Template to retrieve the same standard information but from different sites

Tagged: structured, opengraph

This topic has 13 replies, 2 voices, and was last updated 6 years, 2 months ago by Suman M..

Viewing 14 posts - 1 through 14 (of 14 total)

Author

Posts
April 20, 2020 at 4:19 pm #1895

herve f
Participant

Hi,
the same standard information but from different sites:
* the title,
* the first 200 characters,
* the url and
* if possible, the author, the creation date, the 1st image.
The difficulty would be to not have to do the learning each time but that your plugin tries to recover the maximum of this information by putting a list of urls of web articles from different sites with different layouts!
Could your plugin go in this direction?
Regards

Add New Note to this Reply

April 20, 2020 at 4:31 pm #1897

herve f
Participant

;;

Add New Note to this Reply

April 20, 2020 at 5:02 pm #1896

herve f
Participant

This reply has been marked as private.

Add New Note to this Reply

April 20, 2020 at 5:19 pm #1900

Suman M.
Keymaster

Hi, I checked it and you can scrape from the above source sites. But this one is not scrapable – https://www.pourleco.com/ca-clashe/debat-des-economistes/dominique-meda-la-crise-du-covid-19-nous-oblige-reevaluer-lutilite
Also I didn’t get this exactly – “I want to recover to make a short quote with the data mentioned in the previous post to make a wordpress article by article.”

Do you want to scrape the following fields? And anything else you want to be done? Do let us know.
* the title,
* the first 200 characters,
* the url and
* if possible, the author, the creation date, the 1st image.

And please let us know backend login details so that we can create task for you. Thanks!

Add New Note to this Reply

April 20, 2020 at 5:31 pm #1901

herve f
Participant

HI
I’m still trying to explain it better.
Today, I retrieve manually for each article (url of a website) each following information:
* the title,
* the url
* the first x characters,
* the author,
* the creation date,
* the 1st image.

With your help I would like to create a single task to retrieve this information that any site should have. Sure, it will fail from time to time, but if I can recover 8 out of 10 sites correctly, that would be fine.
Concretely I put the list of urls in a task and your plugin collects this information as much as possible. If this was not possible with positioning, can you consider, in the future, using structured data from Google, facebook opengraph to complete more easily 🙂 ?

Regards

Add New Note to this Reply

April 20, 2020 at 5:33 pm #1902

herve f
Participant

This reply has been marked as private.

Add New Note to this Reply

April 21, 2020 at 4:41 am #1905

Suman M.
Keymaster

This reply has been marked as private.

Add New Note to this Reply

April 21, 2020 at 9:32 am #1908

herve f
Participant

Hi,
Thank you for your intervention but did you understand what I wanted.
I knew that your plugin would be able to retrieve this information but my request is more specific.
I would like to be able to retrieve hundreds of articles from dozens of different sites. It’s unmanageable if I have to create a new stain every time. I’ll have it faster manually!
This is why I was wondering if there was a trick with the multiple post currently or if there was the other idea to recover the meta data (structured, opengraph) because apart from the first 200 characters, the most major sites now have this structured data !?
Regards

Add New Note to this Reply

April 22, 2020 at 7:58 am #1917

herve f
Participant

HI,
Can you answer the more urgent previous question
I also send you the photo of the screen or I do not understand the choice to make ?
Regards

Attachments:
You must be logged in to view attached files.

Add New Note to this Reply

April 22, 2020 at 8:22 am #1920

Suman M.
Keymaster

The above example is for Single Post scrape. You can also do Multiple Post scrape which will scrape all the items from a page like https://www.lemonde.fr/planete/
I’ve created task ‘lemonde.fr/planete – multiple-post’ in your site for this.

Also, you can do bulk scrapping from multiple URLs at a time – https://support.wpbots.net/documentation/scraping-urls-in-bulk/
But for this, all the single post should be of same pattern/html structure.

Add New Note to this Reply

April 22, 2020 at 2:58 pm #1925

herve f
Participant

HI,
My request was to scrap urls of different structure / pattern.

It is however the method that I would have liked
Copy in the same task, a different url from a site A, B … C.

1 / As they have a different structure, I asked you if you could use (when exists?) Structured data from Google, Opengrah?
2 / why for Release, I have the message when opening the task
“XML or RSS Feed parsing?” and what to do best?
Regards

Add New Note to this Reply

April 23, 2020 at 5:10 pm #1942

Suman M.
Keymaster

Yes, you can clone the task and then make required changes to it.

1) this is not supported by the plugin

2) We’ll be fixing this. But as of now, you can simply Cancel that popup.

Add New Note to this Reply

April 24, 2020 at 2:01 pm #1949

herve f
Participant

HI
1/ I expected that. I thought I was clear because the need for this basic information seems generic to me.
Are you interested in implementing this type of support?

If not I had made you other suggestions, you will soon have a new version?
Regards

Add New Note to this Reply

April 25, 2020 at 1:24 pm #1952

Suman M.
Keymaster

Hi,

1) Can you please explain this requirement using examples, so that we can decide further. Also give an example of structured data from Google, Opengrah.

Thanks & Regards!

Add New Note to this Reply
Author

Posts

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in and have valid license to reply to this topic.

License required for the following item

Scraper - Automatic Content Crawler Plugin for WordPress

by wpBots

Template to retrieve the same standard information but from different sites

Attachments:

Scraper - Automatic Content Crawler Plugin for WordPress

Frequently Asked Questions

Why is our Ajax service paid?

How to scrape a site / URL?

How can I import products?

Most Helpful Articles

Introduction

Manual Installation

Updating