Template to retrieve the same standard information but from different sites
wpBots Support – The best crawlers for WordPress › Forums › SCRAPER (after-sales) › General Issues › Template to retrieve the same standard information but from different sites
Tagged: structured, opengraph
- This topic has 13 replies, 2 voices, and was last updated 4 years, 8 months ago by Suman M..
-
AuthorPosts
-
April 20, 2020 at 4:19 pm #1895herve fParticipant
Hi,
the same standard information but from different sites:
* the title,
* the first 200 characters,
* the url and
* if possible, the author, the creation date, the 1st image.
The difficulty would be to not have to do the learning each time but that your plugin tries to recover the maximum of this information by putting a list of urls of web articles from different sites with different layouts!
Could your plugin go in this direction?
RegardsApril 20, 2020 at 4:31 pm #1897herve fParticipant;;
April 20, 2020 at 5:02 pm #1896herve fParticipantThis reply has been marked as private.April 20, 2020 at 5:19 pm #1900Suman M.KeymasterHi, I checked it and you can scrape from the above source sites. But this one is not scrapable – https://www.pourleco.com/ca-clashe/debat-des-economistes/dominique-meda-la-crise-du-covid-19-nous-oblige-reevaluer-lutilite
Also I didn’t get this exactly – “I want to recover to make a short quote with the data mentioned in the previous post to make a wordpress article by article.”Do you want to scrape the following fields? And anything else you want to be done? Do let us know.
* the title,
* the first 200 characters,
* the url and
* if possible, the author, the creation date, the 1st image.And please let us know backend login details so that we can create task for you. Thanks!
April 20, 2020 at 5:31 pm #1901herve fParticipantHI
I’m still trying to explain it better.
Today, I retrieve manually for each article (url of a website) each following information:
* the title,
* the url
* the first x characters,
* the author,
* the creation date,
* the 1st image.With your help I would like to create a single task to retrieve this information that any site should have. Sure, it will fail from time to time, but if I can recover 8 out of 10 sites correctly, that would be fine.
Concretely I put the list of urls in a task and your plugin collects this information as much as possible. If this was not possible with positioning, can you consider, in the future, using structured data from Google, facebook opengraph to complete more easily 🙂 ?Regards
April 20, 2020 at 5:33 pm #1902herve fParticipantThis reply has been marked as private.April 21, 2020 at 4:41 am #1905Suman M.KeymasterThis reply has been marked as private.April 21, 2020 at 9:32 am #1908herve fParticipantHi,
Thank you for your intervention but did you understand what I wanted.
I knew that your plugin would be able to retrieve this information but my request is more specific.
I would like to be able to retrieve hundreds of articles from dozens of different sites. It’s unmanageable if I have to create a new stain every time. I’ll have it faster manually!
This is why I was wondering if there was a trick with the multiple post currently or if there was the other idea to recover the meta data (structured, opengraph) because apart from the first 200 characters, the most major sites now have this structured data !?
RegardsApril 22, 2020 at 7:58 am #1917herve fParticipantHI,
Can you answer the more urgent previous question
I also send you the photo of the screen or I do not understand the choice to make ?
RegardsAttachments:
You must be logged in to view attached files.April 22, 2020 at 8:22 am #1920Suman M.KeymasterThe above example is for Single Post scrape. You can also do Multiple Post scrape which will scrape all the items from a page like https://www.lemonde.fr/planete/
I’ve created task ‘lemonde.fr/planete – multiple-post’ in your site for this.Also, you can do bulk scrapping from multiple URLs at a time – https://support.wpbots.net/documentation/scraping-urls-in-bulk/
But for this, all the single post should be of same pattern/html structure.April 22, 2020 at 2:58 pm #1925herve fParticipantHI,
My request was to scrap urls of different structure / pattern.It is however the method that I would have liked
Copy in the same task, a different url from a site A, B … C.1 / As they have a different structure, I asked you if you could use (when exists?) Structured data from Google, Opengrah?
2 / why for Release, I have the message when opening the task
“XML or RSS Feed parsing?” and what to do best?
RegardsApril 23, 2020 at 5:10 pm #1942Suman M.KeymasterYes, you can clone the task and then make required changes to it.
1) this is not supported by the plugin
2) We’ll be fixing this. But as of now, you can simply Cancel that popup.
April 24, 2020 at 2:01 pm #1949herve fParticipantHI
1/ I expected that. I thought I was clear because the need for this basic information seems generic to me.
Are you interested in implementing this type of support?If not I had made you other suggestions, you will soon have a new version?
RegardsApril 25, 2020 at 1:24 pm #1952Suman M.KeymasterHi,
1) Can you please explain this requirement using examples, so that we can decide further. Also give an example of structured data from Google, Opengrah.
Thanks & Regards!
-
AuthorPosts
You must be logged in and have valid license to reply to this topic.