Template to retrieve the same standard information but from different sites

wpBots Support – The best crawlers for WordPress Forums SCRAPER (after-sales) General Issues Template to retrieve the same standard information but from different sites

Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #1895
    herve Dherve f
    Participant

    Hi,
    the same standard information but from different sites:
    * the title,
    * the first 200 characters,
    * the url and
    * if possible, the author, the creation date, the 1st image.
    The difficulty would be to not have to do the learning each time but that your plugin tries to recover the maximum of this information by putting a list of urls of web articles from different sites with different layouts!
    Could your plugin go in this direction?
    Regards

    #1897
    herve Dherve f
    Participant

    ;;

    #1896
    herve Dherve f
    Participant
    This reply has been marked as private.
    #1900
    Suman M.Suman M.
    Keymaster

    Hi, I checked it and you can scrape from the above source sites. But this one is not scrapable – https://www.pourleco.com/ca-clashe/debat-des-economistes/dominique-meda-la-crise-du-covid-19-nous-oblige-reevaluer-lutilite
    Also I didn’t get this exactly – “I want to recover to make a short quote with the data mentioned in the previous post to make a wordpress article by article.”

    Do you want to scrape the following fields? And anything else you want to be done? Do let us know.
    * the title,
    * the first 200 characters,
    * the url and
    * if possible, the author, the creation date, the 1st image.

    And please let us know backend login details so that we can create task for you. Thanks!

    #1901
    herve Dherve f
    Participant

    HI
    I’m still trying to explain it better.
    Today, I retrieve manually for each article (url of a website) each following information:
    * the title,
    * the url
    * the first x characters,
    * the author,
    * the creation date,
    * the 1st image.

    With your help I would like to create a single task to retrieve this information that any site should have. Sure, it will fail from time to time, but if I can recover 8 out of 10 sites correctly, that would be fine.
    Concretely I put the list of urls in a task and your plugin collects this information as much as possible. If this was not possible with positioning, can you consider, in the future, using structured data from Google, facebook opengraph to complete more easily 🙂 ?

    Regards

    #1902
    herve Dherve f
    Participant
    This reply has been marked as private.
    #1905
    Suman M.Suman M.
    Keymaster
    This reply has been marked as private.
    #1908
    herve Dherve f
    Participant

    Hi,
    Thank you for your intervention but did you understand what I wanted.
    I knew that your plugin would be able to retrieve this information but my request is more specific.
    I would like to be able to retrieve hundreds of articles from dozens of different sites. It’s unmanageable if I have to create a new stain every time. I’ll have it faster manually!
    This is why I was wondering if there was a trick with the multiple post currently or if there was the other idea to recover the meta data (structured, opengraph) because apart from the first 200 characters, the most major sites now have this structured data !?
    Regards

    #1917
    herve Dherve f
    Participant

    HI,
    Can you answer the more urgent previous question
    I also send you the photo of the screen or I do not understand the choice to make ?
    Regards

    Attachments:
    You must be logged in to view attached files.
    #1920
    Suman M.Suman M.
    Keymaster

    The above example is for Single Post scrape. You can also do Multiple Post scrape which will scrape all the items from a page like https://www.lemonde.fr/planete/
    I’ve created task ‘lemonde.fr/planete – multiple-post’ in your site for this.

    Also, you can do bulk scrapping from multiple URLs at a time – https://support.wpbots.net/documentation/scraping-urls-in-bulk/
    But for this, all the single post should be of same pattern/html structure.

    #1925
    herve Dherve f
    Participant

    HI,
    My request was to scrap urls of different structure / pattern.

    It is however the method that I would have liked
    Copy in the same task, a different url from a site A, B … C.

    1 / As they have a different structure, I asked you if you could use (when exists?) Structured data from Google, Opengrah?
    2 / why for Release, I have the message when opening the task
    “XML or RSS Feed parsing?” and what to do best?
    Regards

    #1942
    Suman M.Suman M.
    Keymaster

    Yes, you can clone the task and then make required changes to it.

    1) this is not supported by the plugin

    2) We’ll be fixing this. But as of now, you can simply Cancel that popup.

    #1949
    herve Dherve f
    Participant

    HI
    1/ I expected that. I thought I was clear because the need for this basic information seems generic to me.
    Are you interested in implementing this type of support?

    If not I had made you other suggestions, you will soon have a new version?
    Regards

    #1952
    Suman M.Suman M.
    Keymaster

    Hi,

    1) Can you please explain this requirement using examples, so that we can decide further. Also give an example of structured data from Google, Opengrah.

    Thanks & Regards!

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in and have valid license to reply to this topic.

License required for the following item
Login and Registration Log in · Register