Scraping for Content is not working

Viewing 15 posts - 1 through 15 (of 17 total)
  • Author
    Posts
  • #2248
    Martin WollankMartin Wollank
    Participant

    Hey guys,

    I bought this Plugin to scrape content from Steam, where i want to get Items from the steam workshop on my site

    https://steamcommunity.com/workshop/browse/?appid=1213210&browsesort=mostrecent&section=readytouseitems&actualsort=mostrecent&p=1

    It doesn’t work accordingly, it fails almost everytime completely but sometimes it scrapes only 1 item. It is essential for me to get this working, otherwise I really don’t need this plugin. I suspect, that there is some kind of blocking serverside by steam to prevent such content fetching.

    I already tried to increase the delay per processed URL and I increased the PHP memory and time limits but it didn’t help at all.

    Thank you in advance!

    #2249
    Suman M.Suman M.
    Keymaster

    Hi, thanks for contacting us. Please try using the following XPath for Item’s Path ( https://www.screencast.com/t/tl1RP4rtaYr )

    //div[contains(@class,”workshopItem”)]//div[contains(@class,”workshopItemTitle”)]/ancestor::a

    I used it and could scrape from the given site. Do let us know.

    #2251
    Martin WollankMartin Wollank
    Participant

    Hey thanks for your quick answer,

    unfortunately there must be some mistake in the syntax you gave me, i checken a few times and copied and pasted it correctly but there isn’t any item it could find with the xpath provided

    //div[contains(@class,”workshopItem”)]//div[contains(@class,”workshopItemTitle”)]/ancestor::a

    it says Items count 0 and i cant advance to the Post content.

    #2252
    Suman M.Suman M.
    Keymaster

    Oops! Sorry for that. The forum’s text editor has changed the double quotes. Below is the correct XPath:

    //div[contains(@class,"workshopItem")]//div[contains(@class,"workshopItemTitle")]/ancestor::a

    #2324
    Martin WollankMartin Wollank
    Participant

    Sorry to say but after a few more attempts it still makes problems. It looks like the target has some anti-scraping mechanics activated. So I tried some other pages with different problems. this plugin might be usefull, but I can say that I have absoluetly no use for it in my case because all the things i want to achieve with it dont work.

    Sorry but I want my pruchase to be rolled back/refunded

    #2329
    Suman M.Suman M.
    Keymaster

    I tried to scrape from above source site and could fetch the content. Can you please let us know your website’s wp-admin login details so that we can check the issue there?
    If it doesn’t work we’ll be happy to make the refund. Thanks for your understanding.

    #2335
    Martin WollankMartin Wollank
    Participant

    Unfortunately I cannot give you administrative access because we have some sensitive customer data of thousands of users and this would not be allowed due to GDPR without a proper data execution contract. But can you tell be the Xpaths with wich you manged to grab the players from this leaderboard to present them on our page?

    https://cnc.community/command-and-conquer-remastered/leaderboard/tiberian-dawn

    Additionally how did you manage to get the things from the steam workshop wothout being soft banned from scraping their contents? :/

    #2344
    Suman M.Suman M.
    Keymaster

    Alright, we understand your privacy. I could scrape from https://steamcommunity.com. I have given the XPath in my previous post – https://support.wpbots.net/support/topic/scraping-for-content-is-not-working/page/3/#post-2252

    Please let us know your scraper task ID so that we can check – https://www.screencast.com/t/vG5tD4XGW

    • This reply was modified 3 years, 9 months ago by Suman M.Suman M..
    #2346
    Martin WollankMartin Wollank
    Participant

    As said, the steam Workshop scraping worked at first but it failed after a number of items, I will have an extensive look on both cases in a few hours when i am at home at let you know the details and problems in both cases.

    #2351
    Martin WollankMartin Wollank
    Participant

    The Task ID is 141377c3718331d2ca882a31866a13c3

    #2353
    Suman M.Suman M.
    Keymaster
    This reply has been marked as private.
    #2364
    Martin WollankMartin Wollank
    Participant

    Yep as said in #2324 😉 Thats the problem. So this seems not to work accordingly, therefor the only use-case that might be useful left would be to scrape these leaderboards: https://cnc.community/command-and-conquer-remastered/leaderboard/tiberian-dawn but I am not sure how to start here :/

    #2371
    Suman M.Suman M.
    Keymaster
    This reply has been marked as private.
    #2375
    Martin WollankMartin Wollank
    Participant

    Error 404 🙁

    Task ID is: 7a8822db99cb26a45e81ee0be51997ef

    scraping for the complete HTML table is no option, due to very different formatting. it looks really bad :/

    #2376
    Suman M.Suman M.
    Keymaster

    We tried to scrape from this site and it’s not returning any data to the bots. Thus this site cannot be scraped.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in and have valid license to reply to this topic.

License required for the following item
Login and Registration Log in · Register