Scraping for Content is not working
wpBots Support – The best crawlers for WordPress › Forums › SCRAPER (after-sales) › Tasks Troubleshooting › Scraping for Content is not working
- This topic has 16 replies, 2 voices, and was last updated 4 years, 4 months ago by Suman M..
-
AuthorPosts
-
June 6, 2020 at 10:07 pm #2248Martin WollankParticipant
Hey guys,
I bought this Plugin to scrape content from Steam, where i want to get Items from the steam workshop on my site
It doesn’t work accordingly, it fails almost everytime completely but sometimes it scrapes only 1 item. It is essential for me to get this working, otherwise I really don’t need this plugin. I suspect, that there is some kind of blocking serverside by steam to prevent such content fetching.
I already tried to increase the delay per processed URL and I increased the PHP memory and time limits but it didn’t help at all.
Thank you in advance!
June 7, 2020 at 6:24 am #2249Suman M.KeymasterHi, thanks for contacting us. Please try using the following XPath for Item’s Path ( https://www.screencast.com/t/tl1RP4rtaYr )
//div[contains(@class,”workshopItem”)]//div[contains(@class,”workshopItemTitle”)]/ancestor::a
I used it and could scrape from the given site. Do let us know.
June 7, 2020 at 3:15 pm #2251Martin WollankParticipantHey thanks for your quick answer,
unfortunately there must be some mistake in the syntax you gave me, i checken a few times and copied and pasted it correctly but there isn’t any item it could find with the xpath provided
//div[contains(@class,”workshopItem”)]//div[contains(@class,”workshopItemTitle”)]/ancestor::a
it says Items count 0 and i cant advance to the Post content.
June 7, 2020 at 4:33 pm #2252Suman M.KeymasterOops! Sorry for that. The forum’s text editor has changed the double quotes. Below is the correct XPath:
//div[contains(@class,"workshopItem")]//div[contains(@class,"workshopItemTitle")]/ancestor::a
June 30, 2020 at 6:22 am #2324Martin WollankParticipantSorry to say but after a few more attempts it still makes problems. It looks like the target has some anti-scraping mechanics activated. So I tried some other pages with different problems. this plugin might be usefull, but I can say that I have absoluetly no use for it in my case because all the things i want to achieve with it dont work.
Sorry but I want my pruchase to be rolled back/refunded
June 30, 2020 at 3:51 pm #2329Suman M.KeymasterI tried to scrape from above source site and could fetch the content. Can you please let us know your website’s wp-admin login details so that we can check the issue there?
If it doesn’t work we’ll be happy to make the refund. Thanks for your understanding.July 1, 2020 at 8:02 pm #2335Martin WollankParticipantUnfortunately I cannot give you administrative access because we have some sensitive customer data of thousands of users and this would not be allowed due to GDPR without a proper data execution contract. But can you tell be the Xpaths with wich you manged to grab the players from this leaderboard to present them on our page?
https://cnc.community/command-and-conquer-remastered/leaderboard/tiberian-dawn
Additionally how did you manage to get the things from the steam workshop wothout being soft banned from scraping their contents? :/
July 2, 2020 at 11:14 am #2344Suman M.KeymasterAlright, we understand your privacy. I could scrape from https://steamcommunity.com. I have given the XPath in my previous post – https://support.wpbots.net/support/topic/scraping-for-content-is-not-working/page/3/#post-2252
Please let us know your scraper task ID so that we can check – https://www.screencast.com/t/vG5tD4XGW
- This reply was modified 4 years, 4 months ago by Suman M..
July 2, 2020 at 11:54 am #2346Martin WollankParticipantAs said, the steam Workshop scraping worked at first but it failed after a number of items, I will have an extensive look on both cases in a few hours when i am at home at let you know the details and problems in both cases.
July 2, 2020 at 10:06 pm #2351Martin WollankParticipantThe Task ID is 141377c3718331d2ca882a31866a13c3
July 3, 2020 at 5:52 am #2353Suman M.KeymasterThis reply has been marked as private.July 6, 2020 at 2:17 pm #2364Martin WollankParticipantYep as said in #2324 😉 Thats the problem. So this seems not to work accordingly, therefor the only use-case that might be useful left would be to scrape these leaderboards: https://cnc.community/command-and-conquer-remastered/leaderboard/tiberian-dawn but I am not sure how to start here :/
July 6, 2020 at 4:25 pm #2371Suman M.KeymasterThis reply has been marked as private.July 7, 2020 at 5:35 am #2375Martin WollankParticipantError 404 🙁
Task ID is: 7a8822db99cb26a45e81ee0be51997ef
scraping for the complete HTML table is no option, due to very different formatting. it looks really bad :/
July 7, 2020 at 1:15 pm #2376Suman M.KeymasterWe tried to scrape from this site and it’s not returning any data to the bots. Thus this site cannot be scraped.
-
AuthorPosts
You must be logged in and have valid license to reply to this topic.