Scraping for Content is not working

This topic has 16 replies, 2 voices, and was last updated 6 years ago by Suman M..

Viewing 15 posts - 1 through 15 (of 17 total)

1 2 →

Author

Posts
June 6, 2020 at 10:07 pm #2248

Martin Wollank
Participant

Hey guys,

I bought this Plugin to scrape content from Steam, where i want to get Items from the steam workshop on my site

https://steamcommunity.com/workshop/browse/?appid=1213210&browsesort=mostrecent&section=readytouseitems&actualsort=mostrecent&p=1

It doesn’t work accordingly, it fails almost everytime completely but sometimes it scrapes only 1 item. It is essential for me to get this working, otherwise I really don’t need this plugin. I suspect, that there is some kind of blocking serverside by steam to prevent such content fetching.

I already tried to increase the delay per processed URL and I increased the PHP memory and time limits but it didn’t help at all.

Thank you in advance!

Add New Note to this Reply

June 7, 2020 at 6:24 am #2249

Suman M.
Keymaster

Hi, thanks for contacting us. Please try using the following XPath for Item’s Path ( https://www.screencast.com/t/tl1RP4rtaYr )

//div[contains(@class,”workshopItem”)]//div[contains(@class,”workshopItemTitle”)]/ancestor::a

I used it and could scrape from the given site. Do let us know.

Add New Note to this Reply

June 7, 2020 at 3:15 pm #2251

Martin Wollank
Participant

Hey thanks for your quick answer,

unfortunately there must be some mistake in the syntax you gave me, i checken a few times and copied and pasted it correctly but there isn’t any item it could find with the xpath provided

//div[contains(@class,”workshopItem”)]//div[contains(@class,”workshopItemTitle”)]/ancestor::a

it says Items count 0 and i cant advance to the Post content.

Add New Note to this Reply

June 7, 2020 at 4:33 pm #2252

Suman M.
Keymaster

Oops! Sorry for that. The forum’s text editor has changed the double quotes. Below is the correct XPath:

//div[contains(@class,"workshopItem")]//div[contains(@class,"workshopItemTitle")]/ancestor::a

Add New Note to this Reply

June 30, 2020 at 6:22 am #2324

Martin Wollank
Participant

Sorry to say but after a few more attempts it still makes problems. It looks like the target has some anti-scraping mechanics activated. So I tried some other pages with different problems. this plugin might be usefull, but I can say that I have absoluetly no use for it in my case because all the things i want to achieve with it dont work.

Sorry but I want my pruchase to be rolled back/refunded

Add New Note to this Reply

June 30, 2020 at 3:51 pm #2329

Suman M.
Keymaster

I tried to scrape from above source site and could fetch the content. Can you please let us know your website’s wp-admin login details so that we can check the issue there?
If it doesn’t work we’ll be happy to make the refund. Thanks for your understanding.

Add New Note to this Reply

July 1, 2020 at 8:02 pm #2335

Martin Wollank
Participant

Unfortunately I cannot give you administrative access because we have some sensitive customer data of thousands of users and this would not be allowed due to GDPR without a proper data execution contract. But can you tell be the Xpaths with wich you manged to grab the players from this leaderboard to present them on our page?

https://cnc.community/command-and-conquer-remastered/leaderboard/tiberian-dawn

Additionally how did you manage to get the things from the steam workshop wothout being soft banned from scraping their contents? :/

Add New Note to this Reply

July 2, 2020 at 11:14 am #2344
Suman M.
Keymaster
Alright, we understand your privacy. I could scrape from https://steamcommunity.com. I have given the XPath in my previous post – https://support.wpbots.net/support/topic/scraping-for-content-is-not-working/page/3/#post-2252

Please let us know your scraper task ID so that we can check – https://www.screencast.com/t/vG5tD4XGW
- This reply was modified 6 years ago by Suman M..
Add New Note to this Reply
July 2, 2020 at 11:54 am #2346

Martin Wollank
Participant

As said, the steam Workshop scraping worked at first but it failed after a number of items, I will have an extensive look on both cases in a few hours when i am at home at let you know the details and problems in both cases.

Add New Note to this Reply

July 2, 2020 at 10:06 pm #2351

Martin Wollank
Participant

The Task ID is 141377c3718331d2ca882a31866a13c3

Add New Note to this Reply

July 3, 2020 at 5:52 am #2353

Suman M.
Keymaster

This reply has been marked as private.

Add New Note to this Reply

July 6, 2020 at 2:17 pm #2364

Martin Wollank
Participant

Yep as said in #2324 😉 Thats the problem. So this seems not to work accordingly, therefor the only use-case that might be useful left would be to scrape these leaderboards: https://cnc.community/command-and-conquer-remastered/leaderboard/tiberian-dawn but I am not sure how to start here :/

Add New Note to this Reply

July 6, 2020 at 4:25 pm #2371

Suman M.
Keymaster

This reply has been marked as private.

Add New Note to this Reply

July 7, 2020 at 5:35 am #2375

Martin Wollank
Participant

Error 404 🙁

Task ID is: 7a8822db99cb26a45e81ee0be51997ef

scraping for the complete HTML table is no option, due to very different formatting. it looks really bad :/

Add New Note to this Reply

July 7, 2020 at 1:15 pm #2376

Suman M.
Keymaster

We tried to scrape from this site and it’s not returning any data to the bots. Thus this site cannot be scraped.

Add New Note to this Reply
Author

Posts

Viewing 15 posts - 1 through 15 (of 17 total)

1 2 →

You must be logged in and have valid license to reply to this topic.

License required for the following item

Scraper - Automatic Content Crawler Plugin for WordPress

by wpBots

Scraping for Content is not working

Scraper - Automatic Content Crawler Plugin for WordPress

Frequently Asked Questions

Why is our Ajax service paid?

How to scrape a site / URL?

How can I import products?

Most Helpful Articles

Introduction

Manual Installation

Updating