Scraping Issues

This topic has 1 reply, 2 voices, and was last updated 6 years, 9 months ago by Suman M..

Viewing 2 posts - 1 through 2 (of 2 total)

Author

Posts
September 25, 2019 at 1:23 am #606

Michael Golley
Participant

Hi, thanks for the plugin. Had great success so far,

however… (Sorry if this long winded, I’m sure this is most of the information I need to know)

I’m trying to extract information from the following website as multiples – https://www.classicfootballshirts.co.uk/premiership-clubs/manchester-united.html

⁃ I would like to extract product tags from the breadcrumbs at the top of the page, however they seem to disappear when URL is added to the plugin?

⁃ I can’t seem to find the separate paths for regular price and sale price. The plugin seems to add both prices into one on the preview, how can I find a solution?

⁃ When adding the page content, I’d like to keep the paragraphs and the bold lettering originally on the page. Is this possible?

⁃ I want to extract a part of the content which shows the shirt size as a product tag, I’ve been successful in removing the word – ‘size’ and highlighting just the ‘XL’ part – however when pressing scrape and then preview, it seems to dump the entire contents into the tag information.

FYI, I’ve installed WooCommerce and saving as products.

Thanks for your help.
Regards Michael

Add New Note to this Reply

September 25, 2019 at 7:07 am #610

Suman M.
Keymaster

Hi, thanks for contacting us.

⁃ I would like to extract product tags from the breadcrumbs at the top of the page, however they seem to disappear when URL is added to the plugin?
>> Which ones? Can you show using screenshot please?

⁃ I can’t seem to find the separate paths for regular price and sale price. The plugin seems to add both prices into one on the preview, how can I find a solution?
>> You’ll need to add XPath manually in such case. In this case,
for sale price – //p[contains(@class,"special-price")]/span[2]
for regular price – //p[contains(@class,"old-price")]/span[2]
Also, you’ll need to strip off non-numerical characters from price – https://www.screencast.com/t/VaYuGlX5

⁃ When adding the page content, I’d like to keep the paragraphs and the bold lettering originally on the page. Is this possible?
>> Yes, please set Part field to ‘HTML source code’ – https://www.screencast.com/t/KuAmkSwW0epK

⁃ I want to extract a part of the content which shows the shirt size as a product tag, I’ve been successful in removing the word – ‘size’ and highlighting just the ‘XL’ part – however when pressing scrape and then preview, it seems to dump the entire contents into the tag information.
>> Can you please check if the text “size” is still there after importing the product to your site?

Add New Note to this Reply
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in and have valid license to reply to this topic.

License required for the following item

Scraper - Automatic Content Crawler Plugin for WordPress

by wpBots

Scraping Issues

Scraper - Automatic Content Crawler Plugin for WordPress

Frequently Asked Questions

Why is our Ajax service paid?

How to scrape a site / URL?

How can I import products?

Most Helpful Articles

Introduction

Manual Installation

Updating