Scraping Issues

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #606
    Michael GolleyMichael Golley
    Participant

    Hi, thanks for the plugin. Had great success so far,

    however… (Sorry if this long winded, I’m sure this is most of the information I need to know)

    I’m trying to extract information from the following website as multiples – https://www.classicfootballshirts.co.uk/premiership-clubs/manchester-united.html

    ⁃ I would like to extract product tags from the breadcrumbs at the top of the page, however they seem to disappear when URL is added to the plugin?

    ⁃ I can’t seem to find the separate paths for regular price and sale price. The plugin seems to add both prices into one on the preview, how can I find a solution?

    ⁃ When adding the page content, I’d like to keep the paragraphs and the bold lettering originally on the page. Is this possible?

    ⁃ I want to extract a part of the content which shows the shirt size as a product tag, I’ve been successful in removing the word – ‘size’ and highlighting just the ‘XL’ part – however when pressing scrape and then preview, it seems to dump the entire contents into the tag information.

    FYI, I’ve installed WooCommerce and saving as products.

    Thanks for your help.
    Regards Michael

    #610
    Suman M.Suman M.
    Keymaster

    Hi, thanks for contacting us.

    ⁃ I would like to extract product tags from the breadcrumbs at the top of the page, however they seem to disappear when URL is added to the plugin?
    >> Which ones? Can you show using screenshot please?

    ⁃ I can’t seem to find the separate paths for regular price and sale price. The plugin seems to add both prices into one on the preview, how can I find a solution?
    >> You’ll need to add XPath manually in such case. In this case,
    for sale price – //p[contains(@class,"special-price")]/span[2]
    for regular price – //p[contains(@class,"old-price")]/span[2]
    Also, you’ll need to strip off non-numerical characters from price – https://www.screencast.com/t/VaYuGlX5

    ⁃ When adding the page content, I’d like to keep the paragraphs and the bold lettering originally on the page. Is this possible?
    >> Yes, please set Part field to ‘HTML source code’ – https://www.screencast.com/t/KuAmkSwW0epK

    ⁃ I want to extract a part of the content which shows the shirt size as a product tag, I’ve been successful in removing the word – ‘size’ and highlighting just the ‘XL’ part – however when pressing scrape and then preview, it seems to dump the entire contents into the tag information.
    >> Can you please check if the text “size” is still there after importing the product to your site?

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in and have valid license to reply to this topic.

License required for the following item
Login and Registration Log in · Register