Web Scraping tool suggestion

Discussion in 'SW Helpdesk' started by skimr, Mar 16, 2019.

Tags:
  1. skimr

    skimr Registered User

    Joined:
    Jan 13, 2019
    Messages:
    130
    Likes Received:
    1,015
    I need web scraping software suggestion. I know there are courses in python and javascript but I want something simpler.

    Basically I have a page with a list that has a page element at the bottom (20 pages). Each item in the list has a hyperlink and when you click on it opens a details page with the detail of that item.

    I basically need to extract specific data and put it in excel columns

    Anyone know of software that they would suggest? Google gives too many results and I want to know the best software for the job
     
    12345678 likes this.
  2. Sir.Dev.A.Lot

    Sir.Dev.A.Lot is a Trusted Warez PosterSir.Dev.A.Lot Shadow Moderator Staff Member Super Moderator DEV Guild Reverser Translator

    Joined:
    Oct 10, 2008
    Messages:
    21,105
    Likes Received:
    83,566
  3. gr

    grantr Registered User

    Joined:
    Jan 11, 2019
    Messages:
    3
    Likes Received:
    4
    It is worth looking into power query, it’s free from Microsoft versions after Excel 2016 it is built in, versions below there is an add-in.
     
    12345678 likes this.
  4. H4

    H4nZ83 Registered User

    Joined:
    May 29, 2017
    Messages:
    67
    Likes Received:
    88
    try HTTrack it free
    i've been using it since 2018 and it turns great, all was copied, and all link was changed to local link
     
    12345678 likes this.
  5. Wonderman

    Wonderman is a Trusted Warez PosterWonderman DEV Guild Member DEV Guild

    Joined:
    Mar 16, 2011
    Messages:
    2,576
    Likes Received:
    18,231
    Imacros, content grabber and winautomation are the best to use for what you need.
     
    12345678 likes this.
  6. WarezNet

    WarezNet Registered User

    Joined:
    Jul 26, 2017
    Messages:
    74
    Likes Received:
    543
    Content Grabber is the best solution out there by far, you can even use custom c# code in it to do things that are not built in like detect phone numbers. The built in controls will take care of the basic things like pagination link navigation etc.
     
    12345678 likes this.
  7. trainfan

    trainfan Registered User

    Joined:
    Mar 31, 2019
    Messages:
    29
    Likes Received:
    25
    The most basic way.... batch download /scrape the websites and use a keyboard macro in your favorite text editor to extract the links.

    You can scrape using httrack as some mentioned above.

    Personally I would use curl and grep/sed.
     
    12345678 likes this.
  8. ogrishman

    ogrishman Registered User

    Joined:
    Sep 25, 2018
    Messages:
    217
    Likes Received:
    1,605
    I suggest trying Content Grabber based on your description. It's easy to use, powerful and extensible. You can use C# or Python to extend its functionality to make it meet your requirements.
     
    12345678 likes this.
  9. ma

    mattstyle Registered User

    Joined:
    Apr 7, 2019
    Messages:
    1
    Likes Received:
    1
    12345678 likes this.