I want to rip the contents of a pay website, but I have to log in to their web site on a web page to get access

Does anyone have any good tools for Windows for that?

I’m guessing that any such tools must have a built in browser, or be a browser plugin for it to work.

  • zabadohOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Okay, I found SurfOffline that does the trick without too much hassle, but…

    It’s verrrrrrrry slooooooooow.

    It uses Internet Explorer as a module, and calls each individual resource separately, instead of file copying from IE’s cache, which is weird and slow, especially when hundreds of images are involved.

    And SurfOffline doesn’t appear to be supported anymore, i.e. the support email’s inbox is full.

    edit: Aaaaand SurfOffline doesn’t save to .html files with a directory structure!!! It stores everything in some kind of sql database, and it only saves to .mht and .chm files, which are deprecated Microsoft help file formats!!!

    What it does have is a built in web server that only works while the program is running.

    So what I plan to do is have the program up but doing nothing, while I sick Httrack on the 127.0.0.1 web address for my ripped website.

    Httrrack will hopefully “extract” the website to .html format.

    Whew, what a hassle!

    • zabadohOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      To continue my travails:

      Httrack didn’t do a great job: It was slow, even copying from the same machine, and it flattened the directory structure of the website it was writing, making it almost un-navigable.

      Here’s where Cyotek WebCopy shines: It’s copying the website from SurfOffline’s database webserver quickly, so I should have the entire website re-extracted very soon!