Use httrack to download an entire website

Posted on Thursday, October 1, 2015

The other day I discovered this awesome command line tool that lets you effectively download a website and save it locally.     I used it to download a website of mine that is in WordPress and convert it to just a static backup of  the site (images and all).

Download and install

The main website for httrack is [1]


Installing on Ubuntu is cake

     > sudo apt-get install webhttrack


When looking how to do this I found this page [3]. 

Which led me to this page

From cygwin run setup.exe but use the -K to point to the cygwin ports project.

     > cygstart -- /cygdrive/c/cygwin64/setup-x86_64.exe -K

When you get to "Choose a Download Site"


Click Add and Next

Doh!  Error

unable to get setup.ini from ftp://ftp/

… what did I do wrong?

Oh I had a space in the URL.  Once I removed that it worked.

Cool now I can just search for httrack and it shows up J

Start a new cygwin window and check if it'whichs installed.

     > which httrack

How does it work?

Well for a more advanced explanation check out [2]

Basically you give it a start location, for example look at this command.

     > httrack -v ""  -O whiteboardcoder

It will then go to copy the page and start copying images and other pages that the first starter page links to.  It's smart, it won't start copying pages outside the URL you gave it.  In fact, unless you tell it otherwise, it will only drill down, not up.  So… even if it as a link to it won't copy .  It won't even copy other subdomains… ex

You can define how you want to filter and even what you want to filter out.

For me I want copy all of *  Here is the command that would do that.

 > httrack -v ""  -O whiteboardcoder "+**"

This is the filter


All subdomains and every file (that the site link to… as long as they are part of the same domain).

If you want to get detailed on what you skip and what you get see [2]   May be of some value to you… for example if you want to skip all .zip files you could with

Let me run this with the time command to see how long it takes to download my entire blog (which is run at

 > time httrack -v ""  -O whiteboardcoder "+**"

Took almost 1.5 hrs to download my site.

Now I have a nice self-contained version of my website in my desktop. 
External links still will take me to other URLs.   Clicking from page to page just loads files locally on my hard drive.

And the original URL is preserved in the folder structure.  Nice.


[1]        htrack main site
                Accessed 10/2015
[2]        Filters
                Accessed 10/2015
[3]        cygwin-ports
                Accessed 10/2015
[4]        cygwin-ports how to run
                Accessed 10/2015

No comments:

Post a Comment