Download a whole website and view it offline using wget!

If you want to view the contents of this website (or any website) offline, you can download the whole site (including images if desired) to a folder on your hard disk, as follows:

Quick Method

Download the WebLeach.zip file at the bottom of this page, extract the files and run WebLeach.cmd script file. You will be prompted for the URL (e.g. www.xxxxx.com) that you want to download and the drive letter that you want to mount the website folder to (e.g. Z). You can then browse the whole website offline. The downloaded files will be copied to C:\temp\www.xxxxx.com for you to browse at any time.

You might also like to try the free WinHTTrack GUI Windows\Linux\BSD application from http://www.httrack.com which has been recommended to me. You can filter out certain files or file extensions and also update your local copy at any time. It also copes with an interrupted connection and will auto-resume.

Google Sites

1. For Google Sites you can use Sites Import/Export GUI here. To run it from Windows, right-click on the .Jar file and choose Open with - Java SE Platform Binary (Java SE must be installed).

Entry fields are (for a site http://sites.google.com/a/ xxxx.com / www :
Host: sites.google.com
Domain: xxxx.com
Webspace: www
Username: [email protected] (your gmail account)
Password: (your gmail account password)
If you have 2-step verification enabled then your Google password won't work - generate an application-specific password from https://accounts.google.com/b/0/IssuedAuthSubTokens#accesscodes and paste it into the Password field.
Choose Target Directory: c:\temp\zzz

2. Click on C:\temp\zzz\home\index.html to start browsing

Manual Method

Here is the manual.

2. Install the software onto your Windows computer.

3. Open a command prompt and use the CD command to get to the installation folder by typing:

cd "C:\Program Files (x86)\GnuWin32\bin"

OR if you have a 32-bit system, type:

cd "C:\Program Files\GnuWin32\bin"

4. Type the command:

wget -mk -P C:\temp www.rmprepusb.com

(assuming this is the site you want to leech!)

or add -R to prevent files with certain extensions from being downloaded, e.g.

wget -mk -R=jpg,png -P C:\temp www.rmprepusb.com

A new folder will be created of the name of the site (www.rmprepusb.com) under the C:\temp folder.

5. Now run a subst command to substitute a new drive letter for the new folder: e.g.

subst k: "C:\temp\www.rmprepusb.com"

6. Then you can start to browse the website by typing

start k:\index.htm

and because the index file is now at the root of the new drive, the links to all the other pages should work correctly and you can browse the whole site offline.

Steve Si,
15 Sep 2012, 05:33