Posted By

rwczippy on 10/04/08


Tagged

curl Shell Bash linux


Versions (?)

Who likes this?

11 people have marked this snippet as a favorite

basicmagic
AzizLight
yosemite610
deepsoul
srinivasga
riddla
rwczippy
vkolev
armanx
jeremydouglass
ringo380


Using wget or curl to download web sites for archival


 / Published in: Bash
 

URL: http://psung.blogspot.com/2008/06/using-wget-or-curl-to-download-web.html

This will start at the specified URL and recursively download pages up to 3 links away from the original page, but only pages which are in the directory of the URL you specified (emacstips/) or one of its subdirectories.

wget will also rewrite the links in the pages it downloaded to make your downloaded copy a useful local copy, and it will download all page prerequisites (e.g. images, stylesheets, and the like).

The last two options -nH --cut-dirs=1 control where wget places the output. If you omitted those two options, wget would, for example, download http://web.psung.name/emacstips/index.html and place it under a subdirectory web.psung.name/emacstips of the current directory. With only -nH ("no host directory") wget would write that same file to a subdirectory emacstips. And with both options wget would write that same file to the current directory. In general, if you want to reduce the number of extraneous directories created, change cut-dirs to be the number of leading directories in your URL.

  1. wget -rkp -l3 -np -nH --cut-dirs=1 http://web.psung.name/emacstips/

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: hemanthhm on January 11, 2009

More tips and tricks : http://ubunt2.blogspot.com/2009/01/wget-tircks-and-tips.html

Posted By: hemanthhm on January 11, 2009

http://ubunt2.blogspot.com/2009/01/wget-tircks-and-tips.html

You need to login to post a comment.