How to download files recursively

There is no better utility than wget to recursively download interesting files from the depths of the internet. I will show you why that is the case.

Simply download files recursively. Note, that default maximum depth is set to 5.

$ wget --recursive https://example.org/open-directory/

Download files recursively using defined maximum recursion depth level. It is important to remember that level 0 is equivalent to inf infinite recursion.

$ wget --recursive --level 1 https://example.org/files/presentation/

Download files recursively and specify directory prefix. If not specified then by default files are stored in the current directory.

$ wget --recursive --directory-prefix=/tmp/wget/ https://example.org/open-directory/

Download files recursively but do not ascend to the parent directory.

$ wget --recursive --no-parent https://example.org/files/presentation/

Download files recursively, do not ascend to the parent directory and define user-agent header field if you need to circumvent this security measure.

$ wget --recursive --no-parent --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0" https://example.org/files/presentation/

Download files recursively, do not ascend to the parent directory and reject index.html files.

$ wget --recursive --no-parent --reject "index.html*" https://example.org/files/presentation/

Download files recursively, do not ascend to the parent directory and accept only PDF files.

$ wget --recursive --no-parent --accept "*.pdf" https://example.org/files/presentation/

Download files recursively but ignore robots.txt file as it sometimes gets in the way.

$ wget --recursive --execute robots=off https://example.org/

Download files recursively, do not ascend to the parent directory and wait around 10 seconds (0.5 and 1.5 * wait seconds) between requests.

$ wget --recursive --no-parent --wait 10 --random-wait https://example.org/files/presentation/

Download files recursively but limit the retrieval rate to 250KB/s.

$ wget --recursive --limit-rate=250k  https://example.org/files/

Download files recursively, do not ascend to the parent directory, accept only PDF and PNG files but do not create any directories. Every downloaded file will be stored in current directory.

$ wget --recursive --no-parent --accept "*.pdf,*.png" --no-directories https://example.org/files/presentation/

Download files recursively but do not create example.org host-prefixed directory.

$ wget --recursive --no-host-directories https://example.org/files/

Download files recursively using defined username and password.

$ wget --recursive --user="username" --password="password" https://example.org/

Download files recursively, do not ascend to the parent directory, do not create host-prefixed directory and ignore two directory components. It will store first-presentation> directory with downloaded conent.

$ wget --recursive --no-parent --no-host-directories --cut-dirs=2 https://example.org/files/presentation/first-presentation/

Download files recursively using only IPv4 or IPv6 addresses.

$ wget --recursive --inet4-only https://example.org/notes.html
$ wget --recursive --inet6-only https://example.org/notes.html

Continue download started by a previous instance of wget.

$ wget --recursive --continue https://example.org/notes.html
Milosz Galazka's Picture

About Milosz Galazka

Milosz is a system administrator working for a successful Polish company and a long time supporter of Free Software Foundation and Debian operating system.

Gdansk, Poland https://sleeplessbeastie.eu