wget

Non-interactive network downloader for retrieving files from web servers

Syntax: wget [options] [URL...]

Note: wget is a free utility for non-interactive download of files from the web. It supports HTTP, HTTPS, and FTP protocols and can work in the background.

Description

wget is a command-line utility for downloading files from web servers. It's designed to work robustly over slow or unstable network connections, with features like automatic retries, recursive downloads, and the ability to resume interrupted downloads. It's particularly useful for scripting and automation tasks.

Basic Options

Option	Description
`-O, --output-document=FILE`	Write documents to FILE
`-c, --continue`	Resume partial download
`-t, --tries=NUMBER`	Set number of retries
`-T, --timeout=SECONDS`	Set network timeout
`-q, --quiet`	Quiet mode (no output)
`-v, --verbose`	Verbose output
`-b, --background`	Run in background
`-P, --directory-prefix=PREFIX`	Save files to PREFIX directory

Basic Usage

Simple file download:

# Download a file
wget https://example.com/file.zip

# Download with custom filename
wget -O myfile.zip https://example.com/file.zip

# Download to specific directory
wget -P /downloads https://example.com/file.zip

Resume and Retry Options

Handling interrupted downloads:

# Resume interrupted download
wget -c https://example.com/largefile.iso

# Set retry attempts
wget -t 5 https://example.com/file.zip

# Infinite retries
wget -t 0 https://example.com/file.zip

# Set timeout
wget -T 30 https://example.com/file.zip

Recursive Downloads

Website mirroring:

# Download entire website
wget -r https://example.com

# Mirror with specific depth
wget -r -l 2 https://example.com

# Mirror for offline browsing
wget -r -p -k -E https://example.com

# Download all images from a page
wget -r -A jpg,jpeg,png,gif https://example.com

Advanced Recursive Options

Option	Description
`-r, --recursive`	Turn on recursive retrieving
`-l, --level=NUMBER`	Maximum recursion depth
`-k, --convert-links`	Convert links for local viewing
`-p, --page-requisites`	Download all files needed to display page
`-E, --adjust-extension`	Save HTML/CSS files with proper extensions
`-m, --mirror`	Shortcut for -r -N -l inf --no-remove-listing

File Type Filtering

Accept and reject patterns:

# Download only specific file types
wget -r -A "*.pdf,*.doc,*.docx" https://example.com

# Exclude specific file types
wget -r -R "*.gif,*.jpg,*.jpeg,*.png" https://example.com

# Download files matching pattern
wget -r -A "*2024*" https://example.com/files/

# Exclude directories
wget -r --exclude-directories=temp,cache https://example.com

Authentication

HTTP authentication:

# Basic HTTP authentication
wget --user=username --password=password https://example.com/file.zip

# Prompt for password
wget --user=username --ask-password https://example.com/file.zip

# Use .netrc file for credentials
wget --netrc-file=~/.netrc https://example.com/file.zip

Headers and User Agent

Custom headers:

# Set custom user agent
wget --user-agent="Mozilla/5.0 (compatible; MyBot/1.0)" https://example.com

# Add custom headers
wget --header="Accept: application/json" https://api.example.com/data

# Multiple headers
wget --header="Authorization: Bearer token123" \
     --header="Content-Type: application/json" \
     https://api.example.com/data

Rate Limiting and Politeness

Bandwidth and delay control:

# Limit download speed
wget --limit-rate=200k https://example.com/largefile.zip

# Add delay between requests
wget --wait=2 -r https://example.com

# Random wait time
wget --random-wait -r https://example.com

# Respect robots.txt
wget --robots=on -r https://example.com

HTTPS and SSL Options

SSL/TLS handling:

# Ignore SSL certificate errors (use with caution)
wget --no-check-certificate https://example.com/file.zip

# Specify CA certificate
wget --ca-certificate=/path/to/ca-cert.pem https://example.com

# Use specific SSL protocol
wget --secure-protocol=TLSv1_2 https://example.com

Logging and Output

Output control:

# Quiet mode
wget -q https://example.com/file.zip

# Verbose output
wget -v https://example.com/file.zip

# Log to file
wget -o download.log https://example.com/file.zip

# Append to log file
wget -a download.log https://example.com/file.zip

# Background download with log
wget -b -o download.log https://example.com/largefile.zip

FTP Downloads

FTP protocol:

# Anonymous FTP download
wget ftp://ftp.example.com/pub/file.tar.gz

# FTP with credentials
wget --ftp-user=username --ftp-password=password ftp://ftp.example.com/file.zip

# Passive FTP mode
wget --passive-ftp ftp://ftp.example.com/file.zip

# Recursive FTP download
wget -r ftp://ftp.example.com/pub/

Practical Examples

Website backup:

# Complete website mirror
wget --mirror --convert-links --adjust-extension \
     --page-requisites --no-parent \
     https://example.com

# Backup with compression
wget -r -l 3 -k -p -E -np \
     --reject="*.exe,*.zip" \
     https://example.com

Batch downloads:

# Download from URL list
wget -i urls.txt

# Download with different names
wget -O file1.zip https://example.com/download?id=1
wget -O file2.zip https://example.com/download?id=2

# Parallel downloads (using xargs)
cat urls.txt | xargs -n 1 -P 4 wget

Scripting with wget

Shell script integration:

#!/bin/bash

# Check if download was successful
if wget -q --spider https://example.com/file.zip; then
    echo "File exists, downloading..."
    wget https://example.com/file.zip
else
    echo "File not found"
fi

# Download with error handling
wget -t 3 -T 30 https://example.com/file.zip
if [ $? -eq 0 ]; then
    echo "Download successful"
else
    echo "Download failed"
fi

Configuration File

Using .wgetrc:

# ~/.wgetrc configuration file
user_agent = Mozilla/5.0 (compatible; MyBot/1.0)
timeout = 30
tries = 3
wait = 2
robots = on
continue = on

# Use configuration
wget https://example.com/file.zip

Common Use Cases

File downloads: Download software, documents, and media files
Website mirroring: Create offline copies of websites
API data retrieval: Download data from REST APIs
Backup automation: Automated backup of web content
Software deployment: Download packages and updates
Web scraping: Extract content from websites
Monitoring: Check if files or pages are available
Batch processing: Download multiple files automatically

Troubleshooting

Common issues:

# Check if URL is accessible
wget --spider https://example.com/file.zip

# Debug connection issues
wget --debug https://example.com/file.zip

# Handle redirects
wget --max-redirect=5 https://example.com/file.zip

# Deal with cookies
wget --save-cookies cookies.txt --keep-session-cookies \
     https://example.com/login
wget --load-cookies cookies.txt https://example.com/protected

Best Practices: Always respect robots.txt, use appropriate delays between requests, and be mindful of server resources when performing recursive downloads.

wget vs curl

Feature	wget	curl
Recursive downloads	Yes	No
Resume downloads	Yes	Yes
Protocol support	HTTP, HTTPS, FTP	Many protocols
Library support	No	Yes (libcurl)

Related Commands: curl, lynx, ftp, scp, rsync

Linux Commands