wget

Non-interactive network downloader for retrieving files from web servers

Syntax: wget [options] [URL...]
Note: wget is a free utility for non-interactive download of files from the web. It supports HTTP, HTTPS, and FTP protocols and can work in the background.

Description

wget is a command-line utility for downloading files from web servers. It's designed to work robustly over slow or unstable network connections, with features like automatic retries, recursive downloads, and the ability to resume interrupted downloads. It's particularly useful for scripting and automation tasks.

Basic Options

Option Description
-O, --output-document=FILE Write documents to FILE
-c, --continue Resume partial download
-t, --tries=NUMBER Set number of retries
-T, --timeout=SECONDS Set network timeout
-q, --quiet Quiet mode (no output)
-v, --verbose Verbose output
-b, --background Run in background
-P, --directory-prefix=PREFIX Save files to PREFIX directory

Basic Usage

Simple file download:
# Download a file
wget https://example.com/file.zip

# Download with custom filename
wget -O myfile.zip https://example.com/file.zip

# Download to specific directory
wget -P /downloads https://example.com/file.zip

Resume and Retry Options

Handling interrupted downloads:
# Resume interrupted download
wget -c https://example.com/largefile.iso

# Set retry attempts
wget -t 5 https://example.com/file.zip

# Infinite retries
wget -t 0 https://example.com/file.zip

# Set timeout
wget -T 30 https://example.com/file.zip

Recursive Downloads

Website mirroring:
# Download entire website
wget -r https://example.com

# Mirror with specific depth
wget -r -l 2 https://example.com

# Mirror for offline browsing
wget -r -p -k -E https://example.com

# Download all images from a page
wget -r -A jpg,jpeg,png,gif https://example.com

Advanced Recursive Options

Option Description
-r, --recursive Turn on recursive retrieving
-l, --level=NUMBER Maximum recursion depth
-k, --convert-links Convert links for local viewing
-p, --page-requisites Download all files needed to display page
-E, --adjust-extension Save HTML/CSS files with proper extensions
-m, --mirror Shortcut for -r -N -l inf --no-remove-listing

File Type Filtering

Accept and reject patterns:
# Download only specific file types
wget -r -A "*.pdf,*.doc,*.docx" https://example.com

# Exclude specific file types
wget -r -R "*.gif,*.jpg,*.jpeg,*.png" https://example.com

# Download files matching pattern
wget -r -A "*2024*" https://example.com/files/

# Exclude directories
wget -r --exclude-directories=temp,cache https://example.com

Authentication

HTTP authentication:
# Basic HTTP authentication
wget --user=username --password=password https://example.com/file.zip

# Prompt for password
wget --user=username --ask-password https://example.com/file.zip

# Use .netrc file for credentials
wget --netrc-file=~/.netrc https://example.com/file.zip

Headers and User Agent

Custom headers:
# Set custom user agent
wget --user-agent="Mozilla/5.0 (compatible; MyBot/1.0)" https://example.com

# Add custom headers
wget --header="Accept: application/json" https://api.example.com/data

# Multiple headers
wget --header="Authorization: Bearer token123" \
     --header="Content-Type: application/json" \
     https://api.example.com/data

Rate Limiting and Politeness

Bandwidth and delay control:
# Limit download speed
wget --limit-rate=200k https://example.com/largefile.zip

# Add delay between requests
wget --wait=2 -r https://example.com

# Random wait time
wget --random-wait -r https://example.com

# Respect robots.txt
wget --robots=on -r https://example.com

HTTPS and SSL Options

SSL/TLS handling:
# Ignore SSL certificate errors (use with caution)
wget --no-check-certificate https://example.com/file.zip

# Specify CA certificate
wget --ca-certificate=/path/to/ca-cert.pem https://example.com

# Use specific SSL protocol
wget --secure-protocol=TLSv1_2 https://example.com

Logging and Output

Output control:
# Quiet mode
wget -q https://example.com/file.zip

# Verbose output
wget -v https://example.com/file.zip

# Log to file
wget -o download.log https://example.com/file.zip

# Append to log file
wget -a download.log https://example.com/file.zip

# Background download with log
wget -b -o download.log https://example.com/largefile.zip

FTP Downloads

FTP protocol:
# Anonymous FTP download
wget ftp://ftp.example.com/pub/file.tar.gz

# FTP with credentials
wget --ftp-user=username --ftp-password=password ftp://ftp.example.com/file.zip

# Passive FTP mode
wget --passive-ftp ftp://ftp.example.com/file.zip

# Recursive FTP download
wget -r ftp://ftp.example.com/pub/

Practical Examples

Website backup:
# Complete website mirror
wget --mirror --convert-links --adjust-extension \
     --page-requisites --no-parent \
     https://example.com

# Backup with compression
wget -r -l 3 -k -p -E -np \
     --reject="*.exe,*.zip" \
     https://example.com
Batch downloads:
# Download from URL list
wget -i urls.txt

# Download with different names
wget -O file1.zip https://example.com/download?id=1
wget -O file2.zip https://example.com/download?id=2

# Parallel downloads (using xargs)
cat urls.txt | xargs -n 1 -P 4 wget

Scripting with wget

Shell script integration:
#!/bin/bash

# Check if download was successful
if wget -q --spider https://example.com/file.zip; then
    echo "File exists, downloading..."
    wget https://example.com/file.zip
else
    echo "File not found"
fi

# Download with error handling
wget -t 3 -T 30 https://example.com/file.zip
if [ $? -eq 0 ]; then
    echo "Download successful"
else
    echo "Download failed"
fi

Configuration File

Using .wgetrc:
# ~/.wgetrc configuration file
user_agent = Mozilla/5.0 (compatible; MyBot/1.0)
timeout = 30
tries = 3
wait = 2
robots = on
continue = on

# Use configuration
wget https://example.com/file.zip

Common Use Cases

  • File downloads: Download software, documents, and media files
  • Website mirroring: Create offline copies of websites
  • API data retrieval: Download data from REST APIs
  • Backup automation: Automated backup of web content
  • Software deployment: Download packages and updates
  • Web scraping: Extract content from websites
  • Monitoring: Check if files or pages are available
  • Batch processing: Download multiple files automatically

Troubleshooting

Common issues:
# Check if URL is accessible
wget --spider https://example.com/file.zip

# Debug connection issues
wget --debug https://example.com/file.zip

# Handle redirects
wget --max-redirect=5 https://example.com/file.zip

# Deal with cookies
wget --save-cookies cookies.txt --keep-session-cookies \
     https://example.com/login
wget --load-cookies cookies.txt https://example.com/protected
Best Practices: Always respect robots.txt, use appropriate delays between requests, and be mindful of server resources when performing recursive downloads.

wget vs curl

Feature wget curl
Recursive downloads Yes No
Resume downloads Yes Yes
Protocol support HTTP, HTTPS, FTP Many protocols
Library support No Yes (libcurl)
Related Commands: curl, lynx, ftp, scp, rsync