wget
Non-interactive network downloader for retrieving files from web servers
Syntax:
wget [options] [URL...]
Note: wget is a free utility for non-interactive download of files from the web. It supports HTTP, HTTPS, and FTP protocols and can work in the background.
Description
wget is a command-line utility for downloading files from web servers. It's designed to work robustly over slow or unstable network connections, with features like automatic retries, recursive downloads, and the ability to resume interrupted downloads. It's particularly useful for scripting and automation tasks.
Basic Options
| Option | Description |
|---|---|
-O, --output-document=FILE |
Write documents to FILE |
-c, --continue |
Resume partial download |
-t, --tries=NUMBER |
Set number of retries |
-T, --timeout=SECONDS |
Set network timeout |
-q, --quiet |
Quiet mode (no output) |
-v, --verbose |
Verbose output |
-b, --background |
Run in background |
-P, --directory-prefix=PREFIX |
Save files to PREFIX directory |
Basic Usage
Simple file download:
# Download a file wget https://example.com/file.zip # Download with custom filename wget -O myfile.zip https://example.com/file.zip # Download to specific directory wget -P /downloads https://example.com/file.zip
Resume and Retry Options
Handling interrupted downloads:
# Resume interrupted download wget -c https://example.com/largefile.iso # Set retry attempts wget -t 5 https://example.com/file.zip # Infinite retries wget -t 0 https://example.com/file.zip # Set timeout wget -T 30 https://example.com/file.zip
Recursive Downloads
Website mirroring:
# Download entire website wget -r https://example.com # Mirror with specific depth wget -r -l 2 https://example.com # Mirror for offline browsing wget -r -p -k -E https://example.com # Download all images from a page wget -r -A jpg,jpeg,png,gif https://example.com
Advanced Recursive Options
| Option | Description |
|---|---|
-r, --recursive |
Turn on recursive retrieving |
-l, --level=NUMBER |
Maximum recursion depth |
-k, --convert-links |
Convert links for local viewing |
-p, --page-requisites |
Download all files needed to display page |
-E, --adjust-extension |
Save HTML/CSS files with proper extensions |
-m, --mirror |
Shortcut for -r -N -l inf --no-remove-listing |
File Type Filtering
Accept and reject patterns:
# Download only specific file types wget -r -A "*.pdf,*.doc,*.docx" https://example.com # Exclude specific file types wget -r -R "*.gif,*.jpg,*.jpeg,*.png" https://example.com # Download files matching pattern wget -r -A "*2024*" https://example.com/files/ # Exclude directories wget -r --exclude-directories=temp,cache https://example.com
Authentication
HTTP authentication:
# Basic HTTP authentication wget --user=username --password=password https://example.com/file.zip # Prompt for password wget --user=username --ask-password https://example.com/file.zip # Use .netrc file for credentials wget --netrc-file=~/.netrc https://example.com/file.zip
Headers and User Agent
Custom headers:
# Set custom user agent
wget --user-agent="Mozilla/5.0 (compatible; MyBot/1.0)" https://example.com
# Add custom headers
wget --header="Accept: application/json" https://api.example.com/data
# Multiple headers
wget --header="Authorization: Bearer token123" \
--header="Content-Type: application/json" \
https://api.example.com/data
Rate Limiting and Politeness
Bandwidth and delay control:
# Limit download speed wget --limit-rate=200k https://example.com/largefile.zip # Add delay between requests wget --wait=2 -r https://example.com # Random wait time wget --random-wait -r https://example.com # Respect robots.txt wget --robots=on -r https://example.com
HTTPS and SSL Options
SSL/TLS handling:
# Ignore SSL certificate errors (use with caution) wget --no-check-certificate https://example.com/file.zip # Specify CA certificate wget --ca-certificate=/path/to/ca-cert.pem https://example.com # Use specific SSL protocol wget --secure-protocol=TLSv1_2 https://example.com
Logging and Output
Output control:
# Quiet mode wget -q https://example.com/file.zip # Verbose output wget -v https://example.com/file.zip # Log to file wget -o download.log https://example.com/file.zip # Append to log file wget -a download.log https://example.com/file.zip # Background download with log wget -b -o download.log https://example.com/largefile.zip
FTP Downloads
FTP protocol:
# Anonymous FTP download wget ftp://ftp.example.com/pub/file.tar.gz # FTP with credentials wget --ftp-user=username --ftp-password=password ftp://ftp.example.com/file.zip # Passive FTP mode wget --passive-ftp ftp://ftp.example.com/file.zip # Recursive FTP download wget -r ftp://ftp.example.com/pub/
Practical Examples
Website backup:
# Complete website mirror
wget --mirror --convert-links --adjust-extension \
--page-requisites --no-parent \
https://example.com
# Backup with compression
wget -r -l 3 -k -p -E -np \
--reject="*.exe,*.zip" \
https://example.com
Batch downloads:
# Download from URL list wget -i urls.txt # Download with different names wget -O file1.zip https://example.com/download?id=1 wget -O file2.zip https://example.com/download?id=2 # Parallel downloads (using xargs) cat urls.txt | xargs -n 1 -P 4 wget
Scripting with wget
Shell script integration:
#!/bin/bash
# Check if download was successful
if wget -q --spider https://example.com/file.zip; then
echo "File exists, downloading..."
wget https://example.com/file.zip
else
echo "File not found"
fi
# Download with error handling
wget -t 3 -T 30 https://example.com/file.zip
if [ $? -eq 0 ]; then
echo "Download successful"
else
echo "Download failed"
fi
Configuration File
Using .wgetrc:
# ~/.wgetrc configuration file user_agent = Mozilla/5.0 (compatible; MyBot/1.0) timeout = 30 tries = 3 wait = 2 robots = on continue = on # Use configuration wget https://example.com/file.zip
Common Use Cases
- File downloads: Download software, documents, and media files
- Website mirroring: Create offline copies of websites
- API data retrieval: Download data from REST APIs
- Backup automation: Automated backup of web content
- Software deployment: Download packages and updates
- Web scraping: Extract content from websites
- Monitoring: Check if files or pages are available
- Batch processing: Download multiple files automatically
Troubleshooting
Common issues:
# Check if URL is accessible
wget --spider https://example.com/file.zip
# Debug connection issues
wget --debug https://example.com/file.zip
# Handle redirects
wget --max-redirect=5 https://example.com/file.zip
# Deal with cookies
wget --save-cookies cookies.txt --keep-session-cookies \
https://example.com/login
wget --load-cookies cookies.txt https://example.com/protected
Best Practices: Always respect robots.txt, use appropriate delays between requests, and be mindful of server resources when performing recursive downloads.
wget vs curl
| Feature | wget | curl |
|---|---|---|
| Recursive downloads | Yes | No |
| Resume downloads | Yes | Yes |
| Protocol support | HTTP, HTTPS, FTP | Many protocols |
| Library support | No | Yes (libcurl) |