Linux Pipes
Connect commands together using the pipe operator (|) to create powerful command chains and process data efficiently.
What are Pipes?
Pipes are a fundamental feature of Unix-like systems that allow you to connect the output of one command to the input of another command. The pipe operator | creates a pipeline where data flows from left to right through a series of commands.
How Pipes Work
- The output (stdout) of the first command becomes the input (stdin) of the second command
- Commands run concurrently, not sequentially
- Data is processed in real-time as it flows through the pipeline
- Error messages (stderr) are not piped by default
Basic Syntax
command1 | command2
command1 | command2 | command3
command1 | command2 | command3 | ... | commandN
The pipe operator | connects commands, allowing the output of one to become the input of the next.
Basic Examples
Simple pipes
# List files and count them
ls | wc -l
# Show processes and search for specific one
ps aux | grep firefox
# Display file content and page through it
cat largefile.txt | more
# Sort file contents
cat names.txt | sort
Basic pipe operations for common tasks
Text processing chains
# Count unique lines in a file
cat file.txt | sort | uniq | wc -l
# Find and count specific patterns
grep "error" logfile.txt | wc -l
# Extract and sort specific columns
cut -d: -f1 /etc/passwd | sort
# Remove duplicates and sort
cat data.txt | sort | uniq
Chain commands for text processing and analysis
System monitoring
# Find largest files
du -h | sort -hr | head -10
# Monitor system processes
ps aux | sort -k3 -nr | head -5
# Check disk usage
df -h | grep -v tmpfs
# Network connections
netstat -tuln | grep LISTEN
Use pipes for system monitoring and analysis
Advanced Pipe Usage
Complex data processing
# Analyze log files
cat access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10
# Process CSV data
cat data.csv | cut -d, -f2 | tail -n +2 | sort -n | uniq -c
# Extract and analyze specific patterns
grep "ERROR" app.log | cut -d' ' -f1-3 | sort | uniq -c | sort -nr
# Multi-step text transformation
cat file.txt | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z ]//g' | sort | uniq -c
Complex data processing with multiple pipe stages
Combining with redirection
# Process and save results
cat data.txt | sort | uniq > unique_data.txt
# Append processed data
ps aux | grep python | awk '{print $2}' >> python_pids.txt
# Process and display with save
cat logfile | grep ERROR | tee errors.txt | wc -l
# Complex processing with multiple outputs
cat data.txt | sort | tee sorted.txt | uniq | tee unique.txt | wc -l
Combine pipes with redirection for flexible output handling
Error handling in pipes
# Pipe both stdout and stderr
command1 2>&1 | command2
# Pipe only stderr
command1 2>&1 >/dev/null | grep error
# Handle errors in pipeline
set -o pipefail # Make pipeline fail if any command fails
command1 | command2 | command3
# Check pipeline exit status
command1 | command2
echo "Pipeline exit status: ${PIPESTATUS[@]}"
Handle errors and check status in command pipelines
Common Pipe Patterns
File and directory operations
# Find files by size
find . -type f | xargs ls -lh | sort -k5 -hr
# Count files by extension
find . -name "*.*" | sed 's/.*\.//' | sort | uniq -c | sort -nr
# Find and process files
find . -name "*.log" | xargs grep "ERROR" | wc -l
# Directory size analysis
du -sh * | sort -hr
Common patterns for file and directory operations
Text analysis and reporting
# Word frequency analysis
cat text.txt | tr ' ' '\n' | sort | uniq -c | sort -nr | head -10
# Line length statistics
cat file.txt | awk '{print length}' | sort -n | tail -1
# Extract email addresses
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt | sort | uniq
# Generate reports
cat sales.csv | awk -F, '{sum+=$3} END {print "Total:", sum}'
Text analysis and reporting with pipes
System administration
# Process monitoring
ps aux | awk '{print $3, $11}' | sort -nr | head -10
# Memory usage analysis
free -m | grep Mem | awk '{print ($3/$2)*100}'
# Network analysis
ss -tuln | awk '{print $1}' | sort | uniq -c
# Log analysis
tail -f /var/log/syslog | grep --line-buffered ERROR
System administration tasks using pipes
Named Pipes (FIFOs)
Creating and using named pipes
# Create a named pipe
mkfifo mypipe
# Write to named pipe (in one terminal)
echo "Hello World" > mypipe
# Read from named pipe (in another terminal)
cat < mypipe
# Use named pipe for inter-process communication
mkfifo /tmp/logpipe
tail -f /var/log/syslog > /tmp/logpipe &
grep ERROR < /tmp/logpipe
Create and use named pipes for inter-process communication
Named pipe applications
# Log processing pipeline
mkfifo /tmp/logprocessor
# Producer process
tail -f /var/log/apache2/access.log > /tmp/logprocessor &
# Consumer process
while read line; do
echo "Processing: $line"
# Process log line here
done < /tmp/logprocessor
# Cleanup
rm /tmp/logprocessor
Practical applications of named pipes
Performance Considerations
Pipe Performance Tips
- Buffer size - Pipes have limited buffer size (typically 64KB)
- Blocking behavior - Writers block when buffer is full
- Parallel processing - Commands in pipeline run concurrently
- Memory usage - Pipes use kernel memory, not user space
- Error propagation - Use pipefail to catch errors
Optimizing pipe performance
# Use appropriate buffer sizes
cat largefile | buffer -s 1M | sort
# Avoid unnecessary pipes
# Instead of: cat file | grep pattern
# Use: grep pattern file
# Use parallel processing
cat data.txt | parallel --pipe sort | sort -m
# Monitor pipeline performance
time (command1 | command2 | command3)
Optimize pipe performance for large data processing
Debugging Pipes
Debugging techniques
# Use tee to inspect intermediate results
command1 | tee debug1.txt | command2 | tee debug2.txt | command3
# Check pipeline exit codes
set -o pipefail
command1 | command2 | command3
echo "Exit codes: ${PIPESTATUS[@]}"
# Add debugging output
command1 | (echo "Stage 1 complete" >&2; cat) | command2
# Use verbose mode
set -x
command1 | command2 | command3
set +x
Debug complex pipelines and identify issues
Common debugging scenarios
# Debug empty output
command1 | wc -l # Check if command1 produces output
# Debug slow pipelines
command1 | pv | command2 # Monitor data flow rate
# Debug memory issues
command1 | (ulimit -v 100000; command2) # Limit memory
# Debug hanging pipelines
timeout 30 command1 | command2 # Set timeout
Common debugging scenarios and solutions
Best Practices
Pipe Best Practices
- Use
set -o pipefail in scripts to catch pipeline errors
- Avoid unnecessary use of
cat in pipes
- Use
tee when you need to save intermediate results
- Consider memory usage with large data sets
- Test pipelines with small data sets first
- Document complex pipelines with comments
Common Pitfalls
- Useless use of cat -
cat file | grep pattern vs grep pattern file
- Ignoring errors - Pipeline continues even if early commands fail
- Buffer overflow - Large data can cause blocking
- Broken pipes - Reader exits before writer finishes
- Resource leaks - Named pipes not cleaned up