Linux Pipes

Connect commands together using the pipe operator (|) to create powerful command chains and process data efficiently.

What are Pipes?

Pipes are a fundamental feature of Unix-like systems that allow you to connect the output of one command to the input of another command. The pipe operator | creates a pipeline where data flows from left to right through a series of commands.

How Pipes Work
  • The output (stdout) of the first command becomes the input (stdin) of the second command
  • Commands run concurrently, not sequentially
  • Data is processed in real-time as it flows through the pipeline
  • Error messages (stderr) are not piped by default

Basic Syntax

command1 | command2 command1 | command2 | command3 command1 | command2 | command3 | ... | commandN

The pipe operator | connects commands, allowing the output of one to become the input of the next.

Basic Examples

Simple pipes

# List files and count them ls | wc -l # Show processes and search for specific one ps aux | grep firefox # Display file content and page through it cat largefile.txt | more # Sort file contents cat names.txt | sort

Basic pipe operations for common tasks

Text processing chains

# Count unique lines in a file cat file.txt | sort | uniq | wc -l # Find and count specific patterns grep "error" logfile.txt | wc -l # Extract and sort specific columns cut -d: -f1 /etc/passwd | sort # Remove duplicates and sort cat data.txt | sort | uniq

Chain commands for text processing and analysis

System monitoring

# Find largest files du -h | sort -hr | head -10 # Monitor system processes ps aux | sort -k3 -nr | head -5 # Check disk usage df -h | grep -v tmpfs # Network connections netstat -tuln | grep LISTEN

Use pipes for system monitoring and analysis

Advanced Pipe Usage

Complex data processing

# Analyze log files cat access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10 # Process CSV data cat data.csv | cut -d, -f2 | tail -n +2 | sort -n | uniq -c # Extract and analyze specific patterns grep "ERROR" app.log | cut -d' ' -f1-3 | sort | uniq -c | sort -nr # Multi-step text transformation cat file.txt | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z ]//g' | sort | uniq -c

Complex data processing with multiple pipe stages

Combining with redirection

# Process and save results cat data.txt | sort | uniq > unique_data.txt # Append processed data ps aux | grep python | awk '{print $2}' >> python_pids.txt # Process and display with save cat logfile | grep ERROR | tee errors.txt | wc -l # Complex processing with multiple outputs cat data.txt | sort | tee sorted.txt | uniq | tee unique.txt | wc -l

Combine pipes with redirection for flexible output handling

Error handling in pipes

# Pipe both stdout and stderr command1 2>&1 | command2 # Pipe only stderr command1 2>&1 >/dev/null | grep error # Handle errors in pipeline set -o pipefail # Make pipeline fail if any command fails command1 | command2 | command3 # Check pipeline exit status command1 | command2 echo "Pipeline exit status: ${PIPESTATUS[@]}"

Handle errors and check status in command pipelines

Common Pipe Patterns

File and directory operations

# Find files by size find . -type f | xargs ls -lh | sort -k5 -hr # Count files by extension find . -name "*.*" | sed 's/.*\.//' | sort | uniq -c | sort -nr # Find and process files find . -name "*.log" | xargs grep "ERROR" | wc -l # Directory size analysis du -sh * | sort -hr

Common patterns for file and directory operations

Text analysis and reporting

# Word frequency analysis cat text.txt | tr ' ' '\n' | sort | uniq -c | sort -nr | head -10 # Line length statistics cat file.txt | awk '{print length}' | sort -n | tail -1 # Extract email addresses grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt | sort | uniq # Generate reports cat sales.csv | awk -F, '{sum+=$3} END {print "Total:", sum}'

Text analysis and reporting with pipes

System administration

# Process monitoring ps aux | awk '{print $3, $11}' | sort -nr | head -10 # Memory usage analysis free -m | grep Mem | awk '{print ($3/$2)*100}' # Network analysis ss -tuln | awk '{print $1}' | sort | uniq -c # Log analysis tail -f /var/log/syslog | grep --line-buffered ERROR

System administration tasks using pipes

Named Pipes (FIFOs)

Creating and using named pipes

# Create a named pipe mkfifo mypipe # Write to named pipe (in one terminal) echo "Hello World" > mypipe # Read from named pipe (in another terminal) cat < mypipe # Use named pipe for inter-process communication mkfifo /tmp/logpipe tail -f /var/log/syslog > /tmp/logpipe & grep ERROR < /tmp/logpipe

Create and use named pipes for inter-process communication

Named pipe applications

# Log processing pipeline mkfifo /tmp/logprocessor # Producer process tail -f /var/log/apache2/access.log > /tmp/logprocessor & # Consumer process while read line; do echo "Processing: $line" # Process log line here done < /tmp/logprocessor # Cleanup rm /tmp/logprocessor

Practical applications of named pipes

Performance Considerations

Pipe Performance Tips
  • Buffer size - Pipes have limited buffer size (typically 64KB)
  • Blocking behavior - Writers block when buffer is full
  • Parallel processing - Commands in pipeline run concurrently
  • Memory usage - Pipes use kernel memory, not user space
  • Error propagation - Use pipefail to catch errors

Optimizing pipe performance

# Use appropriate buffer sizes cat largefile | buffer -s 1M | sort # Avoid unnecessary pipes # Instead of: cat file | grep pattern # Use: grep pattern file # Use parallel processing cat data.txt | parallel --pipe sort | sort -m # Monitor pipeline performance time (command1 | command2 | command3)

Optimize pipe performance for large data processing

Debugging Pipes

Debugging techniques

# Use tee to inspect intermediate results command1 | tee debug1.txt | command2 | tee debug2.txt | command3 # Check pipeline exit codes set -o pipefail command1 | command2 | command3 echo "Exit codes: ${PIPESTATUS[@]}" # Add debugging output command1 | (echo "Stage 1 complete" >&2; cat) | command2 # Use verbose mode set -x command1 | command2 | command3 set +x

Debug complex pipelines and identify issues

Common debugging scenarios

# Debug empty output command1 | wc -l # Check if command1 produces output # Debug slow pipelines command1 | pv | command2 # Monitor data flow rate # Debug memory issues command1 | (ulimit -v 100000; command2) # Limit memory # Debug hanging pipelines timeout 30 command1 | command2 # Set timeout

Common debugging scenarios and solutions

Best Practices

Pipe Best Practices
  • Use set -o pipefail in scripts to catch pipeline errors
  • Avoid unnecessary use of cat in pipes
  • Use tee when you need to save intermediate results
  • Consider memory usage with large data sets
  • Test pipelines with small data sets first
  • Document complex pipelines with comments
Common Pitfalls
  • Useless use of cat - cat file | grep pattern vs grep pattern file
  • Ignoring errors - Pipeline continues even if early commands fail
  • Buffer overflow - Large data can cause blocking
  • Broken pipes - Reader exits before writer finishes
  • Resource leaks - Named pipes not cleaned up

See also