🐧 Week 2: Linux + Networking Basics

Day 3: Text Processing & Searching

Duration: 5 Hours

📚 Learning Objectives

By the end of this session, you will be able to:

  • Search text using grep with various options
  • Find files using find command
  • Process text with sed and awk
  • Use pipes to chain commands
  • Redirect input/output

📖 Core Concepts (2 Hours)

grep - Search Text in Files

# Basic grep grep "error" logfile.txt # Case insensitive grep -i "error" logfile.txt # Show line numbers grep -n "error" logfile.txt # Recursive search (all files in directory) grep -r "TODO" /path/to/project/ # Invert match (lines NOT containing) grep -v "INFO" logfile.txt # Count matches grep -c "error" logfile.txt # Show context (lines before/after) grep -B 2 -A 2 "error" logfile.txt # 2 before, 2 after grep -C 3 "error" logfile.txt # 3 lines context # Regular expressions grep -E "error|warning" logfile.txt # Extended regex grep "^2024" logfile.txt # Lines starting with 2024 grep "completed$" logfile.txt # Lines ending with completed

find - Search for Files

# Find by name find /path -name "*.txt" find . -name "config*" # Find by type find . -type f # Files only find . -type d # Directories only # Find by size find . -size +10M # Larger than 10MB find . -size -1k # Smaller than 1KB # Find by time find . -mtime -7 # Modified in last 7 days find . -mmin -60 # Modified in last 60 minutes # Find and execute find . -name "*.log" -exec rm {} \; # Delete all .log files find . -name "*.sh" -exec chmod +x {} \; # Combine conditions find . -name "*.py" -size +1k -mtime -7

Pipes and Redirection

# Pipe (|) - send output to another command cat file.txt | grep "error" ls -l | head -5 ps aux | grep nginx # Chain multiple pipes cat logfile.txt | grep "error" | wc -l # Output redirection echo "hello" > file.txt # Overwrite echo "world" >> file.txt # Append # Input redirection sort < unsorted.txt # Redirect both stdout and stderr command > output.txt 2>&1 command > output.txt 2> errors.txt # Discard output command > /dev/null 2>&1

sed - Stream Editor

# Replace text sed 's/old/new/' file.txt # First occurrence per line sed 's/old/new/g' file.txt # All occurrences # Replace in place sed -i 's/old/new/g' file.txt # Delete lines sed '/pattern/d' file.txt # Delete matching lines sed '1,5d' file.txt # Delete lines 1-5 # Print specific lines sed -n '10,20p' file.txt # Print lines 10-20 # Common DevOps uses sed 's/localhost/192.168.1.100/g' config.txt sed -i 's/DEBUG=false/DEBUG=true/g' .env

awk - Pattern Processing

# Print specific columns awk '{print $1}' file.txt # First column awk '{print $1, $3}' file.txt # First and third awk '{print $NF}' file.txt # Last column # With delimiter awk -F':' '{print $1}' /etc/passwd # Username from passwd # Filter and print awk '/error/ {print}' logfile.txt awk '$3 > 100 {print $1, $3}' data.txt # Calculate awk '{sum += $1} END {print sum}' numbers.txt awk '{sum += $1} END {print "Average:", sum/NR}' numbers.txt # Format output awk '{printf "%-20s %s\n", $1, $2}' file.txt

Other Useful Commands

# Sort sort file.txt # Alphabetical sort -n file.txt # Numeric sort -r file.txt # Reverse sort -u file.txt # Unique only sort -k2 file.txt # Sort by column 2 # Unique uniq file.txt # Remove consecutive duplicates sort file.txt | uniq # Remove all duplicates uniq -c file.txt # Count occurrences # Cut cut -d':' -f1 /etc/passwd # First field, delimiter : cut -c1-10 file.txt # Characters 1-10 # tr - Translate characters echo "hello" | tr 'a-z' 'A-Z' # Uppercase cat file.txt | tr -d '\r' # Remove carriage returns

🔬 Hands-on Lab (2.5 Hours)

Lab 1: Log Analysis with grep

  • Search for errors in log files
  • Extract specific patterns
  • Count occurrences
# Create sample log cat > ~/devops-project/logs/access.log << 'EOF' 192.168.1.100 - - [15/Jan/2024:10:00:01] "GET /api/users HTTP/1.1" 200 1234 192.168.1.101 - - [15/Jan/2024:10:00:02] "POST /api/login HTTP/1.1" 200 456 192.168.1.100 - - [15/Jan/2024:10:00:03] "GET /api/products HTTP/1.1" 500 789 192.168.1.102 - - [15/Jan/2024:10:00:04] "GET /api/users HTTP/1.1" 200 1234 192.168.1.100 - - [15/Jan/2024:10:00:05] "DELETE /api/users/5 HTTP/1.1" 403 123 192.168.1.103 - - [15/Jan/2024:10:00:06] "GET /api/orders HTTP/1.1" 200 5678 192.168.1.100 - - [15/Jan/2024:10:00:07] "POST /api/orders HTTP/1.1" 500 234 EOF # Lab exercises cd ~/devops-project/logs # Find all errors (500 status) grep "500" access.log # Find requests from specific IP grep "192.168.1.100" access.log # Count GET vs POST requests grep -c "GET" access.log grep -c "POST" access.log # Find all failed requests (non-200) grep -v " 200 " access.log

Lab 2: Data Processing Pipeline

  • Extract specific columns from log
  • Sort and find unique values
  • Create summary statistics
# Extract all IP addresses (first column) awk '{print $1}' access.log # Find unique IPs awk '{print $1}' access.log | sort | uniq # Count requests per IP awk '{print $1}' access.log | sort | uniq -c | sort -rn # Extract HTTP status codes awk '{print $9}' access.log | sort | uniq -c # Find top requested URLs awk '{print $7}' access.log | sort | uniq -c | sort -rn

Lab 3: Config File Manipulation

  • Use sed to modify configuration
  • Replace values in config files
  • Extract specific settings
# Create sample config cat > ~/devops-project/config/app.conf << 'EOF' # Application Configuration APP_NAME=myapp APP_PORT=8080 DEBUG=false LOG_LEVEL=INFO DATABASE_HOST=localhost DATABASE_PORT=5432 CACHE_ENABLED=true EOF cd ~/devops-project/config # View config cat app.conf # Change port sed 's/APP_PORT=8080/APP_PORT=3000/' app.conf # Enable debug mode sed 's/DEBUG=false/DEBUG=true/' app.conf # Change database host sed 's/DATABASE_HOST=localhost/DATABASE_HOST=db.example.com/' app.conf # Multiple replacements (save to new file) sed -e 's/APP_PORT=8080/APP_PORT=3000/' \ -e 's/DEBUG=false/DEBUG=true/' \ -e 's/LOG_LEVEL=INFO/LOG_LEVEL=DEBUG/' \ app.conf > app.dev.conf

📝 Practice Exercises

  1. Find all .py files in your home directory modified in the last week
  2. Extract all unique IP addresses from /var/log/auth.log
  3. Replace "http://" with "https://" in a config file
  4. Use awk to calculate average response time from logs

💡 DevOps Use Cases

  • grep: Search logs for errors, find processes
  • find: Locate config files, clean old logs
  • sed: Automate config changes, template processing
  • awk: Parse logs, generate reports
  • Pipes: Build powerful one-liners for automation

✅ Day 3 Checklist

  • Can search text with grep (including regex)
  • Can find files with various criteria
  • Understand pipes and redirection
  • Can use sed for text replacement
  • Can extract columns with awk
  • Can chain commands for complex operations