Module 08 // System Shell

Linux
Commands

> The terminal is the data engineer's workbench. Navigate the filesystem, manipulate files, control processes, and pipe data between tools — the UNIX way.

Try It // Sandbox Shell

A sandboxed shell with a virtual filesystem runs entirely in your browser. Type help to see the supported commands, or hit Run examples to walk through the basics. Use / for command history.

shell // sandbox
DATA//STRUCT sandbox shell — type `help` for available commands.
ada@data-struct:~$
tip: simulated shell — try help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -l

Filesystem Navigation

Every shell session starts in a working directory. These commands move you around and reveal what's there.

pwd                     # print working directory
ls                      # list files
ls -la                  # long format, include hidden
ls -lh /var/log         # human-readable sizes

cd /etc                 # change directory (absolute)
cd ../projects          # relative path
cd ~                    # home directory
cd -                    # previous directory

tree -L 2               # show tree, 2 levels deep
shell // sandbox
DATA//STRUCT sandbox shell — type `help` for available commands.
ada@data-struct:~$
tip: simulated shell — try help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -l

File Operations

Create, copy, move, and delete — the four verbs of file manipulation.

touch report.txt              # create empty file
mkdir -p data/raw/2025        # make nested dirs

cp source.csv backup.csv      # copy file
cp -r src/ build/             # copy directory recursively

mv old.txt new.txt            # rename
mv *.log archive/             # move multiple

rm temp.txt                   # delete file
rm -rf node_modules/          # delete dir recursively (DANGER)

ln -s /var/log/app.log app    # symbolic link

Reading & Searching Files

cat config.yml                # print whole file
less /var/log/syslog          # paginated view (q to quit)
head -n 20 access.log         # first 20 lines
tail -n 50 access.log         # last 50 lines
tail -f app.log               # follow live

# Search inside files
grep "ERROR" app.log
grep -r "TODO" src/           # recursive
grep -in "warn" *.log         # case-insensitive, line numbers

# Find files by name
find . -name "*.py"
find /var -type f -size +10M
shell // sandbox
DATA//STRUCT sandbox shell — type `help` for available commands.
ada@data-struct:~$
tip: simulated shell — try help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -l

Permissions & Ownership

Linux permissions are three triplets — owner, group, others — each controlling read (r), write (w), and execute (x).

ls -l script.sh
# -rwxr-xr-- 1 ada devs  240 Apr 24 12:00 script.sh

chmod +x script.sh            # add execute bit
chmod 755 script.sh           # rwxr-xr-x (numeric)
chmod -R 644 docs/            # recursive

chown ada:devs file.txt       # change owner & group
sudo chown -R www:www /srv/web

umask 022                     # default permission mask

The sandbox tracks real permission bits and ownership. Watch how ls -l changes after each chmod / chown, and inspect a file with stat.

shell // sandbox
DATA//STRUCT sandbox shell — type `help` for available commands.
ada@data-struct:~$
tip: simulated shell — try help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -l

Pipes & Redirection

The UNIX philosophy: small tools that do one thing well, composed via pipes (|) and redirection (>, >>, <).

# Pipe stdout of one command into another
ps aux | grep python | wc -l

# Redirect output to file (overwrite / append)
ls > files.txt
echo "done" >> log.txt

# Redirect stderr separately
./build.sh 2> errors.log
./build.sh > out.log 2>&1     # both streams

# Read input from a file
sort < unsorted.txt

# Useful filter combo
cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head

Try the pipe pipeline below in the sandbox — count unique IP addresses that hit the access log, sorted by frequency.

shell // sandbox
DATA//STRUCT sandbox shell — type `help` for available commands.
ada@data-struct:~$
tip: simulated shell — try help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -l

Processes & Jobs

ps aux                        # all processes
ps -ef | grep nginx           # filter by name
top                           # interactive monitor
htop                          # nicer alternative

kill 1234                     # send SIGTERM to PID
kill -9 1234                  # force kill (SIGKILL)
killall python                # by name

# Background & foreground
long_task &                   # run in background
jobs                          # list background jobs
fg %1                         # bring job 1 to foreground
nohup ./worker.sh &           # survive logout

Networking

ping google.com
curl -s https://api.example.com/users | head
wget https://files.example.com/data.zip

ip addr                       # network interfaces
ss -tuln                      # listening sockets (modern netstat)

ssh ada@server.example.com
scp report.csv ada@host:/srv/data/
rsync -avz ./src/ host:/srv/app/

Environment & Variables

echo $HOME
echo $PATH

export DATABASE_URL="postgres://localhost/app"
env | grep DATABASE

# Persist in shell config
echo 'export EDITOR=nvim' >> ~/.bashrc
source ~/.bashrc

# Inline for one command
NODE_ENV=production node server.js

Archives & Compression

# tar — create / extract
tar -czvf backup.tar.gz src/      # create gzipped
tar -xzvf backup.tar.gz           # extract
tar -tzvf backup.tar.gz           # list contents

# zip / unzip
zip -r project.zip project/
unzip project.zip

# gzip a single file
gzip large.csv                    # -> large.csv.gz
gunzip large.csv.gz

Shell Scripting Primer

A script is just a file of commands with a shebang line and execute permission.

deploy.sh
#!/usr/bin/env bash
set -euo pipefail               # fail fast, undefined vars, pipe errors

APP_NAME="data-struct"
DEPLOY_DIR="/srv/${APP_NAME}"

echo "Deploying $APP_NAME..."

if [ ! -d "$DEPLOY_DIR" ]; then
  mkdir -p "$DEPLOY_DIR"
fi

for file in dist/*.js; do
  cp "$file" "$DEPLOY_DIR/"
done

echo "Done. $(date)"

System Inspection

uname -a                      # kernel info
uptime                        # load average
df -h                         # disk free, human-readable
du -sh ./*                    # size of each item here
free -h                       # memory usage
who                           # logged-in users
history | tail -20            # recent commands