Linux
Commands
> The terminal is the data engineer's workbench. Navigate the filesystem, manipulate files, control processes, and pipe data between tools — the UNIX way.
Try It // Sandbox Shell
A sandboxed shell with a virtual filesystem runs entirely in your browser. Type help to see the supported commands, or hit Run examples to walk through the basics. Use ↑/↓ for command history.
help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -lFilesystem Navigation
Every shell session starts in a working directory. These commands move you around and reveal what's there.
pwd # print working directory
ls # list files
ls -la # long format, include hidden
ls -lh /var/log # human-readable sizes
cd /etc # change directory (absolute)
cd ../projects # relative path
cd ~ # home directory
cd - # previous directory
tree -L 2 # show tree, 2 levels deephelp, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -lFile Operations
Create, copy, move, and delete — the four verbs of file manipulation.
touch report.txt # create empty file
mkdir -p data/raw/2025 # make nested dirs
cp source.csv backup.csv # copy file
cp -r src/ build/ # copy directory recursively
mv old.txt new.txt # rename
mv *.log archive/ # move multiple
rm temp.txt # delete file
rm -rf node_modules/ # delete dir recursively (DANGER)
ln -s /var/log/app.log app # symbolic linkReading & Searching Files
cat config.yml # print whole file
less /var/log/syslog # paginated view (q to quit)
head -n 20 access.log # first 20 lines
tail -n 50 access.log # last 50 lines
tail -f app.log # follow live
# Search inside files
grep "ERROR" app.log
grep -r "TODO" src/ # recursive
grep -in "warn" *.log # case-insensitive, line numbers
# Find files by name
find . -name "*.py"
find /var -type f -size +10Mhelp, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -lPermissions & Ownership
Linux permissions are three triplets — owner, group, others — each controlling read (r), write (w), and execute (x).
ls -l script.sh
# -rwxr-xr-- 1 ada devs 240 Apr 24 12:00 script.sh
chmod +x script.sh # add execute bit
chmod 755 script.sh # rwxr-xr-x (numeric)
chmod -R 644 docs/ # recursive
chown ada:devs file.txt # change owner & group
sudo chown -R www:www /srv/web
umask 022 # default permission maskThe sandbox tracks real permission bits and ownership. Watch how ls -l changes after each chmod / chown, and inspect a file with stat.
help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -lPipes & Redirection
The UNIX philosophy: small tools that do one thing well, composed via pipes (|) and redirection (>, >>, <).
# Pipe stdout of one command into another
ps aux | grep python | wc -l
# Redirect output to file (overwrite / append)
ls > files.txt
echo "done" >> log.txt
# Redirect stderr separately
./build.sh 2> errors.log
./build.sh > out.log 2>&1 # both streams
# Read input from a file
sort < unsorted.txt
# Useful filter combo
cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | headTry the pipe pipeline below in the sandbox — count unique IP addresses that hit the access log, sorted by frequency.
help, ls -la, cat readme.txt, or pipes like cat projects/data/access.log | grep GET | wc -lProcesses & Jobs
ps aux # all processes
ps -ef | grep nginx # filter by name
top # interactive monitor
htop # nicer alternative
kill 1234 # send SIGTERM to PID
kill -9 1234 # force kill (SIGKILL)
killall python # by name
# Background & foreground
long_task & # run in background
jobs # list background jobs
fg %1 # bring job 1 to foreground
nohup ./worker.sh & # survive logoutNetworking
ping google.com
curl -s https://api.example.com/users | head
wget https://files.example.com/data.zip
ip addr # network interfaces
ss -tuln # listening sockets (modern netstat)
ssh ada@server.example.com
scp report.csv ada@host:/srv/data/
rsync -avz ./src/ host:/srv/app/Environment & Variables
echo $HOME
echo $PATH
export DATABASE_URL="postgres://localhost/app"
env | grep DATABASE
# Persist in shell config
echo 'export EDITOR=nvim' >> ~/.bashrc
source ~/.bashrc
# Inline for one command
NODE_ENV=production node server.jsArchives & Compression
# tar — create / extract
tar -czvf backup.tar.gz src/ # create gzipped
tar -xzvf backup.tar.gz # extract
tar -tzvf backup.tar.gz # list contents
# zip / unzip
zip -r project.zip project/
unzip project.zip
# gzip a single file
gzip large.csv # -> large.csv.gz
gunzip large.csv.gzShell Scripting Primer
A script is just a file of commands with a shebang line and execute permission.
#!/usr/bin/env bash
set -euo pipefail # fail fast, undefined vars, pipe errors
APP_NAME="data-struct"
DEPLOY_DIR="/srv/${APP_NAME}"
echo "Deploying $APP_NAME..."
if [ ! -d "$DEPLOY_DIR" ]; then
mkdir -p "$DEPLOY_DIR"
fi
for file in dist/*.js; do
cp "$file" "$DEPLOY_DIR/"
done
echo "Done. $(date)"System Inspection
uname -a # kernel info
uptime # load average
df -h # disk free, human-readable
du -sh ./* # size of each item here
free -h # memory usage
who # logged-in users
history | tail -20 # recent commands