From VMS to Linux HOWTO: Real Life Examples

11. Real Life Examples

UNIX' core idea is that there are many simple commands that can linked together via piping and redirection to accomplish even really complex tasks. Look at the following examples; I'll only explain the most complex ones, for the others, please study the above sections and the man pages.

Problem: ls is too quick and the file names fly away.

Solution:


$ ls | less

Problem: I have a file containing a list of words. I want to sort it in reverse order and print it.

Solution:


$ cat myfile.txt | sort -r | lpr

Problem: my datafile has some repeated lines! How do I get rid of them?

Solution:


$ sort datafile.dat | uniq > newfile.dat

Problem: I have a file called 'mypaper.txt' or 'mypaper.tex' or some such somewhere, but I don't remember where I put it. How do I find it?

Solution:


$ find ~ -name "mypaper*"

Explanation: find is a very useful command that lists all the files in a directory tree (starting from ~ in this case). Its output can be filtered to meet several criteria, such as -name.

Problem: I have a text file containing the word 'entropy' in this directory, is there anything like SEARCH?

Solution: yes, try


$ grep -l 'entropy' *

Problem: somewhere I have text files containing the word 'entropy', I'd like to know which and where they are. Under VMS I'd use search entropy [...]*.*.*, but grep can't recurse subdirectories. Now what?

Solution:


$ find . -exec grep -l "entropy" {} \; 2> /dev/null

Explanation: find . outputs all the file names starting from the current directory, -exec grep -l "entropy" is an action to be performed on each file (represented by {}), \ terminates the command. If you think this syntax is awful, you're right.

In alternative, write the following script:

#!/bin/sh
# rgrep: recursive grep
if [ $# != 3 ]
then
  echo "Usage: rgrep --switches 'pattern' 'directory'"
  exit 1
fi
find $3 -name "*" -exec grep $1 $2 {} \; 2> /dev/null

Explanation: grep works like search, and combining it with find we get the best of both worlds.

Problem: I have a data file that has two header lines, then every line has 'n' data, not necessarily equally spaced. I want the 2nd and 5th data of each line. Shall I write a Fortran program...?

Solution: nope. This is quicker:


$ awk 'NL > 2 {print $2, "\t", $5}' datafile.dat > newfile.dat

Explanation: the command awk is actually a programming language: for each line starting from the third in datafile.dat, print out the second and fifth field, separated by a tab. Learn some awk---it saves a lot of time.

Problem: I've downloaded an FTP site's ls-lR.gz to check its contents. For each subdirectory, it contains a line that reads "total xxxx", where xxxx is size in kbytes of the dir contents. I'd like to get the grand total of all these xxxx values.

Solution:


zcat ls-lR.gz | awk ' $1 == "total" { i += $2 } END {print i}'

Explanation: zcat outputs the contents of the .gz file and pipes to awk, whose man page you're kindly requested to read ;-)

Problem: I've written a Fortran program, myprog, to calculate a parameter from a data file. I'd like to run it on hundreds of data files and have a list of the results, but it's a nuisance to ask each time for the file name. Under VMS I'd write a lengthy command file, and under Linux?

Solution: a very short script. Make your program look for the data file 'mydata.dat' and print the result on the screen (stdout), then write the following script:

#!/bin/sh
# myprog.sh: run the same command on many different files
# usage: myprog.sh *.dat
for file in $*  # for all parameters (e.g. *.dat)
do
  # append the file name to result.dat
  echo -n "${file}:    " >> results.dat
  # copy current argument to mydata.dat, run myprog 
  # and append the output to results.dat
  cp ${file} mydata.dat ; myprog >> results.dat
done

Problem: I want to replace `geology' with `geophysics' in all my text files. Shall I edit them all manually?

Solution: nope. Write this shell script:

#!/bin/sh
# replace $1 with $2 in $*
# usage: replace "old-pattern" "new-pattern" file [file...]
OLD=$1          # first parameter of the script
NEW=$2          # second parameter
shift ; shift   # discard first two parameters: the next are the file names
for file in $*  # for all files given as parameters
do
# replace every occurrence of OLD with NEW, save on a temporary file
  sed "s/$OLD/$NEW/g" ${file} > ${file}.new
# rename the temporary file as the original file
  /bin/mv ${file}.new ${file}
done

Problem: I have some data files, I don't know their length and have to remove their last but one and last but two lines. Er... manually?

Solution: no, of course. Write this script:

#!/bin/sh
# prune.sh: removes n-1th and n-2th lines from files
# usage: prune.sh file [file...]
for file in $*   # for every parameter
do
  LINES=`wc -l $file | awk '{print $1}'`  # number of lines in file
  LINES=`expr $LINES - 3`                 # LINES = LINES - 3
  head -n $LINES $file > $file.new        # output first LINES lines
  tail -n 1 $file >> $file.new            # append last line
done

I hope these examples whetted your appetite...