Key UNIX commands for Tufts cluster1

Adapted from

The critical commands are in bold (but others you may eventually find useful). Gavin Schnitzler 11/2011.

The best way to communicate with the cluster on a Mac is to open a command window and use ssh. Normal mac copy and paste key-combinations work here, and up down arrows review recent commands. To transfer files use sftp (below).

The best way to communicate with the cluster from a PC is to use the free program “Tera Term VT”. To copy and paste, you need to use [alt]-c and [alt]-v (since control-c is the UNIX interrupt command). Up/down arrows review recent commands. To transfer files most easily, use the WinSCP program.

*A wildcard character that represents any character in filenames etc. for most commands. Thus ‘head *.txt’ prints the top 10 lines of all files in the current directory ending in ‘.txt’.

or <Indicators of direction for command input (<) and output (>). ‘command file1 > file2’ is extremely useful for directing output of command execution on file1 to file2 (instead of, by default, to the screen).

| (pipe)Used to link commands together, taking the output of leftmost commands as input to rightmost. E.g. ‘head –n 100 | more’ prints the top 100 lines of a file one screenful at a time.

awk ‘OFS=”\t” {print $1;$4+$5,”etc”}’ infile > outfileCreate columns in outfile based on specified tab-separated columns in infile (\t means tab), where $1=column 1. In this example column 2 gets the sum of infile columns 4 & 5, and column 3 is always “etc”.

bjobsTells you about jobs you have running in batch. ‘bjobs –l’ gives detailed information.

bkill job#Kills a job you have running in batch (find job number with ‘bjobs’).

bsub –oo output.logfile command [parameters]Submits run of any command to background batch process (a job). You must do this for processes that take more than about 30 seconds or the sysadmins will get upset. The –oo file contains information about the run, including any errors or anything sent to the screen (thus if you use “bsub command infile > outfile” the output will end up in the –oo file.

bsub –Ip –q int_publicMakes the bjob run “interactive”. Normal program output goes to the screen (or is redirected by > as appropriate). You can’t do anything in that window while it’s running, but the sysadmins won’t kill your program (which is what they do for long-running programs started w/o bsub).

bzip2 –d file.bz2Opens bzip2 archives, deleting originals

cat f1prints the contents of f1 to the screen (‘more’ is generally better).

cat f1 f2 > f3appends f2 to the end of f1 and puts the results in f3

cd directorynameChange directory. ‘cd ..’ moves you up one directory, ‘cd ../..’ moves up 2. ‘cd ../dirname’ moves up one & then down into dirname. ‘cd /’ takes you to the root (top level) directory, and you can specify subdirectories between /’s. Thus ‘cd /cluster/shared/gschni01’ takes you to my shared directory.‘cd ~’ takes you to your home directory (the only place that is not cleaned out every 28 days on the cluster… but has limited storage).One last thing about pathnames: “.” stands for the current directory. This is sometimes useful to get UNIX to recognize an executable file in the current directory – if ‘command’ doesn’t work, try ‘./command’.

clearClears the screen

cp f1 f2Copies file1 into file 2 (file 1 is unchanged).

cp –r d1 ../d2Copies the directory d1 to d2 (in this case d2 will be placed up one directory).

^-c (ctrl c)Kills the current running process

diff f1 f2Lists differences between f1 & f2, line by line.

export NAME=definitionGenerally used to tell UNIX where to find things (often necessary to run programs), as in…

export CREAD=/mydir/cread_folder/… export PATH=$PATH:CREAD/binThis adds the folder /mydir/cread_folder/bin to the end of the places UNIX searches for programs to run, where $PATH was your original list. Be sure to include ‘$PATH:’ or you may need to logout & restart.

find ./ -name “*namepart*” –printFinds a file containing “namepart” in the current or lower directories

find . –exec touch ‘{}’ \;The magic phrase that “touches” all files in your shared directory, so their access date becomes the current data, and are not deleted when the sysadmins wipe old files every 28 days.

dulists all subdirectories and their sizes

expr ‘first_number’ operator ‘2nd_number’ (e.g. expr ‘1’ + ‘1’)Does simple math. The ‘’ quotes can also contain any command that returns a number.

env lists current environment settings, such as PATH, etc.

ftp account (or sftp for secure connection)Makes an ftp connection to a machine account you have access to (login & password). Once there,‘ls’ and ‘cd’ commands work on the remote account. Use ‘lls’ and ‘lcd’ for local directories. ‘get’ gets a file from that account. ‘put’ puts a file there (using current local & remote directories). Use sftp to transfer files between the cluster and a UNIX shell on your mac or PC.

grep “pattern” file > outputSearches for ‘pattern’ within a file and outputs only those lines containing it. In the pattern‘*’ means any number of characters, so grep “*dog*” would find any line with ‘dog’ in it. ‘^’ means start of line, ‘$’ means end of line and‘.’ means any one character.

grep –c “pattern” fileTells the number of lines containing “pattern”, where –c means ‘count’.

gunzip file.gzDecompresses a .gz file (removing the .gz file)

gzip file Compresses a file into .gz format, removing the original file (useful to save space on the cluster)

head –n # filePrints only the top # lines of a file (default 10). If # is negative, prints all lines except the last # lines.

history > fileRecords all of your past ~1000 or so command entries to a file.

lsLists the contents of the current directory. ‘ls –l’ gives details. ‘ls path/directory/’ lists that directory’s contents.

man commandGives usage help on a command (often full of many un-needed details). Use [space] to move forward, “q” to stop.

mkdir DMakes a new directory named‘D’

module availableTells what “modules” (special environments to run specific programs) are available on the cluster. To start one, type “module add “ and the name before the / in that listing (the number after the / is the version number).

module add RStarts R on the cluster, follow the instructions to begin (usually typing ‘bsub –Ip –q int_public R’).

module add pythonMakes python 2.6.5 your default version, instead of 2.4 (necessary to run some programs).

more FLists contents of file F one screen at a time. To move forward hit [space], to stop hit “q”

mv f1 f2Changes the name of f1 to f2

mv f1 directoryMoves f1 to the specified directory (in current directory, or specified by ../dir/dir, /dir/dir, etc. path). See ‘cd’ for details.

perl program.pl parametersRuns a perl program with associated program-specific parameters. Without parameters, most programs will print a brief usage summary.

printenv PATHPrints just the PATH variable of your environment settings.

pythonprogram.py parametersRuns a python program with associated pogram-specific parameters

pwdGives the pathname of the current directory (stands for “print working directory”)

rm fRemoves a file. ‘rm *.extension’ removes all files ending with “.extension” (use caution)

rm –r directoryRemoves directory and all files in it (use extreme caution)

rmdir DRemoves directory D (only if empty)

s f1 > f2Alphabetically sort file f1 and put the results in f2

sed –n ‘#p’ filePrints line number ‘# ‘of a file

sed ‘#d’ file1 > file2 Deletes line number # from file1 (not changing the original file) and writes the new file to file2

sftpSecure ftp. See ‘ftp’.

ssh connects you to your cluster account from any UNIX shell (e.g. on your PC).

tar file[s]Combines multiple files into one archive or, more usefully, unpacks such archives. ‘tar –xf file’ unpacks anything with a .tar extension. ‘tar –xjf’ unpacks most archives. ‘tar –xvzf’ unpacks archives with .tgz extensions. Sometimes files have multiple packings, such as file.tar.gz, in which case you’d have to use gunzip first & then tar.

tail –n #Prints the last ‘#’ lines of a file, default (without –n) is 10.

unzipUnpacks files with .zip extensions

viOpens vi text editor. The basics are: Use arrow keys to move. Type “i” to begin entering text, “[esc]” to stop. “:wq” to save changes and quit, and “:q!” to quit without saving. “/” to find a pattern, “:#” to go to line #, “:$” to go to the end of the file, “$” to go to the end of a line and “^” to go to the beginning of a line. “dd [#]’ deletes # lines starting with current line (default 1 line), “yy[#]”= yank, copies # lines from current. “p” pastes lines most recently deleted or yanked.

wc < filename“Word count”: tells lines, words & characters in file. ‘wc –l’ (only lines),’ wc –w’ (only words)

wget URLGets a file from a URL. To get the full URL, right- or command-click on a link & choose ‘copy link location’.

which commandGives directory where ‘command’ is run from by default.

whereis commandTells locations of all executable versions of ‘command’