Difference between revisions of "Bash Course: Lesson 1"

From FreekiWiki
Jump to navigation Jump to search
(created page -- class notes for bash course)
 
Line 1: Line 1:
====== Preliminaries ======
+
= Preliminaries =
 +
 
 
===== Getting Started =====
 
===== Getting Started =====
  

Revision as of 11:22, 5 May 2010

Preliminaries

Getting Started

What is a Shell Script?

The shell (what you run when you use a terminal) is a command interpreter. Used by itself, on the command line, it acts as a way to interact with a computer, but it also has the features of a programming language, one that gives you access to all the unix tools and utilities you might normally use by themselves on the command line.

It's possible to program directly on the command line using semicolons to separate lines e.g.

 cd /home/paul;ls

but normally it is done in a shell program, normally called a shell script. These take the form of a text file.

Shell scripts are commonly used in systems administration. A knowledge of shell scripting is essential for this. Many aspects of a Linux/UNIX system involve shell scripts. For instance, the boot procedure is controlled by the shell scripts in /etc/rc.d. Sysadmins often use shell scripts to automate tasks or put together simple tools. While bash scripting does not have all the power of other more fully featured programming languages, that drawback is often outweighed by the advantages of being easy to use, and providing access to powerful UNIX tools you might already be familiar with.

Starting with a Bang

At its most basic, a shell script is just a list of commands that are executed in order, one by one. All shell scripts start the same way: with a #! , followed by the path to a command. This is known as a sh-bang. This tells the system that the file is a list of commands to be passed to an interpreter. It's a form of [magic] (see man 5 magic for the gory details).

The commonest form of shell script is the Bash Script. Not only is Bash (Bourne Again SHell) the default shell on most modern GNU/Linux systems, it is also particularly suitable for shell scripting. It is what we will be using on this course.

The first line of a Bash script always starts like this:

 #!/bin/bash

Other examples are a Perl script:

 #!/usr/bin/perl
 

or this, which is a self-destructing script:

 #!/bin/rm

Hello World - the First Script

By long tradition the first thing you do when you learn to program is to write a simple example that outputs "Hello World!" This is how you do it with a Bash script. (You can download this to your computer by clicking on the file name or the icon next to it).

<file bash hello_world.sh>

  1. !/bin/bash

echo "Hello World! </file>

N.B. Note the file has the extension .sh . This is a convention to say it is a shell script. It's not absolutely necessary but it's useful to remind you of what it is. If I'm writing a quick script that I might only use once or twice, I always use this extension so I can quickly figure out what it is later. If I write a script that I will be using a lot, just like a regular program, then I won't use it, but I make sure that these are stored in a special place on my file system that's used for storing programs (see Running the Script below).


Before You Begin

Before we can run the script there is one more thing we need to do --- set the right file permissions so it can run. Specifically, to set the "executable" bit. You do this with the [chmod] (short for CHange MODe) command. There are various ways of setting the permissions (see the man page) but at the very least you need to make sure that you, the owner, have "executable" permissions:

  chmod  u+x hello_world.sh

If you want everyone on your system to be able to run in you can use chmod a+x hello_world.sh

or just chmod  +x hello_world.sh as a shorthand.

Running the Script

To run the script you have have to make sure you specify the path to the script. If you are in the same directory as the script, you can type

 ./hello_world.sh
 
 in a terminal.


otherwise give the full path e.g.

 /home/paul/hello_world.sh
 

It should say:

   Hello World!

If it does, congratulations. Otherwise go back and check the file permissions (look for an x on the left if you do ls -l) and make sure it is all typed correctly.

you can also use the shortcut

 ~/hello_word.sh
    

~/ is the shorthand for /home/[USER] --- /home/paul in my case.

You don't need to specify the path if the script is in a directory that's in your $PATH. You can type echo $PATH to find out what it is. When you type something on the command line the interpreter (e.g. Bash) scans this list and looks for executable files in these directories that match what you have just typed. It runs the **first** one it come across (so you need to be careful what you name your scripts). Here's my path:

 /usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games

so first my system looks in /usr/local/bin then /usr/bin etc.

If you want everyone on your system to be able to use your script you can put it in /usr/local/bin, as long as you have the right permissions. Then you can run it by just typing its name on the command line.

Another thing to do is create your own bin directory (mkdir ~/bin) and add it to $PATH. If you look in your .bash_profile file (cat ~/.bash_profile) you might see something that looks like this:

  1. set PATH so it includes user's private bin if it exists

if [ -d ~/bin ] ; then

   PATH=~/bin:"${PATH}"

fi

With the above added, you will now have ~/bin in the front of your path if that directory exists. This doesn't work for non-login shells however, such as terminal windows you open up in a GUI. Add the above code to ~/.bashrc and it will work in this case as well.


Comments

//"Good comments don't repeat the code or explain it. They clarify its intent. Comments should explain, at a higher level of abstraction than the code, what you're trying to do."// Steve McConnell , Code Complete

Comments are there to help you understand what is going on in a script or program. They are lines that will be ignored by a computer but can be read by humans. They are especially important if anyone else is going to read your code. Often times if you are dealing with somebody else's code and they have not put any comments in, or just bad ones, it can be quicker to start from scratch than to try to figure out what they were trying to do. In practice you should assume that this will always be true. Even if you are the only person that ever looks at your code, I can guarantee there will come a point that you will come back to a code, look at it and ask yourself, "What the hell was I trying to do here?" From experience I can tell you that this can happen in less than a week. The only ways to counter this are good programming styles (clear program structure, good use of white space, etc) and the most important subset of these: comment and comment well.

The three common beginner's mistakes when commenting are:

 - Not enough, or no comments.
 - Too many, i.e. superfluous comments
 - comments that just echo the code they are commenting on

To clarify the last point: comments shouldn't just tell you what the code does. The code itself should do this. If you can't look at a snippet of code and see what it does, you should try rewriting it so it is clear. In scripting this can come about from sticking messy command line hacks or 'spaghetti code' -- hard to understand shortcuts or long regular expressions compounded together. Some people like to do the latter to show how clever or obscure they can be, but the point of code is not to boost your ego. The point is to work -- and if other people can't understand it, it's not doing its job. Similarly, Bash scripting isn't fast and there is no point in sacrificing clarity for speed.

Comments should be there to help you understand the structure and intent of the code --- what it should be doing, and how. \\ You can also use temporary comments in your code to remind yourself to do something later on.

Comment lines in bash always start with a # \\ You can put them on a line of their of their own (above the code) or append them to a line of code if they are short.

Here is the Hello World Script with comments:

<file bash hello_world.sh>

  1. !/bin/bash
  1. says hello to the world by outputting a string to <STDOUT>

echo "Hello World! </file>

A Note on Output

We used echo above to produce some output. By default it goes to <STDOUT> (short for STandarD OUT, it's the screen you're on when you run a command on the command line), as does the output from any other command. You can also use Bash to output to a file or another program just as you can on the command line.

A Pipe ( | ) redirects output from one command to the input of another.
 cat helloworld.sh | grep echo
 

> (angle bracket, or "arrow" for short) overwrites a file and %%>>%% (double-arrow) appends output to the end.

Here is a script that logs uptime with the date.

<file bash uptime_logger.sh>

  1. !/bin/bash
  1. logs uptime prepended by today's date

date +%Y/%m/%d>>log.txt # uses ISO date format, not US, for easy sorting uptime>>log.txt </file>

Extended Output with Echo

echo -n will avoid putting a newline on the end so you can string things together. \\

echo -e will print 'escaped' characters

\n means newline \r means return \t means tab \vmeans vertical tab \bmeans backspace \ameans "alert" (beep or flash) \0xx prints out octal ASCII characters You can also use ASCII hexadecimal codes eg echo -e "\xA9" outputs the © sign see [[1]].

Variables
What is a Variable?

You can think of a variable as a container like a box. it can hold things (it can also be empty). While a box in real life can contain objects, a variable contains data. The variable name is like a label on the box. It never changes, while the contents of the box or variable might. So for example, there might be a box on my desk labeled 'Inbox' that has a gas bill in it. We could have an equivalent variable called $INBOX whose value equals "gas bill" (value equals is another way of saying what data it contains).

Just like the contents of a box, the contents of a variable can change. This means we can keep track of something that changes over time and refer to it by the same name. We can also use a variable to define the contents once and then refer to it multiple times using this name. This helps prevent mistakes, and makes it **much** easier to change things at a later date.

For instance, in the uptime.sh script, it would be a good idea to use a variable to hold the name of the log file. That way if we want to change the name at a later date, we would only have to change the name of the file in one place, not in all the places we had used it. For this reason it is also useful to put variables like this right at the beginning of a script.


The contents of a variable in Bash can be an integer (a number), a string (a string of characters such as letters, numbers and punctuation e.g. a name or a sentence) or even another variable.

You don't have to anything special before you use a variable in Bash. Some programming languages do there are called 'typed' languages because the type of variable matters. Bash, by contrast, is 'untyped' because it doesn't really matter. This makes it easier to use, but it does allow you to get yourself in trouble on occasion. Also, Bash doesn't really care whether a variable contains a string or a number, it just makes a guess depending on what you are doing with it.

N.B. Unlike my Inbox, a variable can only hold one thing at a time. There are other data types that can hold more than one thing that we will learn about in another lesson.

Variable Names

Variable names in Bash always begin with a dollar sign. A variable name can be pretty much anything you want it to be, but there are good variable names and bad ones. There also conventions on how to name variables.

N.B. **Variable names can not include spaces**. If you want to join words together in a variable the convention is to use underscores.

 $A_LONG_VARIABLE_NAME
 

Using ALL_CAPS for variable names is another Bash convention. It helps them stand out in a script. It is not always necessary or desirable to do this. When to use them is partly a matter of personal preference but there are a number of factors that can help you decide.

There are a special class of variables that always use ALL_CAPS, these are external variables that you do not set in the script but inherit when you use bash. These include "environment" variables such as $PATH (use env on the command line to get a list) and bash built-ins such as $HOSTNAME (see this [[ http://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html#Bash-Variables%7Clist ]]).

When To Use All Caps

 - For clarity, such as when a variable is going to be used with a command and needs to stand out
 - When a variable is going to be preserved, i.e. kept and referred to over and over again
 - When a variable will be unchanged, e.g. if it refers to the name of something
 - When it refers to something outside the script
 - When you might export it (pass it to another script or program)

With the uptime.sh script, it is clear that the log file variable meets most, or all of these criteria so it would be a good candidate to keep capitalized. We might choose to call it $LOG_FILE for example.

Good and Bad Variable Names

Beginners and bad programmers typically make two common mistakes when it comes to naming a variable: names that are too short, and names that are nonsensical (eg $L and $FRED). A good programmer, on the other hand, does not hesitate to use a long variable name. She aims to make a variable name as specific and as clear as possible, so that the next programmer can tell exactly what it does merely by looking at it. She might choose $UPTIME_LOG_FILE or $DAILY_UPTIME_LOG_FILE so it is clear exactly what it is for, and so the next programmer can easily add a reference to another log file if he chooses to. A good variable name is a useful form of comment in its own right.

Using Variables

Assigning Variables

Assigning a variable means giving a variable its value or contents. We do it like this:

 UPTIME_LOG_FILE="uptime.log"
 

Note the lack of spaces around the equals signs.

N.B. when you assign a variable you do not use a dollar sign. This is the only time you don't use a dollar sign when working with variables in BASH.

Referring to Variables

"Referring to variables" is another way of saying "use variables." It is more technically accurate to say that "I referred to variable $X" than "I used variable $X." Sometimes it is also referred to as "calling a variable." It is done by using them with the dollar sign.

 # outputs the name of the log file
 echo $UPTIME_LOG_FILE

==== Putting Them Together ====

<file bash uptime_logger.sh>

  1. !/bin/bash

UPTIME_LOG_FILE="uptime.log"

date +%Y/%m/%d >> $UPTIME_LOG_FILE # uses ISO date format not US for easy sorting uptime >> $UPTIME_LOG_FILE </file>

Variable Substitution

Variable Substitution means assigning the value of one variable to the contents of another

 FOO=$BAR

$FOO now has the same contents of $BAR (foo, bar and foobar the the names typically used for example variable names, they aren't good names themselves).

More examples

FOO="cat" BAR=$FOO # $BAR contains "cat" echo "$FOO $BAR" # outputs "cat cat" FOO="dog" echo "$FOO $BAR" # outputs "dog cat" BAR="rabbit" echo "$FOO $BAR" # outputs "dog rabbit"

Getting Input

the read command is a handy way of getting user input from the command line. Used by itself in the form read FOO it waits for you to type a string, followed by ENTER, then sets $FOO to that string.

<file bash hello_name.sh>

  1. !/bin/bash
  1. asks you your name then says hello

echo "What's your name?" read NAME echo "Hello $NAME" </file>

There are a number of handy parameters to read you might want to use. -p sets a prompt. -n//number// reads only that many characters without waiting for you to press enter. It is normally used in the form -n1 to read only 1 character for instant feed back. It is often useful to use -s with this; this stops the shell immediately echoing back what you just typed. You can also use -t//seconds// to set a time out.

<file bash what_key.sh>

  1. captures a single key stroke (without return)
       read -s -n 1 key_stroke
       echo $key_stroke

</file>

  1. Ask a question, then wait five seconds for an answer

read -s -t5 -n1 -p"Do this [y/n]?" ANSWER

  1. Do something with $ANSWER here
  2. Don't forget to check if they have answered y or n

You can also look into the zenity command to get fancy GUI input and output using the GTK toolkit (Its not too hard either).

===== Quotes, Brackets and Other Special Things  =====

Quotes

Double Quotes

Whenever we have used a string so far it has been surrounded by double quotes like this "this is a string." It is not always necessary but it is generally what you want, so it's a good habit to get into.

FOO="foobar" # this works FOO="foobar" # this also works FOO="foo bar" # this works as well FOO=foo bar # FAIL this doesn't work

Quoting with double quotes has two effects

1. It allows variable names to be 'interpreted' (or expanded) i.e. the variable name will be replaced by its contents.

NAME="Paul" echo "My name is $NAME" # outputs "My name is Paul"

2. It escapes most special characters (except / & $) --- it stops them being interpreted, so they are treated literally. ls foo* # outputs "foo.sh foobar foo1 foo2 etc" ls "foo*" # outputs "ls: cannot access foo*: No such file or directory"

It also stops echo 'eating' newlines --- useful if you want output to appear just as if you typed it on the command line rather than as one continuous line.

Single Quotes

Quoting with single quotes has a different effect. It is very literal, it stops everything being interpreted. NAME="Paul" echo 'My name is $NAME' # outputs "My name is $NAME"

Escaping Characters

A backslash can be used to 'escape' characters. With single quotes this will stop $ (and \) being interpreted so you can use this to stop variable names being expanded.

NAME="Paul" echo "\$NAME is $NAME" # outputs "$NAME is Paul". Useful for debugging*


It can also be used to output certain special characters. With echo -e and double quotes NAME="Paul" echo -e "Hello World! \n" # note the -e. \n = newline echo "How are you $NAME"

produces


Hello World

How are you Paul

You can also use $'\X' with out the -e

echo $'\n' # outputs a new line.

Here is a list of the most useful ones:

^ Character ^ Produces ^ | \n | newline | | \r | return | | \t | tab | | \v | vertical tab | | \b | backspace | | \a | alert (beep or flash) |

\0xx outputs an ASCII character (in octal)

echo -e "\042"    # outputs a quote mark ' " '  

This page has a [list] of ASCII codes.

  • debugging is working out where mistakes are in programs, then getting rid of them. Insects were attracted to the warmth given off by electrical components in the first computers. This used to cause short circuits and then errors in calculations so it used to be meant literally.


==== Curly Brackets ====

Curly Brackets {} are used to 'protect' variable names. This allows us to combine variable names with other strings in useful ways.


UPTIME_LOG_FILE="uptime.log" COMPRESSED_UPTIME_LOG_FILE="${UPTIME_LOG_FILE}.tgz" # equivalent to uptime.log.tgz

LOG_PATH="/var/log" echo "${LOG_PATH}/{$UPTIME_LOG_FILE} # outputs /var/log/uptime.log


  1. Compress log file ( like tar -cvzf /var/log/uptime.log.tgz /var/log/uptime.log)

tar -cvzf ${LOG_PATH}/{$UPTIME_LOG_FILE}.tgz ${LOG_PATH}/{$UPTIME_LOG_FILE}

There are other [[ http://linuxgazette.net/issue55/okopnik.html%7C neat tricks]] you can do with curly brackets:

UPTIME_LOG_FILE="uptime.log"

string_length=${#UPTIME_LOG_FILE} # the number of characters in $UPTIME_LOG_FILE echo $string_length # outputs 10

  1. starting at offset 2 output the next 4 characters

echo ${UPTIME_LOGFILE:2:4} # outputs "TIME" , remember computers start counting at 0


Parentheses

With variables, parentheses are used to capture the output of a command in a variable like this:

UPTIME=$(uptime) #stores the output of the uptime command in variable $UPTIME

This is also known as command substitution. You can also use it directly on the command line, such as with echo:

echo $(date) # outputs the time and date

grep foo $(ls bar*) # search for lines containing the string "foo" in all files that begin with bar

An alternative way of doing this is with back ticks a.k.a. back quotes (UPTIME=`uptime`; echo `date`). While you might see this in other people's code, it is not really a good idea. Back ticks are much easier to miss and it is less clear what is going on. It is mostly there as a historical artifact.

Special Variables

You have already come across some special variables: environment variables such as $PATH (use env on the command line to get a list) and bash built-ins such as $HOSTNAME (see this [[ http://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html#Bash-Variables%7Clist ]]). These are typically things it is useful to have access to in your script without having to go through lots of extra effort to find them out.

echo $HOSTNAME

  1. is equivalent to

$HOST_NAME=$(cat /etc/hostname) echo $HOST_NAME


Command Line Arguments

You will already be familiar with command line arguments: ls -l /var/log ls is the command, and -l and /var/log are the command arguments.

In bash these are provided for your use within a script through the special variables $0 $1 $2

$0 gives you the name (and full path) of the command (such as your script) as it was called. <file bash whatami.sh>

  1. outputs the name and path of itself when it's run.

echo $0 # outputs ./whatami.sh or /home/paul/bin/whatami.sh etc depending on how you ran it </file>

$1 $2 $3%%...%% give you the first and second command line arguments etc. <file bash hello.sh>

  1. !/bin/bash

FIRST_NAME=$1 LAST_NAME=$2 echo "Hello $FIRST_NAME $LAST_NAME" </file>

./hello.sh Paul M \\ would output \\ "Hello Paul M"

there is also: $# which gives you the number of command line arguments $* which gives you all the command line arguments as a single string. $@ this works like $* but each word is treated as a separate string (the difference will become clear later on when we look at loops) The last two **always** need to be quoted with double quotes ("$*" "$@").

<file bash hello.sh>

  1. !/bin/bash

echo "Hello $* you have $# words in your name" # echo "Hello $@ you have $# words in your name" would work in the same way here </file>


Other Special Parameters

$? is really useful in shell scripting. It contains the exit value of the last command (or function) run. If a command is successful it will be equal to 0, otherwise a positive integer (number) --- normally 2 if an error occurred. You can use this to see if something worked properly and maybe what went wrong if it didn't.

date echo $? # 0 ---it worked!

ls foo # output "ls: cannot access foo: No such file or directory" echo $? # 2 ---an error occurred

grep foo bar echo $? # 0 if it finds the string foo in the file bar

       # 1 if it does not find the string foo in the file bar
       # 2 if an error occurs e.g. file bar does not exist

$! is the Process ID (PID) of the last job run in the background. You could use this to monitor or otherwise control a job you put in the background with & (this leaves a process/command running, but allows you to use other commands in the mean time without waiting for it to finish). You might want to use this for logging purposes.

$$ is the Process ID (PID) of the script itself. This can be used to construct unique file names for temporary files so they don't overwrite each other, or to make them easily identifiable, for example.

Putting It All Together

An updated version of the uptime logger. You need to give the name of the log file on the command line and make sure that you have a directory called log in your home directory. With just what we have learned so far we can already produce a useful system tool. If you were to add it to your crontab you could use this to continuously monitor system uptime and load average on your computer.

Crontab entry:

  # m h dom mon dow     command
  */10 * * * *       /home/paul/bin/uptime_logger.sh uptime.log # logs uptime and load average every ten minutes  

The script:

<file bash uptime_logger.sh>

  1. !/bin/bash
  1. set the log file name and the directory to store it in

UPTIME_LOG_FILE=$1 UPTIME_LOG_PATH="${HOME}/log"

ISO_DATE=$(date +%Y/%m/%d)

  1. write hostname, date and uptime to log file in a pretty format

echo -e "${HOSTNAME}: \t $ISO_DATE $(uptime)" >> ${UPTIME_LOG_PATH}/${UPTIME_LOG_FILE} </file>

This will result in logfiles that look something like this:

 prairie:         2010/02/24  02:18:35 up 5 days, 15:44,  5 users,  load average: 0.71, 0.99, 0.88
 
As you can see my computer is danger of getting overloaded!