bash shell scripting language
- Details
- Last Updated: Tuesday, 09 May 2023 22:30
- Published: Friday, 16 November 2018 22:16
- Hits: 929
Bash: Bash is most popular and is the one enabled by default on most Linux distro.
Bash documentation is on gnu website (www.gnu.org), as well as on tldp website: (www.tldp.org)
few good bash doc from these websites:
https://www.gnu.org/software/bash/manual/bash.pdf
http://tldp.org/LDP/Bash-Beginners-Guide/Bash-Beginners-Guide.pdf
http://tldp.org/LDP/abs/abs-guide.pdf
Bash startup files: When bash shell starts the first time, it is the login shell. it reads few startup files, before it brings the shell prompt. These are /etc/profile, then ~/.bash_profile, ~/.bash_login and then ~/.profile. When bash exits the login shell, it reads ~/.bash_logout before exiting. When bash shell starts as non-login shell (i.e we invoke terminal "bash" by after the gui has come up), it reads and executes ~/.bashrc (if it exists).
Simple bash script example: create a file test.sh (1st line of script specifies what interpreter to use, 2nd line is regular cmd)
#!/bin/bash
echo "hello";
Run this script by doing "chmod 755 test.sh", and then typing ./test.sh. Running ./test.sh is similar to doing "bash test.sh" as bash interpreter is used by default based on 1st line of script. Just typing "test.sh" on cmd line won't work, as then bash wouldn't know where test.sh is, so it will start looking in it's std paths (which is indicated in env var SHELL), and if it doesn't find there, it would complain that this cmd was not found. However, if we provide the full path, then ./ is not needed, i.e typing "/home/ashish/scripts/test.sh" on cmd line would work, as the shell doesn't need to figure out the path."./" is needed to tell the shell to run the script in current dir.
bash itself has many options that can be provided on cmdline that controls it's behaviour.
ex: bash --dump-strings --rc_file file1.rc --version --verbose -ir -D -o option => options may be single char or multi char. some options need -, while some others need --. options with -- need to be provided before options with -, else -- options won't be recognized
Bash syntax:
Characters (alphanumeric and special char on keyboard) are used to form words. These words form the syntax of any language. Usually some reserved words are formed using alphabetic keywords (a-z, A-Z), these are called reserved keywords (such as for, if, etc). Other words using alphabetic or alphanumeric (a-z, A-Z, 0-9) char are used to form variables. Other special characters (;, +, &, etc) remaining on the keyboard are used by the language to do special task. In most languages, space is used to separate out different words (i.e break line into tokens). These reserved Keywords or cmds (i.e if, for, ls, echo, etc), variables (strings, etc) and special characters (expressions, etc), are building blocks of any programming language. Bash is no different.
We'll look at reserved keywords/cmds, variables and special char. On a single line, different tokens are separated by whitespace (tab, space or blank line). Special characters as ;, &, #, etc are identified as tokens even if there's no white space around. However it's always safe to have spaces around special char too. Each line in bash is a cmd followed by args for that cmd. Separate lines are separated by newline (enter key), semicolon (;) or control characters (explained later).
A. Commands:
In bash, keyword are basically cmds (simple or complex). These cmds may be of 2 types:
1. simple cmd: A simple cmd is just a sequence of words separated by blanks, and terminated by control characters explained below. 1st word is the actual cmd, with rest of the words being cmd's args. Any of the control character or newline (enter key) ends the cmd. Simple cmds are those unix cmds that are explained in other section. These cmds are in reality programs that were written in unix world by various users to help in carrying out basic tasks. However, their use became widespread, so a lot of these cmds became std with stdardized options, and were supported by all linux distro. Simple cmds themselves are of 2 types, depending on whether they are part of the shell, or if they are external pgm being called:
A. shell in built cmds: These are not passed to the kernel, but are rather interpreted by shell itself. So, these are fast. ex: cd, exec, pushd, bg, fg, etc. Different shells have different in build cmds.
Bash in built cmds: http://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html
Any bash builtin cmd takes options preceeded by -, and also accepts -- to signify end of options for that cmd. These shell builtin cmds are explained later. Few ex:
- alias: alias allows a string to be substituted for a word when it is used as the first word of a simple command. Alias are usually put in initialization files like ~/.bashrc etc so that the shell sees all the aliases (instead of typing them each time or sourcing the alias file). One caveat to note is that Aliases are not inherited by child processes, so any bash script that you run will not inherit these alias until you redefine these alias again in your script.
- ex: alias e='emacs' => Now we can type "e" on cmd line and it will get replaced by emacs and so emacs will open.
B. external cmds or programs: These external cmds are not built in. Their binary pgm needs to be called and loaded by the kernel, just like any other pgm that we run. Here, shell passes the control to kernel, and then once pgm is done, kernel trasfers control back to shell. ex: grep, find, etc. Some cmds that can easily be external i.e echo, are actually built-in for efficiency reasons (as echo gets called very frequently)
Remember, any cmd you type on a shell (i.e ls *, pwd, grep "my" file.txt, etc) is interpreted by that shell. So, all syntax rules that applies to shells in general also apply to shell scripts. We can write shell cmds on the prompt, (known as interactive shell session) or we can type them in a file and run that file (known as shell scripting). Depending on what shell you have on which you are typing cmds, they will generate slightly different outputs. Unix cmds are nothing but programs written to support some functionality. These unix cmds may be built as part of shell, so that shell will not need to call the program for that unix cmd separately. That speeds up the execution of linux cmd. Any linux cmd that is not built into the shell, will be called just like any other program (like calling pgm "emacs"). To keep shell interface consistent, a lot of UNIX cmds that had been in use for a long time, (i.e ls, grep, pwd, etc) are defined by GNU standard and supported by all shells. They have a defined consistent syntax, which is honored by all shells. That makes it eaiser to learn these unix cmds once, and then use it anywhere (since syntax remains the same). We'll talk about unix cmds in separate section
2. complex/compound cmd: These cmds are shell pgm constructs. Apart from above simple cmds, shell support a lot of other keywords, similar to programming language statements as if, for, etc. These keywords are called complex or compound cmds. These make shells more powerful, as simple unix cmds can be made conditional, and complicated scripts are possible. We'll look at syntax of these cmds later.
B. Variables:
Apart from these simple/compound cmds, we can have variables which are combination of alphanumeric characters and some other special characters as _, etc (i.e myname is set as "jk_679"). However what all characters can be used as part of variable name is dependent on shell syntax. variables are basically memory location name which store a value, which may be any combination of char. In bash, variable (aka memory location name) themselves may be any combination of char (only alphabet, number, _ allowed). Var which are not assigned any value have an undef or null value (bash doesn't differentiaite b/w the 2). vars are assigned using "=" sign. There are 2 kinds of var: global and local.
1. local var: var only available in current shell. Can contain letters, numbers and _, but can't start with number. Usually given lowercase name. "set" cmd prints all var (local and global). NOTE: no space can be provided on any side of = sign, as then it will treat LHS as a keyword/cmd by itself, and rest of = and RHS as args of that cmd. If no such keyword/cmd exist, it will error out "cmd not found". This problem exists b/c no special cmd used for assignment (as "set" in csh).
ex: libs="123_2"; => NOTE: so space around = sign. This whole line is treated as a single cmd with no args as there is no space in this line. semicolon signals end of cmd.
ex: me_12=a/b/c?*; => all of these special char are treated as literals and are part of RHS, as parser is looking for space or ; to mark end of RHS assignment char stream. single/double quoting may be preferred here to remove any ambiguity (i.e me_12='a/b/c?*'; explained later). However me_12 itself can't contain any of these special char (i.e my_?12 is invalid, as then parser will treat = as literal, and whole thing will be seen as 1 big cmd, since ? messes up the parser in seeing this as a var. As soon as parser sees ?, it doesn't consider my_?12 as var anymore)
export: Var created above are local as any child process or subshell wont be able to see this var. To pass var global (accessible to other shells or subshells), we use "export" builtin cmd. local var created within a script are local to that script only, while global var can be accessed across other scripts also run from within same shell.
export var1="12a/c" => Now, var1 can be seen by child process. We can do assignment and export in 2 separate lines too (i.e var="123"; export var1;)
2. global var: These are also called environment variables (or reserved variables), and many of them are available across other shells (but some are bash specific such as BASHOPTS, BASH_ARGC, etc). Usually given all uppercase name. "env" or "printenv" cmd prints all env var. few ex of env var are PATH, HOME, SHELL, PWD, PS1, PS2, CC, CFLAGS, LD_LIBRARY_PATH, TERM, USER, USERNAME, etc. Some are readonly while some can be set. To make a local var global, we should put "export" cmd above in a file that gets sourced at startup such as ~/.bashrc.
Accessing variables: $ is used to get value of a variable assigned previously. ex: abc=10; echo $abc; => here $abc can be used to get the value stored in var abc. It's preferred to quote $var in double quotes ("$var") to prevent reinterpretation of special char (except $, `, \, everything else is hidden from shell reinterpretation when inside double quotes, as we'll see later). Indirect referencing of var (similar to pointers in C) is also available via \$$varname (Not really needed for simple scripts). $, {}, =, etc are special char discussed later.
Just as we use $ before user defined var to access it's value, we can use $ before special shell reserved var to access their value (i.e echo $SHELL). We may also modify these values by assigning value to it. i.e for changing the PATH var in bash, we may overwrite or append PATH env variable.
ex: in /home/ashish/.bashrc_profile (or .bashrc), we may have PATH env var. This is a very imp env var to find out path of any cmd (when the path to that cmd is not provided). We usually want to add more std paths to this env var, so that scripts or cmds that we may have in other non std dir may still be run w/o providing the full path. One way to modify this var is to do as follows in .bashrc_profile file:
echo $PATH => shows std path, usually "/usr/local/bin:/usr/bin:/usr/sbin". This means any cmd or script for which the path is not provided is going to be searched in these std paths. It's searched in the order this list is provided, i.e /usr/local/bin is searched 1st, then /usr/bin and son, until that script or cmd is found. If it reaches the end of this w/o finding the cmd, it gives an error "cmd not found".
PATH=$PATH:/home/ashsish/scripts => This adds or appends this new path "/home/ashsish/scripts" to already existing list of paths in PATH env var, so any script /home/ashish/scripts/mytest.pl can now just be run by calling "mytest.pl", instead of providing the full path " /home/ashish/scripts/mytest.pl". We have to do an "export" cmd also in this file, so that this new modified PATH is avialable to child process or subshell. Here, we appended the path, instead of modifying it as that's safer and more efficient to do.
echo $PATH => now it shows the modified path var, which has pur custom path appended. So, it shows "/usr/local/bin:/usr/bin:/usr/sbin:/home/ashsish/scripts"
prompt: to set prompt (prompt is the dir name followed by $ sign you see in any terminal. Terminals with bash shell usually show a "$" sign as prompt) in bash: here we overwrite PS1 var. see bash documentation for prompt.
echo $PS1 => shows [\u@\h \W]\$ => This means that it's showing "[", then username, then @, then hostname, then current working dir followed by "]" and finally a $ sign. So, my prompt in a terminal looks like [ashish@linux-desktop ~]$
below assignment modifies PS1 or bash prompt to something custom.
PS1="\[\e[31m\] \w \[\e[0m\][\!]\[\e[32m\]$ \[\e[0m\]"; => \w =>current working dir, \u =>user name, \h =>host name, \n =>newline, \! => history number. \[ ... \] is used to embed a seq of non-printing char. The \[\e[31m\] sequence set the text to bright red and \[\e[32m\] sequence set the text to bright green and \[\e[0m\] returns to normal color.
Whenever we run any pgm in a shell, shell stores the cmd line that was run, in reserved vars. These can be accessed within the pgm, by calling these special reserved var. These special reserved var are not available outside of that pgm which invoked it. So, the pgm needs to be such that it can access these shell var, If the pgm running is itself a shell pgm, it's very easy to use these var.
command line agruments : are stored in $0, $1, $2 and so on. $0 refers to 1st arg which is script name itself, $1 refers to first arg following script name, $2 to 2nd arg, and so on. $# stores the number of args (i.e if $n is the last arg, then $# is n). $argv[0], $argv[1] also store these cmd line args. $#argv stores number of args (real args excluding the name of script itself). If we want to pass all args as a list , we can use $argv[*]
ex: ./test.csh name1 name2 => $0= test_csh, $1=name1, $2=name2
ex: foreach name ($argv[*]); echo $name; end => this prints all args above.
Below some ex show how $ is followed by some other special character, or a number to access these special var.
ex: ./command -yes -no /home/username
- $# = 3 (stores num of cmd line args that were passed to pgm)
- $0 = ./command, $1 = -yes etc. (stores value of 1st arg, 2nd arg, etc). These are called positional parameters.
- $* = -yes -no /home/username => Stores all the arguments that were entered on the command line ($1 $2 ...). $* shouldn't be used, as it can cause bugs.
- $@ = array: {"-yes", "-no", "/home/username"} => stores value of args as array, each quoted via " "
- $$ = expands to process ID of the shell or script in which it appears
- $? = expands to exit staus of most recently executed foreground pipeline. This is used very often in bash scripts to check if cmd on previous line executed succesfully by using if-else cond. exit status of 0 means cmd executed successfully, while any other number indicates error.
- $! = expands to exit staus of most recently executed background cmd.
- !$ = gets arg from last cmd.
- ex: less a.log
- grep "me" !$ => this cuases !$ to be replaced with a.log as that was the arg from last cmd above
Getopts () builtin: Parsing cmd line options more gracefully: Aove we saw how to parse and retrieve cmd line args using $1, $2, etc. However, if we want to do this in how linus cmds parse their options, then above code will get complicated. We would like specify options flas as -<flag_name> followed by a value as <flag_value>. There's a GNU inbuilt shell cmd called "getopts" (NOT getopt as that's an older version which isn't GNU POSIX compliant).
Here's an excellent article on how to do it: https://www.howtogeek.com/778410/how-to-use-getopts-to-parse-linux-shell-script-options/
Data types: Any pgm language has diff data types it supports. In bash, Primitive data types are char string (i.e everything is interpreted as group of char). However, depending on context, strings interpreted as numbers (if they contain digits only) for arithmetic operations. Floating point numbers are not natively supported, but external pgms like bc or dc can be used. The derived data type in bash is array which is not that commonly used. Bash does provide a "declare" or "typeset" builtin cmd to declare var as constant, interger and array. NOTE: bash doesn't support a lot of data types, so it's limited in doing complex data manipulations.
1. String: Everything is char string. It's internally interpreted as numbers depending on context, or if declared via cmd.
2. constant: ex: readonly TEMP=abc;
3. array: array contains multiple values. format is same as assigment operator,except that multiple values are given, so, we use braces (). We do not define size of array, so any +ve index number can be used. Associative arrays are also allowed, which allow subscript to be any string instead of +ve number. To dereference the array (i.e get vale of an item), we use $ just like with other var. However, since array uses [ ] (which is a special char used for some other function explained later), it may get interpreted differently, so we use { } around array to remove any ambiguity.
ex: my_array=(one two three) => this assigns my_array[0]=one, my_array[1]=two and so on.
ex: echo $my_array[2] => this prints "three[2]". this is because var name can't have [ .. ], so as soon as [ is encountered, var interpretation stops, so val of "my_array" is printed, which is the first index (index 0) of this array, so it actually prints $my_array[0]. Then it prints "[2]" since for echo it's just literals to print. If we want to print index 2 for this array, we need to use curly braces, i.e echo ${my_array[2]}. Now everything inside { } is interpreted as var name, so it prints "three".
ex: me_arr[7]=me => this is equally valid syntax. Eve though var name has square brackets in it's name, which is invalid, I guess the shell is able to see it as an array var, instead of a regular var. Not sure, why "echo $me_arr[7]" doesn't work (since $me_arr[0] is not defined, it has no value for $my_arr[0], so it just prints [7]). It needs curly braces to work i.e "echo ${me_arr[7]}"
ex: echo ${my_array[*]} => this dispplays all values of an array. Instead of *, we could use @ also. $my_array[*] would interpret $my_array and [*] separately, and would print "one[*]" on screen
ex: echo ${#my_array} => prints num of elements in array. In general, ${#VAR} prints num of char in VAR.
ex: my_array[name]='ajay'; => associative array
C. Special Characters or metacharacters:
Apart from alphanumeric characters on keyboard (that are used for cmds and variables), we have multiple other characters on keyboard, and some of them have special meaning. These are special characters that you find on your keyboard (other than alphabet and numbers). When these special characters are not part of a var name, they can have their own special meaning. These special char are called meta characters.
Printable char: Following are special char seen on your keyboard (some of these are accesible by using shift key). We'll study special meaning of each of these later.
~ ` ! @ # $ % ^ & * ( ) _ - + = { } [ ] | \ : ; " ' < > , . /
Non-printable char: Apart from above printable char, we also have other non printable keys as backspace (i.e delete), space, tab, ctrl, shift, alt, esc, enter (return), insert, function and few other. These all keys have ASCII codes associated with them, but for right now we'll consider 2 most important ones being used => space and enter keys.
A. space key: "space" special character is used to separate out words into "tokens".
B. Enter key: Enter key or newline key is used to separate out cmds from each other (by signifying end of current cmd). : may also be used t separate out cmds.
NOTE: The way parser works is it will separate out keywords/cmds separated by space, newline or other special char. Other special char may be parsed into their own token. Any special char can be placed right after/before each other, to form even more special char (i.e groups of 2 or 3 special char as &&, <=, etc, which may be separated out as their own token and be considered a special character group).
ASCII:
Each of the keys on the keyboard have a ASCII code associated with them:
Each ascii code is 1 byte, so 256 ascii codes possible (0 to 255 in decimal). Code 0 to 127 are normal characters, while codes 128 to 255 are special characters. These special characters correspond to extended character set and are encoding dependent. So, they are not same on all platforms. Ascii codes from 128-255 are called extended ASCII codes, and are special char as pi, beta, square root, etc that are not present on keyboard.
Std Keyboard have total of 105 keys of which there are 53 keys in 4 rows out of which caps lock and 2 shift don't have equiv ascii code. BS, TAB, ENTER don't have 2 ascii codes as it's the same key with or without shift. This accounts for 47*2+3=97 normal characters. Space, Del and ESC keys are 3 more keys with equiv ascii codes. Remaining 28 ascii codes are from 00(0x00) to 31(0x1F) [4 of these: BS, TAB, CR, ESC are already accounted for in the 53 keys).
ASCII codes (shown as decimal, hex):
00 = 0x00 = NULL (end of string is always a null character)
ASCII codes for CTRL-A is 0x01, CTRL-B is 0x02 and so on till CTRL-Z is 0x1A (decimal 26)
08 = 0x08 = BS (backspace) => equiv to CTRL-H (^H)
09 = 0x09 = TAB (horizontal tab) => equiv to CTRL-I (^I)
10 = 0x0A = LF (Line feed or NEW LINE \n ) = moves the cursor all the way to the left and advances to new line => equiv to CTRL-J
13 = 0x0D = CR (carriage return or ENTER = moves the cursor all the way to the left but doesn't advance to new line => equiv to CTRL-M (^M).
NOTE: Any file made in windows treats end of line as "CR LF" so it assigns ascii value "0x0D 0x0A", but linux files treat end of line as "LF" only, so ascii code is "0x0A". So, for files imported from windows to linux (files created in windows using notepad, and then opened in linux using vi or emacs) , linux shell sees these extra "0x0D" and it prints ^M at end of each line. Many newer editors (xemacs) are smarter and ignore "0x0D" when it precedes "0x0A". So, be aware of transferring text files b/w Linux and Windows. It doesn't affect functionality, but is a nuisance nonetheless.
27 = 0x1B = ESC (escape)
32 = 0x20 = SPACE (space key)
48 to 57 = 0x30 to 0x39 = 0 to 9
65 to 90 = 0x41 to 0x5A = A to Z
97 to 122 = 0x61 to 0x7A = a to z
127 = 0x7F = DEL
special character from 128 to 255 are encoding dependent and so different for different OS. If we have NUMLOCK key activated, then press "ALT" key and ASCII extended code in decimal on numeric pad on right, and then release "ALT" key. We'll get special character printed on screen. So, if we pressed 128, we would get special char for 0x80 (which is c with tilda).
128 = 0x80 = special character c with a tilda (`) at bottom.
254 = 0xFE = special character filled in square
255 = 0xFF = special character "nothing" or "blank"
Printable char (1-9, a-z, A-Z, @, ?, etc) are from ASCII code 33 to 126.
A very good doc of these special char is here: http://www.tldp.org/LDP/abs/html/special-chars.html
showkey cmd: To know what ascii code is generated for which key, we can run cmd "showkey -a" which will show ascii codes generated for any key.
a=97, esc=27, enter=13, backspace=127
Many combination of keys such as "ctrl key + C key" pressed together, etc are not printable but can be used on cmd line of shell to do various things. This is what we are going to study next.
cmd line editing: Any shell has cmd line i/f thru which we enter cmds. Editing on this i/f is provided by Readline library, which is used by several pgms including bash. That is why cmd editing is same across different shells as bash, csh, etc. cmd line editing is basically using keys to edit typed char or moving the cursor to desired location on the line. We use simple keys like delete key, arrow keys etc to edit cmd line, but we also have combination of keys available which can do more efficient editing. These are divided under control keys and meta keys. These control or meta characters are not normally used inside a script, though they can be used by using their octal or hexadecimal code. i.e ascii code 0x0a is the code for newline or control char C-j. These control/meta char have the same meaning irrespective of whether caps lock key is turned ON or not, i.e pressing control + small w keys behaves same as pressing control + capital W keys (small w or capital W doesn't matter).
control keys: control keys are keys activated by pressing "ctrl" key. When we press "control" key and then "c" key (not together but press ctrl key, and then while keeping it pressed, press the c key), it kills the current cmd. This is the specal behaviour invoked by pressing these 2 keys together. Pressing "control" key and "k" key is refrred as text "C-k" (Control k). control key by itself doesn't have an ascii code, but when pressed in combination with an alphabet, it generates ascii decimal code 1 to 26, i.e C-a = 1, while C-z = 26. Howevr, as we can see from ascii table above, all the ascii codes within 1 to 26 are already assigned to other keys as tab, enter, etc. In such a case, ctrl key with that char serves same purpose as the key. Ex: tab key is assigned code=9, but C-i is also assigned code 9. So, tab functionality can also be achieved by pressing C-i. Similarly, enter key has code=13, but C-m is also assigned code 13. So, functionality of enter can be achieved by pressing C-m. C-0 to C-9 may also be mapped to ascii codes as 28, 29, 30, 31 etc or just be the ascii code of that number itself implying no special treatment. Control keys show on the terminal o/p as caret (^). So, C-a shows as ^A and so on. This is also a very common way, of how control keys are written in books.
We have control characters from C-a to C-z, but we'll look at only few important ones only.
C-b/C-f => move back/forward 1 char
C-a/C-e => Move to start/end of line
C-d => log out of a shell, similar to exit cmd
C-k => Kill the text from the current cursor position to the end of the line. (kills text forward)
C-u => Kill the text from the current cursor position to the start of the line. (kills text backward)
C-w => Kill from the cursor to the previous whitespace. (kills text backward)
C-y => This is used to yank (copy back) the text killed via above cmds.
job control cmds: A subset of control key cmds allow us to selectively stop or continue execution of various processes. The shell associates a job with each pipeline. It keeps a table of currently executing jobs, which may be listed with the jobs command.
C-c => kills currently running process
C-z => suspends currently running process and returns control to bash. In MSDOX file system, C-z is EOF (END OF FILE) character.
C-s => suspend. This freezes stdin in a terminal. Use C-q to restore stdin. Many times, when you see a terminal as not responding on cmd line, it's because it's suspened (due to someone accidently pressing C-s, as C-s is a cmd in emacs to save, so if the cursor is not in emacs window but instead on terminal, then it will freeze the terminal w/o user knowing it). In such cases, use C-q to see if it restores it.
jobs => lists all jobs running in that shell (NOTE: all jobs under control of this shell only). Many options supported. This shows jobid in [ .. ].
bg <job_id> => This resumes suspended job in background. This is equiv to appending & at end of cmd (to cause any job to run in background). If no >job_id> provided, then current job is used
fg <job_id> => This resumes suspended job in foreground. If no >job_id> provided, then current job is used. Typing %<job_id> brings that job to foreground too.
kill <job_id> => kills that job. Many options supported.
meta keys: Similarly pressing "Meta" key and "k" key is refrerred as "M-k" (Meta k). In older linux keyboards there used to be meta key, but windows dominated PCs had ALT key instead. So, linux started treating ALT key the same as Meta key.Alt key by itself doesn't have an ascii code, but when pressed with a char, it generates a sequence of 2 ascii codes. For ex, when we press M-a, there is no corrsponding ascii code for this. Instead, the terminal program receives a character sequence beginning with the escape character (byte 27 oe 0x1B, sometimes written \e
or ^[
) followed by char "a", so ascii code 27 followed by ascii code 97. Since "meta" or "alt" key generates this "esc" character as the first char, we can achieve the same behaviour by pressing esc key and then pressing the char. If Alt key is not present or is inconvenient, many people prefer to use the "Esc" key. However, when pressing esc key, we press Esc key, leave it and then press the char. This is different than what we do with control key or alt key, where we press the key and the char simutaneously. The reason for this is that, we need to generate 2 ASCII ocde for mimicing meta key behaviour, so we press esc key, release it and then in quick succession press the char key to generate those 2 ascii codes. Sometimes, it woks by pressing esc key. keeping it presed and then pressing the char key, but that works by luck as ascii code for 2 keys get generated right one after the other, even though esc key was kept pressed all the while. It may not always work, so better to use the approach of pressing and releasing "Esc" key.
Esc key is printed on screen as "^[", so esc +K = ^[K as shown on screen. Alt + K also shows as ^[K. So M-k rep as ^[K. showkey cmd shows ascii decimal code 27 and 75 being generated for M-k irrespective of whether Alt key or Esc key used. Meta keys have a loose convention that they operate on words, while control keys operate on char. word is defined as seq of letters and digits, so anything separated by whitespace, /, etc is considered 1 word. Some imp meta keys below:
M-b/M-f => move back/forward 1 word (instead of 1 char as in C-b/C-f).
M-d => Kill from the cursor to the end of the current word, or, if between words, to the end of the next word. (kills text forward)
M-DEL => Kill from the cursor the start of the current word, or, if between words, to the start of the previous word. (kills text backward)
Other keys: Pressing other keys as "home", "end" , etc generate their own sequence of ascii codes. "home", "end" egnerate sequence of 3 ascii codes, "page up", "page down" generate seq of 4 ascii codes, while some function keys generate seq of 5 ascii codes. so, their behaviour could be mimicked by pressing the keys corresponding to these ascii codes in quick succession. However, the theme that remains common is that all these key combination starts with "Esc" key, since that is the way, readline identifies that the seq of char coming after "esc" key is special and is to be treated as one.
special character usage: Some of these meta or special char take on different meaning depending on context, so it can be confusing. We'll talk about some of the imp meta char below:
1. # => comment. These are used to write comments, can be at beginning or end of line. Anything after a # is ignored by shell (until the new line). So, if we put "\" (which is continuation of line) at end of comment line, it doesn't inlcude the next line as part of this line (i.e above \ quoting rule doesn't apply to comment). Another shell, tclsh differs from bash/csh in this regard, as \ in comment line causes continuation of comment on next line.
2. " ' \ => These characters help hide special char from shell.
Hiding special characters from shell by using these 3 char (" ' \) : Though special char above have special meaning, we may force the shell to treat them as literals. The double quotation marks (""), the single quotation marks (''), and the backslash (\) are all used to hide special characters from the shell. Hiding means it preserves the literal value of the character, instead of interpreting it. Each of these methods hides varying degrees of special characters from the shell. These 3 are referred as quoting mechanism.
I. Double Quotes " " : weak quoting
The double quotation marks are the least powerful (weak quoting) of the three methods. When you surround characters with double quotes, all special characters are hidden from the shell, except these 3: $, `(backtick), and \ . The dollar sign and the backticks retain their special meaning within the double quotes. backslash has special rules as to when it retains it's special meaning.
The backslash retains its meaning only when followed by dollar (\$), backtick (\`), double quote(\"), backslash(\\) or newline(\newline_at_end_of_line (i.e \ followed by "enter" key), not the newline char \n). Within double quotes, the backslashes are removed from the input stream when followed by one of these characters. Backslashes preceding characters that don't have a special meaning are left unmodified for processing by the shell interpreter (i.e \n is left unmodified). A double quote may be quoted within double quotes by preceding it with a \. Similarly $, ` may be printed as is by preceeding with \.
This type of quoting is most useful when you are assigning strings that contain more than one word to a variable. For example, if you wanted to assign the string hello there to the variable greeting, you would type the following command:
ex: greeting="Hello there \" name ' me" => This command would store the string "Hello there" name ' me" in the variable "greeting" as one word. If you typed this command without using the quotes, then it would error out (as Hello would be assigned to greeting, but then it would find unknown cmd "there"
test="This is"; echo $test => prints "This is" on screen. Since $test is not quoted, $ expansion is done for var named test.
echo "$test" => prints "This is" as $test is substituted. However, echo '$test' prints $test on screen and not "this is" as $test is not substituted by it's value.
mkdir "test dir" => creates dir named "test dir" (with space in between test and dir, as space looses it's special meaning of separarting tokens). If we do mkdir test dir, then it creates 2 dir with names test and dir. mkdir 'test dir' works same way as mkdir "test dir".
II. Single Quotes ' ': strong quoting
Single quotes are the most powerful form of quoting. They hide all special characters from the shell (so a superset of double quotes). This is useful if the command that you enter is intended for a program other than the shell. Because the single quotes are the most powerful, you could have written the previous example using single quotes too. Only thing to remember is that single quote may not occur within single quote (even if preceeded with \, as nothing is escaped in single quotes), as at that point, the 2nd single quote is identified as ending of quotes (i.e anything within 1st and 2nd single quote is treated as 1 string).
ex: greeting="Hello there $LOGNAME \n" => This would store the string "Hello there " and the value of $LOGNAME into the variable greeting (greeting would be assigned "Hello there Ashish \n". The LOGNAME variable is a shell variable that contains the username of the person who is logged in to the system.
ex: greeting='Hello there $LOGNAME \n' => the single quotes would hide the dollar sign from the shell and the shell wouldn't know that it was supposed to perform a variable substitution. So, greeting would be assigned "Hello there $LOGNAME \n"
NOTE: special characters such as \n, \t, have been assigned to newline (\n), tab (\t) etc. However, in bash they are not treated as special char with 1 byte ASCII value , but instead as 2 literals. So, above \n is printed as literal, not expanded as newline ascii char. However "\\n" would print \n (as \ is escaped with double quotes), while '\\n' would print \\n (as \ is not escaped with single quotes).
\a=bell(alert), \b=backspace, \n=newline, \r=return(enter), \t=horizontal tab, \xH or \xHH = 8 bit character whose val is 1 to 2 hex digits, \uHHHH = 16 bit unicode char whose value is 1 to 4 hex digits
Assigning newline to a var was not possible in bash with old syntax (as \n wouldn't get converted to newline), but now with printf, we can do that.
ex:
printf -v str "Hello \n"; => stores Hello followed by newline byte into var "str".
echo "$str"; => we need to use double quotes for echo in order to print newline, else newline are converted into space.
III. Backslash \ : NOTE: we refer \ as backslash, as it leans backward (as if resting on a chair). forward slash is / as it leans forward (as if about to fall down to the front). In Linux, all dir paths etc are forward slash (key on bottom of keyboard), while in windows dir paths are separarted by backslash (key on top of keyboard).
Using the backslash is the third way of hiding special characters from the shell. Like the single quotation mark method, the backslash hides all special characters from the shell, but it can hide only one character at a time, as opposed to groups of characters. You could rewrite the greeting example using the backslash instead of double quotation marks by using the following command:
ex: greeting=Hello\ There => In this command, the backslash hides the space character from the shell, and the string "Hello there" is assigned to the variable "greeting". bash did not even look at the space, it just saw the escape character, and continued with space just as with any other valid character. Then when it ultimately hit a new line, it cmpleted the cmd, and assigned the whole thing to "greeting".
Backslash quoting is used most often when you want to hide only a single character from the shell. This is usually done when you want to include a special character in a string. For example, if you wanted to store the price of a box of computer disks into a variable named disk_price, you would use the following command:
ex: disk_price=\$5.00 => The backslash in this example would hide the dollar sign from the shell. If the backslash were not there, the shell would try to find a variable named 5 and perform a variable substitution on that variable. Assuming that no variable named 5 were defined, the shell would assign a value of .00 to the disk_price variable. This is because the shell would substitute a value of null for the $5 variable (any undefined variable is assigned a null value). The disk_price example could also have used single quotes to hide the dollar sign from the shell.
\\, \', \", \? => all of these quoting characters and other special characters are escaped due to \.
If we put \ at end of line in a script, then it escapes newline char. Bash removes \n byte altogether (since \ tells it to hide newline char) and causes continuation of 1st line on 2nd line, w/o any newline in b/w. ex:
a=cde\
efg;
echo a=$a; #prints cdeefg (w/o any space b/w cde and efg). However putting space before/after \ causes error, i.e "a=cde \" (since there's space after assignment causing parser to end assignment at cde, and treat efg as next cmd) or "a=cde\ " (since space after \ doesn't escape newline, but escapes space, which isn't really escaped)
3. End of cmd: A newline character (by pressing enter/return) is used to denote end of 1 cmd line. For multiple cmds on same line, semicolon (;) can be used to separate multiple cmd. There has to be a space after semicolon, else parser will not see ; as a token.
Ex:
> wc temp ; rm temp ; mkdir ana => ; used only when cmds are on same line. Enter separates cmds on separate lines, so no ; needed for each line. Any unix cmd can be used directly in tcsh/bash scripts.
Control operators: Some special characters or combination perform control function (i.e to indicate separation of cmds or end of cmd). These are: |, |&, &, ;, ||, &&, ;;, ;&, ;;&, (, ). We already saw semicolon as a control operator. We'll see others below. Exit status of combination of cmds is the exit staus of last cmd executed.
- Pipe cmd (|): Pipe is one of the most used cmds in linux, and does redirection. Pipe is done using "|" (key on right side of keyboard just above enter key). The o/p of each cmd is connected via pipe (| or |&) to i/p of next cmd. "|" passes cmd's stdout to next cmd's stdin, while "|&" passes cmd's stdout and stderr to next cmd 's stdin. This is a method of chaining commands together. Each cmd in pipeline is executed in it's own subshell.
ex: cat *.txt | sort | uniq => meres all .txt files, sorts them, and then deletes duplicate lines. pipe operator keeps on passing o/p of 1 cmd as i/p to next cmd.
- List: List is seq of 1 or more pipeline separated by operators (; or & or || or &&), and optionally terminated by one of these (; or & or newline).
- OR/AND (|| &&): Many times, we have a chain of cmds, and we want subsequent cmds to be executed, depending on if previous ones executed successfully. For ex, when running make, a lot of cmds make sense to run only if previous make cmds ran w/o error. in such cases, &&, || come in handy. They are logical AND OR of 2 booleans which may be true or false. exit status of these is the exit status of last cmd executed (0=sucess, non-zero=failure. So 0=TRUE and non zero integer is FALSE. Any thing that is not a number is also FALSE). So, if exit status can be decided by 1st cmd itself, then 2nd cmd is not executed.
- cmd1 && cmd2 => cmd2 is executed only if cmd1 is success (i.e returns exit status of 0. That means first cmd returns TRUE, so result = TRUE && cmd2 = cmd2. So, cmd2 is run).
- cmd1 || cmd2 => cmd2 is executed only if cmd1 is failure (i.e returns exit status of anything non zero. That means first cmd returns FALSE, so result = FALSE || cmd2 = cmd2. i.e cmd2 has to be executed to determine the boolean value of this expression, so, cmd2 is run).
- semicolon (;): This separates diff cmds, and indicates that shell should wait for current cmd to finish, before executing next cmd. Newline serves same purpose as ;
- background (&): shell executes current cmd in background, so that next cmd does not wait for previous cmd to finish, but can start immediately.
- OR/AND (|| &&): Many times, we have a chain of cmds, and we want subsequent cmds to be executed, depending on if previous ones executed successfully. For ex, when running make, a lot of cmds make sense to run only if previous make cmds ran w/o error. in such cases, &&, || come in handy. They are logical AND OR of 2 booleans which may be true or false. exit status of these is the exit status of last cmd executed (0=sucess, non-zero=failure. So 0=TRUE and non zero integer is FALSE. Any thing that is not a number is also FALSE). So, if exit status can be decided by 1st cmd itself, then 2nd cmd is not executed.
4. source or . cmd: dot command or period is equiv to builtin "source" cmd. "source" cmd used to source file (i.e read cmds in the file specified). This cmd works in all shells. When invoked from cmd line, "source" or "." executes a script, while when invoked from within a script, they load the file sourced (i.e load the code in the script, similar to #include of C pgm)
ex: prompt> source abc.py => executes abc.py script
ex: prompt> . abc.py => same as above
5. backquote or backtick (`): This is the key just below ESC key on top left of keyboard (it's NOT the single quote key found on double qote key on right middle of keyboard). This makes available the o/p of a cmd for assigment to a var or to other cmd. cmd subs invokes a subshell and can remove trailing newlines if present. cmd may be any cmd that can be typed on an interactive shell. parenthesis () explained later achieves same purpose as backtick. Since backtick retains it's special meaning in double quotes, we can always enclose ` within " .. ".
ex: rm `cat file.txt` => here o/p of cmd cat is passed on to rm cmd.
ex: a="`ls -al`"; echo $a; => lists all the files. " .. " doesn't make any difference in o/p. var "a" stores the o/p as a string, not an array. Incsh, this o/p is stored in an array.
ex: echo `pwd`/test => here cmd within backtick is expanded, so it will print something like this: /home/jack/test
6A. user interaction: All languages provide some way of getting i/p from a user and dumping o/p. In bash, we can use these builtin cmds to do this:
Output cmds: echo and printf cmds supported. echo is a linux cmd supported by all shells, while printf is bash specific.
I. echo: echo built-in command outputs its arguments, separated by spaces and terminated with a newline character (or ;). The return status is always zero. echo takes a couple of options (look in manual)
ex: echo -n abc rad 17 => even w/o single or double quotes, this prints everything, as everything before a newline is considered it's arguments. newline is automatically added at end, not here since -n option used (-n suppresses newline to be added at end). There are many more options supported.
II. printf: This is another inbuilt cmd. This is specific to bash, and it's implementation may different b/w different bash versions. This is good replacement for echo as it follows syntax similar to C language.
ex: printf "a=$a b=%d \n" $b => $ is expanded, so $a takes value of var a. %d is similar to C lang, where args outside double quotes are substituted for %d. So, assuming a=1, b=2, it prints "a=1 b=2" with a newline at end (by default printf doesn't add a newline). NOTE: there is no comma outside double quotes before putting the var name (as in C lang)
Input cmds: read is the builtin i/p cmd supported. There is no other way to read i/p in bash. read is bash specific cmd, and not supported by other shells. csh has "$<" for reading i/p.
I. read: the i/p line is read until enteris pressed. There are various options supported. By default, the line read is stored in var "REPLY"
ex: read; echo "read line is = $REPLY" => If i/p entered is "My name", then var REPLY stores "My name".
ex: read Name Age Address; echo "name= $Name, age = $Age, addr = $Address"; This splits the line into various words. 1st word is assigned to $Name, 2nd word to $Age, and remaining words to $Address. The characters in the value of the IFS variable are used to split the input line into words or tokens; By default $IFS is space, tab or newline, so words are split on space boundary.
ex: read -a my_array => here various words of line are assigned to array = my_array. my_array[0] stores 1st word, my_array[1] stores 2nd word, and so on ...
ex: read -p "what's your name? " name => here, this prints the string on prompt, so that we don't have to do a echo separartely. $fname stores the i/p line entered after echoing "what's your name? "
6B. IO redirection: good link here:
http://www.tldp.org/LDP/abs/html/io-redirection.html
redirection means capturing o/p from a file, script, cmd, pgm etc and passing it to another file, script, cmd, pgm, etc. By default, there are always 3 files open: stdin (keyboard), stdout (screen) and stderr (error msg o/p to the screen). Each open file has a numeric file descriptor, so Unix assigns these 3 files, file descriptors of 0, 1 and 2. For opening additional files, there remain descriptors 3 to 9.
The file descriptors or handles for the default files are:
i/p = STDIN or file descriptor 0 (fd0). This is /dev/stdin
o/p = STDOUT or file descriptor 1 (fd1). This is /dev/stdout
error = STDERR or file descriptor 2 (fd2). This is /dev/stderr (usually points to /dev/null)
process file desc for each process are in /proc/<process_id>/fd/0,1,2 (for i/p, o/p, err). These are just soft links pointing to /proc/self/fd/0,1,2. /dev/stdin, stdout, stderr are equiv to /dev/fd/0,1,2 (FIXME: wrong??). So, when we pipe o/p of 1 cmd into i/p of other cmd using pipe (|) cmd, cmd just changes the soft link of fd/1 of 1st cmd to fd/0 of 2nd cmd (so that o/p file of 1st cmd points to i/p file of 2nd file).
So, when we run "read" cmd, it takes i/p from STDIN, and when we run printf, it dumps o/p to STDOUT. If we want to change this default behaviour (redirect i/p or o/p to other places instead of these 3 default files), we use redirectio operator < (to redirect i/p), > (to redirect o/p). The > operator creates new file (or overwrites existng one), but if we want to append to existing file, then use >>. Other Redirect operators are >, >>, <, >&, &>. <&-, >&- are used for closing various file descriptors.
output redirection using > or >> => ">" redirects o/p to named file instead of outputting to stdout (i.e screen). > overwrites the file if present, while >> appends to the existing file if present.
ex: ls -al > cmd.txt => So, here it lists the contents of current dir not on stdout (screen), but on file "cmd.txt". Any error from cmd is still directed to STDERR. >& or &> causes both STDOUT and STDERR to be redirected to file cmd.txt. It's preferable to have no space after > or >> (i.e ls >>cmd.txt)
input redirection using < => "<" redirects i/p to be taken from the named file instead of taking i/p from stdin (i.e keyboard).
ex: grep "ajay" <names.txt => So, here it looks for name "ajay" in file names.txt instead of taking i/p from stdin (keyboard). We can have a space after <, but preferable to have no space.
We can use file descriptor numbers too for redirection, i.e M>N => file descriptor M is redirected to file descriptor N
ex: ls -al 2> file1 => this redirects fd2 (or STDERR) to file1. N> implies redirect for fd0, fd1 or fd2 depending on N=0,1,2. When N not provided then > implies STDOUT while < implies STDIN.
ex: cmd1 2>error.txt => file desc 2 (i.e stderr) gets redirected to error.txt => error msg from o/p of cmd1 gets redirected to error.txt
ex: cmd2 &>out.txt => &> redirects both stdout and stderr to out.txt
here cmd uses << and <<< for redirection. Here's little intro:
HEREDOC (<<): Here documents are used in most shells. This is a form of i/p redirection. Frequently, your script might call on another program or script that requires input. The here document provides a way of instructing the shell to read input from the current source until a line containing only the search string (aka limit string) is found (no trailing or starting blanks). All of the lines read up to that point are then used as the standard input for a command.They are called heredoc probably because the document is here instead of coming externally from some other file.
We can use any char inside the HEREDOC. If search string is w/o any quoting (i.e no " " or ' '), then all text within the heredoc is treated like regular bash lines, and parameter substitution, expansion, etc done. Many times, we use heredoc to generate a script to be used later. In such cases, we want to treat text inside heredoc literally with no substition/exapnsion etc done (to print text as is with no modification). We can do this by putting single quotes or double quotes around limit string. The reason this works is because quoting the limit string effectively escapes the $, `, and \, and causes them to be interpreted literally. Not sure why double quotes work, since they should not escape $, ` and \ (FIXME ??). Using HEREDOC is lot better than using bunch of echo/printf to print those in a file, and then redirecting i/p from that file. There may be space after << (doesn't matter)
ex: here NAMES is the search string. Last NAMES should be on a line by itself with no trailing spaces to be identified as end of HEREDOC.
cat << "NAMES"
Roy $name; c=$[$a+$b]; echo $c; => everything is printed literally since "NAMES" used. If we just used NAMES w/o the quotes in start, then $a $b, $c, $name will be substituted and expression will be evaluated.
Bob #next line has NAMES by itself (no spaces) to indicate end of heredoc
NAMES
HERE string (<<<): A here string can be considered as a stripped-down form of a here document. It consists of nothing more than COMMAND <<< $WORD, where $WORD is expanded and fed to the stdin of COMMAND.
ex: grep -q "txt" <<< "$VAR" => here i/p to grep is taken from $VAR
ex: String="This is"; read -r -a Words <<< "$String" => reads words from the given string
7. Brackets [ ] , braces { } and parenthesis ( ) : [] and {} are used in pattern matching using glob. All [], {}, () are used in pattern matching in BRE/ERE. See in regular expression section. However, they are used in other ways also: both single [], (), {} and double [[]], (()), {{}} versions are supported.
I. single ( ) { } [ ]:
( ) { } => these are used to group cmds, to be executed as a single unit. parenthesis (list) causes all cmds in list to be executed in separate subshell, while curly braces { list; } causes them to be executed in same shell. NOTE: parenthesis (list) do not require blank space to be recognized as parenthesis, as () are operators, and hence recognized as a separate token, blank space or not. However, curly braces { list; } has historically been a reserved word (just like for, if, etc), so it needs a blank or other metacharacter, else parser will not recognize it as separate token. Also a ; or newline is required to indicate end of list in { list; }, but not in (list).
subshell: any cmd enclosed within braces ( ) are run in a separate shell, similar to the "direct" or "csh/bash/etc" execution of a shell script. Ex:
/home/proj/ > (cd ~ ; pwd; a=5;) => runs cd in a subshell and prints dir name after doing cd. then it returns back to main shell, forgetting all actions it did. a=5 is valid only in subshell, and not remembered outside of subshell
/home/kagrwal
/home/proj/ > pwd => note, pwd prints dir that it was in prior to executing cd cmd.
/home/proj/
NOTE: if we run any shell script, the linux kernel program invoke a program to "run the script". It runs script in a separate shell that it creates for the sole purpose of running script. Any actions you do inside script (cd .., etc) are carried out in the new shell that it created. At the end of execution of script, the shell is killed, all actions that the script did in that new shell are gone along with that new shell, and control returns back to original shell that lauched that script. If we do not want to spawn a new child shell, we can use "exec" cmd to run our cmd, which will replace current shell with the new cmd.
brackets ( ) allows cmd substitution too. To save the output of any linux cmd to any var, we can use $ infront of subshell, and then use that var. Cmds which have newline character in their o/p are stripped of their newline char.
myvar=$( ls /etc | wc -l ) => returns number of word count from this cmd, and assigns the value to var "myvar"
echo $myvar => prints value stored in myvar variable. It can be number, string, etc
$(cmd) or `cmd` both achieve same purpose.
ex: echo $(date); or echo `date`; => both print date as "Mon Jun 24 16:41:55 PDT 2017".
brackets ( ) also does array init as shown in arrays above.
{ } => Braces { }
are also used to unambiguously identify variables. They protect the var within {} as one var. { ..} is optional for simple parameter expansion (i.e $name is actually simplified form of ${name})
ex: var1=abc; path=${var1}_file/txt; echo $path; => This assigns path to abc_file/txt. If we did path=$var1_file/txt, then parser would look for var named var1_file/txt, which doesn't exist. So, it will print nothing as $path is undefined.
{} are also used to substitute parts of var name:
ex: ${STRING/name/bob} => this replaces 1st match of pattern "name" with "bob" in var STRING
ex: ${STRING//maya/bob} => this replaces all matches (see // instead of /) of pattern "maya" with "bob" in var STRING
We can also assign/change value to a parameter within { } by using =,+,-,?, etc. By putting ":" before the operator , we assign new value only if parameter exists, and value is not null. By omitting ":", we assign new value if parameter exists, irrespective of whether value is null or not. := and :- are more commnly used.
ex: := (new value assigned to parameter only if parameter doesn't exist)
IN_STEP=A;
IN_STEP=${IN_STEP:=rtl}; => here IN_STEP is assigned rtl, only if it was defined, but value was null. Since, value for IN_STEP was not null, IN_STEP is not assigned new value of rtl, but retains old value of "A"
OUT_STEP=C; echo "Mine= ${OUT_STEP=B}"; => since ":" is omitted, OUT_STEP is assigned new value of B (since parameter OUT_STET exists but with value of "C"). So, Mine=C is printed.
ex: :- (here one of the 2 parameters is assigned to the whole thing, depending on if 1st parameter exists or not)
a=${test1:-tmp} => here if "test1" is undefined or null, then "tmp" i substituted, else value of "$test1" is substituted. So a=$test1 or a=tmp. Note, it's not $tmp, but tmp
Indirect expansion of parameter within braces is done when it's of form {!PARAMETER}. i.e 1st char is !. Bash uses the value of the variable formed from the rest of "PARAMETER" as the name of the variable; this variable is then expanded and that value is used in the rest of the substitution, rather than the value of "PARAMETER" itself. Since expansion is done, wild cards may be used.
ex: echo "${!SH*}" => SH* is expanded. Matching var are SHELL, SHELLOPTS (reserved vars). These var "SHELL SHELLOPTS" in printed rather than $SHELL, $SHELLOPTS value. FIXME Not clear ???
{ } can also be used for separate out a block of code. spaces should be used here. ex: a=3; { c=95; .... } echo $c;
{} are used in glob and RE/ERE too. They are used for expansion of patterns inside them. They are explained more in Regular expression topic. ex: echo a{c,d,e}f => prints acf adf aef. Here there can't be any space b/w inside braces, else braces won't be recognized as regex.
Other uses: ${#parameter}, ${parameter/pattern/string}, and many more.
[ ] => square brackets are used for globbing as explained above, but they are also cmd by themselves (i.e /usr/bin/[ is a binary executable, so putting "[" in script calls the cmd "[" . [ can be builtin cmd or an external cmd). It has the same functionality as the test
command (it's actually a synonym for test), except that the last argument needs to be the closing square bracket ]. Also, it needs blank space to be identified as a cmd, else parser will recognize it as a globbing char. The test is a built in command and is frequently used as part of a conditional expr. It is used to test file attributes, and perform string and arithmetic comparisons. It has a lot of options, and can take in anywhere from 0 args to 5+ args, wgich can do a lot of complicated tests. detailed doc for test cmd is here: https://www.computerhope.com/unix/bash/test.htm
ex: num=4; if (test $num -gt 5); then echo "yes"; else echo "no"; fi => This tests if $num is greater than 5. Since $num is smaller than 5, then "yes" is printed. Here it prints "no"
ex: num=4; if [ $num -gt 5 ]; then echo "yes"; else echo "no"; fi => this is equiv to above as cmd "test" is replaced by [ ... ]. NOTE: there are spaces on both side of [. ] also needs to have space on both sides, but ; instead of space is parsed correctly. However, if we omit the space before ]; then we get error "missing ]".
[ ] are used to denote array elements as explained in array section above, and for evaluating integer expressions as explained below.
arithmetic operators: $[expression] is used to evaluate arithmetic expr => similar to (( ... )). command "expr" or "let" can also be used to do arithmetic operations. Syntax similar to C lang are used here:
- number arithmetic: +, -, *, /, %, **(exponent), id++/id-- (post inc/dec), ++id/--id (pre inc/dec)
- bitwise: &, |, ^(bitwise xor), ~(bitwise negation), <<(left shift), >>(right shift). ~ is also used as expansion to home dir name.
- logical: &&, ||, !(logical negation)
- string comparison: ==(equality), !=(inequality), <=, >=, < ,>. These are not arithmetic comparisons but lexicographical (alphabetic) comparisons on strings, based on ASCII numbering. Since , we need to precede < > with backslash, so that these characters are not interpreted as redirection, but instead as alphabetic comparators. So, test "Apple" \< "Banana" is true, but test "Apple" \< "banana" is false, because all lowercase letters have a lower ASCII number than their uppercase counterparts. To test numerical numbers, we use -lt, -gt with the test cmd explained above.
- arithmetic comparison: -lt, -le, -gt, -ge, -ef, -ne => These are used for arith comparison (i.e [ 12 -gt 14 ] returns false, as 12<14 and not >.
- assignment: =(assigns RHS to LHS), *=, /= %= += -= <<= >>= &= ^= |= => these are assigments where RHS is operated on by the operator before =, and then assigned to LHS (i.e a*=b; is same as a=a*b.
- matching: =~ this is a matching operator (similar to perl syntax) where string on RHS is considered ERE, and is matched with string on LHS. ex: [ $line =~ *?(a)b ]
- condional evaluation: expr ? expr1 : expr2 => similar to C if else stmt
- comma : comma is used as separator b/w expr
ex: c=$[2+3]; = prints 5.
ex: a=7;b=10; c=$[a*b]; => gives 70. Note: we can use var name or var value interchangeably. i.e c=$[$a*$b]; also gives same result. Looks like internal conversion is done??
ex: expr $a + $b; => this prints 17. Note: spaces has to be provided for args of "expr" as it's syntax demands that. expr $a+$b will error out. expr 5 + 4 will print 9.
ex: a=$(expr 10 \* $a ); echo $a => will print value of 10*$a which is 10*7=70. NOTE: space is needed after last arg of expr too (due to syntax of expr), else it will assign null to a (as expr will not be evaluated)
NOTE: bash lacks "expression grammer", i.e we can't directly operate on 2 numbers. ex: c=$a * $b is not valid as direct number arithmetic or arithmetic comparison (as $a < $b) is not supported. That is why we have to use cmd "expr" or "test" to achieve this. This is a big drawback of bash. csh allows direct arithmetic operations.
ex: i=2; j=`expr $i '*' 5`; => as can be seen here, we can't directly do j=$i*2. We had to use expr along with cmd substitution.
II. double (( )) {{ }} [[ ]]
(( ... )) => double parentheses are used for arithmetic operations. We can't directly add numbers as c=a+b; We need to enclose them as c=$((a+b)) => this evaluates the expr and subsitutes the result. spaces are not required around (( )). However, this should not be used. Instead $[expr] is preferred.
ex: c=$((2+3)); => prints 5. same as c=$[2+3]
ex: ((a++)); => inc var a (c style manipulation of var). $ is not needed infront of (( unless we want to assign the result to some other var
Used as conditional constructs to test expr.
ex: (( $num == 40 ))
also in for loop as shown above.
ex: for ((i=0; i<10; i
++));
=> This works since any expr in (( .. )) is valid, so ((i=0))
assigns i to 0, ((i<10)) is conditional construct to test, and
((i++)) is arithmetic expr to inc i.
NOTE: $(cmd) and $((expr)) usage. single braces are used for cmd evaluation/substitution, while double braces for expr evaluation/substitution.
[[ ... ]] => double square bracket are equiv to new "test" cmd, which enables additional functionality, not available with old test cmd or single brackets. This was added later in bash, and is a keyword, rather than a pgm, so we won't find pgm named "[[". It does the same thing (basically) as a single bracket. It's not POSIX complaint, so use single [ .. ] instead of double [[ ... ]], to ensure portability. [[ ... ]] , but is a bash builtin. You can think of [[ ... ]] as a superset of [ ... ], where it can do everything that single bracket does, but also a lot more. Also, [[ ... ]] is easier to use as they don't require escape char for args inside (as no glob expansion etc done in new test). Also, ==, &&, || are supported in new test. In old test or [ ], = is used to test for equality (= can still be used to test for equality in new test also, but == preferred in new test cmd)
ex: var=abcd; if [[ $var == abc ]]; then echo "yes"; else echo "no"; fi => prints "no".
{{ ... }} => double braces are not defined as anything special in bash. So, do not use the. Using ${{SHELL}} will print nothing (${SHELL} is the correct form).
NOTE: $ infront of (), [] or {} has different meanings. $(cmd) causes cmd evaluation and substitution, while $[expr] causes expr evaluation and substitution. $[expr] is same as $((expr)). ${VAR} is same as $VAR, and is used to remove var name ambiguity.
8. pattern matching: glob and RE/ERE are used in a lot of unix cmds for pattern matching. Below special characters are used for that. More details in Regular expression section. NOTE: many of these characters are used for other purposes also (dual/triple purpose, depending on what else is around them, for ex: as shown in section above braces, brackets and parenthesis are used for eval/substitution etc). So, whenever we use these, we have to be careful that they are interpreted correctly.
* ? [] ! {} => These characters are used as wildcards in pattern matching in glob. curly braces may be used too, depending on your system's settings.
. * ^ $ \ [] {} <> () ? + | [: :] => These characters are used as wildcards in pattern matching in RE/ERE. These are explained in Regular expression topic.
9. looping constructs: These 3 are used to form loops. => until, while, for. "break" and "continue" builins are used to control loop execution. break exits the loop, not the script. continue continues the loop w/o going thru the remaining stmt in loop that are after continue.
- while: while <test-cmds>; do <consequent-cmds>; done => execute <consequent-cmds> as long as <test-cmds> have exit staus which is zero (i.e stop when exit status is non-zero, which implies failure of <test-cmds>). "true" may be used in <test-cmd> to run loop infinitely.
- while [ $i -lt 4 ]; do i=$[$i+1]; done
- until: until <test-cmds>; do <consequent-cmds>; done => execute <consequent-cmds> as long as <test-cmds> have exit staus which is non-zero (i.e stop when exit status is 0, which implies success of <test-cmds>). This is opposite of while, i.e contune loop while <test-cmd> is "false"
- until [ $i -gte 4 ]; do i=$[$i+1]; done => equiv to above "while" ex.
- for: There are 2 formats of for loop:
- for name in <LIST> ... ; do <cmds>; done => name takes values from each item of LIST in each loop. If "in <LIST>" not provided, then "in $@" is used as default (i.e values from cmd line args)
- for i in `ls -al`; do cat $i; done . => or "for i in $(ls); ..."
- for ( (expr1; expr2; expr3 ) ); do <cmds>; done => similar to C pgm style of for loop
- for name in <LIST> ... ; do <cmds>; done => name takes values from each item of LIST in each loop. If "in <LIST>" not provided, then "in $@" is used as default (i.e values from cmd line args)
10. Conditional constructs: These 3 are used to test for conditions: if-else, case, select. Note the ending keyword for if block is fi (if written backward), for case is esac (case written backword).
- if-else: if <test-cmds>; then <consequent-cmds>; elif <more-test-cmds>; then <more-consequent-cmds>; else <alternate-consequent-cmds> fi => execute <consequent-cmds> if <test-cmds> have exit status 0, else continue with further test-cmds.
- ex: if [ -a file.txt ]; then echo "file exists"; else echo "doesn't exist"; fi => here test-cmds are put inside [ test-cmd ], as square brackets provide test (explained below). option -e can also be used instead of -a.
- ex: if [ -d $dirname ]; then echo "dir $dirname exists"; fi => This checks for existence of directory with name $dirname. there are many more options supported. Look in bash manual.
- ex: if (( ("$year" %4) == "0" )) || (( $year -ne "1999" )); then echo "this"; fi => (( ... )) can be used to test expr as explained above. We can also use C style ?: within (( ... )) to test.
- ex: if (( var0 = var1<3?a:b )) => This is equiv to => if var1<3 then var0=a else var0=b. NOTE: there is no space anywhere b/w elements of ?:, as presence of space messes the parser
- if ...; then ... elif .. then ... else ... fi => here no ; needed for elif (the first if still needs a ;)
- if [ "$T1" = "$T2" ]; then echo expression evaluated as true; else echo expression evaluated as false; fi =>NOTE: = used instead of == as it's within [ ].
- ex: rm abc*; if [ "$?" -eq 0]; then echo "success"
- case: case <word> in pattern1 ) <cmd_list1> ;; pattern 2 ) <cmd_list2> ;; ... esac. Each of this pattern list and the cmd is known as a clause. So, we can have multiple clause, each for set of matching patterns. Clauses are separated by ;;, but ;& and ;;& can also be used which have different meaning. NOTE: pattern list only has ending brace, and no starting brace. Also double semicolon used instead of single semicolon for separating clausesThe same effect as case can be achieved with if-else, but the code looks messy if we have too many if-else, so case is preferred in such cases.
- case $animal in
- horse | dog |cat) echo "animal 1";; #1st bracket ( is optional. Also | is used to match multiple patterns, so that if any of them match, this cmd executed
- kangaroo | a[bcd]*) echo "animal2";;
- *) echo "unknown";; #* means default match, as * matches everything.
- esac
- select: same syntax as for, except that "break" has to be used to get out of select loop. select <name> in <words> ... ; do <cmds>; done . It is very useful for generating user options menu, similar to what you see when a bash script asks you for your choice on screen. More details in pdf manual
11. null cmd or colon (:) => It's a NOP cmd. It's shell built-in cmd, and it's exit status is always true (or 0). It's used in while loops, if-else, etc (explained later) when there's no condition to be specified.
ex: while : do ... done => here : returns true, so eqiv to while true do ... done. So, it's endless loop.
ex: if condition then : else ... fi => here "then" doesn't have any stmt to execute, so : used.
ex=> : $(a=23) => Any line in bash is cmd followed by args. Here if we just do "$(a=23), then that is interpreted as a cmd, which is not true. Putting : before that makes : a cmd, and rest as arg $(a=23), which works fine. similar ex => : $[ n = n + 1 ] =>works fine
11. existence of file/dir: -e/-d
#!/bin/csh
if (-d dir1) then ... endif
if (!(-e ${AMS_DIR}/net.v)) then ... else ... endif
exit6325
Advanced Bash cmds: Read in bash manual for more on this.