Linux/Unix cmds: All shell support majority of simple cmds used in scripting. Many of these simple cmds came from unix world, and as such we will refer to them as unix cmds (as ls, cd, mkdir, etc).

The syntax for these unix cmds is something like this:

Syntax: cmd <options prefixed by "-"> <other args for filtering>

  • Options: These unix cmds allow options to be used in the args, which may be specified prefixed by a "-" (i.e ls -a => here "a" is an option that specifies how listing should be done). The purpose of these options is to customize on how these cmds behave.We may combine multiple option in one by using just one "-", i.e ls -a -l -S may be written as ls -alS => here we combined 3 separate options into one. The options may be put in any order, so ls -lSa is same as ls -alS or ls -al -S, etc.
  • Other args: These unix cmds also allow other args such as name of file, etc that might be needed for that particular cmd. Not all cmds need these other args. Wildcards (as *, ?, etc) may also be used (in lieu of full name, etc), which we'll discuss under "regular expression" section. These wildcards are not part of cmd syntax, but rather part of shell. The shell expands these wild cards based on it's rules, and then passes the final expanded form to the unix cmd, which is then executed. However, some options in cmd line may dictate how these wildcards should be treated. POSIX std defines the behaviour of cmds, and all shells try to conform to POSIX std, so these unix cmds behave consistently across most of the shells. Also, quoting mechanism (single quote, double quote, backslash, etc) may hide special characters from being interpreted by shell, so that they are passed untouched to the unix cmd. Read man page of a unix cmd on your system (details below). So, sntax for unit cmds is:

 

There are more than 200 unix cmds supported by major shells. We'll just discuss a few in unix cmds section later (rest available in "advanced bash manual - part 4", see bash section, note some of unix cmds may be bash speciic or csh specific).

very good website: http://www.grymoire.com

Versions: Unix cmds may be implemented differently on different Linux systems. To make it consistent, POSIX defines a standard syntax for unix cmds. All GNU implementation of these Unix cmds follow this standard, by supporting a minimum set of options that those unix cmds will support. Since most Linux systems originate from same Linux kerenl and incorporate lot of basic stuff from gnu.org, they adhere to GNU POSIX standard. However, some vendors start supporting extra options on some of these unix cmds, which are not POSIX standard. These keep on getting added with time, and sometimes they become POSIX standard later on, and newer versions of these cmds start incorporating them. It's impossible to know what variant of unix cmd is supported on your particular system, but knowing the flavor of that Unix cmd can help you a lot to debug why some linux cmd works fine on some machine but not on other.

For any unix cmd, run -V for that cmd on cmdline. If -V is not supported, run "rpm -q <cmd>" on RHEL type OS. Both cmds should work on any Linux distro. That will show version number and other details. It's recommended to always use GNU cmd (downloaded from gnu.org), as that is what is supported on most Linux systems, and most of the documentation on the web refers to GNU implementation of these cmds (even though they may not state it explicitly). Some linux distro may implement their own version of some of these unix cmds, which may be slightly differently, so always check.

Ex: -V: running "grep -V" shows that grep is GNU grep and is version 2.20.

/home/aseth $ grep -V
GNU grep 2.20

Ex: rpm: running "rpm -q grep" shows that grep is version 2.20-3.

/home/aseth $  rpm -q grep
grep-2.20-3.el7.x86_64

If I want to update grep to later version, I can do that by going to gnu.org and downloading the latest grep. I could even download some other flavor of grep instead of GNU grep (i.e Solaris grep) by going to that vendor's website, but as I said, stick to GNU. Do not worry too much even if you are on a older version of a cmd, as most basic features are supported even on very old versions, and suffice for 99.9% of users.

MAN: man stands for manual, and is used to find all the details of any cmd. For ex, to see what "ls" cmd does, and all it's options supported, we can type this on terminal:

  • man ls =>  shows description and all options for "ls" cmd. man can be used for any other utility also such as diff, etc (not just unix cmds)

A kernel doesn't have any user interface. Kernel can only interact with other pgms. Any shell (bash, csh, etc) is a pgm that reads a line of input from you, and then interprets it as one or more commands to manipulate files or run other programs. These cmds may be of 2 types: built in or external. We'll look at both types of cmds:

CMDS: There are tons of Linux/Unix cmds with lots of options, that probably no one can learn. However, I'm listing some common ones that will allow you to get your job done. Of these cmds, grep and find are 2 most powerful cmds used widely, that you should look into. Below is a link to cheatsheet of most used linux cmds.

https://phoenixnap.com/kb/linux-commands-cheat-sheet

I'm also listing imp linux cmds separately in the list below:

1. dir/file cmds:

cd: cd <dir> => change dir to specified dir

  • cd  - => changes to previous working dir
  • ~ corresponds to internal variable $HOME (i.e home dir as /home/ashish). cd ~ is same as cd $HOME. Also ~ followed by user name means home dir of that user. i.e ~rohan means /home/rohan.
  • ~+ corresponds to internal variable $PWD or current working dir. Similarly ~- corresponds to internal variable $OLDPWD or previous working dir. 

mkdir/rmdir: mkdir <dir> => makes new dir. rmdir <dir> => removes the named dir.

  • If -p used then it creates dir if it doesn't already exist (-p is usually used for creating nested dir). ex: mkdir -p dir1/dir2/dir3 =>creates nested dirs dir1/dir2/dir3 instead of creating them one by one (so all 3 dirs creted by just 1 cmd).
  • -m is used to assign dir permissions. ex: mkdir -m 755 dir1 => applies "chmod 755" to dir1 (instead of running a separate chmod cmd on dir1)

pushd: pushd <dir>: (pushd means push dir) => saves current dir on top of dir stack and then cd into dir. This is convenient way to remember previous dir, and then goto that dir using popd.

popd: popd (popd means pop dir) => This pops dir from top of dir stack, and then cd into that dir. pushd and popd

dirs: This prints the dir stack that was formed by doing push cmds

cp: copies files/dirs. syntax: cp [options] src dest.

More details: https://www.tutorialspoint.com/unix_commands/cp.htm

  • copy files: cp src.txt dest.txt => cp file named as src.txt as dest.txt. Even links are followed and the actual files are copied. If you want to just copy the link (i.e preserve the soft link as it is), use -P.
  • copy dir: cp -r src_dir mydir/dest_dir => Here all files from src dir are copied to dest dir, as -r (or -R) means copy recursively. There's no diff b/w -r and -R. Behaviour of cp cmd varies based on whether dest dir exists or not. If the dest_dir doesn’t exist, cp creates it and copies content of src_dir recursively as it is. But if dest_dir exists, then copy of src_dir becomes sub-directory under dest_dir. When -r option is used, all soft links are preserved when copying recursively, i.e links are copied rather than the contents pointed to by the link. This is different behaviour than regular cp cmd without -r option, that copies the contents and not the links. To get same behaviour as regular cp cmd, we have to use -L which dereferences symbolic links. So, use cmd: cp -rL to copy files pointed by the links, when copying recursively.
  • Copy updated files only: cp -u [source] [destination] or cp --update [source] [destination] which will only copy files if the source files are newer than the destination files, or if the destination files do not exist.
  • preserve all attr: This is called "archive" and can be enabled with option "-a". It preserves link, mode, ownership, timestamps and all other attributes.
  • copy hidden files only: If you are wanting to copy hidden files and folders in Linux using the cp command, the first thing people will think of is cp -r .* /dir/ but this will actually match ./ and ../ as well, which will copy all files in the current working directory, and also copy all the files from the parent directory. So to copy only hidden files in Linux, you would want to run cp -r .[a-z,A-Z,0-9]* /dir/ this way it will only match files that start with a . and the next character is a-z, A-Z, or 0-9 and everything after that being a wildcard.

rm: removes files and dirs. To remove dir recursively, use rm -r *. However, sometimes dir can't get removed, because of permission not set correctly in all levels of hier. So, do chmod 777 -R top_dir_name. This sets rwx permissions to 777 in all subdir recursively. Then use rm cmd. An empty dir won't get deleted by rm <dir_name>. Either we need to use rm *dir_name* where we risk deleting other dirs too, or we can just use rmdir <dir_name>.

ls: list all files and dirs in specified dir. If nothing specified, then list for current dir.

  • ls dir1/dir2 => lists all files and dir in dir1/dir2. WE can also put \ at end, it makes no diff. i.e ls dir1/dir2/
  • ls dir1/dir2/* => ls by itself lists all files and dir in given dir, but no contents of subdir within that dir. By using *, we make ls list contents of that subdir too.
  • ls dir1/*my* will list all files and dir which have letters "my" in them. But if there's a dir which has my in it's name, then it will list it's contents too. We may not want that. In such case, we may use option "-d" to list dir only (since contents of most dir are files, so all those will be omitted making it easier to see). ls -d dir1/*my* => lists dir only which have "my" in their name. Other way to list such dir is by using find cmd with mindepth and maxdepth option. Check on "find" cmd details later.
  • ls -l -a => This gives long listing of files (i.e with all details as time, permissions, etc). -a causes all files/dir to be shown (including hidden files/dirs whose names start with ".")
  • ls -S -r => S option causes files to be listed by size (with largest file at top by default). -r option causes files to be listed in reverse, so largest file is at bottom (this is more commonly used). -h causes sizes to be displayed in human readable format (i.e MB, GB, etc).
  • ls -lrt => very commonly used cmd. long lists all files sorted by their timestamp (with latest file at bottom due to -r being used)

readlink: display full file path
ex: readlink -f file1.txt => displays full path of that file (so it's easy to cp and paste)

realpath: expands all symbolic links and resolves references to produce absolute path name:

ex: realpath /home/ash/project/../arm => displays real absolute path = /home/ash/arm


tree: tree -f => displays all dir/files etc in that dir as a tree structure. Easy to see.

less/more: These cmds allow us to read a file from within terminal itself, w/o opening any other pgm as emacs, vi, etc. less is most widely used, as it can do everything that more does. less is one the most used cmds for reading a file (NOT writing as that requires other text editors as vi), so get used to it. The options in less are similar to those in vi test editor cmd (many key bindings are same), so it's comfortable to use if you have used vi editor in the past. 

less file1.txt => shows contents of file.txt within the terminal few lines at a time. Use arrow keys to scroll up/down.

There are many options in less that can customize the file appearance.

less -N -M file1 => -N displays line numbers for each line in file. -M displays filename while the file is open.

Apart from cmd line options above, there are many interactive keystrokes/cmds that we can apply while the file is open. Many of the cmd line options above, work in interactive mode also, so you can apply them anywhere.

-N => When you type "-N", and then hit enter, it will show line numbers for each line in file. Pressing -N again gets rid of the line numbers. If file is very large, it may take a while, before it displays line numbers, so don't assume that the pgm is hung

g / G(-shift+g) =>hitting small "g" takes you to beginning of file while capital G takes you to end of file. If you prefix g with a number then it takes you to that line, i.e 234g takes you to line 234.

^G => hitting ctrl + G key displays the file name. Same as -M option.

 

2. exit => to exit from script. exit with optional number as arg, returns that exit code. So "exit 5" exits withs return code of 5 (failure)

3. input/output cmd: To print on screen or get input from keyboard, bash has multiple cmds. For outputting on screen, 2 cmds: echo and printf. To get input from user, builtin cmd "read". read is specific to bash, but echo and printf are POSIX unix cmds, and supported by most shells. However, behaviour of echo is inconsistent from shell to shell, so printf is prefrred.

echo: echo prints whatever is provided in it's args, terminated by newline (to suppress automatic newline at end, use option -n). However, \n, \t etc within args are printed as is. In order to recognize \n or "newline", use option -e (i.e echo -e "my \t name"). -e recognizes all backslash escaped char. ANSI C quoting may also be used to recognize these special char. ANSI C quoting puts these escape char in form $'\n', so that they are expanded by bash, as newline, tab, etc.

ex: echo me $'\t' name $'\n' => This prints tab and newline correctly. NOTE: everything until the end of line or ; (whichever comes first) is considered part of echo cmd (unless ; is hidden from shell by putting it inside single or double quotes)

printf: similar to C style printf cmd. It can print string to STDOUT by default, or using option "-v <var_name>" causes the o/p of printf to be assigned to var name "var_name"

read: It has lot of options to read from keyboard (read year => reads and stores value in var "year") or from file (read -u FD => to read from file descriptor FD)

ex: echo -n "Enter your gender"; read -n 1 gender; if [ $gender == "M" ] ...; => -n num option says read only "num" characters and return, instead of waiting for newline char

4. alias: allows a string to be an alias or substitute name for a simple cmd. unalias is used to remove an alias.

ex: alias ml="ls -al"

To see what alias is being used for any cmd, use "type" cmd. See in section below.

NOTE: alias are not expanded when shell is not interactive, so scripts using aliases may break. Using shell function is always preferred over using alias.

5. history: displays the history of all cmds run in that shell (with line numbers). A lot of options supported. History expansion is provided similar to csh. This helps to run a cmd quickly, without doing copy/paste or using up/down arrow keys.

NOTE: there are settings in .bashrc (for bash shell) that controls number of history cmds to show (HISTSIZE=100) and max no. of entries to hold in .bash_history file (HISTFILESIZE 2000). Many more settings possible.

!n => run cmd line n from history. ex: !123 => run cmd number 123 from history of cmds

!-n => run cmd n lines back. So, !-2 means run 2nd last cmd in history.

!! => runs last cmd. We can also apend to this cmd, i.e !! | grep "a" => this runs last cmd and searches "a" from o/p of that cmd. other ex: !!2 => appends 2 to last cmd run. So, if last cmd was "ls prog", then this becomes "ls prog2"

!string => run most recent cmd in history that starts with string <string> ex: !ma => runs last cmd that started with letters ma.

!?string? => run most recent cmd in history that has string <string>in it. ex: !?prog1? => runs last cmd that matches pattern "prog1". i.e if 2 cmds were run "make prog1" and "make prog2", then "make prog2" will be run.

What if we don't want to run cmd, but ibstead just display such cmds. For that use :p at the end => tht will display all such cmds w/o running it.

ex: !sudo:p => This will show all cmds starting with word "sudo" w/o executing it.

ex: history | grep <string> => This will searchf= for all cmds with matching string and display it. Here we used pipe and another linux cmd "grep" to filter out results.

6. eval => shell built in cmd. If we set a unix cmd to a variable, then to run that cmd, we need to use eval.

ex: CMD="ls -lrt"; eval $CMD; => If we just type $CMD that will error out.

7. exec: Normally when we run a new pgm, a new process is created. It runs in a new subshell, inheriting env var from it's parent shell. Sometimes, we want to run a new process which has no connection with parent process. In such cases, we can replace the new process with current process .In such cases, exec cmd is used to run the new process. It will replace contents of current process with this binary image of this new process. exec cmd may have 0 or more arguments. If no arguments provided, then exec is only used to redefine the current shell file descriptors. The shell continue after the exec. However if 1 or more arguments provided, then the first arg is taken as cmd, the remaining args are passed as arguments to that cmd. In this case, if there are cmds in a script after the exec cmd, then those cmds will never get executed. In this case, exec never returns as is the case with calling other pgms (since the new pgm has replaced the original pgm).

ex: exec wc -l myfile.txt => runs wc cmd passing "-l myfile.txt" as args to wc

ex: sometimes we see code where we execute a script2 within a shell script1, which sources the same original script1. This looks like a recursive infinite loop, bt can work well when script1 has tcl commands. This script myscript.sh is a bash shell script. Only 1st 3 lines are bash cmds, remaining are pt_shell cmds. On running, this script starts running as bash based on 1st line, ignores 2nd line which is a comment, and runs 3rd line "exec" cmd. That calls a pgm pt_shell which has a source $0 in its arg. pt_shell gets called in a new shell and sources myscript.sh. However, pt_shell is a pgm that can only source tcl files. It sees myscript.sh as a tcl file. So anyline starting with # is treated as comment. So, it ignores first line as comment. It looks at 2nd line, sees contiuation of it on 3rd line (due to \ at end of 2nd line being treated as valid continuation of comment line in tcl), and so considers 2nd line as comment (with 3rd line included in it). Then it moves to 4th line and start reading the file normally. So, here we are able to use the same file as both script and source file.

#!/bin/sh

# \

  exec pt_shell "source $0"

other tcl cmds ....

8. env: Unix has a way of defining a list of strings. These can be assigned a value (in the form name=value). These whole set is called the environment. When shell is invoked, it scans its environment and stores these environment var to pass it to any pgm, that is invoked from within the shell. Whenever any pgm is invoked, shell passes these env var to child processes. Some var are already set by default, as PATH, DISPLAY, SHELL, USER, etc. We can also add new var or modify existing var. Syntax differs slightly b/w bash and csh (more details in bash and csh sections.

csh:

setenv X_ROOT /some/specified/path
setenv XDB    ${X_ROOT}/db
setenv PATH   ${X_ROOT}/bin:${PATH} => PATH env var is unique in sense that we are adding values to PATH w/o deleting the old ones. To append new values, we use :

bash:

X_ROOT=/som/paths => This just defines a var X_ROOT with some value. However, it is not a env var
export X_ROOT => keyword export makes this var an env var, and it's now available to all child process

export X_ROOT=/some/paths => this cmd is equiv to above 2 cmds combined.

using Linux cmd "env" prints all these env var with their values. There are few other uses of env cmd:

1. env is used to run pgms, instead of running them directly. This is useful in cases where we do not want any aliases, shell functions, shell builtin cmds to replace or alter args of any cmd run from within shell. Since env is an external cmd, it has no knowledge of aliases, and simply passes the pgm and it's args to an "exec" call.

ex: env name=value name2=value2 program and args => here pgm and args are preceeded by extra env var whose values are as defined. This runs the cmd "program and args" with an environment formed by extending the current environment with the environment variables and values as shown. This is helpful in cases, where we want just this pgm to have these appened env var, but still leave the env intact for the current shell

ex: env -i <your cmd> => -i ignores environment completely when running the cmd. This is generally useful when creating a new environment (without any existing environment variables) for a new shell. ex: env -i /bin/sh => Here new bash shell is created with environment completely cleared out.

ex: env -u VAR1 => This unsets VAR1 environment variable

2. env is used as first line in scripts. The reason is this => env always searches the PATH for cmd. So, if we provide first line of bash script as #!/bin/bash, it's possible that bash is not in bin dir. Even if bash is in bin, #!bash doesn't work (even though PATH variable has /bin in it). This happens because the first line doesn't search the PATH variable for that bash pgm. To avoid specifying a hardcoded path to bash, we do this:

ex: #!/usr/bin/env bash => Here env being an external program searches PATH for bash pgm, which it finds in /bin/bash, and then runs that pgm called "/bin/bash". NOTE: this still requirs absolute path for env, but env is almost always in "/usr/bin/env" on Linux systems.

NOTE: When Bash invokes an external command, the variable ‘$_’ is set to the full pathname of the command and passed to that command in its environment. 

9. grep: Global Regular Expression Print. Most powerful for searching any pattern in multiple files across dir hier. This cmd is most widely used unix cmd. Lots of options available that you should look into.

detailed usage here: https://www.computerhope.com/unix/ugrep.htm

grep <pattern> file_name => As explained under regular expression section, pattern is Regular expression while filenames are simple glob patterns. By default, grep patterns are BRE, but can be made ERE with -E. To search for any pattern as it is without interpreting it, use -F (this means searching for /a/b#[$"/ef\f will serach for this pattern exactly w/o interpreting any character as special character). Using -F is same as using fgrep (called as fast grep), and using -E is same as egrep (called as extended grep) but usage of fgrep and egrep is deprecated. -r is used to earch recursivel, and is same as using rgrep.(-r doesn't search thru soft links, use -R for that) grep returns names of files that match the given pattern along with the line that matched (separated by :).

grep --color "am(18a)" *.txt => This will search for BRE "am(18a)" as ( ) are not treated as metacharacters (but instead as literals) by default in BRE, so they do not need to be escaped. If we do escape them "am\(18a\)" then they will match "am18a" and anything matching 18a can be recalled later using \1. --color highlights the matched patterns.

grep -E "am\(18a\)" *.txt => This is exactly same as above (w/o the highlight for matched pattern). It will search for ERE "am(18a)" as ( ) are treated as metacharacters (and not as literals) by default in ERE, so they need to be escaped. If we do not escape them "am(18a)" then they will match "am18a" and anything matching 18a can be recalled later using \1.
grep -A 5 -B 4 "anon" *.txt => This will search for "anon" in all files, and then print 5 stmt after and 4 stmt before each matching line.
grep -rl "myname" . => searches for string "myname" in all files of that dir recursively. -l prints file names with a match

grep "name1" */*/*.tcl => This searches for "name1" in all files ending with .tcl extension in dir 2 levels down. If we use recurive option then grep -r "name1" *.tcl doesn't work (It will say doesn't match anything, even though there may be matches in .tcl files).

grep -r "name1" --include="*.tcl" . => With -r option, use option --include="*.tcl" to search only in .tcl files. "." in the end implies search in current dir.

grep -E "^start|^end" *.txt => To find all lines starting with either "start" or "end". Here | is or operator only available in Extended expression, so use -E (though grep "^start\|^end" works too implying | is valid in RE as long as escape char \ is used before it)

zgrep: grep doesn't work on zipped (compressed) files. zgrep is used instead in such cases. Most options remain the same as grep (-r or recursive doesn't work on zgrep).

ex: zgrep "abc" rpt.gz

10. find: find is other very powerful cmd to find any file or subdir in any dir matching a given name. This is very useful where you partially remember name of a file, or you want to find all files with a given file extension, etc. This searches recursively in all subdir starting from a given dir. Lots of options available that you should look into.

detailed usage here: https://www.computerhope.com/unix/ufind.htm

Both dir path and name of file are provided as glob pattern. Quotes are needed when pattern provided for name of file. Without quotes or some other mechanism to escape metacharacters, the patterns would be expanded be shell, which may give incorrect input to the "find" cmd. We want "find" cmd to see the pattern in the name of file, so we use single or double quotes.


find /dir1/dir2 -name file1 -type f => Find files named file1 in or below the directory /dir1/dir2. Here the file name "file1" needs to match exactly, i.e file11 won't match. "-type f" means find only files with given name, while "-type d" means find only dir with given name
find /home/docs/ -name "*driver*" => lists all files starting from dir "docs" that have "driver" in part of their name. This will match my_driver, driver.txt, or any such files

find . -iname "*.Php" => . means start searching from current dir. option -iname ignores case, so test.php, test2.PHP, test3.Php all match

find ./test ! -name "*.php" => ./test means start searching from "test" subdir in current dir. ! does invert match, i.e all files which don't end in .php. Instead of !, we could also use "-not" for invert matching

find ./test -name ".*.txt" -not -name "*.php" -o -name ".*.rpt" => Here we have multiple criteria with -name. However -not means not match .php, while -o means or, so match names with .txt or .rpt

find /home/*/project/*/*/code/ -name "*my_test?_??_.*.txt" => Here we match in multiple dir since * is in dir name. Pattern matching matches my_test1_00_rpt.txt, my_test2_99_.txt, etc.

find -mindepth 2 -maxdepth 3 /home/docs/ -name "*driver*" => This cmd is used to limit search to certain depth, otherwise find serches all the way to leaf dir. Here, it starts search from 2 dir below /home/docs and stops at 3 dir below that, so in essence it searches in only 2 dir at /home/docs/*/*/*driver* and /home/docs/*/*/*/*driver*. "-mindepth 1 -maxdepth 1" is used to search in current dir only without going any deeper.

11. cut: cut text from lines. cut can do most of what we do using scripting languages as perl, sed, etc.

cut -d":" -f2,4 file_in > out_file => This cuts the file using delimiter ":". By default delimiter is TAB (not individual space), so if you have TAB separated text, then you can omit -d.  To cut it by a sinngle space, do -d" ". After cutting, it names the columns as f1, f2, f3, etc that are separated by this delimiter. f2,4 says that print out column 2 and column 4. The output is then printed to out_file.


more file1 | cut -d'=' -f1 | grep REG2 => This cuts each line of file "file1", with delimiter =, and names fields as f1,f2 with = as delim. Then it takes field f1 out and prints it. Then grep prints only those lines that have REG2 in them
ex: CONFIG_REG2.RESERVED = new("RESERVED", 26,..) => for this file, cmd above gives CONFIG_REG2.RESERVED

grep and cut cmds may be combined via pipe to edit multiple files.

ex: grep "assign SDA" tests/I2C/*/test.v =>this may return o/p which is something like as shown below (names of files where pattern was found, along with the line where it matched separated by :) then cut gets file names, and then we checkout those files
tests/I2C/dir1/test.v:assign SDA = 1'b0
tests/I2C/dir2/test.v:assign SDA = 1'b1;

ex: cut -d":" -f1 => If we apply cut cmd to above o/p, then we get f1, which is names of files. Then we can do whatever we want with these files.

ex: grep "assign SDA" tests/I2C/*/test.v | cut -d":" -f1 | xargs cleartool co -nc => The 1st part greps "assign SDA" stmt and lists all the files that have it. Then we use ":" as delimiter, and extract name of files. Then "xargs" cmd takes all the names of files as arguments to cmd "cleartool co -nc". So, this causes all files to be checked out which had that "assign SDA" stmt in them.

ex: grep "assign SDA" tests/I2C/*/test.v | cut -d":" -f1 | xargs sed -i 's/assign SDA/assign #10 SDA/' => replaces "assign SDA" with "assign #10 SDA" using sed to all files that have this line. sed cmd explained later.

ex: more ~/z.txt | tr -s " " | cut -d" " -f2 | less => Here we are cutting by single space. If our text contains multiple spaces, then use "tr" cmd (explained below) to squeeze multiple spaces into one.

12. sort: sort is another very useful cmd to sort lines in a file. It sorts lines alphabetically (not ascii), and then prints o/p on screen. -u sorts them uniquely (i.e if it finds multiple lines with exactly same content, it only prints 1 of them)

ex: : grep Scope /sim/.../violations.txt | sort -u => It looks for "Scope", and then sorts them uniquely. This is very useful to uniqify gate level timing violations coming from cadence simulator.

sort can also be used to sort on specific columns separated by space (instead of from start of line) by using option "-k <col_num>", and can sort numerically on that entry (instead of sorting alphabetically) by using "-n" (sorts from smallest to largest number). -r reverses the sorting order (irrespective of whether it's ascii sorting or numerical sorting)

ex: grep "me" *.txt | sort -n -k 3 > sorted_col3.txt => This sorts the file based on entries in 3rd column and does that numerically. So, if 3rd col has 21, 36,05, 17, then it will sort as 05,17,21,36.

13. tee: used in complex pipes to record some stage o/p (diverts copy of that to the named file).
syntax: tee [ -ai ] file_list
-a => appends o/p to existing file, w/o this file is overwritten.
-i => Ignore interrupts. If you press the Delete key, tee ignores the interrupt signal sent to it and continues processing.


Ex: sort somefile.txt | tee sorted_file.txt | uniq -c | head 12 > top12.txt

14. list: returns list comprised of all the args. braces and backslashes get added as necessary
syntax: list ?arg ... ?arg
ex: list a b "c d e " " f {g h}" => returns this: a b {c d e } { f {g h}}

15. tr: This translates, squeezes or deletes characters. It takes input from STDIN (not a file), and writes to STDOUT. So, we can only use this cmd in a pipe |, or provide the text directly. With tr cmd, -d deletes, -t renaslates and -s squeezes each input sequence of a repeated character that is listed with a single occurrence of that character. This cmd is very useful to use to pipe to other unix cmds, which don't handle multiple spaces.

ex: more ~/z.txt | tr -s " " | less => This truncates multiple spaces in file z.txt with single space. Now, this can be used with "cut cmd" for ex. See in cut cmd section above.  

16. system cmds: These are cmds that show system related info.

ps: process info
ps -ef => lists all running process. /proc/ dir has all process in it as separate dir (with PID name), with files containing details about processes.
ps cmd greps process PID and other details from this /proc/ dir

lsof: list open files

lsof -D dir1/ => This lists all open files in a given dir, and the process which is using them. -D option is needed only if we want to recursively apply it on all the files in the dir. To apply on a single file only "lsof <file>". This is very helful, when you can't delete a file due to "device/resource busy", and ps doesn't show which process is keeping it open. This is reverse of "ps" where it says which process has that file open, and then we can kill that process using kill -9 <pid>.

uname: info about system
~ [538]$ uname -a => prints all system info: kernel name, hostname, kernel-release/version, macine, processor, h/w platform and OS.
Linux lindesk51.dal.design.ti.com 2.6.9-89.0.16.ELxenU #1 SMP Tue Oct 27 04:12:25 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
NOTE: above cmd doesn't show which Linux is used. To find that, do: ls /etc/*rel*. If it shows one of the files as "/etc/redhat-release", then its redhat OS. If it shows "/etc/SuSE-release", then it's SuSE OS. You can see the version number by looking into this file. As an ex for centOS: /etc/centos-release file shows => "CentOS Linux release 7.7 (core)" implying it's version 7.7
Other way to find OS is to run: lsb_release -a => std unix cmd which prints certain LSB (Linux Standard Base) and Distribution information. -a displays all info.

hostname => shows name of the host on which this cmd ran.

who/w: who and w cmds  displays info about all users on that machine. w cmd provides more details than who and also shows what processes they are running.

which: shows the full path of any cmd (i.e path of binary or pgm being executed). This is very useful cmd to see where any pgm is getting executed from.

ex: which ls => /usr/bin/ls => this shows that path to unix cmd "ls" is as shown

file: return the "type" of a file.

ex: file my.png => PNG image data, 1600 x 900, 8-bit/color RGB, non-interlaced

type: indicates how its argument would be interpreted if used as a command. If cd s aliased to something else, it would show that. See args for type cmd.

ex: type cd => returns "cd is a shell builtin"

NOTE: It only seems to work on bash shell, not on csh.

whereis: locate source, executable image, and/or manual pages for a command. this is a superset of "which" cmd, which also shows source of code, manual pages, etc. This is one of the most underrated cmd, and is very helpful

ex: whereis ls
ls: /usr/bin/ls /usr/share/man/man1/ls.1.gz /usr/share/man/man1p/ls.1p.gz

17. super user or root user cmds:

Whenever Linux is installed, a user with username "root" is created by default, and we set a password for root user at that time. Then we can create additional users, setting their own passwords. Root user is also called superuser, and has write permission to all files in system. / dir is called root dir, and root user has read/write permission to everything under this root dir. The prompt will change to "#" (from "$" in bash or "%" in csh) to indicate that current user is root. If user's name is displayed, then it will show root# for root and ashishb$ for other users.

sudo: super user do. "super user" and "root" are used interchangeably to imply a user who has admin rights. Most cmds for making any changes in Linux system (i.e install new software, etc) are allowed by root user only which has also system privileges. This is done for security reasons. However, since most of the users login into their own account, they can't have root privileges. sudo is the cmd that allows us to achieve that. By typing sudo before any cmd that requires root privileges, we temporarily allow current user to fake as a root user, and allow him to run that cmd. This cmd has many options, details here: https://www.sudo.ws/man/1.8.16/sudo.man.html

ex: sudo yum install tcl => yum install is reserved for root user only, so if any other user tries to run "yum install tcl", it will give "privilege error". However, by appending "sudo" before the cmd, it allows current user to act as root user for this cmd only. It will ask for current user password (NOT the root user password), and on matching password, it runs this cmd w/o issues.

So, sudo allows us to run a privileged cmd w/o logging in as root. This is most preferred way, as we don't have to login as root. So, the question is how is it secure? It's secure because only those users whose username match against names in special file called sudoers (in /etc/sudoers) are allowed to run the sudo cmd, and which cmds they can run is also specified in this file. Others will need to login as root by using su cmd. Other users on system can be granted super user rights by adding their user names to this file, and also specify what cmds that can be allowed to run.

sudo can be used to change user to root (similar to su cmd below):

sudo -i => -i or --login allows us to login as target user. Here, since sudo cmd is used, target user is root. So this allows us to login as root in that teminal, so that we don't have to type sudo cmd all the time. We see prompt change to root# from sanjay#. Now we can type any cmd w/o sudo prefix. This does what su cmd does below. su cmd below to change to root user doesn't work in ubuntu, but this cmd works.

su: switch user. This is similar to sudo cmd, but this is used to switch user from current user to any other user whose account exists on system. However, here we need password of the other user whose a/c we are switching to.

ex: su ashishb => this asks for the password for user "ashishb", and on matching the password, switches to user account "ashishb" from current account. However, this doesn't switch the environment of current user to ashishb env, so cmds as cd, running executables with permissions set to ashishb may not work. To get even the env of new user, we have to do proper login into other user via - or -l, i.e: su - ashishb (or su -l ashishb) => this switches the env to ashishb env

ex: su => su cmd w/o any options switches to root account. That is why sometimes su is abbreviated as "super user". However, we need to know the root password for this. Moreover, just as in sudo, su w/o - sign, doesn't switch the env to root, but when we use "su -", the env of current user switches to root (i.e pwd will print /root instead of /home/ankitk as user's current dir)

Other way to switch to root account is via "sudo -i".

To logout from root, run "exit" or "logout" to return back to your normal user account. "su -" (instead of just su) is preferred way to login as root, since this command simulates a complete root login (running init script, setting env var, etc)

Ubuntu root account:

Root account is setup on most linux distro when installing but is not done for ubuntu based derivatives. This means that you don't get to choose a root password for ubuntu based systems. Actually, the developers of Ubuntu decided to disable the administrative root account by default by not assigning a root password.

Article on enabling root a/c: https://www.computernetworkingnotes.com/linux-tutorials/how-to-enable-and-disable-root-login-in-ubuntu.html

So, when we run su cmd for ubuntu based systems, and provide a password, it will always say "incorrect password". We can enable root account as shown in link above. But using "sudo -i" works too.

CentOS root account:

In CentOS, su cmd works by default as root password is setup during installation. (If on single user system, default user is also assigned as admin, so that user's password is also the root password).  Now we can run any cmd w/o needing to enter "sudo" for any cmd, as we are root user and have all the privileges.

 

misc cmds: These cmds have been explained in other sections of linux, so look over there too for other details.

lsblk: lsblk cmd is explained in "File systems". It allows us to list all devices. 

dd: dd cmd is explained in "File systems". It allows us to copy contents of any device.

printer cmds:

lpr: print to printer
1. lpr -P LKE_4001 file.txt => prints file.txt in non-duplex mode. We can set options, to print it in desired way. for that we provide options with -o
. We can get valid option names for a printer by runnig cmd "lpoptions" on that printer. Different printers have different options
2. lpoptions -p LKE_4001 -l => gives all options for named printer. For ex: for duplex, we see option "Duplex/2-Sided Printing: *None DuplexNoTumble DuplexTumble" meaning option_name is Duplex, and option_value is cauurently set to none, although others values allowed are "DuplexNoTumble" and "DuplexTumble".
3. lpr -PLKE_4001 -p -o Duplex=DuplexNoTumble -o HPwmFontSize=pt30 file.txt => -p is used for pretty printing. default fontsize of pt48 too large, so using pt30
4. lpr -PLKE_4001 -duplex test.txt => instead of above, we can use this as for every printer that can duplex, there exists a -simplex and -duplex queue that simply forces either option

a2ps: converts to ps format
a2ps -s 2 -P LKE_4001 file.txt => prints duplex. use option -Pvoid to print to void printer(displays no. of pages that it's going to use). -f8 prints using font size=8.

enscript:
enscript -BH -r -2 -dLKE_4001 -fCourier7 -DDuplex:true file.txt => prints in a pretty format. alt to a2ps

lpstat: printer statistics
lpstat -d -p => gives names and desc of all printers connected.

misc programs and useful utilities: There are many other useful pgms included in all major linux distro by default.

tar/untar, zip/unzip: these cmds are actually linux pgms to compres/decompress files and dir. First tar is used to put the whole dir into a file (.tar), and then gzip/bzip2 is used to compress that file (.tar.gz). gzip/bzip2 compression doesn't directly work on dir, so we've to use tar. Both gzip and bzip2 compression algo are supported with tar, though gzip is more common. bzip2 has higher compression rate, but takes more time to compress. To get our dir back, we do the reverse: gunzip first and then untar (i.e tar with diff option). The resulting file .tar.gz/.tar.bz or .tgz/.tbz is called a "tarball". .gz is for gzip while .bz or .bz2 is for bzip2

  • Tar and zip: First tar and then zip.
    • tar -cvf file1.tar dir1 => tars dir1 into file1.tar and puts the named tar in current dir. Option -c => create archive, -v=> verbose mode where progress is diplayed on terminal, -f => filename of archive (here it's file1.tar). W/o -f, tar will be created with same name as name of dir being tarred (i.e dir1.tar). We could also create tar of files by providing names of files or isung glob (*, etc). ex: tar -cvf file1.tar dir1 dir2 dir3/file1 file4 => This puts all these in a single tarball. "--exclude=file_name_pattern" switch can be used for each directory or file you want to exclude (so that not all files within a given dir are included in tarball).
    • gzip file1.tar => zips file1.tar into file1.tar.gz and puts it in current dir. .gz file is compressed to about 1/4th the orig size.
    • Both the cmds above can be combined into one by adding "z" option to tar which says zip the file using gzip compression too. -j compresses it using bzip2 compression algorithm. ex: "tar -zcvf archive.tar.gz directory/"
  • Untar and Unzip: First unzip and then untar.
    • gunzip file1.tar.gz => unzips file and creates file1.tar file. If we want to keep existing tar file, and create new untarred file (usually in cases where we don't have write permission for that dir), we can use -c option of gunzip. The -c option will write the unzipped data to stdout and will not remove the original .gz file. We rediect this stdout to a file: ex: gunzip -c file1.tar.gz > ~/file1.tar
    • tar -xvf file1.tar => leaves original tar file intact. extracts all files from the tar file, and keeps them in same dir with dir name=file1. To get content of only desired files, we can provide names of files/dir and only those will be extracted, i.e "tar -xvf file1.tar file1/file2 file1/dir3/file3" => extracts only these files/dir in "file1" extracted dir.
    • Both the cmds above can be combined into one by adding "z" option to tar which says gunzip the file too. ex: "tar -zxvf archive.tar.gz". switch "-C ~/dir2" will extract the archive in ~/dir2 dir.
  • Viewing contents w/o untarring or unzipping: We can extract files above using unzip and untar shown above. However, many times we've to look thru bunch of tar files, and it's not always possible to untar.unzip everything as it takes time and lots of space. We just want to look at contents of tarball or search for patterns in individual files. For this use below options:
    • less file1.tar => This will show all subdir and files inside the tarball. Won't show contectn of files though
    • zgrep -Hna "my*pattern" file1.tar => zgrep will search for given pattern "my*pattern" in individual files of the tarball. -H lists the filenmae that contains the mtach, -n shows the line number while -a treats all files as text files. Unfortunately -H doesn't show the individual filename, but only the tarball name which is not very helpful. "--label=filename" may also be added to zgrep to search only in given files.
    • tar -tvf file1.tar dir1 =>This shows the contents of archive. -t flag specifies that we need to only view the contents of the archive. This is same as using "less" above
    • tar -xOf /dir1/*.tar.gz */logs/syn.*.log => Option -O (NOT zero(0) but capital O) will not extract the files on disk, but will rather extract i to standard o/p (On screen). We can pipe it thru "less" to view the file or search for any pattern using grep. Very helpful option.

openssl: This is a linux pgm to encrypt/decrypt files, openssl supports wide range of ciphers.
openssl aes-128-cbc -e -in file1 -out file1.aes -K -iv 0 => uses aes for encryption (openssl enc -ciphername => openssl ciphername).e=encrypt, d=decrypt, i/p=file1, o/p=file1.aes, -iv specifies actual initialization vector(IV) to use. The cipher modes CBC, CFB, OFB and CTR all need an IV. -iv must be used when only the key is specified using -K, and no password is specified (-k or -pass specifies the password to use). IV should consist of hex digits only. A new, random IV should be created for every encryption of data. Think of the IV as a nonce (number used once) - it's public but random and unpredictable. -nopad disables padding (otherwise encrypted data is larger than i/p data, as engine adds extra bytes to i/p). key is in hex only (-K F01A95...4C => no need to use 0x...)

od: octal dumping pgm. Can also dump hex, decimal or ascii. -a=ascii, -x=hex, -d=decimal, -o=octal. See c_pgm_lang.txt for ASCII codings.
ex: od -a test1.txt (test1.txt has: abcd CR 1234 CR tab CR)
0000000 a b c d nl 1 2 3 4 nl ht nl => shows byte 1 is a, byte 2 is b, byte 5 is newline, byte 11 is tab(ht) and so on.
#NOTE: we can od for an executable file to see what's in there. Usually it's long, so do "head" to see only first few lines
od hello | head

sendmail: email: /usr/sbin/sendmail This email address is being protected from spambots. You need JavaScript enabled to view it.

diff: diff is very important pgm to find out differences b/w 2 text files. It's used very often. However, it's o/p is little cryptic, so "tkdiff" is used which is gui rep of differences. meld (open source) and bc (beyond compare, a proprietary tool) are other gui based diff tools which can diff entire dir recursively.

ex: diff -c file1.txt file2.txt => shows differences in content of file1.txt vs file2.txt. file2.txt is treated as reference for comparison purpose. o/p is little cryptic. To make it more human friendly, use option -c

ex: diff --color -c -i -w file1 file2 => here differences are highlighted in red color. -i ignores cases. -w ignores all white spaces (this is very helpful, where we don't want space differences in different files to be reported as actual diff). there are many other variants of space diff such as -b which ignores changes in amount of white space, -E,(ignores changes due to tab expansion), -Z (ignores white space at line end), -B (ignore blank lines), etc.

ex: diff -qr dir1/dir2 dir3/dir4 => -q only shows if files are different w/o showing actual differences. -r does recursive comparison for any subdir too, for all files in given dir

 

cron/at: Many times, we want to run a script automatically every day or at particular times. This can be easily be done via 2 cmds that are widely supported in Linux OS. However, you will need to install the software, if it isn't included by default.

  • cron: "crond" is the cron daemon (background service) that runs cron jobs.
    • To install, type => sudo yum install cronie (for RPM OS m/c). Then start and enable service using systemctl.
    • To add jobs to cron, we ned to add jobs to crontab file. Open it using crontab -e.
    • Syntax for each cmd is => <time/date to execute> <cmd_to_execute>. There are 5 fields for time/date. 1st field is for inute, 2nd for hour, 3rd for day of month, 4th for month, 5th for day of week.
      • ex: 20 3 * * * /home/run_it.sh => This will run "run_it.sh" script everyday at 3:20am
  • at: Many times, we need to run a job only once in future and never again. at is easier to use in such situations. Also, many times we don't have admin permissions to run cron on other m/c, in those cases at can stll be used. Similar to cron, at has it's daemon, "atd" that runs at jobs.
    • To install, type => sudo yum install at (for RPM OS m/c).
    • Syntax for running a acript at specific time => at <time/date> -f <script_name>
      • ex: at 3:50 -f ~/my_script.csh => This wll run script at 3:50am
    • To use at for running daily jobs, we can use a trick used in other scripts => We run the script using at, and then schedule the same script inside the main script. This way, the script will run once and then reschedule itself to run again after specified time. Ex below makes the script run every 24 hrs.
      • my_script.sh => (Add this line in script) => ./daily/script_daily.csh; echo "$0" | at now + 24 hours
      • chmod 755 my_script.sh
      • ./my_script.sh | at now

 


 

Aside from Gnu Autotools, one other tool that is very popular for building projects is "CMake". Just like GNU Autotools, it's open source and cross platform. It also provides support for testsing and packaging using Ctest and Cpack. Cmake project started in 2000. The executable pgms Cmake,  Cpack and Ctest are all written in C++. CMake/CPack/Ctest are case insensitive, meaning that we can write name of pgms, files or text in files in any case, and it will mean the same (for ex: CMake can be written in any case, i.e cmake, Cmake, etc, and it still runs correctly).

Links:

https://cmake.org/cmake-tutorial/

https://www.johnlamp.net/cmake-tutorial.html  => whole tutorial also avilable on pdf: https://www.johnlamp.net/files/CMakeTutorial.pdf

 https://ecrafter.wordpress.com/category/programming/

https://ilcsoft.desy.de/portal/e279/e346/infoboxContent560/CMake_Tutorial.pdf

Just as Make uses Makefile which is used to build source code, CMake has CmakeLists.txt files. These describe your project to CMake. Cmake generates Makefiles for you in Linux, instead of we writing it manually. Then we run make using these Makefiles to create executables and libraries, just as we do with regular make. These Makefiles are native build files (i.e generated based on what's available on current system), so can be run on that system (hence called cross platform)

CmakeLists.txt are fairly simple especially compared to Makefiles.Note Cmake is also inteneded for other platforms as Microsoft Windows and Apple Mac, so it is used to generate other build system other than Makefiles. We'll talk only about Unix Makefile generation using Cmake, as that is what Cmake is used for on Linux platforms.

A simple 3 line CmakeLists.txt is as follows:

CMakeLists.txt: NOTE: End of each line is signaled with newline, no ; or anything else needed to signify end of line

cmake_minimum_required(VERSION 2.8 FATAL_ERROR) #This line is optional, but is strongly recommended. This adds version number. 2.8 as version number. Last option "FATAL_ERROR" is optional. This whole line says that if version of cmake is < 2.8, then cmake would error out with FATAL ERROR. If you aren’t sure what version to set use the version of CMake you have installed (cmake -version). It is recommended that this command be used in all top level CMakeLists.txt

#This is comment. Comments can also be used at end of cmd on same line as shown in above line. Multi line comments are as #[ ... #]
project("My tutorial") => Name of project, if name has spaces, put them inside " "

add_executable(tutorial tutorial.cxx) => This command tells CMake you want to make an executable and adds it as a target. The first argument is the name of the executable and the rest are the source files. You may notice that header files aren’t listed. CMake handles dependencies automatically so headers don’t need to be listed. 

 Once we have this file in a dir, we can run cmake. We will do an out of source build, where we create all new files in a separate dir. This is a recommended approach for any build software, as it keeps generated files separate from source files. We can also do in source build where we create all new files in same dir, but in source build are not preferred.

Let's start with same example as of the tutorial in Make:

 /home/aseth/ => cp hello.c and config.h in this dir.

Create CMakeLists.txt as below, with only 2 lines:

project("My tutorial") # Name of project,if multiple words, separated by spaces, put them inside " "
add_executable(hello hello.c) #name of config.h is not put here, as CMake handles headers automatically

mkdir build
cd build/
cmake -G "Unix Makefiles" .. => This creates "Makefile" for us in same dir as "Makefile". (option -G provides Generator name, here we create Unix Makefiles as generator). It also creates couple of subdir here. ".." is needed since that specifies path of CMakeLists.txt, which is 1 level up. Here we are doing an out of source build as we are in a separate dir build. If we were doing an in source build, by running this cmd in main dir (/home/aseth), we would have to replace ".." with "." as that specifies the path of CMakeLists.txt which is now in current dir for in source build.

Build package => similar to AutoTools
make VERBOSE=1 => runs make as you would with a Makefile created manually. Makefile created above by Cmake is quite fancy. It suppresses the standard output. While this provides a neater and cleaner experience it can make debugging more difficult as you can’t check the flags passed to the compiler, etc. We can get all of that output by running make VERBOSE=1.

NOTE: in make cmd above, we didn't specify a target. If you look at Makefile generated by cmake above, you will see it has lot of std targets as all, clean, etc. By default, target "all" is run, so above cmd is equiv to "make -all VERBOSE=1"

executable hello is created in this build dir. object files for hello and other related files are all created in CMakeFiles/hello.dir subdir.

Install package => similar to AutoTools

make install => we can omit build step above, as both build and install can be handled by this cmd

CMakeCache.txt => this is one other important file created during build process, in build dir itself. It has entries of form: VAR:TYPE=VALUE. It speeds up build process. There is no need to modify it manually.

Syntax: All files in CMake are written in CMake syntax. All details of syntax can be found on CMake website. We'll just discuss important ones.

Variables: Cmake comes with list of predefined var as CMAKE_BUILD_TYPE, etc. These Variables can be changed directly in the build files (CmakeLists.txt) or through the command line by prefixing a variable's name with -D (i.e -DBUILD_SHARED_LIBS=OFF )

CMAKE_MODULE_PATH => Path (dir) to where the CMake modules are locate. These CMake modules are loaded by the the include() or find_package() commands before checking the default modules that come with CMake. By default it is empty, it is intended to be set by the project.

CMAKE_INSTALL_PREFIX => Where to put files when calling 'make install'

CMAKE_BUILD_TYPE => Type of build (Debug, Release, ...)

BUILD_SHARED_LIBS => Switch between shared and static libraries

We can change predefined var or define our own variables using set cmd. Data type for variables are strings and list (which are lists of strings). All var can be accessed via ${VAR}.

set(STRING_VARIABLE "value")

set(LIST_VARIABLE "value1 value2")


To print the value of a variable, use the message command and deference the variable using ${}:

message("The value of STRING_VARIABLE is ${STRING_VARIABLE}")

ex:

set(CMAKE_CXX_FLAGS "-Wall -std=c++0x") => this sets additional compiler flags to be applied to both compiler and linker. This var is predefined var, but we can assign any value to it.

Conditional constructs:
1. IF() ... ELSE()/ELSEIF() ... ENDIF() => ex: IF (UNIX)

2. WHILE() ... ENDWHILE()

3. FOREACH() ... ENDFOREACH()

CMake Modules: Special cmake file written for the purpose of finding a certain piece of software and to set it's libraries, include files and definitions into appropriate variables so that they can be used in the build process of another project. (e.g. FindJava.cmake, FindGSL.cmake)

 These modules are in dir where CMake is installed. On my system, cmake is installed in /usr/local/ dir, just like anyother pkg installed.
dir: /usr/local/share/cmake-3.14/Modules/ => there are hundreds of modules here as *.cmake and *.in
 
Ex: FindGSL.cmake ( /usr/local/share/cmake-3.14/Modules/ FindGSL.cmake ) => Finds native GSL includes and libraries, and sets appr var. The file reads like this:
 
include ( ...file.cmake) => includes a cmake file to handle args passed to the module
if (GSL_ROOT_DIR defined, use it),
else use PkgConfig module => find_package(PkgConfig), pkg_check_modules( GSL QUIET gsl ))
set GSL_INCLUDE_DIR, GSL_LIBRARY, GSL_CBLAS_LIBRARY, GSL_INCLUDE_DIRS, GSL_LIBRARIES, (debug versions too)
If we didn't use PkgConfig, try to find the version via gsl-config or by reading gsl_version.h. set GSL_VERSION
Now, handle the QUIETLY and REQUIRED arguments and set GSL_FOUND to TRUE if all listed variables (GSL_INCLUDE_DIR, GSL_LIBRARY, GSL_CBLAS_LIBRARY, GSL_VERSION ) are TRUE

On running, FindGSL module will set the following variables in your project (other modules may set other similar var, i.e OpenCV will set similar var, but starting with prefix OpenCV, i.e: OpenCV_FOUND, etc):

GSL_FOUND          - True if GSL found on the local system
GSL_INCLUDE_DIRS   - Location of GSL header files. => all include dir. Here, it's same as GSL_INCLUDE_DIR
GSL_LIBRARIES      - The GSL libraries. => all library, here it's both of these: GSL_LIBRARY, GSL_CBLAS_LIBRARY
GSL_VERSION        - The version of the discovered GSL install.

Packages: Packages provide dependency information to CMake based buildsystems. Let's say we installed a package for OpenCV. This package might have been built with CMake or may been built with any other pkg building software as AutoTools, etc. If package was built with CMake, then , then it has "<name>Config.cmake" (i.e OpenCVConfig.cmake) file associated  with it. This is called as Config-file package. If CMake was not used to make OpenCV, then we need to write a new file named Find<name>.cmake file (i.e FindOpenCV.cmake), and put it in the directories listed in MAKE_MODULE_PATH. This is called as Find-module package. Now, in order to use this package in another project, we have to tell our new project, how to find this package. Packages are found with the find_package() macro. Both of this kind can be found using find_package() cmd.

1. Find-module Packages => These are the ones that have Find<name>.cmake file associated. This is the default package type that is assumed for any package (i.e CMake assumes that external package was not built with CMake). In order to use YARP package in new project called "hello", CMake searches the directories listed in CMAKE_MODULE_PATH for a file called Find<name>.cmake. If found, this macro is executed and it is responsible for finding the package.

FIND_PACKAGE(YARP) => searches for file named FindYARP.cmake in dir specified by CMAKE_MODULE_PATH 

2. Config-file Packages => These are the ones that were built with Cmake and have "<name>Config.cmake" or or <lower-case-name>-config.cmake files associated. This file is created in build dir selected by user, when CMake was used to build that package. In order to use YARP package in new project called "hello", we specify the location of this file by filling a cache entry called <name>_DIR (this entry is created by CMake automatically). If the file is found, it is processed to load the settings of the package (an error or a warning is displayed otherwise).

So, we set YARP_DIR to appr dir where this file is, and then use FIND_PACKAGE macro.

SET(YARP_DIR "$ENV{YARP_ROOT}") #We can also specify DIR directly as \usr\include instead of having env var
FIND_PACKAGE(YARP) => searches for file named YARPConfig.cmake or yarp-config.cmake in YARP_DIR specified above.

YARPConfig.cmake creates the entries YARP_LIBRARIES, YARP_INCLUDE_DIRS and YARP_DEFINES. To use these in a project, do:

  INCLUDE_DIRECTORIES(${YARP_INCLUDE_DIRS})
  ADD_DEFINITIONS(${YARP_DEFINES}) # optional
  ...
  TARGET_LINK_LIBRARIES(your_target_name ${YARP_LIBRARIES})

include:  include(file|module) => Load and run CMake code from a file or module. If a module is specified instead of a file, the file with name <modulename>.cmake is searched first in CMAKE_MODULE_PATH, then in the CMake module directory. 

ex: include(FindGSL) => searches for FindGSL.cmake and runs it.

 

 

1. find_module(Qt4 MODULE) =>

1. pkg_check_modules(<PREFIX> <MODULE>) invokes pkg-config, queries for <MODULE> and returns the result in variables which have names beginning with <PREFIX>_. Thus module is the name of the software you or pkg-config are looking for.

With "pkg-config --list-all"  (type this in build dir on linux shell cmd line) you get a list of all modules known by pkg-config. If your <MODULE> is not listed here, then  pkg_check_modules can never find it. If it does find it, use "pkg-config --modversion <MODULE>" to find what version of <MODULE> it's finding.

Ctest:

/usr/local/------

Now, to enable testing, we can add more lines in CMakeLists.txt :

cmake_minimum_required(VERSION 2.8 FATAL_ERROR)

project("My tutorial") 
enable_testing() #Enables testing for this CMake project. This should only be used in top level CMakeLists.txt. The main thing this does is enable the add_test() command.  
add_executable(hello hello.c)
add_test(HelloTest hello) => The enable_testing() function we added to our CMakeLists.txt adds the “HelloTest” target to our Makefile. Making the “HelloTest” target will run CTest which will, in turn, run all of our tests. In our case just the one.

 CPack:

-------

Design Reliability:

In chips, we have transistors and wires connecting them. They need to be functioning for 10 yrs or so. We run simulations to account for aging of transstors and wires.

BEOL=Back end of Line (metals and Vias),

FEOL = Front end of Line (transistors, everything below contact)

Transistor reliability: We measure many of these parameter affect for 10 yrs or 10K hrs of operation. FIT (failure in Time) rates are based on 10yrs of operation, so

  1. CHC: channel hot carrier: affects Transistor VT
  2. BTI: Bias Temperature instability: It degrades VT of Transistor during "ON" state at high temperature. It affects both NMOS and PMOS, though PMOS is affected more (reduces VT for NMOS, increases VT for PMOS => weaker tran).BTI happens when tran is ON, but starts fixing itself once tran is off.
    1. So, there are 2 BTI phase:
      1. Stress phase: here tran is ON, gate bias is max, causing max VT shift
      2. Relaxation phase: here tran is OFF, gate bias is min, reversing VT shift caused by stress phase. It's able to recover 20% of the VT shift caused by Stress phase. So, only 80% degradation of VT remains after Relaxation phase.
    2. Comparing BTI across tech:
      1. 180nm device => For PMOS which is contantly ON for 100K hrs @105C, with Vgs=1.8V, ΔVT = 10mv. 10mV is acceptable since there's already enough margin built into design.
      2. 10nm FinFet device => for NMOS, it's about 10mv, and PMOS is 50mv (assuming worst case 100% ON). At 1V applied across src/drn for 1nm device, Field is 1V/nm => 1KV/um, enough to break Si and SiO2 atoms. Better PBTI for finfet. NMOS BTI
    3. 2 kinds of BTI (-ve and +ve):
      1.  A. NBTI: -ve BTI, VT goes down. Here gate is at lower voltage than src, drn and subs. For tran to remain ON, this can only happen for PMOS. It is dominated by substrate interface traps. It's more severe than PBTI.
      2. B. PBTI: +ve BTI. VT goes up. Here gate is at higher voltage than src, drn and subs. For tran to remain ON, this can only happen for NMOS.  It is dominated by bulk HK trapped charges
  3.  HCI: same as CHC? FinFet HCI very challenging
  4. Self Heating: worse in Finfet as compared to planar, as fins surrounded by oxide.
  5. SEU : soft error vulnebility in srm cells, logic.
  6. ESD: ESD needed to protect gate oxide. In Finfet, Gate oxide breakdown is lower, and breakdown voltage of devices is also lower (=> FF more fragile). So, equiv ESD requires more area, bigger ESD devices.

 

Few more reliability notes:

 

  • Effective VT degradation = CHC + BTI
  • OverDrive capacity of Tran (i.e how much over voltage can they tolerate): For tran aging, stress voltages applied beyond rated voltages (1 V=rated voltage for 10nm NMOS/PMOS, stress voltage =1.2V), and higher temperatures. That's why thin oxide tran not used in IO. We make 2D plot of Voltage and temperature, and check the lifetime of tran, as well as VT shift of tran. We see that higher Voltage (30% higher than rated Voltage, so 1V rated tran stressed to 1.3V) and Higher Temp (> 50C), reduces lifetime of tran to couple of days. Very high voltages of >1.5V and Temp > 50C start causing VT shifts of 50mv or more (else it remains < 10mv). So, to assess Over Drive capability w/o getting the effect of VT shift, experiments limited to 30% voltage overdrive with Temp 50C.
  • HTOL stress (high temp operating life): High vol, high temp for 12-24 hrs gets same effect as aging.

 

 

Wire reliability:

Wires/Vias may suffer from reliability too. They may develop open/shorts due to multiple reasons.

  1. EM: lectro migration: Iavg/Irms/Ipeak calc.
  2. Self heating: happens for wires too.

 

Autotools:

We saw the make utility for building (compiling and generating executable) large programs. However, usually we write Makefile for compiiing programs for the system on which we are running make. i.e if I am working Linux OS, executable I generate is for Linux OS. However, that same executable will not work on Windows, and even on other flavors of Linux OS. This is because same compiler version, lib files, etc may not exist on Linux systems. So, we need different Makefile tailored for each platform. This will create too many Makefile each unique for each system.

To remedy this situation, we need to have a Makefile which will have #ifdef for different platform, and will generate executable differently for different platform (i.e if older version of gcc exists on some system, it may not support some flag. In that case, ifdef will help resolve such issues, and still let the Makefile build an executable). Having all these ifdef in Makefile and making sure that such pgm work on all arch is very tedious task. So, most of the software that is intended to be distributed for multiple platforms, use a tool called "Autotools". This tool automates the task of generating binary for different platforms.

Most of the Linux programs that we download, have their binary executable or source code. The programs which have their binary executable directly available, we do not need to do anything. We just run the executable and pgm starts running. However, programs which have their source code available, we usually need to run 3 steps as a end user to generate executable. This is called "GNU Build". Installer on your system unpacks the downloaded package (if tar.gz file, then use gunzip and tar, if .deb pkg, then use dpkg or apt, etc) and then runs these 3 steps:

  1. ./configure => analyzes your system to see what kind of programs and libraries you have, so it knows how to build the program best
  2. make => actual building is done using Makefile generated from above step (same way as we do using make with a Makefile)
  3. sudo make install => installs the pgm (puts pgm libs,binary,etc in appr dir with appr permissions). By default, it's put in /usr/local/ (bin in /usr/local/bin, lib in /usr/local/lib, etc)

These 3 steps are needed because over here Makefile gets generated differently for different platforms. ./configure generates a Makefile then make runs on this Makefile to generate executable, and then "install" puts generated executable in appropriate dir. We may never need to write a program that we distribute to other people, so you may wonder why learn Autotools. Reason is that most of the times we end up using these pgms, which requires us to run these steps. Having a brief understanding of this tool "Autotools" helps us when building (compiling and installing) 3rd party pgms on our system. Learning Autotools is a full time job in itself, so I'll just highlight few basic cmds with an example.

Full detailed tutorial for autotools here: https://www.lrde.epita.fr/~adl/autotools.html

Brief tutorial on this is: http://markuskimius.wikidot.com/programming:tut:autotools

Autotools is a collection of three tools:

  • Autoconf — This is used to generate the “configure” shell script. As I mentioned earlier, this is the script that analyzes your system at compile-time. For example, does your system use “cc” or “gcc” as the C compiler? Full Autoconf doc here: https://www.gnu.org/software/autoconf/manual/autoconf.html
  • Automake — This is used to generate Makefiles. It uses information provided by Autoconf. For example, if your system has “gcc”, it will use “gcc” in the Makefile. Or, if it finds “cc” instead, will use “cc” in the Makefile. Full automake docs here: https://www.gnu.org/software/automake/manual/automake.html
  • Libtool — This is used to create shared libraries, platform-independently. No need to know this as it's complicated topic for advanced users.

Autotools build process has some standard things in GNU build. Good to know these:

A. make options:

1. Std Makefile targets: make all, make install, make uninstall, make clean, make check=> To make targets when pkgs have been downloaded to your system

2. For making distribution: make dist (creates a tarball named *.tar.gz by collecting all src/other files, which is ready for distribution), make distcheck (to check the pkg for any errors/issues), make distclean.

3. staged installation. using DESTDIR, we can divert install step of "make install" to other dir than the ususal dir. Then we can choose and move files to whichever dir we want.

ex: make DESTDIR=~/scratch install

B. configure options: configure --help gives all options, few important ones are listed below.

1. Std Directory var: var=prefix (default is /usr/local). By chnaging value of this, we can put bin, lib, doc, etc in other dir.

ex: ./configure --prefix ~/user => puts bin "hello" in ~/usr/bin,/hello, etc.

2. Std configuration var: CC, CFLAGS, LDFLAGS, CPPFLAGS.

ex: ./configure CC=gcc3 .. => configure automatically chooses appr default values for these, but sometimes we may want to override defaults.

3. Parallel build tree: GNU build system has 2 trees: source tree and build tree. Source tree is the dir containing "configure" which has all src files. Build tree is dir where "./configure" is run creating object files and other intermediate files. Most of the times, we run "./configure" in same dir where configure is located, so source and build tree are same. But, if we want to keep our source files uncluttered from generated files, we can have build tree in separate by doing this:

ex: ~/.../top-dir-pkg (this is dir where you extracted files, and has configure script). "mkdir build", "cd build", run "../configure" and "make" in build dir. This keeps all generated files in build dir, keeping main source dir intact.

4. cross compilation: To generate binary for a different system that one where we are compiling the files. By default, binaries are generated for the same system, where we compile the files.

ex: ./configure --build=i68cpc --host=solaris => Here, build denotes our sytem, whereas host is the system for which we generate binaries. For binaries to get generated for host system, cross compiler has to exist on native system, else it will error out.

5. pgms can be renamed by using --program-prefix, --program-suffix (i.e instead of installing a pgm with name "tar", we can install it as "my-tartest, by using prefix=my-, suffix=test, to prevent overwriting "tar" that is already installed)

6.

Simple example of building a pkg: This ex shows how to build pkg like *.tar.gz from source files that can be distributed.

Autotools is installed by default. We can check version number of Autoconf/Automake by running autoconf/automake.

$ autoconf --version
autoconf (GNU Autoconf) 2.69

$ automake --version
automake (GNU automake) 1.13.4

$ autoreconf --version
autoreconf (GNU Autoconf) 2.69

ex1: write a C pgm (hello.c) that has gettimeofday function. We need to run autotools  in same dir as pgm hello.c. These are the steps for running autotools.  The whole goal of autotools is to generate 2 files: configure and Makefile.

0. write C pgm as below called hello.c:

#include <stdio.h>
#include <sys/time.h> // this is added on purpose, so that we can make this system dependent
 
int main(int argc, char* argv[])
{
   double sec;
   struct timeval tv;
 
   gettimeofday(&tv, NULL); // This function only exists in sys/time.h, so if that file doesn't exist, this will error out
   sec = tv.tv_sec;
   sec += tv.tv_usec / 1000000.0;
 
   printf("%f\n", sec);
 
   return 0;
}

1. autoconf + automake steps =>

  • Autoconf has a series of steps to generate configure script. configure is a very large bash script. Autoconf needs configure.ac as an i/p file to generate configure script.

    we can write configure.ac as below: AC_* and AM_* are M4 macros. AC_* are autoconf macros, while AM_* are automake macros.

    AC_PREREQ([2.69]) => Autoconf version number (optional)
    AC_INIT([hello-pkg], [1.0], [This email address is being protected from spambots. You need JavaScript enabled to view it.]) => pkg name, version, email addr for bug reporting
    AC_CONFIG_SRCDIR([hello.c])
    AM_INIT_AUTOMAKE([-Wall -Werror foreign]) => NOTE: this is automake macro (AM_*), not autoconf macro (AC_*). options inside are optional. We turn on all Warnings and report them as error by using -Wall and -Werror. -foreign allows us to proceed even w/o having files as README, AUTHORS, NEWS, etc. Else, automake will complain about these missing files and won't allow us to generate pkg. Also, if autoconf is run with this AM_* macro, it will error out as "undefined macro"
    AC_CONFIG_HEADERS([config.h]) => causes the configure script to create a config.h file gathering ‘#define’s defined by other macros in configure.ac. This config.h file can be included in hello.c file and then those defined strings in our program to make it portable.
    AC_PROG_CC => causes the configure script to search for a C compiler and define the variable CC with its name
    AC_CHECK_HEADERS([sys/time.h]) => checks for header files
    AC_CHECK_FUNCS([gettimeofday]) => checks for library func in src files
    AC_CONFIG_FILES([Makefile]) => list of all Makefiles that should be generated from Makefile.in file. If Makefile are in nested dir, provide all those here
    AC_OUTPUT => closing command that actually produces the part of the script in charge of creating the files registered with AC_CONFIG_HEADERS and AC_CONFIG_FILES (i.e config.h and Makefile)

  • Aytomake generates Makefile.in. It needs Makefile.am and configure.ac as i/p to generate Makefile.in.

    Makefile.am can be simple file specifying o/p binary file, and i/p C pgm as shown below.

    bin_PROGRAMS=hello
    hello_SOURCES=hello.c

2. autoreconf --install => With these 3 files above (hello.c, configure.ac, Makefile.am), we can now run autoconf on configure.ac to generate configure. Then we can run automake on Makefile.am and configure.ac to generate config.h.in and Makefile.in. But it will require lot of work to get it working. Autoreconf is a script that calls autoconf, automake, and a bunch of other commands in the right order. So, this is the preferred step instead of running autoconf and automake separately. This step creates configure, config.h.in, Makefile.in files. It also bunch of other files as install-sh, depcomp, missing, aclocal.m4 and dir autom4te.cache.

 3. configure => At his point build is complete. Steps 3 and 4 are what the user would run on any system that the package is downloaded to create executable. We run these steps here to check that everything runs OK. With 3 files (configure, config.h.in and Makefile.in) generated in step 2, running ./configure script (generated in step 2), creates Makefile (from Makefile.in) and config.h (from config.h.in). These 2 files have been created after probing the system, so these files are runnable on this system. There are also extra files created called config.status and config.log

4. make => running make generates executable hello (shows the actual steps). hello.c and hello will be the files generated by this step.

5. make install => We do not run this step as it will install binaries in appr dir, which we do not want on our system. Stpes 3,4,5 are run by folks downloading our pkg.

6. make distcheck => ceates final *.tar.gz distribution pkg as "hello-pkg-1.0.tar.gz"

Now that we have the final pkg, it can be dsitributed to anyone. However, our program is not yet portable for all systems, as there are function in our hello.c pgm that may not be present on some systems in the C library. config.h is the file that comes to our rescue here. It looks at all functions in pgm, and provides us with constants in form of "#define" that we can use to check if the system has that function in C library or not. For ex: looking in config.h, we see these lines:

/* Define to 1 if you have the `gettimeofday' function. */
#define HAVE_GETTIMEOFDAY 1

/* Define to 1 if you have the <sys/time.h> header file. */
#define HAVE_SYS_TIME_H 1

Now, in our C pgm, we can use these constants to check for the existence of these on that system. So, we modify our C pgm to make it portable.

Our modified C pgm looks like this:

#include <stdio.h>
#ifdef HAVE_SYS_TIME_H
#include <sys/time.h>
#else
#include <time.h>
#endif

int main(int argc, char* argv[])
{
   double sec;
 #ifdef HAVE_GETTIMEOFDAY
   struct timeval tv;
 
   gettimeofday(&tv, NULL);
   sec = tv.tv_sec;
   sec += tv.tv_usec / 1000000.0;
 #else
   sec = time(NULL);
#endif
   printf("%f\n", sec);
 
   return 0;
}

Now, since we modified our pgm, we need to rerun step 3 and 4 to make sure our pgm still compiles fine. Then we can run step 6 to create tar.gz that can be distributed.

----------

OPTIONAL: The steps below are alternate set of steps that are not recommended. But they are good to know, incase we do not want to run autoreconf, but instead plan to run autoconf and automake separately.

A. autoscan => generates configure.scan. It's a small file. It should look very similar to configure.ac file above. It has all autoconf macros  only(i.e AC_*). We will need to add automake macros to it (AM_*), Rename it as configure.ac to use it in flow above.

B. autoconf => uses configure.ac to generate configure. If needs config.h.in and Makefile.in . If we do not want to write config.h.in from scratch, we can use autoheader to generate config.h.in. Makefile.in contains very basic Makefile isntructions, which are used to generate Makefile.

C. autoheader => generates config.h.in. It just has few constants which are undefined.

D. automake =>  generates Makefile.in from Makefile.am.

E. aclocal => There will be lot of errors in automake step above. 1st set of errors will be Automake macros which aren't found in configure.ac. If we add these macros in configure.ac, then autoconf will freak out, since it doesn't know these macros. To remedy this, we provide defn of these macros in aclocal.m4. We run aclocal to creeate aclocal.m4 automatically with defn of all these macros in automake.

At his point, we have config.h.in and Makefile.in. So, configure script can run now, followed by make.

G. run configure: . ./configure => generates config.h from config.h.in, and Makefile from Makefile.in. config.h will look same as config.h.in, except that all constants are #define now. Makefile will look same as Makefile.in.

H. run make => Once Makefile is generated, we can run make. It uses Makefile to run 1st target "hello". make all => generates executable hello using rules in Makefile. Now we can run ./hello to get executable running.

 

GCC: GNU Compiler Collection

Before learning C or C++, we need to learn how to compile the C/C++ program. The program to compile C/C++ into machine code is call GCC. (GNU Compiler Collection). Very good pdf here (by Brian Gough) = https://tfetimes.com/wp-content/uploads/2015/09/An_Introduction_to_GCC-Brian_Gough.pdf

Installing GCC:

Check if gcc is installed by running "gcc -v" on your linux terminal.

gcc -v => shows "gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) " along with some other info.

GCC is installed by default on CentOS. However, on Linux Mint, you will get errors regarding various std lib not found, when trying to run GCC (even though gcc is installed). This is because not all the libs needed for gcc are installed. If you get error running gcc and gcc is already installed on your system, follow these steps ($ below represents the terminal prompt).

Debian based OS: (Linux Mint, Ubuntu etc): Run following 2 cmds:

$ sudo apt update => updates pkg data repository. Needed before you install anything

$ sudo apt install build-essential => build-essential package is a reference for all the packages needed to compile a Debian package. It generally includes the GCC/g++ compilers and libraries and some other utilities (as Make, etc).

Fedora based OS (RHEL, CentOS, etc): Run following 2 cmds: 

$ sudo yum makecache => makes sure that the yum cache is up to date with the latest metadata. (Not sure if we can use "sudo yum update" instead of this)

$sudo yum group install "Development Tools" => "Development tools" is a yum group which contains all pgms for compiling etc (gcc, cvs, rpm-build, etc). This installs all of those in 1 cmd.

 

Running "which gcc" shows that gcc is in path /usr/bin/gcc (binary file). Compiler "cc" used to be the default compiler in past, so usually there is soft link in /usr/bin/cc pointing to gcc, so that cc can be run as well.

We will explore gcc in more detail as we learn C and C++. Here are the basics with the help of a C/C++ pgm. C pgm need gcc to compile, while C++ require g++ compiler.

Compiling C pgm using gcc:

C pgm ex: write program hello.c as below

#include <stdio.h> => this file is in /usr/include/stdio.h

int main (void) {

  printf ("Hello, world!\n"); => printf is a function that is declared in stdio.h, so stdio.h had to be included. Only the declaration of function is done in stdio.h, actual body of function "printf" is itself stored in library /usr/lib/libc.a.

  return 0;

}

#include files:
-------------
2 versions of #include preprocessor directive. Full path, partial path or just name of file can be provided. If full path is provided, then 2 versions of #include have same effect, else they differ in how they search for the file.
1. #include <file_name> => system include. used for std header files. Here compiler searches for the file in std paths. Usually it's /usr/local/include (higher precedence) and /usr/include (lower precedence). We can provide full path of file here too, however that is not a good habit, as that file may not have same path on other systems, thereby making this pgm non portable. There is -I option that can be used for non-std path, which is discussed later.
2. #include "file_name" => user include. used for user defined header files. Here compiler first searches for the include file in the dir where your current source file resides. The current source file is the file that contains the directive #include "file_name". The compiler then searches for the include file according to the search order described above in version 1.

GCC options:

To compile pgm above, type:
gcc hello.c => This compiles hello.c pgm into an executable called a.out in same dir. Running ./a.out will print "Hello, world" on screen. # directive isntructs compiler to include stdio.h file at appropriate points. That is why we don't need to explicitly compile this file.

gcc -Wall -v hello.c -o hello => -o specifies that output executable file should be named hello instead of a.out. -Wall turns on all warnings (recommended to always use it). We can turn on specific warnings by using -Wcomment, -Wformat, etc (or even more warnings by using -W in addition to -Wall) .-v shows details about various paths, options used.

Producing machine language executable is a 2 step process, when multiple files are involved. First we create an compiled object file for each source file, and then a linker program (called ld but it's invoked automatically by gcc) links all these compiled object files to produce an executable a.out. An object file contains machine code where any references to the memory addresses of functions (or variables) in other files are left undefined.This allows source files to be compiled without direct reference to each other. The linker fills in these missing addresses when it produces the executable.

steps:

1. gcc -Wall -c main.c => If we use option -c, then instead of generating executable file, object file called main.o is genberated. Here object file with same name as source file is created by default (so main.c creates an object file main.o). Similarly, we create object files for all other files. When creating object files, compiler just notes any unresolved symbols and leaves the addr "blank" for that symbol/function.

2. gcc -Wall -c other.c => generates other.o

3. gcc main.o other.o -o hello => this step calls linker ld, which links all object files to create an executable. Now, ./hello can be run. Order is important here. Files are searched from left to right, so files which have functions that are called by other files should appear last. So, if main.c has a function my_func defined in other.c, then main.o should be put before main.o.

Instead of running the 3 cmds separately, we can also run it in 1 cmd as follows:

gcc main.c hello.c hello => produces executable hello

Linking with external libraries:

A library is a collection of precompiled object files which can be linked into programs. Libraries are typically stored in special archive files with the extension‘.a’, referred to as static libraries. They are created from object files with a separate tool, the GNU archiver ar, and used by the linker to resolve references to functions at compile-time. The standard system libraries are usually found in the directories ‘/usr/local/lib’ (higher precedence) and ‘/usr/lib’ (lower precedence). On 64 bit platforms, additional lib64 dir are also searched.

C std lib: /usr/include/stdio.h and few other *.h has all std header files (which have function declaration), while /usr/lib/libc.a is the C std lib which has all the functions defined in C std. We just include the header files in C pgm. Then the std C lib is linked by default for all C pgms.

C math lib: /usr/include/math.h has all std header files (which have function declaration for math functions as sqrt), while /usr/lib/libm.a is the C math lib which has all the math functions. This lib is not linked by default, even if we include math.h in the C pgm. Compiler option -lNAME (small letter "l" (as in love) with no space b/w l and NAME) will attempt to link object files with a library file ‘libNAME .a’ in the standard library directories. So, to link math lib, we should use "-lm" (that links libm.a from std dir which is /usr/lib/). To link more lib, we'll need -lNAME for each of them. Instead of -lm, we can also provide the full path of file as /usr/lib/libm.a on cmd line of gcc.

 The list of directories for header files is often referred to as the include path and the list of directories for libraries as the library search path or link path.

 When additional libraries are installed in other directories it is necessary to extend the search paths, in order for the libraries to be found.The compiler options ‘-I’ (capital I as in India)and ‘-L’ (captal L as in Love) add new directories to the beginning of the include path and library search path respectively.

ex: gcc -Wall -I/opt/gdbm/include -L/opt/gdbm/lib dbmain.c -lgdbm (here non std gdbm pkg is installed in /opt/gdbm. gdbm.h is in /opt/gdbm/include/gdbm.h, while libgdbm.a is in /opt/gdbm/lib/gdbm.a)

There are environment variables also which can be set instead of -I and -L options above.

1. include path: var C_INCLUDE_PATH (for C header files), CPLUS_INCLUDE_PATH (for C++ header file)

2. Static Lib search path: var LIBRARY_PATH

These var can be set on cmdline, or be put in .bashrc file, so that they take affect all the time.

ex: add these in .bashrc in home dir.

C_INCLUDE_PATH=.:/opt/gdbm-1.8.3/include:/net/include:$C_INCLUDE_PATH => adds current dir (due to . in front) and other paths to C_INCLUDE_PATH if it had any.

LIBRARY_PATH=.:/opt/gdbm-1.8.3/lib:/net/lib:$LIBRARY_PATH => adds current dir and other paths to LIBRARY_PATH if it had any.

export C_INCLUDE_PATH; export LIBRARY_PATH => export cmd is needed so that these var can be seen outside of current shell by other pgms as gcc.

So far, we have been dealing with static libraries. There is concept of shared libraries explained nicely in pdf book. Dynamic linking of these shared libraries is done at run time, so executable file (a.out) is smaller in size (as a.out doesn't contain full object file of function in .a file). Instead it keeps a small table that tells it where to get it from. OS takes care of this by loading a single copy of shared lib in dram memory, and providing a pointer to that shared lib whenever a.out requests access to shared lib. Instead of .a extension, they have .so extension, and reside in same dir where .a files reside. By default, .so files will be linked instead of .a files if .so files are present. If .so files are in non std path, then we either need to provide full path to .so file on cmd line, or need to add this 3rd var also:

3. Dynamic lib search path: var LD_LIBRARY_PATH

We can force compiler to do static linking only by using option -static.

C language standards:

original C language std are called ANSI/ISO C std (called c89 and c99). Then GNU added extensions to language called as GNU std (called as gnu89 and gnu99). By default, gcc compiles GNU C pgm. That means it uses gnu C lib (glibc). However, if we want strict ANSI/ISO C pgm, we can compile with -ansi or -std c99 option.

 Preprocessor:

# statements in C. #defiine and #ifdef ... #endif are used to compile only desired sections of C code. Instead of using #define in C pgm (which will require changing C pgm), we can define it on cmd line using -DNAME (i.e for #ifdef TEST ... #endif, we can do -DTEST which is equiv to #define TEST). To define value to var, we can do -DNUM=23 (equiv to #define NUM 23), or DMSG="My Hero", etc.

 Optimization level:

different opt levels are supported by gcc. -O0 is level 0 opt, and is the default. -O1, -O2 and -O3 refers to higher levels of code opt.

 Platform specific options:

GCC produces executable code which is compatible with all the processors in the x86 family by default if it's running on x86 system —going all the way back to the 386. However, it is also possible to compile for a specific processor to obtain better performance.

gcc -march=pentium4 => produces code that is tuned for pentium4, so may not work on all x86 processors. Better to not use this option, as it provides a little speed improvement. Similarly there are options for powerpc, sparc, dec alpha processors.

gcc -m32 generates 32 bit code on 64bit AMDx86-64 systems. Not using -m32 will produce 64 bit code by default.

other options:

gcc --help

gcc --version

gcc -v test.c => verbose compilation, shows exact seq of cmds used to compile and link. Shows full dir paths used to search header files and libs.

Compiling C++ pgm using g++:

C++ pgm ex: write program hello.cc as below

#include <iostream>

int main () {

   std::cout << "Hello, world!" << std::endl; //similar to printf func of C

   return 0;

}

compile: g++ -Wall hello.cc -o hello => here we used g++ for compiling C++ pgm. We could have used gcc too as it would compile all files ending in .cc, .C, .cxx or .cpp as C++ pgm.  The onky problem that may happen when using gcc to compile C++ files, is that the appropriate C++ lib may not get linked (*.o files produced by g++ can't be linked using gcc). It' always preferable to use g++ for compiling C++ pgm, and gcc for C pgm. g++ has exactly same options as gcc.

C++ std lib: The C++ standard library ‘libstdc++’ supplied with GCC provides a wide range of generic container classes such as lists and queues, in addition to generic algorithms such as sorting.

Compiler related tools:

1. GNU archiver : called as "ar", it combines a collection of object files into a single archive file, also known as a library.

ar cmd: ar cr libfn.a hello.o bye.o =>  creates a archive from 2 simple object files. cr=create and replace

ar t libfn.a => lists all object files in archive. Here it lists hello.o and bye.o

gcc -L. main.c -lfn -o main => This lib archive libfn.a can be used like any other static lib. -L. just adds . to lib search path (assuming we generated libfn.a in current dir)

2. grpof: gnu profiler for measuring performance of pgm.

3. gcov: gnu coverage tool analyzes coverage of pgm = how many times each line of pgm is run during execution

Compiler steps: Running gcc/g++ involves these 4 steps. These are all run behind the scenes when running gcc/g++, but can be run separately too.

1. preprocessing of macros: preprocessor expands all amcros and header files.

ex: gcc hello.c > hello.i => hello.i contains source code with all macros expanded

2. assembly code generation: assembly code is then generated. It still has call to extenal functions.

ex: gcc -S hello.i => hello.s is generated which has assembly code

3. assembler: converts assembly language into machine code and generate an object file. Addr of External functions still left undefined to be filled in by linker

ex: as hello.s -o hello.o

4. Linking: Any external functions from sytem or C run time lib (crt) are linked here.

ex: ld -dynamic-linker /usr/.../.so /../crt1.o hello.o ...  => All these object files linked together (with proper addr of func called)

ex: gcc hell.o -o hello => this gcc cmd invokes linker automatically when generating an executable from object files

Examining Compiled Files:

ex: file a.out => shows details of file a.out, whether it's ELF format, 32/64 bit, which processor it was compiled for (INTEL 80386, etc), dynamic/static link, and whether it contains a symbol table.

nm a.out => this shows location of all var and func used in exectable. T against a func name indicates func is defined in object file, while U indicates undefined (may be because it's going to be dynamically linked at run time, or we need to link that file having that func with this executable)

ldd a.out => This shows list of all shared lib, that are to be linked at runtime. It shows all dynamic lib (usually libc.so, libm.so), as well as dynamic loader lib (ld-linux.so)

 --------------------