gcc - GNU compiler collection
- Details
- Last Updated: Friday, 16 October 2020 00:37
- Published: Sunday, 10 February 2019 14:00
- Hits: 915
GCC: GNU Compiler Collection
Before learning C or C++, we need to learn how to compile the C/C++ program. The program to compile C/C++ into machine code is call GCC. (GNU Compiler Collection). Very good pdf here (by Brian Gough) = https://tfetimes.com/wp-content/uploads/2015/09/An_Introduction_to_GCC-Brian_Gough.pdf
Installing GCC:
Check if gcc is installed by running "gcc -v" on your linux terminal.
gcc -v => shows "gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) " along with some other info.
GCC is installed by default on CentOS. However, on Linux Mint, you will get errors regarding various std lib not found, when trying to run GCC (even though gcc is installed). This is because not all the libs needed for gcc are installed. If you get error running gcc and gcc is already installed on your system, follow these steps ($ below represents the terminal prompt).
Debian based OS: (Linux Mint, Ubuntu etc): Run following 2 cmds:
$ sudo apt update => updates pkg data repository. Needed before you install anything
$ sudo apt install build-essential => build-essential
package is a reference for all the packages needed to compile a Debian package. It generally includes the GCC/g++ compilers and libraries and some other utilities (as Make, etc).
Fedora based OS (RHEL, CentOS, etc): Run following 2 cmds:
$ sudo yum makecache => makes sure that the yum cache is up to date with the latest metadata. (Not sure if we can use "sudo yum update" instead of this)
$sudo yum group install "Development Tools" => "Development tools" is a yum group which contains all pgms for compiling etc (gcc, cvs, rpm-build, etc). This installs all of those in 1 cmd.
Running "which gcc" shows that gcc is in path /usr/bin/gcc (binary file). Compiler "cc" used to be the default compiler in past, so usually there is soft link in /usr/bin/cc pointing to gcc, so that cc can be run as well.
We will explore gcc in more detail as we learn C and C++. Here are the basics with the help of a C/C++ pgm. C pgm need gcc to compile, while C++ require g++ compiler.
Compiling C pgm using gcc:
C pgm ex: write program hello.c as below
#include <stdio.h> => this file is in /usr/include/stdio.h
int main (void) {
printf ("Hello, world!\n"); => printf is a function that is declared in stdio.h, so stdio.h had to be included. Only the declaration of function is done in stdio.h, actual body of function "printf" is itself stored in library /usr/lib/libc.a.
return 0;
}
#include files:
-------------
2 versions of #include preprocessor directive. Full path, partial path or just name of file can be provided. If full path is provided, then 2 versions of #include have same effect, else they differ in how they search for the file.
1. #include <file_name> => system include. used for std header files. Here compiler searches for the file in std paths. Usually it's /usr/local/include (higher precedence) and /usr/include (lower precedence). We can provide full path of file here too, however that is not a good habit, as that file may not have same path on other systems, thereby making this pgm non portable. There is -I option that can be used for non-std path, which is discussed later.
2. #include "file_name" => user include. used for user defined header files. Here compiler first searches for the include file in the dir where your current source file resides. The current source file is the file that contains the directive #include "file_name". The compiler then searches for the include file according to the search order described above in version 1.
GCC options:
To compile pgm above, type:
gcc hello.c => This compiles hello.c pgm into an executable called a.out in same dir. Running ./a.out will print "Hello, world" on screen. # directive isntructs compiler to include stdio.h file at appropriate points. That is why we don't need to explicitly compile this file.
gcc -Wall -v hello.c -o hello => -o specifies that output executable file should be named hello instead of a.out. -Wall turns on all warnings (recommended to always use it). We can turn on specific warnings by using -Wcomment, -Wformat, etc (or even more warnings by using -W in addition to -Wall) .-v shows details about various paths, options used.
Producing machine language executable is a 2 step process, when multiple files are involved. First we create an compiled object file for each source file, and then a linker program (called ld but it's invoked automatically by gcc) links all these compiled object files to produce an executable a.out. An object file contains machine code where any references to the memory addresses of functions (or variables) in other files are left undefined.This allows source files to be compiled without direct reference to each other. The linker fills in these missing addresses when it produces the executable.
steps:
1. gcc -Wall -c main.c => If we use option -c, then instead of generating executable file, object file called main.o is genberated. Here object file with same name as source file is created by default (so main.c creates an object file main.o). Similarly, we create object files for all other files. When creating object files, compiler just notes any unresolved symbols and leaves the addr "blank" for that symbol/function.
2. gcc -Wall -c other.c => generates other.o
3. gcc main.o other.o -o hello => this step calls linker ld, which links all object files to create an executable. Now, ./hello can be run. Order is important here. Files are searched from left to right, so files which have functions that are called by other files should appear last. So, if main.c has a function my_func defined in other.c, then main.o should be put before main.o.
Instead of running the 3 cmds separately, we can also run it in 1 cmd as follows:
gcc main.c hello.c hello => produces executable hello
Linking with external libraries:
A library is a collection of precompiled object files which can be linked into programs. Libraries are typically stored in special archive files with the extension‘.a’, referred to as static libraries. They are created from object files with a separate tool, the GNU archiver ar, and used by the linker to resolve references to functions at compile-time. The standard system libraries are usually found in the directories ‘/usr/local/lib’ (higher precedence) and ‘/usr/lib’ (lower precedence). On 64 bit platforms, additional lib64 dir are also searched.
C std lib: /usr/include/stdio.h and few other *.h has all std header files (which have function declaration), while /usr/lib/libc.a is the C std lib which has all the functions defined in C std. We just include the header files in C pgm. Then the std C lib is linked by default for all C pgms.
C math lib: /usr/include/math.h has all std header files (which have function declaration for math functions as sqrt), while /usr/lib/libm.a is the C math lib which has all the math functions. This lib is not linked by default, even if we include math.h in the C pgm. Compiler option -lNAME (small letter "l" (as in love) with no space b/w l and NAME) will attempt to link object files with a library file ‘libNAME .a’ in the standard library directories. So, to link math lib, we should use "-lm" (that links libm.a from std dir which is /usr/lib/). To link more lib, we'll need -lNAME for each of them. Instead of -lm, we can also provide the full path of file as /usr/lib/libm.a on cmd line of gcc.
The list of directories for header files is often referred to as the include path and the list of directories for libraries as the library search path or link path.
When additional libraries are installed in other directories it is necessary to extend the search paths, in order for the libraries to be found.The compiler options ‘-I’ (capital I as in India)and ‘-L’ (captal L as in Love) add new directories to the beginning of the include path and library search path respectively.
ex: gcc -Wall -I/opt/gdbm/include -L/opt/gdbm/lib dbmain.c -lgdbm (here non std gdbm pkg is installed in /opt/gdbm. gdbm.h is in /opt/gdbm/include/gdbm.h, while libgdbm.a is in /opt/gdbm/lib/gdbm.a)
There are environment variables also which can be set instead of -I and -L options above.
1. include path: var C_INCLUDE_PATH (for C header files), CPLUS_INCLUDE_PATH (for C++ header file)
2. Static Lib search path: var LIBRARY_PATH
These var can be set on cmdline, or be put in .bashrc file, so that they take affect all the time.
ex: add these in .bashrc in home dir.
C_INCLUDE_PATH=.:/opt/gdbm-1.8.3/include:/net/include:$C_INCLUDE_PATH => adds current dir (due to . in front) and other paths to C_INCLUDE_PATH if it had any.
LIBRARY_PATH=.:/opt/gdbm-1.8.3/lib:/net/lib:$LIBRARY_PATH => adds current dir and other paths to LIBRARY_PATH if it had any.
export C_INCLUDE_PATH; export LIBRARY_PATH => export cmd is needed so that these var can be seen outside of current shell by other pgms as gcc.
So far, we have been dealing with static libraries. There is concept of shared libraries explained nicely in pdf book. Dynamic linking of these shared libraries is done at run time, so executable file (a.out) is smaller in size (as a.out doesn't contain full object file of function in .a file). Instead it keeps a small table that tells it where to get it from. OS takes care of this by loading a single copy of shared lib in dram memory, and providing a pointer to that shared lib whenever a.out requests access to shared lib. Instead of .a extension, they have .so extension, and reside in same dir where .a files reside. By default, .so files will be linked instead of .a files if .so files are present. If .so files are in non std path, then we either need to provide full path to .so file on cmd line, or need to add this 3rd var also:
3. Dynamic lib search path: var LD_LIBRARY_PATH
We can force compiler to do static linking only by using option -static.
C language standards:
original C language std are called ANSI/ISO C std (called c89 and c99). Then GNU added extensions to language called as GNU std (called as gnu89 and gnu99). By default, gcc compiles GNU C pgm. That means it uses gnu C lib (glibc). However, if we want strict ANSI/ISO C pgm, we can compile with -ansi or -std c99 option.
Preprocessor:
# statements in C. #defiine and #ifdef ... #endif are used to compile only desired sections of C code. Instead of using #define in C pgm (which will require changing C pgm), we can define it on cmd line using -DNAME (i.e for #ifdef TEST ... #endif, we can do -DTEST which is equiv to #define TEST). To define value to var, we can do -DNUM=23 (equiv to #define NUM 23), or DMSG="My Hero", etc.
Optimization level:
different opt levels are supported by gcc. -O0 is level 0 opt, and is the default. -O1, -O2 and -O3 refers to higher levels of code opt.
Platform specific options:
GCC produces executable code which is compatible with all the processors in the x86 family by default if it's running on x86 system —going all the way back to the 386. However, it is also possible to compile for a specific processor to obtain better performance.
gcc -march=pentium4 => produces code that is tuned for pentium4, so may not work on all x86 processors. Better to not use this option, as it provides a little speed improvement. Similarly there are options for powerpc, sparc, dec alpha processors.
gcc -m32 generates 32 bit code on 64bit AMDx86-64 systems. Not using -m32 will produce 64 bit code by default.
other options:
gcc --help
gcc --version
gcc -v test.c => verbose compilation, shows exact seq of cmds used to compile and link. Shows full dir paths used to search header files and libs.
Compiling C++ pgm using g++:
C++ pgm ex: write program hello.cc as below
#include <iostream>
int main () {
std::cout << "Hello, world!" << std::endl; //similar to printf func of C
return 0;
}
compile: g++ -Wall hello.cc -o hello => here we used g++ for compiling C++ pgm. We could have used gcc too as it would compile all files ending in .cc, .C, .cxx or .cpp as C++ pgm. The onky problem that may happen when using gcc to compile C++ files, is that the appropriate C++ lib may not get linked (*.o files produced by g++ can't be linked using gcc). It' always preferable to use g++ for compiling C++ pgm, and gcc for C pgm. g++ has exactly same options as gcc.
C++ std lib: The C++ standard library ‘libstdc++’ supplied with GCC provides a wide range of generic container classes such as lists and queues, in addition to generic algorithms such as sorting.
Compiler related tools:
1. GNU archiver : called as "ar", it combines a collection of object files into a single archive file, also known as a library.
ar cmd: ar cr libfn.a hello.o bye.o => creates a archive from 2 simple object files. cr=create and replace
ar t libfn.a => lists all object files in archive. Here it lists hello.o and bye.o
gcc -L. main.c -lfn -o main => This lib archive libfn.a can be used like any other static lib. -L. just adds . to lib search path (assuming we generated libfn.a in current dir)
2. grpof: gnu profiler for measuring performance of pgm.
3. gcov: gnu coverage tool analyzes coverage of pgm = how many times each line of pgm is run during execution
Compiler steps: Running gcc/g++ involves these 4 steps. These are all run behind the scenes when running gcc/g++, but can be run separately too.
1. preprocessing of macros: preprocessor expands all amcros and header files.
ex: gcc hello.c > hello.i => hello.i contains source code with all macros expanded
2. assembly code generation: assembly code is then generated. It still has call to extenal functions.
ex: gcc -S hello.i => hello.s is generated which has assembly code
3. assembler: converts assembly language into machine code and generate an object file. Addr of External functions still left undefined to be filled in by linker
ex: as hello.s -o hello.o
4. Linking: Any external functions from sytem or C run time lib (crt) are linked here.
ex: ld -dynamic-linker /usr/.../.so /../crt1.o hello.o ... => All these object files linked together (with proper addr of func called)
ex: gcc hell.o -o hello => this gcc cmd invokes linker automatically when generating an executable from object files
Examining Compiled Files:
ex: file a.out => shows details of file a.out, whether it's ELF format, 32/64 bit, which processor it was compiled for (INTEL 80386, etc), dynamic/static link, and whether it contains a symbol table.
nm a.out => this shows location of all var and func used in exectable. T against a func name indicates func is defined in object file, while U indicates undefined (may be because it's going to be dynamically linked at run time, or we need to link that file having that func with this executable)
ldd a.out => This shows list of all shared lib, that are to be linked at runtime. It shows all dynamic lib (usually libc.so, libm.so), as well as dynamic loader lib (ld-linux.so)
--------------------