linux executables and binaries
- Details
- Last Updated: Monday, 21 January 2019 19:05
- Published: Tuesday, 16 October 2018 17:02
- Hits: 1031
unix executables = binaries and scripts:
There are two kinds of executable Unix programs: binaries and scripts. Any "executable" file is recognized by "x" permission set on file. "execute" permission tells the kernel that it's executable file. Whenever you type name of an executable file on command line (i.e emacs or ny other application), the kernel calls exec SYSTEM CALL. The details of the whole process is explained here on this link: https://stackoverflow.com/questions/8352535/how-does-kernel-get-an-executable-binary-file-running-under-linux. In short, the kernel checks first few bytes of the file (called as magic number) to check whether it's a binary file or a script. Most of the executable files are binary files, so kernel loads them in memory and processor runs them directly. For ex: vi, soffice, etc. Also, shell program like bash, csh etc are also binary executables that are run similar way. Many of these binary executable programs have optional arguments that provide the names of files they work on. For ex: vi text.c. Here vi is binary executable, which has a argument as text.c. So, "vi" binary executable program works on test.c, which is a plain text file. This test.c file doesn't need to be executable, as vi program processes this file. The processor never runs test.c as it has no binary machine language code. However for shell scripts, there is a different rule. For ex: csh test.csh. Here csh is binary executable that works on test.csh. However test.csh is required to be executable, since it can change anything on machine (since it has access to unix system commands). May be it's this security reason that Linux forces these shell scripts to be executable before you can run them using shell binary executable as bash, csh, etc. It doesn't force any other kind of ASCII text files to be set to executable.
Now, whenever you provide a file to run, and if it's executable, then kernel runs it in steps shown on link above. To find out whether it's binary or script executable file or some other text file, unix uses magic number concept. Any file can have magic number as first few bytes of file. This tells it what program to use to run this file, when the name of program to run this file is not provided. The magic number is a binary bit pattern, but it may happen to correspond to printable ASCII characters. This allows magic numbers to be used in text files. For example, the magic number for PostScript files is 0x25 0x21, which is %!
, and the magic number for executable script files is #!
. Binary programs run on hardware directly, while scripts need a program or interpreter to run them. When we generate a binary executable a.out for a C pgm, we get a binary that has first few bytes as the magic number, then next few as some other header info, and after that comes the real machine language instructions (as MOV, LD, etc for x86 processor). That is why, binaries generated for each OS differ from each other, and binary for Linux will not run on windows, even though the underlying hardware processor is the same, and the generated machine code is also the same. The format of binary executable in Unix is called ELF format.
ELF executables (ELF stands for the Executable and Linkable Format) start with a 7F byte and then ASCII letters “ELF”. (That is why when we run "hexdump a.out" on Linux, we see first 4 bytes as "0x7f 0x45 0x4C 0x46"). Scripts start with hex code 0x23 0x21, which in ASCII code is #!. a shebang line that begins with ASCII characters “#!” and then a path to an interpreter is given, so that Linux knows that it is e.g. a Perl program or a shell script - and if a shell script then which shell should be used to interpret it. The magic number concept is used in Unix to type or identify more than just executable programs. For example, the two byte magic number 0x1f 0x8b identifies a particular species of compressed file (GNU gzip
files).
Once the kernel sees "x" set on the file, it will check to see if the current user has the right permission to execute it. If so, it checks first few bytes. If first few bytes do not match any magic number, it will run the file using current shell as interpreter. If first few bytes do match the magic number, it executes the program accordingly. It calls the handler in exec process. For binaries, it executes it directly, while for scripts, it executes it with interpreter name provided. If no interpretor name provided, it will run using interpretor name following magic number in that script file. Note that the file needs to have read permission set too, since the interpretor will need to read the file when it tries to run it.
If you try to run a script with no execute permission, kernel generates an error and doesn't allow you to run the executable script. Extension in the file name has no meaning in Linux, it's for user readability only. So, tests.tcl doesn't mean anything to Linux kernel. It just sees it as a long file name. The magic number in tests.tcl tells it that it's a tcl file. If you open a "text.xls" file without providing the program name as "soffice test.xls", then magic number in test.xls is used to figure out which program to use. NOTE that test.xls is not a executable file (it's a plain read/write ASCII file), but it still can have magic number. That magic number will be ignored by preprocessor in soffice program, but may be used by kernel. That is why magic numbers in Unix files have first byte as comment character for that particular program (i.e for csh scripts, # is used as first byte for magic number so that it can be seen as a comment by csh interpretor, so that behaviour of test.csh remains same irrespective of whether it's invoked with an interpretor name or without an interpreter name.
So,in summary 3 ways to run executables:
1. a.out => kernel sees it as binary executable, and knows how to run it. No extra program needed to run it.
2. csh test => csh is seen as binary executable, and "test" is the name of csh file provided as an argument. Magic number in test, even if provided, is not used for anything. So if x is set on "test", then it's run using csh interpreter.
3. test => checks magic number in file test. If no magic number found, and if it's executable, then runs it using current shell as interpreter. If magic number found, and if it's executable, then uses path of interpreter provided in the file to run it. If the file is set to non executable, then linux desktop manager/environment decides which program to use to open this file.
-------------------