linux startup

startup process:

After powerup, m/c goes thru different steps before the login prompt appears. A lot of useful info here: https://www.thegeekstuff.com/2011/02/linux-boot-process

and here: https://www.tldp.org/LDP/sag/html/boot-process.html

and here: https://www.linuxnix.com/linux-booting-process-explained/

and here: https://wiki.archlinux.org/index.php/Arch_boot_process

and here: http://tldp.org/LDP/khg/HyperNews/get/tour/tour.html

In short these are the steps:


1. Powerup: As soon as the power button is pressed, the system powers up. CPU is released off reset. 80x86 CPU puts addr 0xFFFF0 on Addr lines, and this is the very first addr that the CPU will read from. This addr (historically) happens to be the addr on EEPROM (a separate chip on motherboard, that just stores read only memory. EE stands for Electrically Erasable and Programmable. We use EEPROM instead of ROM so that manufacturers can push out updates easily). This 0xFFFF0 addr contains another addr to which the CPU jumps to (known as indirection). EEPROM stores piece of code starting from that addr. That code is known as BIOS/UEFI or boot firmware code (UEFI is referred to as BIOS or UEFI BIOS, even though UEFI is different than BIOS). There are still 12 more bytes at addr 0xFFFF4 and beyond. We may store soft power up (i.e reset or warm boot) addr at that location, and likewise at other top addr. That way, we may jump to some other addr in ROM code depending on the powerup scenario. 

A. BIOS/UEFI: This is the code stored in EEPROM. On older system, BIOS used to be stored here. On newer computers, UEFI is stored instead of BIOS. UEFI is just different code than BIOS.

BIOS/UEFI consist of 2 parts. The 1st part is stored in EEPROM. This part does similar things for BIOS/UEFI in initializing the system.The 2nd part is stored on Hard disk or some other disk. The 2nd part is called the 1st stage boot loader.

For BIOS systems, BIOS code first does POST (power on self test). It then initializes the remaining hardware, detects the connected peripherals (mouse, keyboard, pendrive etc.) and checks if all connected devices are healthy. You might remember it as a 'beep' that desktops used to make after POST is successful. Finally, the firmware code cycles through all storage devices and looks for a boot-loader. These storage device may be floppy disk, hard drive, usb stick, etc. It will then choose a disk drive and read it's very first sector, which is addr 0x7C00. This is called boot sector on floppy disks. On hard disk, it's called master boot record (MBR), since HD can have several partitions, each with it's own boot sector. MBR is the 1st 512 bytes on the HD. The code in MBR then transfers control to other code, which does the main booting.

For UEFI systems, UEFI code first initializes hardware to the point it can detect keyboard keys pressed, display something on screen, and access the EFI partitions of detected drives. EFI partition is a separate partition for UEFI booting. So, the 1st part of UEFI code stored on EEPROM also contains a file system driver to read that EFI partition as only then it can actually load EFI extensions from there. EFI partition is like a 1st stage boot loader.

These boot loaders (either on MBR or EFI partition) are called 1st stage boot loader.

Difference b/w BIOS and UEFI: This link explains it nicely: https://www.freecodecamp.org/news/uefi-vs-bios/

BIOS: BIOS stands for Basic Input Output system. In short, BIOS is stored in EEPROM and all of the BIOS related code is in this memory. It runs in 16 bit mode, so that's why it doesn't any graphic support, and we have to use use arrow and enter keys to navigate thru the options. It is the screen that shows up when you press F1, F2 or some other key (depending on your laptop manufacturer) on computer startup.

UEFI: UEFI stands for Unified Extensible Firmware Interface. It does the same job as a BIOS, but with one basic difference: it stores all data about initialization and startup in an .efi file, instead of storing it on the firmware. This .efi file is stored on a special partition called EFI System Partition (ESP) on the hard disk. This ESP partition also contains the bootloader. UEFI was designed to overcome many limitations of the old BIOS such as drive size, slow boot time, etc. UEFI is fast and it runs in 32/64 bit mode (that's why it has GUI). All modern systems support UEFI, so if you bought a laptop in the last 5 years, you most certainly have UEFI on it.

Secure boot: One of the most important feature added by UEFI is a security feature called "Secure Boot", which prevents the computer from booting from unauthorized/unsigned applications. This helps in preventing rootkits, but also hampers dual-booting, as it treats other OS as unsigned applications. Currently, only Windows and Ubuntu are signed OS. This is a a big pain when installing any Linux OS. One way to get around is to disable "Secure Boot" feature (see in Linux Installation section).

B. MBR/GPT: After code in EEPROM runs, control is transferred to MBR (for BIOS startup) or to EFI partition (for UEFI). These usually reside on hard disk (see in Filesystems section for details on hard dik, partitions, etc)

MBR (Master Boot Record): It is the first 512 bytes on hard drive (first 446 bytes is code, remaining 64 bytes is partition info. read more about MBR in File systems section). First BIOS executes code in MBR, then identifies bootable partition (on that hard disk), reads boot sector of that partition, and then starts the code in that boot sector. This code (in boot sector of that partition) reads in the kernel from the partition and starts it. Ideally, the kernel image is stored in seq addr, and the code can be read sequentially. However, this would require a separate partition for kernel image, which is not practical. Instead the kernel image is stored in a File system (FS), so the code will need to figure out the sectors where the FS has stored this kernel image. In a FS, image may be scattered on different physical sectors, so there's no requirement of kernel image to be in consecutive locations. The most common way to achieve this is using a bootLoader as GRUB or LILO.

GPT (GUID Partition Table): MBR is only 512 bytes which is too small. So, a newer scheme called GPT was introduced. It is a standard for the layout of the partition table on a physical hard disk, using globally unique identifiers (GUID). Usually, MBR and BIOS (MBR + BIOS), and GPT and UEFI (GPT + UEFI) go hand in hand. This is compulsory for some systems (eg Windows), while optional for others (eg Linux). UEFI supports the traditional MBR too, while BIOS may support modern GPT too.

3. GRUB: GRUB is a bootloader pgm used in Linux world. grub is now replaced by grub2 as of 2020 in all new linux distro. grub2 was written from scratch and is in bash. grub files are not found anymore on any linux distro, so we'll be talking about grub2 (even though we may write grub, we mean grub2 unless mentioned specifically)

For systems still using classical BIOS on powerup,  Grub replaces the code in MBR (1st 512 bytes) with it's own code. This is done by any Linux installer without you doing anything. So, when system is powered up in BIOS mode, and code in MBR is executed, it's the GRUB bootloader code that gets executed (instead of default code in MBR that was there before grub replaced it, which was windows code). For systems using UEFI, grub is copied on to EFI partition (again it's done by any Linux installer without you doing anything.). Grub is read directly from an EFI System Partition. GRUB has the advantage of being able to read ext2, ext3, and ext4 partitions and load its configuration file. Grub uses hd for hard disk, instead of sd used by Linux File System.

 
GRUB for BIOS:

Grub for BIOS is slightly different than Grub for UEFI. Grub for BIOS is not really used in any system after 2020, so you can skip the section below.

Grub for BIOS intro here: https://www.dedoimedo.com/computers/grub.html

Grub (for BIOS) works in 2 stages.This 2 stage approach allows large code in GRUB to be executed w/o the limitation of 512 bytes in MBR.

  • 1st stage: This is located in the MBR and mainly points to Stage 2, since the MBR is too small to contain all of the needed data.  
  • 2nd stage: This points to its configuration file, which contains all of the complex user interface and options we are normally familiar with when talking about GRUB. Stage 2 can be located anywhere on the disk. If Stage 2 cannot find its configuration table, GRUB will cease the boot sequence and present the user with a command line for manual configuration.  

If we do auto install of any linux distro, GRUB gets installed by default. To manually install it, we need to first log into a linux OS, and from within there, execute grub cmd to get a grub prompt. Next we need to place GRUB Stage 1 in the first sector of the hard disk (MBR or Partition Table). For this, we first find all possible grub on the system (there may be more than 1 grub if more than 1linux distro is installed). Then we choose the one which we want to be copied to MBR.

  • grub> find /boot/grub/stage1 => this will possibly return (hd0,1) for SUSE, (hd1,2) for Ubuntu, etc (assuming system has multiple linux OS installed). That means it found stage 1 grub in these partitions.
  • grub> root (hd1,2) => This is saying that choose grub from disk1 (2nd disk) and partition 2 (3rd partition in 2nd disk) to copy to MBR.
  • grub> setup (hd0) => Write above grub stage 1 from (hd1,2) to hd0 (disk 1 MBR)
  • grub> quit

Instead of above 4 step process, we can just use single cmd shown stating where to cp grub stage1 to => grub-install /dev/hd0 => Here we cp grub from one of the available grub stage1 to hd0 MBR

GRUB files are in 2 main dir => /boot/grub and /usr/lib/grub.

  • /boot/grub/ => GRUB cfg (or menu) is located on the root partition. For legacy grub, we have menu.lst file, while for grub2, we have grub.cfg.
    •  Legacy grub has /boot/grub/menu.lst. menu.lst lists all partitions and marks the active one. It lists partition one by one, Format is:
      • 1st partiton is windows OS and is marked as active. Since this OS is not understood by GRUB, chainloader is used.
        • title Windows 95/98/NT/2000 => title says what OS it is (for human understanding)
        • rootnoverify (hd0,0) => This specifies the root (/) partition of Windows. In this instance, the boot image is on (hd0,0) => hd0 means 1st hard disk, and later 0 means 1st partition on that disk. rootnoverify implies GRUB cannot understand Windows OS, i.e. no multi-boot compliance. The job of mounting the boot image is left to chainloader (see below). 
        • makeactive => this sets the active partition to this partiton, i.e (hd0,0)
        • chainloader +1 => This feature is used for OS such as Windows that cannot be booted directly. They are booted by the method of chainloading (GRUB passes the control of the boot sequence to another bootloader, located on the device to which the menu entry points).
      • 2nd partition is Linux OS
        • title Linux => implies it's Linux OS
        • root (hd0,1) => root specifies where the root (/) partition is. Here / is on (hd0,1), i.e hard disk 0, partition 1.
        • kernel /vmlinuz root=/dev/hda3 ro
    • grub2 has /boot/grub/grub.cfg. This is autogenerated at install via grub-mkconfig using templates from /etc/grub.d and settings from /etc/default/grub
      • /etc/grub.d/ => It has separate files for each OS entry in grub menu (options that we see to choose the OS when grub is loaded), i.e 00_header, 10_linux, 20_memtestx86+,40_custom, etc. 40_custom
  • /usr/lib/grub/ => It has stage1 and stage2 files under /usr/lib/grub/i386-pc/. In grub2, you see *.mod files here, as they are all modules for various things as grub,  boot, sleep, etc.

GRUB for EFI:

GRUB cfg file for EFI are in /boot/efi/EFI/<linux_OS_name>/grub.conf

This grub.cfg is a very small wrapper which takes us back to /boot/grub/grub.cfg.

If we have multiple Linux Os installed, then every time we install a new OS, it's GRUB will overwrite the prior GRUB in MBR/GPT. So, we won't be able to access other Linux OS, as the latest Linux OS only sees itself and Windows (assuming Windows is there). To prevent this, we don't allow later Linux OS to install GRUB (don't do auto install Linux OS, but instead choose "manual installation" and then turn off "grub installation"). This way, we leave the GRUB untouched from prior Linux OS installation.

User Selection for OS:

Once the second stage boot loader (from chosen partition in BIOS or EFI partition in UEFI) is in memory, it presents the user with a graphical screen showing the different operating systems or kernels it has been configured to boot (when you update the kernel, the boot loader configuration file is updated automatically). On this screen a user can use the arrow keys to choose which operating system or kernel they wish to boot and press Enter. If no key is pressed, the boot loader loads the default selection after a configurable period of time has passed. Depending on what OS user chose to load, it loads Windows or Linux or any other OS listed in the menu. Assuming Linux is selected, it locates the corresponding kernel binary in the /boot/ directory. The kernel binary is named using the following format — /boot/vmlinuz-<kernel-version> file (where <kernel-version> corresponds to the kernel version specified in the boot loader's settings). On Ubuntu, file is /boot/vmlinuz-3.10.0* (single file, size~7MB, there is also a rescue copy of this with same size),
 
The boot loader then places one or more appropriate initramfs images into memory. The initramfs is used by the kernel to load drivers and modules necessary to boot the system. Once the kernel and the initramfs image(s) are loaded into memory, the boot loader hands control of the boot process to the kernel. Boot loaders only need to support the file system on which kernel and initramfs reside (the file system on which /boot is located). 
 

 

4A. Kernel load: First Kernel is loaded into RAM and remains there until shutdown. 

The very first part of the Linux kernel is written in 8086 assembly language (boot/bootsect.S). When run, it moves itself to absolute address 0x90000, loads the next 2 kBytes of code from the boot device to address 0x90200, and the rest of the kernel to address 0x10000. The message ``Loading...'' is displayed during system load. Control is then passed to the code in boot/Setup.S, another real-mode assembly source.

The setup portion identifies some features of the host system and the type of vga board. If requested to, it asks the user to choose the video mode for the console. It then moves the whole system from address 0x10000 to address 0x1000, enters protected mode and jumps to the rest of the system (at 0x1000).

4B. Kernel decompress: The Linux kernel is installed compressed, so it will first uncompress itself. The beginning of the kernel image contains a small program that does this. The code at 0x1000 comes from zBoot/head.S which initializes registers and invokes decompress_kernel(), which in turn is made up of zBoot/inflate.c, zBoot/unzip.c and zBoot/misc.c. The decompressed data goes to address 0x100000 (1 Meg = 2^20), and this is the main reason why Linux can't run with less than 2 megs ram.

Decompressed code is executed, and eventually, the routine start_kernel is invoked. start_kernel executes a wide range of initialization functions., including unpacking the initramfs (initial RAM filesystem), which becomes the initial root filesystem. The purpose of the initramfs is to bootstrap the system to the point where it can access the root filesystem.

The source for the above operations is in boot/head.S. start_kernel() resides in init/main.c, and never returns. Anything from now on is coded in C language, left aside interrupt management and system call enter/leave. The kernel then executes init as the first process. The early userspace starts. Until this point, initial root filesystem in RAM was being used. At the final stage of early userspace, the real root is mounted ((i.e sets up / root dir, and all dir under it), which replaces the initial root filesystem. This is done via a call to pivot_root ( ) which unmounts the temporary root file system and replaces it with the use of the real one. The memory used by the temporary root file system is then reclaimed. If the mounting of the root filesystem fails, for example because you didn't remember to include the corresponding filesystem driver in the kernel, the kernel panics and halts the system.


5. Init: Once real file system is mounted, kernel finishes it's own part of boot process. init process then tries to execute /sbin/init which is the first user pgm with process id=1. The first process it runs is a script at /etc/rc.d/rc.sysinit which check all the system properties, hardware, display, SElinux, load kernel modules, file system check, file system mounting etc.Then initrd (initial ram disk) is run. Now, /etc/inittab file is read to determine run level, and pgms for that run level are run. Once all these pgms for that run level have run, init process runs one more file /etc/rc.local which are the last commands run in initialization process or even booting process. Once everything is completed the control is given back to the kernel.

On newer systems using systemd, init is replaced by systemd. So, 1st user pgm with process id=1 is systemd (and NOT init) in such a case. See details under "init vs systemd" section.

After exec()ing the init program above, the kernel has no direct control on the program flow. Its role, from now on is to provide processes with system calls, as well as servicing asynchronous events (such as hardware interrupts). Multitasking has been setup, and it is now init which manages multiuser access by fork()ing system daemons and login processes.


6. Prompt: init (or systemd) forks 2 processes: getty (get terminal) and login (for user login). These allow the user to log into the system.

A. getty: getty is called once for each virtual terminal (typically six of them), which initializes each tty and brings up a "login:" prompt asking for a username. Without this getty pgm, communication via terminal can't happen. The init program starts up other programs similar to getty for networked connections. For example, sshd, telnetd, and rlogind are started to service logins via ssh, telnet, and rlogin, respectively. Instead of being tied directly to a specific, physical terminal or modem line, these programs connect users' shells to pseudo ttys. These are devices that emulate terminals over network connections. getty listens at the terminal and waits for the user to notify that he is ready to login in (this usually means that the user must type something). When it notices a user, getty outputs a welcome message (stored in /etc/issue), and prompts for the username. On my CentOS laptop, /etc/issue has these 2 lines: \S, \r, \m are prompt variable char, and print corresponding system info. i.e \u=user, \h=hostname, etc.

ex: /etc/issue => this has 2 lines shown as below (On ubuntu, it just has a single line with name of linux distro "Ubuntu 22.04.2 LTS \n")

\S

Kernel \r on an \m

Above 2 lines print this message followed by the "login:" prompt

CentOS Linux 7

Kernel 3.18xxx on an x86_64

DESKTOP-ASHISH login:

Now this prompt for login may be CLI or GUI. If we did a basic installation of Linux with no graphical interface, then we will see CLI on all virtual terminals. Else if X11 was installed, then getty starts display manager (dm) on one of these virtual terminal (the default virtual terminal) which brings a graphical login screen (you will see a gdm pgm running, under user "root"). Once the username is provided, getty pgm disappears. But before it disappears, it starts a login pgm at /bin/login to complete the login process.

B.login: login pgm gets the username as a parameter from getty pgm, and prompts the user for the password. It checks them against /etc/passwd and /etc/shadow.

Even though the file/etc/passwd is supposed to store encrypted passwords too (which it used to do in old days), but this file is readable by everyone. So, now for security reasons, the encrypted password is stored as "x" in this file, and the real encrypted password is stored in /etc/shadow file which is readable by super user (su) only. So, even encrypted password is not visible to any one else besides su.

On successful matching of username/password, the login program begins a session for the user by setting environment variables and starting the user's shell, based on /etc/passwd.

/etc/password file contains one line for each user of the system. That line specifies, among other things, the login name, home directory, and program to start up when that user logs in. There are seven entries on each line separated by colon. Exact syntax of this file explained here: https://www.cyberciti.biz/faq/understanding-etcpasswd-file-format/

The last bit of information (the program to start up) is stored after the last colon of each line. If nothing follows the last colon, the standard shell /usr/bin/sh is assumed by default. The following lines show typical lines from my /etc/passwd file

  • root:x:0:0:root:/root:/bin/bash => thi is when we type "root" at login prompt, login shell started is bash for root user
  • gdm:x:42:42::/var/lib/gdm:/sbin/nologin => in RHEL, every process runs under a particular user. Users corresponding to certain processess dont have the need to login. So, they have been assigned with a nologin shell. 
  • ashish:x:1000:1000:ashish:/home/ashish:/bin/bash => this is the line used when user "ashish" logs in. After password match, bash shell is started. NOTE: we don't have to start a shell. We could run any program, i.e :/home/my_test.tcl is an equally valid program that can be mentioned here, and login pgm will start that pgm instead of a shell.

The blinking cursor that appears when typing password in CLI is the login program running. No shell has been started yet.


7. login: The login program displays the contents of /etc/motd (message of the day) after a successful login, and then it executes the pgm mentioned in /etc/passwd, which is usually a shell pgm. This is called a login shell. This login shells may be one of many shells supported, and is the entry in /etc/passwd file as indicated above. The default shell on most linux distro is bash. Once the user's login shell is started, it will typically run a runtime configuration file, such as .bashrc or .cshrc, before presenting a prompt to the user. For bash shell, /etc/profile; is executed. in addition, it executes .profile in the user's home directory. For csh, .login is executed. However, if the login was done graphically via dm, then different set of runtime configuration files are called to call the graphical windows i/f. However, on other virtual terminals (accessible by pressing ctrl+alt+F2 or ctrl+alt+F3 or so on), we still see the text login shell after entering username and password.


8. xinit:  For grahical windows i/f, login pgm calls some other configuration files after successful login.These runtime configuration file will call startx or xinit. It runs the user's xinitrc runtime configuration file, which normally starts a window manager (wm). When the user is finished and exits the window manager, xinit, startx, the shell, and login will terminate in that order, returning to getty. NOTE: xinit may not be use any longer on linux distro that have switched to systemd (instesd of init).

startx: This cmd is a wrapper to xinit, and is implemented differently on diff OS variants. More details here: https://www.computerhope.com/unix/startx.htm. startx is a standalone cmd and is used to start the wm under full user control. This is not the cmd that is used by the OS to start wm, but whatever "xinit" script is being used by the OS achieves results similar to this.