ARM ISA
- Details
- Last Updated: Saturday, 28 March 2020 18:16
- Published: Saturday, 28 March 2020 18:16
- Hits: 877
ISA:
Any ISA needs a minimum of 3 inst types: load/store inst, conditional branch and logical (and|or|not) function. Load/Store inst needed to move contents from one place to other. Branch inst needed to implement if-else conditions which form the basis of doing intelligent things based on conditions. Logical functions are needed to implement any arithmetic/logical operations as add, multiply, etc (since basic gates as and, or, not are sufficient to implement any logic function). However, such an ISA would require very large code size to do simple operations like ADD, SUB, CMP, etc. So, we extend the ISA to include more commonly used inst, so that code size and compute cycles are reduced.
ARM had multiple inst set that it supported, and over so many products, it became very confusing. details of inst set are provided in ARM Architecture Reference Manual (ARM ARM)
Initially ARM inst set (called ARM ISA or A32) was the only ISA in ARM processors, which was fixed 32 bit inst set. Bits 27 to 20 stored various opcode, while other fixed bits stored source/dest reg number. There are 16 registers in user space (R0-R15). However code size for ARM ISA is large due to inst being 32 bit, even though it has good perf with low power. So, later 16 bit inst (called THUMBS ISA) were added to improve code density. 1st popular chip to include Thumbs ISA was ARM7TDMI (T in name implies Thumb ISA). Initially Thumb ISA included only 16 bit inst, but later more inst were added to both ARM ISA and Thumbs ISA, resulting in few 32 bit inst in Thumbs ISA. ARM ISA being fixed inst size was RISC style, while Thumbs ISA being both 16 and 32 bit inst size was CISC style.
THUMBS ISA: these were mostly 16 bit. To be able to use THUMB inst in ARM processors (which support ARM ISA), there is a decompressor in ARM hardware, which decompresses THUMBS 16 bit ISA into ARM 32 bit ISA, which is then passed onto ARM instruction decoder. Each 16 bit Thumb inst had an equiv 32 bit ARM inst. The main motivation for introducing Thumbs ISA was to reduce code size - by encoding most commonly used 32 bit ARM instructions in 16 bit. This was very useful in embedded designs, where memory is limited and expensive. Thus the inst length became variable - while most inst were 16 bit, few inst were encoded in 32 bit, where 16 bit encoding wasn't possible (but the inst was needed to improve cycle time). There were 2 set of THUMBS ISA introduced.
- THUMBS1: had 35 inst of which only 'BL' inst is 32 bit. All 16bit inst can only access lower 8 registers (R0-R7), since there are only 3 bit encoding for registers. Since, most inst were 16 bit, this reduced codesize by 30% and reduced I-cache misses, but increased cycle counts. Solution was to blend THUMBS1 and ARM inst, where critical section of code was written in ARM ISA and rest in THUMBS1 ISA. This gave rise to Thumbs2 ISA. Loosely speaking, Thumbs ISA usually refers to Thumbs1 ISA.
- THUMBS2: So, THUMBS2 ISA introduced which had all THUMBS1 ISA + some new 16 bit inst for code size wins. Few new 32 bit inst were also added (DMB, DSB, ISB, MRS, MSR, BL/BLX). On top of this 32 bit equiv inst for corresponding 16 bit inst were also provided. Thus Thumbs2 provided most of ARM inst too. This resulted in total of 100's of inst. However, virtually all inst available in ARM ISA (with exception of few) were now available in THUMB2 which allowed to have a unified assembly language (UAL) for ARM and THUMBS2, which can then be compiled to generate binaries for either ISA. Thumbs2 was also known as T32 ISA (Thumbs 32 bit ISA). T32 provided the flexibility to programmers to code sections of their pgm in 16 bit as well as 32 bit, depending on whether code size or performance was more important. There is no mode to switch b/w 16 bit and 32 bit, all T32 inst (whether they are 16 bit or 32 bit) are decoded by ARM core the same way, to generate internal ARM 32 bit inst. So, Thumbs2 is a confusing mnemonic, instead we'll use T32 for it. NOTE: T32 or Thumbs2 includes all inst which are 16/32 bit Thumbs inst, as well as equiv 32 bit ARM inst. Thumbs2 was introduced in Cortex M3, which was the 1st cortex processor. Thumbs2 kind of unified Thumbs1 ISA and ARM ISA into one, which allowed cortex processors to run in 1 operation state, instead of switching b/w ARM state (when running 32 bit ARM ISA) and Thumbs state (when running 16 bit Thumbs ISA). This was a huge advantage in terms of perf for Cortex cores.
UAL (unified assembly language): This assembly language syntax is for ARM assembly tools. T32 had there own assembly language syntax, while A32 had their own. This caused confusion. Since most inst were almost same b/w T32 and A32, UAL was developed to allow both ISA to have same assembly language syntax. This allowed easier porting b/w the 2. We'll use the new UAL syntax for any assembly language code. "THUMB" directive in assembly file indicates that the code is in UAL syntax ("CODE16" directive implies it's in traditional Thumbs syntax). Since most inst in T32 have 16 bit and 32 bit variants, compilers choose which variant to use to generate assembly code. Suffix ".W" after any inst indicates it's 32 bit inst (W=wide), while ".N" indicates it's 16 bit inst (N=narrow). If no suffix provided, then assembler can choose b/w 16 bit or 32 bit (but usually defaults to 16 bit to get smaller code size).
Since THUMB2 was allowed to be backward compatible with THUMBS1 (meaning any code in THUMB1 should run on THUMB2 m/c), this implied that 16 bit inst from THUMBS1 could not be changed. Trick was to make processor recognize new 16 bit inst as well as new 32 bit inst. On looking at original Thumbs1 ISA, it was seen that bits [15:13] of only 2 inst were 111. These 2 inst were "B" (unconditional branch), which was 16 bit inst and "BL" (long branch with link) which was 32 bit inst. So, to accommodate 32 bit inst, the 3 MSB [15:13] of 16 bit inst were used to indicate if it was 32 bit inst. Process was as follows:
Look at Bits[15:13] of first HalfWord(HW): If it's anything other than "111", it's current 16 bit Thumbs inst. If it's "111" => it may be B, BL or some other new inst. Now, look at bits [12:11]:
1. 00 => If bit[12:11]=00, it's current THUMBS1 unconditional Branch (B), which is a 16bit inst.
2. 01, 10, 11 => If bit[12:11]=anything else, it's a THUMBS2 32 bit instruction (inlcuding BL which was Thumbs1 32 bit inst).
new THUMBS2 inst which were 16 bit were encoded in remaining 16 bit encodings left.
NOTE: Thumb instruction execution enforces 16-bit alignment on all inst. This means that 32-bit inst are treated as two halfwords, hw1 and hw2, with hw1 at the lower address. So, 32 bit inst is as follows:
Data: 31:24 23:16 15:8 7:0 => HW2=[31:16], HW1=[15:0]
Addr: A+3 A+2 A+1 A => Addr can only be 16 bit aligned, so lsb of Addr is always 0.
THUMBS2 introduced CBZ (Branch if zero) inst, which previously required 2 separate inst. It also introduced predication, if then inst (IT) which caused next 1-4 inst in memory to be conditional. THUMBS2 performance was 98% of ARM perf, and code size was 30% less than ARM ISA. So, THUMBS2 became the ISA of choice.
Later ARM ISA for 64 bit (aka A64) were added to inst set. So, in nutshell, A64 ISA is for 64 bit processor, while A32/T32 ISA is for 32 bit processors. When we talk about Thumbs ISA, we'll mean Thumbs2 ISA, or refer to it as T32 also. There is no more Thumbs1 ISA. It's all Thumbs2 or T32. A32 is the other 32 bit ISA used in A and R profiles. A32 and T32 are almost the same ISA, with the processor decompressing T32 into A32 internally. T32 ISA is just the compressed version of A32 ISA to save memory space (where some 32 bit inst from A32 were encoded in 16 bit inst in T32). Thus there is not much diff b/w T32 and A32 ISA.
Thumbs1 Instructions:
There are 35 total Thumbs 1 inst (34 are 16 bit while 1 is 32 bit). Out of 16 bits, few msb bits are used for opcode encoding, while lower bits specify reg, const, etc. There are 19 instruction format for these 35 inst. Instruction format refers to the opcode, reg num location (i.e same opcode ADD may appear in 3 or 4 formats, depending on whether it's adding 2 reg, or adding a constant number, etc). Below we list all 35 inst based on their type, and NOT on their format (all inst listed in ARM7 TDMI manual):
1. Arithmetic: 6 inst = ADD, ADC (add with carry bit), SUB, SBC (sub with carry bit), MUL (multiply 2 reg), NEG (2's complement, Rd=-Rs),. These arithmetic can be b/w reg or b/w reg and constant.
ADD has 4 formats:
- add 9 bit signed constant to stack pointer
- add 10 bt constant to either PC or SP, and load resulting addr into a reg. So, this is a "load addr" instead of "load datat"
- add 8 bit constant to one reg and store in another reg
- add 2 reg
NEG: do negative of 1 reg and store in other reg, i.e Rd = -Rs
2. load from mem: 7 inst = LDRB (load byte), LDRH (load half word or 2 bytes), LDR (aka LDRW or load full word or 4 bytes), LDM/LDMIA (load multiple), LDSB (load sign extended byte), LDSH (load sign extended half word), POP. NOTE: load/store inst is not there in 8051. Move inst in 8051 does load/store func. move inst in ARM does move from reg to reg, and not from/to mem.
LDMIA: load multiple reg (only 8 reg possible from R0 to R7, since 8 bits allocated in inst) from contents of mem, specified by addr contained in a base reg (3 bits allocated for base reg, 000=R0 ... 111=R7).
POP: pop reg specified by the list (optionally LR also, depending on opcode bit for LR), from the stack in mem (i.e load contents from stack mem to reg). Only 8 reg possible in the list from R0 to R7, since 8 bits allocated in inst. used during function/subroutine calls
3. store to mem: 5 inst = STRB, STRH, STR (aka STRW or store word), STM/STMIA (store multiple), PUSH. These store don't have sign extended version as in load. These inst same as those of load above, except they store to mem (instead of loading from mem)
PUSH: same as pop, except it does push of reg contents to the stack (i.e store contents from reg to stack mem). used during function/subroutine calls
4. move from reg to reg: 2 inst
MOV (move one reg to another reg, or move constant to another reg),
MVN (move NOT of one reg to another reg),
5. logical: 8 inst = AND, ORR (or), EOR (xor), LSL (logical shift left, <<), LSR (logical shift right >>), ASR (arithmetic shift right), ROR (rotate right), BIC, TST (AND test)
BIC: bit clear = AND NOT of 2 reg, i.e Rd = Rd AND NOT Rs
TST: (AND test) = set condition code (N,Z,C,V flag in PSR reg) on Rd AND Rs, so smilar to AND, but sets condition code too
6. Branch: 4 inst = B, Bxx, BL, BX. branch may be conditional or unconditional.
B: unconditional PC relative branch. offset is bit[10:0], so 11 bits, but it's shifted left (<< 1) by one, since addr is always HW aligned. So, the offset actually becomes 12 bit 2's complement offset, so range of addr that can be jumped to is PC +/- 2048 bytes.
BX: branch indirect. performs unconditional branch to addr specified in LO or HI reg.
BL: long unconditional branch with link. This is the only 32 bit inst in Thumbs1. This is same as Bxx, except that offset is 23 bit 2's complement, where upper 11 bits of offset are stored in 1st 16 bit inst, and lower 11 bits are stored in 2nd 16 bit inst ( and then shifted left by one, so addr becomes 23 bits). Addr of inst following the BL is placed in LR, so that after end of branch, PC can return back to where it was.
Bxx: This performs conditional branch depending on state of CPSR condition code. condition code in N,Z,C,V can be each bit set or clear. So, 8 opcodes allocated for each bit (N,Z,C,V) either set or clear. Remaining 6 opcodes allcated to combination of set/clear bits. There are 4 bits for opcode from 0000 to 1111, but only 14 opcodes coded (BEQ, BNE, BCS, BCC, etc).
7. compare b/w 2 reg: 2 inst = CMP, CMN.
CMP: this inst used in 3 ways: compare b/w 2 reg and set condition flag, or subtract 2 reg, or compare b/w reg and constant. These all set condition flag (N,Z,C,V flag in PSR reg)
CMN: add 2 reg and set condition flag
8. software interrupt: 1 inst = SWI (It's not hardware interrupt). It causes the processor to switch to ARM state and enter supervisor (SVC) mode. It loads SWI vector addr (addr 0x08 in vector table) into the PC. This vector addr is also known as non-maskable interrupt (NMI) addr. This 16 bit inst has 8 bit comment field which can be used by SWI handler, it is ignored by the processor.
Thumbs2 instructions: These include all Thumbs1 ISA + new 16 bit inst + new 32 bit inst + 32 bit equiv inst for all 16 bit inst. Total inst icount is over 100. Some inst from here are part of v7-M arch only (.e they are not supported in v6-M arch)
1. Arithmetic:16 bit
ADR
SDIV/UDIV = signed/unsigned divide
CPY
RSB
REV/REV16/REVH/REVSH => reverses byte order (individual bytes are not reversed or modified). REV is for full word, REVH is for half word (both half words are reversed separately), while REVSH is for reversing the lower half word, and then sign extending the result with MSB.
RBIT => reverses bit order in data word. Useful for processing serial bit streams in data communication, where the entire stream needs to be reversed
BFC/BFI => bit field clear(BFC), bit field insert (BFI)
SBFX/UBFX => signed and unsigned bit field extract
SXTB/SXTH, UXTB/UXTH = used to extend a byte/HW int a Word. S=sign extend with MSB(bit [7] for byte or bit [15] for HW), U=unsigned, value is 0 extended to 32 bits
2. barrier instructions: new 32 bit inst, to force a memory/inst barrier. It forces all mem access/inst before it to complete, before allowing mem access/inst coming after it to complete. This may be needed in complex mem systems, when out of order execution can cause race conditions. All 3 inst below can't be coded in high level laguage, so these can be accessed via functions defined in CMSIS compliant device driver library. i.e void __DMB(void); //function defn for DMB inst
DMB: data mem barrier. It forces all mem access before it to complete, before new mem access can be done. This is helpful in multi processor systems, where shared mem is used.
DSB: data sync barrier. It forces all mem access before it to complete, before allowing inst coming after it to complete.
ISB: inst sync barrier. It forces all inst before it to complete, before allowing inst coming after it to complete.
3. move inst to rd/wrt special reg:
MRS: move contents of special reg (i.e APSR, IPSR, PSR, MSP, PSP, etc) to general purpose reg. This causes rd of special purpose reg.
MSR: move contents of general purpose reg to special reg. This causes wrt of special purpose reg. MRS is used in conjunction with MSR as part of rd-modify-wrt seq (ex: to update a PSR to clear Q flag)
3. hint inst: 16 bit
SEV = send event, causes an event to be signaled to all processors within a multiprocessor system. It also sets local event reg to 1.
WFE = sleep and wait for event,
WFI = sleep and wait for interrupt. this inst puts processor in sleep until wakeup event happens,
NOP = no operation
4. branch: 16 bit
CBZ
CBNZ
BLX = branch indirect with link. This is unsupported inst, but existed in traditional ARM processors.
IT = If then. allows upto 4 succeeding inst to be conditionally executed. It avoids branch penalties, as there is no change to pgm flow.
5. misc : 16 bit
SVC = supervisor call. causes SVC exception
BKPT = breakpoint
CPS (CPSIE/CPSID) = change processor state
6. All 32 bit equiv inst for 16 bit inst above (thiese include 16 bit inst from Thumbs1 as well as from Thumbs2 (ex: if total number of 16 bit inst were 50, then there are 50 equiv 32 bit inst in Thumbs2 ISA)
ARM instructions:
A32 is pretty much similar to T32, except that it has no 16 bit inst. So, it can be considered a subset of T32 with some minor changes. A64 is more complex ISA, and competes with x86_64.