ARM arch and mem
- Details
- Last Updated: Sunday, 14 June 2020 07:26
- Published: Saturday, 28 March 2020 18:10
- Hits: 764
ARM profile:
Starting in early 2000, ARM decided to diversify its product portfolio to cater to different segments of market. It developed architecture version 7 (called as v7), and defined 3 profiles, each of which was to cater to different segment of market. These became the 3 ARM profiles: A=Application profile, R=Realtime profile and M=Microcontroller profile, targeting different market segments. It branded these family of processors in these 3 profiles as "cortex" processors. Earlier arch version before v7 was v6 which was used in ARM11 family of processors.
ARM's very first cortex core was 32 bit Cortex-M3, introduced in 2004. It's a microcontroller core (M). It's for embedded use in microcontrollers. It was based on v7-M arch. Cortex-M0 and Cortex-M1 were designed later with fewest instruction set possible for the Cortex family, to become the smallest silicon die (Cortex-M0 core can be designed in less than 10K gates, where gate refers to 2 i/p NAND gate, which is very impressive. In contrast, modern x86 processors have billions of gates). M0 and M1 were based on older v6-M arch, which evolved from v6 arch. These are the only 2 cortex processors from v6 arch, rest of the cortex processors are from arch v7 and above.
Then in 2005, it came up with 32 bit Application core (A), known as Cortex-A8. It was basically a full blown processor core for use in high performing SOC. The Cortex-A8 was the first Cortex design to be adopted on a large scale in consumer devices. In 2012, it introduced 1st 64 bit cores. It introduced the powerful Cortex-A57 core, and the energy efficient alternative Cortex-A53 core.
In 2011, it introduced Cortex real time core (R), known as Cortex-R5. These are optimized for hard realtime and safety critical applications. It's similar to A profile, but adds features which make it more fault tolerant.
M profile is 32 bit, and is the most popular core present in billions of embedded devices. 64 bit A profile is most popular in consumer hand held devices as phones, tablets, wearables, etc.
These are the 3 profiles:
1. Application profile (A): Cortex A5, A8, A9, A15, A32, A53 thru A77 series used for complex OS and user app. It is the only one to support 64 bit ISA. A5, A8, A9, A15, A32 are 32 bit, while A53 onwards are 64 bit. Around late 2000, more and more applications were developed for 64 bit processors from Intel/AMD, and ARM had to move to 64 bit. With 64 bit, ARM moved to newer v8 architecture. All these A profiles support virtual memory system arch (VMSA) based on memory management unit (MMU). This is essential, as some OS require presence of MMU in order to work. These 64 bit ARM processors are the ones that are getting used in most of the SOC chips that you see used in phones, watches, handheld, etc. These easily run over 1GHz clk frequency. For ex, Raspberry Pi Broadcom SOC has A53 in Pi 3, and A72 in Pi 4. Many companies now prefer to design their own cpu based on v8-A arch by licensing arch, instead of licensing RTL for Cortex A cores. This is done to differentiate their products with others, as anyone using a standard cortex core from ARM can't be much better than the competitor using the same cortex ARM core. For ex, the SOCs in earliest ipad and iphones, used Cortex A8 and A9 cores from ARM, but starting with iPhone5, Apple started designing their own cores, based on ARM v7-A arch, which was called "Swift" (it had 2 ARM cores). In iphone 5S and later ipad Air and Mini, Apple designed in house 2 core processor called "Cyclone" based on ARM v8-A arch. Later SOCs from Apple had multiple performance cores and multiple efficiency cores on the same chip (for ex, A12X Bionic SOC from Apple has four high-performance cores and four high-efficiency cores).
More details on various ARM v8-A cores: https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores
More info on Apple processors here: https://en.wikipedia.org/wiki/Apple-designed_processors
A profile supports both ARM (A64, A32) and THUMB (T32) inst set.
2. Real Time profile (R): Cortex R4-R8 series used for real time app in embedded systems. HAS FPU for high perf. These support protected memory system arch (PMSA) based on memory protection unit (MPU).
It supports both ARM (A32) and THUMB (T32) inst set.
3. MicroController profile (M): Cortex M0-M3 series used for embedded microcontrollers (instead of 8051). M3 was the very first cortex core (2004), followed by M1, M0 and then M0+ (2012). There are other less commonly used variants as M4, , M7, M23, etc. M profile supports only Thumbs ISA (no support for ARM ISA). It supports 2 variants of T32 ISA: Thumbs1 and Thumbs2. Thumbs2 being a superset of Thumbs1 is what is supported by almost all M profile cores, so Thunmb1 is more or less obsolete.
Cortex M0/M1: supports subset of THMUBS2 (T32) inst set.
Cortex M3: supports more complete set of THMUBS2 (T32) inst set. more inst supported here (mostly 32 bit equiv for 16 bit inst), as it's more powerful
So, T32 ISA is the most widely used (as it's used in M profile), so we'll concentrate on this. A32 and A64 are more complex ISA with lot more instructions. NOTE: when we say that a cortex core supports T32, A32 or A64 ISA, it supports only a subset of such ISA (not every instruction in that ISA). We need to look in the reference manual for that processor, to know exactly what inst it supports.
ARM architecture:
Besides the 3 profiles and different ISA, ARM defines diff versions of arch. These arch are still evolving. Few ARM arch are: ARMv4T, ARMv5E, ARMv6, ARMv6-M, ARMv7-A, ARMv7-R, ARMv7-M, and ARMv8-A processor architectures => architecture get complex, but faster (with pipelining) as we go from v4T to v8. Arch v4T thru v6 were used in classic ARM cores, and so not relevant for cortex family. Subscript A,R,M after the version num, indicates in which profiles that arch is used. Starting from version v7, arch were defined for each profile. These arch capture the ISA and profile info, so each cortex processor is tied to one of these arch:
ARMv6-M = For M profile. implemented in Cortex M0/M0+/M1. Simplest arch. These are the only cortex cores that are based on v6. All other cortex cores based on v7/v8 arch.
ARMv7-M = For M profile. implemented in Cortex M3
ARM v7-R = For R profile. implemented in Cortex R4 thru R8
ARM v7-A = For A profile. implemented in Cortex A8 thru A17 which were all 32 bit
ARM v8-A = For A profile. implemented in Cortex A32, and A53 thru A77 which were all 64 bit (except A32 which was 32 bit, but still has v8 arch). Most comlex arch. More minor revisions of these as v8.1, etc released. This competes with most advanced processors developed by Intel/AMD.
So, M profile uses v6, v7. R profile uses v7, while A profile uses v7,v8.
General ARM arch: All reg in ARM arch are 32 bit, irrespective of ISA. ARM arch is RISC. It has uniform Register File, and register load/store.
13 general purpose registers - R0 to R12
R0 to R7 are LO reg, and can be accessed by all 16 bit Thumb inst and all 32 bit Thumbs2 inst.
R8 to R12 are HI reg and can be accessed by all 32 bit Thumbs2 inst, but not by all 16 bit Thumbs inst.
3 special meaning register -R13, R14, R15
R13 = can be MSP=Main stack pointer or PSP=Process stack pointer, only one of these 2 reg can be accessed at any time. Usually we would expect only 1 SP, but having 2 SP allows 2 separate stack memories to be setup. MSP is the default SP and can be used by any code that requires privileged access, while PSP is used by unprivileged code. So, MSP is used by OS kernel as well as exception handlers, while PSP is used by user application code. Since PUSH and POP operations are always word aligned (i.e addr = 0x0, 0x4, 0x8, etc), SP has it's lowest 2 bits tied to 00.Having 2 SP, prevents stack error in user application (thread mode) from corrupting stack used by OS (handler mode)
R14 = LR=Link reg. Used to store return pgm counter when a subroutine/function is called. Even though PC bit 0 is always 0, LR bit 0 is readable/writable and is therfore not guaranteed to be 0. This LSB bit set to 0 indicates ARM state, while 1 means Thumbs state.
R15 = PC=pgm counter. bit 0 of PC is always 0, as inst addr are half-word aligned. However, in branching, either by writing to PC or using branch inst, LSB of target addr is always set to 1 to indicate Thumb state operation. Setting to 0 will cause it to switch to ARM state, which may not be supported causing fault exception. NOTE: even though LSB is written as 1, branch takes place with LSB=0, as LSB bit is tied to 0, and can't be changed.
4. few special purpose reg: These reg can only be accessed via MRS and MSR inst. These reg are not mapped to mem, are just like R0-R15 reg.
PSR = pgm status reg. Divided into 3 = APSR (application psr), IPSR (intr psr), EPSR (execution psr)
Interrupt mask reg = PRIMASK, FAULTMASK, BASEPRI, etc
Control reg = CONTROL. Bit 1 of this reg control which SP is used for thread mode. If it's 1, PSP is used in thread mode, while if 0, MSP is used for both thread and handler mode.
memory map: Since addr is 32 bits for T32/A32, addr from 0x0000_0000 to 0xFFFF_FFFF can be accessed, resulting in 4GB of mem. These are divided into several segement. Each segment has particular attributes like can be written, can be cached, etc.
1. 0x0000_0000 to 0x1FFF_FFFF (code segment) => Bottom 0.5GB is for pgm code. This is where we have flash mem or ROM to store our whole pgm
2. 0x2000_0000 to 0x3FFF_FFFF (on chip sram segment) => Next 0.5GB is for pgm data. This is where we have sram or volatile mem to rd/wrt our data (as stack, var, etc)
3. 0x4000_0000 to 0x5FFF_FFFF (on chip peripheral segment) => Next 0.5GB is for peripheral devices. This is where all AHB/APB reg for all peripheral devices are stored
4. 0x6000_0000 to 0x9FFF_FFFF (off chip sram segment) => Next 1.0GB is for external volatile mem. Mem regions 1,2 and 4 above are the only ones from where code execution is allowed
5. 0xA000_0000 to 0xDFFF_FFFF (off chip peripheral segment) => Next 1.0GB is for external peripheral devices
6. 0xE000_0000 to 0xFFFF_FFFF (system segment) => Last 0.5GB is for mem mapped reg. This contains all system reg (i.e IPR, SCR, CPUID, etc), ROM tables, and some vendor specific area. Except for small part of this space, most of this mem space transactions don't appear as rd/wrt on AHB bus, as all these reg are in NVIC which is internal to processor. The processor supports only word size accesses in the range 0xE0000000 - 0xEFFFFFFF.
The processor contains a bus matrix that arbitrates the processor core and optional Debug Access Port (DAP) memory accesses to both the external memory system and to the internal NVIC and debug components. Transactions are routed as follows:
1. All accesses below 0xE0000000 or above 0xF0000000 appear as AHB-Lite transactions on the AHB-Lite master port of the processor.
2. Accesses in the range 0xE0000000 to 0xEFFFFFFF are handled within the processor and do not appear on the AHB-Lite master port of the processor.
NVIC: Nested Vectored Interrupt controller: NVIC provides nested intr support, i.e intr can be programmed to different priority levels, and depending on priority levels, new intr can override current running intr. Whenener the processor gets an intr, it jumps to appr intr handler and executes that code. The addr of intr handler is stored in vector table.
Very bottom code segment of mem has this vector table. Addr 0x0000_0000 has initial value of MSP. From Addr 0x0000_0004 onwards, we store jump addr for exception #1 to exception number #255. Addr 0x0000_0004 stores jump addr for reset exception, Addr 0x0000_0008 stores jump addr for NMI (non maskable interrupt) exception and so on. Upto exception #15 aresystem intr generated due to some system error. Exception #16 onwards are external intr which are activated when external intr line is pulled active by an external device. There may be anywhere from 32 external intr line to 255 external intr line. Depending on which one is activated, the code jumps to that addr in vector table, which has the addr of that exception stored at that entry.
--------