DRAM Memory

DRAM Memory

When we talk about volatile memory, almost always we talk about DRAM (more specifically SDRAM). Wikipedia: https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory

Good article on basics of DRAM (taken from NXP): nxp_ddr_dram_basics

A DRAM memory module that you buy from market (long stick like with pins on one side) is a DIMM. It has multiple memory chips on it (usually 8 or 9). These are the actual memory chips that have the memory array in them. Each chip supplies 8 bits (or 16 bits in newer ones) of the DRAM bus. So, these 8 or 9 chips supply 64 or 72 bits (or 128 bits or more on newer ones) of the whole data bus.

Various gen of SDRAM:

1. SDR (or SDR SDRAM): This is the single data rate SDRAM. "SDRAM" term is also sometimes used for single data rate DRAM. However, SDRAM is usually meant for Synchronous DRAM. So, we should instead use "SDR SDRAM" for single data rate SDRAM. This was the 1st generation of DRAM which output data once every clock cycle (i.e data sent or received on only rising or falling edge). Clock speeds were from 66MHz to 133MHz. Supply Voltage was 3.3V.

2. DDR (or DDR SDRAM): This is the double data rate SDRAM. To double up the speed without increasing the clock speed, DDR SDRAM were introduced, which allowed data change to happen twice every clock cycle = once on rising edge and once on falling edge. This effectively doubled the bandwidth, as we would get the double the data rate even with the same clock speed.

JEDEC standard for naming memory chips:

So, specifying clock speed for memory chips wouldn't make sense, as that would imply that DDR memory have same speed as SDR memory (or they have the same bandwidth). So, JEDEC came up with a standard, which allowed effective speeds or bandwidth to be specified.

  • Speed convention: This specifies effective clock speed of memory module and has prefix as SDR/DDR followed by effective clock speed. It is of the form SDRxxxx or DDRxxxx where xxx refers to the effective clock speed of the memory stick. So, SDR mem with 200MHz was named SDR-200, while DDR mem with 200MHz was named DDR-400 (as 200MHz clk speed for DDR mem is effectively 400MHz clk speed as both rise/fall edges used). We also use nomenclature as DDR2-800 to further clarify that it's DDR2 memory.
  • Bandwidth convention: This specifies effective bandwidth (bytes/sec) of memory module and has PC as prefix followed by bandwidth in MB/sec. Memory chips were identified as PCxxxx where xxx refers to the effective bandwidth (in MB/Sec) of the memory stick. So, DDR2-800 memory stick which has 64 bit data interface is transferring 8bytes*800MHz = 6400MB/sec. So, it's named PC6400 or PC2-6400.

DDR Generations:

Various generation of DDR were introduced starting from year 2000. Below are the 5 gen of DDR memory as of 2020. None of the DDR mem are backwards compatible with SDR mem as the supply voltage was reduced from 3.3V which was the supply voltage of SDR mem. 

  • DDR1: Here, clock speeds were 133MHz to 200MHz. However, since data was rd/wrt twice every cycle, effective clock speed was 266MHz to 400MHz. Supply voltage was 2.5 V. For a 133MHZ SDR to be converted to DDR, clock rate of internal RAM operations wasn't changed. Instead, internally in SDRAM, 2 databits were pumped out on every +ve edge of clock cycle. This was done by allowing 2 parts of mem array to dump out the 2 bits in parallel. These were put in a 2 bit prefetch queue. Externally, the data bus grabbed bits from this prefetch queue 2 times every cycle, one on rising edge of clk, and other on falling edge of clk.
    • Typical DDR SDRAM clock rates are 133, 166 and 200 MHz (7.5, 6, and 5 ns/cycle), generally described as DDR-266, DDR-333 and DDR-400. Corresponding 184-pin DIMMs (having 64 bit or 8 byte data i/f) are known as PC-2100, PC-2700 and PC-3200. DDR-550 also available.
  • DDR2: DDR2 SDRAM is very similar to DDR1 SDRAM, but doubles the minimum read or write unit again, to four consecutive words. It does this by doubling the bus rate of the SDRAM without increasing the clock rate of internal RAM operations; instead, internal operations are performed in units four times as wide as SDRAM. The prefetch queue depth was doubled from 2 bits to 4 bits deep, so that internal memory can dump 4 bits of data every cycle (due to doubling of width of memory). So, from external memory i/f, 2 bits can be transferred with every cycle (one on rising edge and one on falling edge). Also, an extra bank address pin (BA2) was added to allow eight banks on large RAM chips. Supply voltage was 1.8 V. 
    • Typical DDR2 SDRAM clock rates are 200, 266, 333 or 400 MHz (periods of 5, 3.75, 3 and 2.5 ns), generally described as DDR2-400, DDR2-533, DDR2-667 and DDR2-800. Corresponding 240-pin DIMMs (still has 8 byte data i/f) are known as PC2-3200 through PC2-6400. DDR2-1066 and DDR2-1250 also available.
  • DDR3: DDR3 continues the trend, doubling the minimum read or write unit to eight consecutive words (prefetch queue depth of 8). This allows another doubling of bandwidth and external bus rate without having to change the clock rate of internal operations, just the width. Supply voltage was reduced to 1.5 V. DDR3 was mass adopted around 2008. DDR3 allowed max memory of 16GB per DIMM. DDR3 also has low voltage version called DDR3L which operates at 1.35V.
    • Typical DDR3 SDRAM clock rates are 400, 533, 666 or 800 MHz, generally described as DDR3-800, DDR3-1066, DDR3-1333 and DDR3-1600. Corresponding DIMMs (still has 8 byte data i/f) are known as PC3-6400 through PC3-12800. DDR3-2800 also available.
  • DDR4: DDR4 improves speeds further, but NOT by increasing the prefetch depth. Depth is still kept at 8, but banks are divided into more selectable bank groups where transfers to different bank groups may be done more rapidly. Internal banks are increased to 16 (4 bank select bits), with up to 8 ranks per DIMM. Supply voltage VDD/VDDQ was reduced to 1.2 V, with a 2.5 V auxiliary supply for wordline boost called VPP. DDR4 was introduced in 2014 and is still the most widely used memory as of 2022. DDR4 allowed max memory of 64GB per DIMM. DDR4 has NO low voltage version. DDR4 memory is supplied in 288-pin DIMMs, similar in size to 240-pin DDR3 DIMMs, by placing the pins more closely (0.85mm vs 1mm).
    • Typical DDR4 SDRAM clock rates are 800 to 1600 MHz, generally described as DDR4-1600 to DDR4-3200. DDR4-4800 also available.
  • DDR5: DDR5 is a major departure from previous gen mem, where it has active circuitry on the DIMM, which makes interface to the DIMM different from the interface to the RAM chips themselves. DDR5 DIMMs are supplied with management interface power at 3.3 V, and use on-board circuitry and associated passive components to convert to the lower voltage required by the memory chips. Final voltage regulation close to the point of use provides more stable power. Spec was released in 2020, and has yet to get mass adoption (a of 2022). Max DIMM capcity is 512GB.
    • Typical DDR5 SDRAM clock rates are 2400 MHz, generally described as DDR5-4800. It's quite a feat that we have clks running at 2.4GHz on PCB traces connecting 2 chips. Earlier, such Multi GHz clks could only run inside a chip generated locally from PLL.

 

Low Power DDR (LPDDR):

What we talked above was regular DDR mem used in laptops and desktops. LPDDR is a variant of DDR that consumes less power and is targeted for laptops, tablets and mobile phones. LPDDR technology standards are developed independently of DDR standards. LPDDR allows 16 and 32 bit data i/f in addition to 64 biit dat i/f that is std in regular DDR mem modules. Their mem capacity is smaller than their desktop cuounter parts as they are smaller in form factor.

  • LPDDR1: This is a slightly modified form of DDR1 to reduce power consumption. Power supply is reduced from 2.5V to 1.8V. LPDDR1 only had one voltage for all circuits (VDD) at 1.8 V. Additional savings come from temperature-compensated refresh (DRAM requires refresh less often at low temperatures), partial array self refresh, and a "deep power down" mode which sacrifices all memory contents. Additionally, chips are smaller, using less board space than their non-mobile equivalents.
  • LPDDR2: Similar to previous low power version, but more power efficient than DDR2. LPDDR2 created a separate voltage for the data bus (VDDQ), the command/address bus (VDDCA) and peripheral circuits (VDD2) at 1.2 V, keeping VDD (which powered the main memory array), now renamed to VDD1 at 1.8 V. We refer to operating voltage of LPDDR2 as 1.2V, even though the main interbal capcitive memory runs at 1.8V.
  • LPDDR3: LPDDR3 offers a higher data rate, greater bandwidth and power efficiency, and higher memory density. It went mainstream in 2013, running at 800 MHz DDR (1600 MT/s). At 64 bit data i/f, it implies 1600*8=12800MB/sec which is comparable to notebook memory from 2011. Samsung introduced the first 4Gbit 20nm DDR3 module capable of transmitting 2133MT/s. Supply voltages of LPDDR3 were same as those of LPDDR2.
  • LPDDR4: Doubling of i/f speed to 1600MHz (3200MT/s), and consuming 50% less energy. This was achieved partially by lowering VDDQ/VDD2 slightly to 1.1V, (from 1.2V)  removing VDDCA, but still keeping VDD1 at 1.8V. Hence we refer to operating voltage of LPDDR4 as 1.1V.
  • LPDDR5: Spec for LPDDR5 was published in 2019. It doubled the speed to 6400MT/s, and used differential clocking. It has bunch of power saving techniques. LPDDR5 created two possible values for VDD2, 0.9 V (low) or 1.05 V (high), depending on the frequency the memory is running, also allowing VDDQ to be between 0.3 V and 0.5 V. VDD1 was kept same as previous gens at 1.8V.

 

Graphics DDR (GDDR):

We talked about DDR mem above which are used mainly for cpu, but we also have DDR mem designed specifically for GPU, which require much higher bandwidth. These are called GDDR, and you see this mentioned on computers, which have a dicrete graphics card. Just like regular DDR gen, we have GDD1, GDDR2, etc. Transfer rate for GDR5 is about 100GB/sec.

 

DDR Memory interface and Commands:

The bus that connects the memory pins on microprocessor to the pins on the memory chip (DIMM on the motherboard) is known as the memory bus interface. There are bunch of cmds that are driven on these pins that dictate the operation.

There are 2 good articles on "Introduction to DRAM operation":

Intro to DRAM => https://www.allaboutcircuits.com/technical-articles/introduction-to-dram-dynamic-random-access-memory/

Basic operation => https://www.allaboutcircuits.com/technical-articles/executing-commands-memory-dram-commands/

Let's look at organization of DRAM memory chips and the DIMM:

DIMM: Contains multiple onboard DRAM chips. DIMM will have the memory size of DIMM as well as the organization detail as 2Rx8 etc.

  • DIMM module organization:
    • Rank: The rank of a DRAM module is the highest level of organization within a DIMM. A rank is a separately addressable set of DRAMs within a DIMM. In earlier chips, all DRAM chips on a DIMM would be addressed by a given addr. This was because each DRAM chips provided some bits of the final data bus that was output from the DIMM. In case of DIMM with 64 bit data bus o/p, 8 DRAM chips each 8 bit wide would provide 64 bits. So, Rank of chip was 1. However, later with memory capacity increasing and more and more DRAM chips on a single DIMM, we could separate out the DRAM chips into groups, where each group behaved like a single DIMM. One group has no relation to other group. So, ranks can be considered to be like different DIMM, except that they are physically on a single board.
  • DRAM chip organization:
    • Bank: Bank is the next level of organization below a rank. Bank is organization of memory arrays within a DRAM chip. So, while Rank refers to organization at DIMM level, banks refers to organization at DRAM chip level. Each bank operates independently of the others. This means that reading, writing, and precharging can all be done on one bank without impacting the other. Each bank may have multiple memory arrays, where each array is a set of rows and columns of memory. Each memory array outputs 1 bit, so the size of the output width of a bank indicates the number of arrays it has. Therefore in a x4 DRAM chip, the internal banks would each have four memory arrays. Only 1 bank is accessed at a time in each dram chip, so banks can be thought of as ranks within a DRAM chip.
      • NOTE: Multiple Banks may also be combined to form a Bank Group.
    • Rows/Columns: Banks are further divided into rows and columns, which are grouped into memory arrays. Each memory array outputs 1 bit as explained above.

 

DRAM Chip Interface:

Though pins for DRAM chips vary a little from gen to gen and among different vendors, there are few pins that are fundamental to operation of DRAM. Most of these pins are provided as differential signal (i.e complimentary signals in pair) as that provides btter noise immunity. Major pins of DRAM are:

  • Clock: This is the main clock provided as i/p to the DRAM chip.
    • CMD/ADDR CLK CK (Differential clock signal, i/p): This is pair of clock signals sent to the DRAM chip that is used to sample all the i/p signals for control and Addr (NOT to latch data signals) as shown below. This is differential (i.e +ve and -ve clk) so that it has better noise immunity. Historically clk has always been provided as differential pair, as clk needs to be very clean and free of glitches, else wrong values may get captured. Cmd/Addr are sampled on both edges of clk, so this is ddr clk. Cmd/Addr signals are lower speed, so this clk is usually slower than dataclk discussed later.
    • Clock Enable CKE => Some early generation of chips had a separate clock en pin, which would enable clk when when this pin was high. CKE was pulled low for auto/self refresh cmd as those cmds didn't need a clk.
    • WRT DATA CLK WCK (Differential clock signal, i/p): This is pair of clock signals sent to the DRAM chip for Write data capture and Read data output. CMD/ADDR clk is a separate clk (discussed above) that can't be used as wrt data clk, as wrt data clk is 2X-4X faster to support higher freq writes. This clk is also used for outputting read data and read strobe signal from the DRAM chip back to SOC. Wrt data is centered b/w the rise and fall edge of wrt clk.
  • Control/Addr: These are control signals that determine the action to take.
    • Chip Select CS# (active low i/p) => This activates the selected DRAM chip when low. All other DRAM chips with CS# high are not activated. At the DIMM level, there is a bus of CS#[x:0] coming in, which goes thru a decoder, and that decides which DRAM chip to select. This is sampled on rising edge of Cmd/Addr clk.
    • Cmd/Addr CA[x:0] (i/p bus) => This is a bus that serves dual purpose. It has cmd phase and Addr phase. Various cmds such as read, write, precharge, etc are sent on cmd bus, followed by the Addr to which this cmd applies to. This is the most common approach on latest gen of DDR, though earlier chips had many more pins to achieve the same purpose. Some of these are listed below:
      • Bank Addr BA[x:0] => Since we may have multiple banks, there is a separate bank addr too on some of the chips. Bank Addr bus may be considered part of the same Addr bus, as it's functionally just one big set of addr.
      • Row/Col addr strobe RAS#/CAS# (active low i/p)=> For Addr, we didn't separate out row/col addr. Both of them are embedded into the Addr bus. Row and col addr are provided in 2 separate cycles. The cmd protocol itself may define which is col addr and which is row addr. In many chips, we have dedicated pins called RAS# (Row addr strobe) and CAS# (column addr strobe) which indicate when row addr is going on the Addr bus and when col addr is flowing on Addr bus. Both RAS# and CAS# are active low.
      • Write Enable WE# (active low i/p) => We need a separate line for saying whether it's read or wrt. When low, it indicates a Rd cmd, while high indicates a wrt cmd. RAS, CAS and WE together define a cmd. Now these are all embedded within CA[x:0] bus.
  • Data: This refers to Data signals to Rd or Write. There's also a strobe signal that goes with this data bus, that serves as the clk for the data signals
    • Data DQ[x:0] (bidirectional bus) => This is Data bus that carries the data to be written to memory (in case of write cmd) or data read from memory (in case of read cmd). Optional ECC (Error correcting Code) bits are also provided on few of these lines to help with Error correction. This bus is bidirectional.
    • Data Strobe DQS[x:0] (bidirectional bus) => This is the strobe signal for Data bus. Since clk is already provided to the DRAM chip as i/p to latch all incoming signals, the same clk may be used as the clk for latching the rd/wrt data. The problem is that clk may be shifted and not perfectly aligned. So, a separate strobe for rd/wrt is needed. Write data and strobe are both driven by same clk on SOC (system on chip) chip, but are aligned with help of a DLL (Delay lock loop). This data strobe is used to latch wrt data on the DRAM chip. However, if we have separate clk lines for wrt (Wrt Data Clk), then we don't need strobe for write, in which case, this strobe signal is only used for Rd. For read, we do the same thing as wrt. Rd data and strobe are both driven by same clk (wrt clk) on DRAM chip, but are aligned with help of a DLL (Delay lock loop). This data strobe is used to latch rd data on the SOC chip. For this reason, DQS is bidirectional. DQS is edge aligned with read data, but centered in wrt data.
    • Data Mask DM[x:0] => These are data mask bits that can be used to mask data bytes that we don't want to write to DRAM chip. Since DM bits are similar to Wrt Data bits, they are also latched using Data strobe. Mask bits are usually on per byte basis. These Mask bits may also have other function, when wrt is not taking place.
  • Power: This refers to power supply and gnd signals.
    • VDDQ/VSSQ => These are the power supply for o/p ports of the DRAM chip. i/p ports will also need this power supply for the level shifter before the signal gets into the internal power domain. DRAM IO power follows the level of VDDQ input.
    • VDD/VSS => These are to power everything else except the IO ports. All internal circuitry, peripheral logic, DRAM memory, etc is running of this supply. These were subdivided into 2 parts after few gen to save power:
      • VDD1 => This is the main power that powers the capacitive memory array. The voltage here is usually the highest. Most critical voltage as memory array rd/wrt speed dependent on this voltage.
      • VDD2 => This is the peripheral power which powers all logic which are not on IO and not in capacitive memory array. This is kept lower than VDD1 to save on power. It can be lowered even further depending on Freq requirement of the DRAM.
    • VREF => This is a reference voltage provided as input.

 

DRAM Basic cmds:

Though there are many cmds in the latest gen of DRAM chips, these are few basic cmds that are needed for all DRAM chips. In chips, which have separate cmd pins, combo of CAS, RAS, We and CKE determine what cmd is going to be executed. In newer gen chips, this cmd is embedded within the cmd bus, as there are no separate cmd pins. Below are the 5 basic cmds for any rd or wrt to take place. We start with Activate phase, followed by precharge, then a rd/wrt takes place. Auto Refresh happens in parallel every so often to preserve DRAM values.

  • Activate (row access): Activate is essentially the row access command. Meaning, it opens up a row and moves the charge from the capacitors into the sense amplifiers. Accessing a row is always done before a column in DRAM. This command is paired with inputs to a bank address register (that selects the current bank) and a row address register (that selects the desired row). One important note on the activate command is that whichever row is currently open remains open until a precharge command is issued (more on precharge later). To use this command most DRAMs require CS and RAS to be pulled low, while CAS and WE are pulled high.
  • Precharge (row precharge): Precharge deactivates the row currently open in a bank. When issued a precharge command, the DRAM is told to restore the values read from the row of capacitors. This is done by the sense amplifiers and when completed prepares the bank for another row access. Precharge is performed by pulling CS, RAS, and WE low and leaving CAS high.
  • Read (col access): The read command can also be thought of as a column read command. When combined with a proper bank address and column address, the data recently moved into the sense amplifiers from an activate command (row access) is now pushed onto the data bus. DRAMs often include a “Read and Auto-Precharge” command that performs the column read and then closes/precharges the row. This way, a separate precharge command need not be issued. If the same row, but a different column, needed to be accessed then a precharge would not be issued at all and the row would be left open. To use the read command CS and CAS are pulled low, while RAS and WE are pulled high. 
  • Write (col access): A write command is virtually the same as a read, except for the direction of the data. During a write command, data is pulled off of the data bus and put into the selected bank, row, and column. Auto-precharging can be performed much like a read and closes the currently activated row when the write is done. To perform a write, CS, CAS, and WE are pulled low, while RAS is held high.
  • Auto Refresh: In DRAM, the refresh command is issued every so often. It's needed since DRAM bits loses charge over time (irrespective of whether it's accessed or not). All bits will need to be refreshed every so often. One important aspect of refreshing is that any active banks should be precharged before the command is issued. To perform a refresh CS, RAS, and CAS are pulled low with WE high. After refreshing, the DRAM keeps track of the last refreshed row and increments a refresh counter so that the next refresh command will operate on the next row. When a refresh command is issued, the current row in every bank is refreshed. Most DRAMs will perform 8192 refresh cycles every 64 ms. That's every 7.813 μs. This has remained constant despite growing device densities. 
  • Other Cmds: Other common DRAM commands include NOP (No Operation), Burst Terminate, and Load Mode Register. NOP is used to force the DRAM to do nothing. This is useful when the DRAM needs to wait, for instance if it is currently being refreshed. In reality, read and writes to DRAM are done in short bursts. Burst terminate will truncate the read or write command, i.e., stop it prior to finishing. DRAM can be placed into different modes. These modes are changed via the Load Mode Register command.

 

LPDDR5:

 We'll look at LPDDR5 Memory i/f from JEDEC spec: (JEDEC spec is only available to members, but I'll list imp stuff below):