Chip design - clock tree

Chip design - clock tree

Clock is the most important signal in a chip. It runs all the synchronous elements in design, and is running all the time. It consumes about 30% of chip power, so it's important to design it efficiently.

In an ideal world, we want clk to all the sequential elements to reach at the same time. That's called an ideal clock. However, in reality, clock needs to go thru multiple drivers before reaching the seq flops, and we need multiple drivers to drive 1000's of flops on chip. The goal of any clock design is to meet setup and hold timing for all paths, with minimal use of logic gates. There are 2 strategies for designing clock:

1. clock tree: In this kind of design, clk is built as a tree. So, we start with clk originating point in the center of chip, and then make branches and sub branches (as in a tree). The delay from clk starting point to final clk endpoint which drives the flop, has to be the same for all diff flops on chip, in order to get as close to an ideal clk. The difference in timing b/w different end points of clk is called skew. Ideally we want 0 skew. By minimizing skew, we can make clks as ideal as possible, and that requires fewer gates to fix setup/hold violations in paths. This design is the one that is used in most of the chips and SOCs being built today, as these consume low power.

2. clock mesh: In this kind of design, clk is still built as tree, but the leaf driver of each branch is then connected to other leaf drivers thru a mesh of wires. This mesh is similar to power grid where you have X and Y wires running in different layers. Here there is additional clock mesh connecting all final driver output. The advantage of this scheme is that now the skew is minimized even further, since 2 leaf drivers o/p are connected, so they will rise/fall at same time. The only little skew comes from this final clk point which is input port to the module to the final clk load being driven inside the module. However, this huge skew minimizations comes at large power cost, since now we have a large grid that is being driven by clk drivers all the time. For very high speed designs (i.e couple of GHz clk speed), clock mesh is preferred. Clock mesh is still used in multi GHz processor design at Intel/AMD.

Clk tree design:

1. Cells used for clk tree are specifically designed to be used for clk tree, i.e you can't use regular inverters and buffers used in design for clk tree as they may not have same rise/fall time, or they may not be designed for continuous switching, etc. These clk cells are usually called CK* (i.e CLKBUF, CKINV, etc). These clk cells have lots of vias in their layout to have better current flow, and also to mitigate IR drop and to have low Electromigration effect.

2. Clk tree cells should be of same VT type, i.e if using LVT cells, use LVT for all cells on clk tree. Different VT types don't track each other well (i.e their coorelation may not be good), so clk tree built using diff VT types may not show up real silicon variations in simulations, resulting in broken chip. Usually LVT cells used on clk tree for good clk slew (see bullet 7 below).

3. For very high freq clks, we use even more special clk cells which have inbuilt DeCap (Decoupling Capacitor) on supply lines to reduce Voltage variations on these cells. These cells are usually named DCCK*, and are used for clks > 1GHz freq.

4. Mix of buffers and inverters is not recommended on clk tree. This is again due to the fact that buffers and inverters may not track each other at different PVT corners. Usually only inverters used.

5. All clk routes should be shielded so that they don't get any cross talk or noise from neighboring routes. Shielding is provided by running VDD or VSS lines parallel to these clk routed lines. Also clk routes are made thicker (double wide), so that their resistance and process variation is low. Places where it's not possible to have VDD/VSS lines, metal fills should be used, so that we have uniform clk skew for all clk routes. Also lots of vias should be used on clk routes to mitigate IR drop and have low Electro Migration (EM) effect.

6. Clk routes should be placed under chip bumps. This is because a lot of current flows thru the chip bumps, and it can cause noise on these clk routes, resulting in un accounted delay or crosstalk. Cross talk on clk nets shouldn't exceed a certain threshold, usually 24%.

7. Clk slew (rise or fall transition time) should be low, so that we have crisp edges for clk waveforms. Usually we try to keep slew to be less than 1/10 of cycle time. this requires having large drivers with low fanout on the output of drivers. We still try to get to as large of a fanout as possible without degrading the slew rate too much. In the past for tech > 90nm, HVT cells were used exclusively as clk cells. They provided good enough slew rate and were less leaky. As tech progressed towards < 3nm, these HVT cells started getting replaced by LVT and eventually by ULVT cells. Reason is that power supply voltage are dropping to < 0.5V for low nm tech, and HVT cells don't have very crisp slew rate due to very low voltage headroom (Vgs - Vth). Here, LVT cells are more commonly used on clk tree path. Even though they may be more leaky than High VT cells, the provide crisp edges at these low voltages. If we use High VT. then we may have to provide more buffers to maintain crisp edge edge. This may eventually cause High VT to consume more dynamic power than Low VT cells, even though they leak less. In clock tree path, we are more concerned about dynamic power than leakage power, since cells on clock tree are switching most of the times. So, LVT or ULVT cells are OK to use to clk cells. 

Clock uncertainty:

Clock period is usually defined as a fixed number, let's say period=100MHz or period=10ns. However, if you look at 1000's of such successive clk waveforms, not each clk waveform period will be exactly 10ns. Some will be 9.9ns, while some may be 10.1ns and so on. You can view clk period as a gaussian curve whose mean is at 10ns, but sigma is around 0.1ns or so. The middle edge of the clk (i.e half period of the clk) also has uncertainty, just like the other edges. This variation from clock to clock edges across millions of clk cycles is called clk uncertainty. It can be from rise to rise edge (for full cycle) or rise to fall edge (for half cycle) or vice-versa. Uncertainty in middle edge of clk hurts more, since those are mostly half cycle paths (which have half the time to meet timing) and accounting for uncertainty leaves us with even less time to meet timing for that path. We have to account for this uncertainty in our timing runs, as worst case setup timing must now be smallest clk waveform that can be formed with these uncertainties. There is no way that any timing tool has any knowledge of such clk deviation as it assumes ideal clk coming in that we specify the freq for.

There are several sources of clk uncertainty:

1. jitter: jitter is the moving of clk edge due to clk generator ckt, noise, power supply variations, temperature variations, interference from nearby ckt, etc. It's the main component of clk uncertainty, and most of the times, people refer to jitter and clk uncertainty as the same thing. However, jitter is not the only component of clk uncertainty, though it's the major one. jitter is usually found from sims and may be expressed as Δt per mv of power supply variation.

There's a short article on various clk jitter terms here: https://vlsi.pro/clock-jitter/

RMS jitter: average value of of clock period deviation over the selected number of cycles (usually a very large number of cycles are considered)

Peak to peak jitter: difference between maximum deviation & minimum deviation within the selected number of cycles (usually a very large number of cycles are considered)

Above jitter values are calculated over all clock cycles, i.e the 1st clock cycle or the 1000th clock cycle. This is known as the clock period jitter and is the most common way of reporting jitter. Sometimes we are also interested in the cycle to cycle jitter, which is the deviation in cycle of of two adjacent clock cycles over a random number of clock cycles (usually a very large number of cycles are considered). This jitter value can be calculated for both RMS jitter or Peak to peak jitter. This value is useful in DDR and other very fast GHz clks.

Total jitter includes two components:

  • Bounded (or deterministic) jitter (DJ) => Deterministic jitter is a predictable and repeatable behavior which can be expressed as a peak-to-peak value. Deterministic jitter can be of 2 types:
    • data dependent jitter => It's correlated to the data stream. There are 2 types of data dependent jitter:
      • duty-cycle dependent jitter (also known as duty-cycle distortion or DCD) =>
      • Inter Symbol Interference (ISI) =>
    • bounded uncorrelated jitter (BUJ) => It's uncorrelated to the data stream. It's least understood in jitter family. It is considered “uncorrelated” since it’s statistically not possible to correlate it with other parts of the system. BUJ is caused by system phenomena, though. An example of BUJ would be noise inside a chip caused by sources external to the chip, be it power-rail ripple or RF interference. A number of tools are available to help model crosstalk and identify aggressor signals, but causes of BUJ are generally outside of the designer’s control.
    • Periodic jitter (PJ) =>
  • Unbounded (or random) jitter (RJ) => Random jitter is the unbounded jitter part of Total jitter, usually expressed as Rms value. It's not expressed as peak to peak value, as peak to peak is unbounded (can be infinite). It is mainly caused by unpredictible thermal noise, flicker noise or shot noise. These are all studied in device physics. Random jitter typically follows a normal distribution (gaussian curve). Since Total jitter plot of any clock shows a gaussian distribution, that implies that RJ is the main component of total jitter.

NOTE: Jitter is due to variations occurring due to different time of the edges, i.e starting edge of clk may see a different voltage, temperature, etc than the next edge of clk. So, this will cause jitter and will need to be accounted for in timing runs, since now the full cycle cycle may not be available for the paths due to jitter component. So, all full cycle setup timing runs need to consider this jitter in the clk path since the launch and capture are 2 different edges. However, full cycle hold timing runs don't consider this jitter, since both the launch and capture clk are on same edge and occur at almost same time. However, if one of the launch or capture clk is too much skewed wrt the other capture or launch path so that they are shifted in time too much, then we'll need to consider jitter in hold path too. In timing tools, we can specify clk certainty for both setup and hold (usually jitter component is removed from hold clk uncertainty). However, jitter is considered for both setup and hold for half cycle paths, since they are time shifted.

When we calculate jitter for a clock path, we separate it out in 2 components:

  1. Source jitter: This is the jitter coming from PLL or clk source. For a 24MHz oscillator, it may be something like 3ps/mV. PLL may add it's own jitter to this clk. So, for a PLL clk with freq of 1GHz, jitter may be 25ps (which is 2.5% of 1ns cycle time).
  2. Network jitter: This jitter is extra jitter added in clock network, where the clk is getting routed. This jitter is caused due to clk network supply voltage and temperature fluctuations. 5% power supply noise very common on chips with < 1V power supply, and having varying IR drops. This can easily cause jitter of 50ps or so.

Total jitter = √ (jitterpll2 + jitterstage12 + jitterstage22 + ... jitterstageN2) where N is the number of stages in clk tree. jitter per stage is usually provided in a tabular form from running sims, and may be 5ps/stage or something like that. So, more the number of stages in clk tree, more the jitter.

2. DCD (duty cycle distortion): This is applicable for half cycle paths only, so DCD doesn't contribute anything for full cycle paths. As mentioned above, DCD is actually "data dependednt deterministic jitter". This also has 2 components:

  1. Source DCD: This is the DCD coming from PLL or clk source
  2. Network DCD: This DCD is extra DCD added in clock network, due to clk network supply voltage and temperature fluctuations. Network structural (mean) and variation (sigma) DCD are now handled natively in timing tools as PrimeTime.

3. Convergence uncertainties: These are NOT real uncertainty caused by devices. Here we model some other effect of clock as uncertainty, when we do not know the impact of such effect. As an ex, consider Synthesis stage of design, where we are optimizing circuit. In synthesis stage, clock tree is assumed to be ideal (as clock tree is not built during synthesis, but during PnR stage). However, we do know that there we will be some skew b/w different end points of clk. To model this skew, we set clk uncertainty equal to an educated guess for skew. That way, the logic gets optimized with that clk uncertainty, so that later in PnR stage, we do not get too many timing violations due to skew. So, here we modeled skew as "clk uncertainty", even though it's not a component of clk uncertainty. In PnR stage, the real skew gets added for each and every path, so there is no need to add this skew as clk uncertainty in that stage. Similarly clk ocv (on chip variation) and clk xtalk (cross talk) are also modeled as clk uncertainty during synthesis stage, so that sythesis stage sees clk uncertainty as close as possible to what will be seen after CTS build in PnR stage.

NOTE: clk uncertainty may be different for different clocks, depending on their PLL source, network and other effects unique to each clk.

Derate: There may still be unaccounted jitter and DCD in clk path. To model for this, we add a derating factor, which basically worsens jitter and DCD by a given percentage. This provides us extra margin in simulations, incase silicon shows extra jitter and DCD than what sims showed.

set_clock_uncertainty: All of the clock uncertainty above are summed up and provided as a single number to the timing tool via "set_clock_uncertainty" cmd. More details on this cmd are in "PT- clock command" section.

 

Clock Tree construction:

Clock tree construction is done during the backend PnR stage. It's not done during synthesis, as we assume ideal clk during synthesis. It's done during PnR, as clk tree construction needs placement information of all cells, to come up with optimal clk tree. Clk tree construction is also called CTS (Clk Tree Synthesis). Imp parameters for any clk tree are jitter, DCD, slew and skew.

We'll look at CTS in more detail under PnR section.