CDC

CDC = Clock Domain Crossing

In a single clock design, all paths have same launch and Capture clock. Timing tools are able to time all such paths for setup and hold. When we have more than one clock, then what happens to paths in diff clks? Well, timing tools can still time all paths within a clock domain, i.e all paths launched and Captured by same clk are still timed. What happens to paths that are launched from one clk and captured by other clock? Well, such paths are still timed by timing tools, as all paths are considered synchronous, unless told otherwise. However the question to ask is => Should these paths even be timed? Since the 2 clks are async, the capture clk can come at any time wrt launch clk, so there's really no timing that we can meet here. It's not like a regular clk->clk path that needs to meet setup/hold timing. So, in timing tools, we define all such paths crossing 2 clks as false path (FP) (to be precise, we use SDC cmd "set_clock_groups -async -group <clk1> -group <clk2>" to set all paths b/w clk1 and clk2 to be async, meaning they are all FP).

However, do we need to do anything in design to make sure signals can transmit correctly from clk1 to clk2? Well, turns out that that signal transmitting from clk1 to clk2 will violate setup/hold of the capturing flop in clk2 domain some of the times randomly, depending on the exact arrival time of clk2 wrt clk1. We have no control over the relative arrival time of clk1 and clk2, as they might be coming from 2 diff PLLs. Three possible scenarios:

  1. signal from clk1 meets setup time of clk2 (i.e signal from clk1 comes way before clk2 capture) => New value of Signal will be captured correctly at end of current cycle of clk2.
  2. signal from clk1 violates setup/hold time of clk2: Here 2 scenarios are possible here:
    1. signal from clk1 comes in the setup/hold violation window of clk2 capture (i.e signal comes right close to the clk2 capture edge)=> Since Signal is in the violation window, we don't know what is going to get captured- Is it the old value or the new value? In this situation, actually the signal remains meta stable (i.e an intermediate value b/w 0 and 1) for a little while (may be for a few 10-100 ps), before it makes it's decision to remain at old value or move to new value. Either old value or new value of Signal will be captured correctly at end of current cycle of clk2.
    2. signal from clk1 comes after the setup/hold violation window of clk2 capture (i.e signal from clk1 comes way after clk2 capture)  => Since Signal comes after the violation window, it missed getting captured at the end of current cycle of clk2. So, at the end of current cycle of clk2, it will have old value of signal, and then in next cycle, it'll have the new value of signal.

Depending on which scenario we end up in, we can either have old value or new value at the end of current cycle of clk2. But in next cycle of clk2, we are guaranteed to have new value of signal. Thus there exists a 1 cycle uncertainty, where the value may be old value or new value. This is shown in diagram below (See Async clk section in middle)

 

There's an excellent paper on CDC by Clifford Cummings at SNUG 2008: CummingsSNUG2008Boston_CDC.pdf

Based on the above paper, I found this nice 3 part series on CDC explaining it in further detail: https://www.verilogpro.com/clock-domain-crossing-part-1/

 

Metastability, MTBF and Synchronizers:

Once we capture the signal in clk2 domain (with a flop), can we just allow it to flow freely to other flops in clk2 domain? If we look at case 2.1 above, where the signal comes in the setup/hold window of clk2 domain, the signal being captured has the possibility of going meta stable for a small time. If this signal is passed to other flops, then there are 2 problems:

  1. The first is a timing problem. From this 1st capture flop (we'll call it synchronizer flop) to other flops, we have only clk2 domain (no clk1 domain). That means all flops must meet setup/hold in clk2 domain. When STA tools analyze timing for paths, they assume no metastate for any signal, i.e they go with assumption that signal coming out of previous flop has met timing and is either 0 or 1 and start timing the path from launch flop to capture flop. They check for timing based on that assumption. In our example, metastable state has eaten into the timing of the path from the very first flop to the next flop, so when STA tool reports a particular path has met timing, there's no guarantee that it has indeed met timing. As an ex: if meta state lasted for 1/2 cycle of clk2, and the path delay of the path is 3/4 cycle of clk2 (from synchronizer flop to some other connecting flop), then STA tool will say that the path met timing with a slack of 1/4 clk, but in reality the path failed timing with slack of -1/4 clk.
  2. The second problem is that being metstable is a probabilistic phenomenon. We can never say that metastability will resolve 100% within a given time. It falls down exponentially with time, but there's still a non zero probability that flop output will be metastable even after infinite time. So, our other capture flops may still be prone to getting meta stable value, and hence capture metastable value. These will get resolved to 0 or 1 in each capturing flop. But if there are multiple capturing flop, not all of them will resolve to same value. Now, we have the possibility that same signal will get captured as different values in different flops, while in RTL we assumed them all to be the same value. This may result in incorrect functionality of the gate level circuit even though RTL may work fine. This is unacceptable failure.

Metastability is measured as failure rate or MTBF (Mean time b/w failures). Here's a good paper from TI on measuring MTBF for flops => https://www.ti.com/lit/an/scza004a/scza004a.pdf

How long the flop o/p remains metastable depends on a flop time constant, known as "tau".  Formula for probablility that metastable state is unresolved at time tr is:

f(r) = e^(-tr/tau) where function f(r) is the probability of nonresolution as a function of resolve time allowed, tr, and the metastability decay time constant, tau (value dependent on electrical characteristic of flop). As an ex, if tau=0.2ns, then at time=1ns, the probability that the metastable state is still present = e^-(1/0.2) = 0.007 or 0.7%.

Probability of generation of metastable event = Probability of incoming signal being in setup/hold window of capturing flop = t0/Td where t0=setup/hold window of capturing flop, and Td= time period of data transition (dependent on launch freq of clk1 and how many times data transitions per clk period of clk1).

Failure within time tr = f(r) * probability of generation of metastable event = f(r)*t0/Td

From Probability eqns:

  • If failure rate = λ, then cummulative failures since time=0 is λ*t (this is from derivation of probability desnsity function as shown in MTBF doc). So, failure rate (λ) = cummulative failures / t

Using above eqn, we come with Failure rate (λ) = Failure within time tr / tr = f(r)*t0/Td *(1/tr)

MTBF = 1/(λ) = e^(tr/tau)/(t0*Fd*Fc) where Fc=clk freq of capturing clk assuming we are capturing this metastable value in a flop running at Fc.

Thus we can get MTBF for a 1 flop synchronizer, which gives us the probability of failure if we try to use the output of first synchronizer flop into any other flop in clk2 domain. However, if we add 1 more flop to capture the output of 1st sync flop, then net MTBF = MTBF1 * e^(tr2/tau) = e^(tr1/tau)/(t0*Fd*Fc) * e^(tr2/tau)

Thus MTBF for 2 flop synchronizer goes up exponentially by just adding 2nd stage. If we add n stages, and assume tr is same for all, then MTBF = e^(N*tr/tau)/(t0*Fd*Fc) 

So, if MTBF for 1 flop sync was A*e^6 sec, then MTBF for 2 flop sync is A*e^12 sec (i.e MTBF goes from 400 sec to 200K seconds).

So we almost always have a 2 stage synchronizer to transfer any signal from one clk to another clk. We may still have metastable signal at output of 2nd stage sync, but probability of that happening is so low (once in 10000 years or so), that we consider it as 0 probability event. There's still timing path from 1st flop to 2nd flop that may not meet timing (due to metastability). But we already know that and assume that probability of that timing not resolved within 1 period is very low. To further help us, we don't put any logic delay in this path, so that the whole cycle time is avilabe for the meta value to settle.

Transferring single signal from one clk to other clk:

We saw above that by having a 2 stage synchronizer, we can capture the signal and fan it out to all the logic in clk2 domain without any timing issues (i.e STA will time paths correctly). Is that all we need or is there something else that's needed? Let's explore.

When we transfer signals from 1 clk to other clk, our main objective is to preserve the shape of the signal waveform. i.e if signal was going from 0->1->1->0->0->1 in clk1 domain, then this exact waveform should be seen in clk2 domain. However, as saw above this is not always possible. There will be always be 1 cycle uncertainty in clk2 domain. Luckily this is not a problem as long as ALL the signal transitions are captured in clk2 domain. In other words, the signal intent should be maintained in clk2 domain - i.e if signal was supposed to be 0 in 1st clk, then 1 in the 2nd clk, then 1 in the 3rd clk (all in clk1 domain), then this exact sequence of 0->1->1 should be captured in clk2. NOTE: clk2 freq may be very diff than clk1 freq, so signal may need to be elongated in clk1 domain to make sure they are captured in clk2 domain. Let's look at 3 freq scenarios separately:

  1. Launch clk freq = Capture clk freq => Here depending on relative arrival time of capture clk, the signal may either get captured correctly every cycle of clk2, or may get missed every cycle, or may be a mix of the two.
  2. Launch clk freq > Capture clk freq => Here since capture clk is slower, few of the transition of signal in clk1 domain may get missed in clk2 domain.
  3. Launch clk freq < Capture clk freq => Here since capture clk is faster, all transitions of signal in clk1 domain will get captured correctly in clk2 domain.

These 3 scenarios are also shown at the bottom of the page in the link above. As we can see that signal in clk1 domain should be at least as long as 1 cycle of clk2, otherwise it may not get captured in clk2. However, even 1 cycle of clk2 is not enough, since there's some setup/hold time requirement. So, we need to add a "delta" margin. The requirement for signals transferred from clk1 to clk2 boils down to this:

Src signal length > (1 cycle of dest clk + ε)  where ε= delta margin. Usually we take it 1/2 a cycle of clk2 as "delta margin", so that it's easy for tools to verify it (as all sims/checkers etc are based on clk edges).

If we don't want to introduce half cycle paths, we can just have the signals crossing clk domains to be at least as wide as 2 cycles of destination clk. This works well in practice. We can achieve it in 2 ways:

  1. We know the freq of clk2 in advance from spec. We code the RTL so that the signal crossing into dest clk is held constant for that many cycles of clk1 as needed (i.e signal held stable for cycles=round(2*Tdest/Tsrc))
  2. However if the dest clk or src clk freq are not unique values, but may change based on settings, then above approach may not work. Or we may have to design cumbersome logic to take care of all the cases. In such cases, a feedback loop is employed where signal in clk1 domain is not allowed to change until clk2 domain acks that it received the signal. This is a very robust solution, as it guarantees that signal will be captured corretly no matter what. This is what is usually employed.

CDC rules for transferring single signal from one clk to other clk:

  1. The signal needs to go to a single synchronizer with at least 2 flops in the synchronizer. 1 flop synchronizer may also work for slow clk speed on capture side. There should be no logic in between the 2 flops of synchronizer so as to give full clk cycle to allow meta stable signal to settle down. The same signal can't go to more than 1 synchronizer in capture clk domain, as then the values in the synchronizer may be off by a clk cycle(due to 1 cycle uncertainty). This may cause same signal to have 2 different values at a given time, which is not what is RTL intended and will cause gate sim failures.
  2. The signal needs to be at least 1.5 clk cycle wide, so as to allow the signal to be captured on the receiving side. Most of the time feedback logic on receiving side sends an ACK signal back to the source indicating that the signal has been captured.
  3. There shouldn't be any combo logic on the data path crossing clk domain. This is because combo logic may cause short glitches, which may get captured by synchronizer of receiving clk domain, which may pass this incorrect pulse to all other logic on receiving side.

Transferring multiple signals from one clk to other clk:

Above we looked at transferring one signal from clk1 to clk2. An example of this would be an "interrupt" signals. As long as each of these interrupt signals got captured correctly, our function in clk2 will work fine. But let's assume that these multiple signals have relation between them, where we need to maintain that relation in clk2 domain. An ex of this would be a 2 bit binary counter counting from 0 to 3. The 2 bits cnt[1] and cnt[0] go from 00->01->10->11. If we have 2 stage synchronizers for each of them, then they may get time shifted by 1 cycle wrt each other depending on which got new value in next cycle while which got old value. This is shown in diagram below.

FIXME

The problem happened because here we needed to transfer a group of signals as 1 entity which needs to maintain the waveform relation amongst each other at all times. Separate synchronizers isn't going to solve this issue. So, we need to come up with other solutions. There are multiple solution as listed below:

  1. Combine signals: Combine multiple signals into one signal, and then transfer it. As an ex, if we have a 2 bit counter that we want to xfer from one clk to other clk, we can have counters in both clk domains, and then just use a signal called cntr_inc or something that goes high whenever the counter increments in clk1 domain. We can use this signal to increment counter in clk2 domain, and that way both counters will be in sync, without transferring the counter values from
  2. Gray Coding: Make only one of the signals toggle at a time, if it's possible to do so. Gray counters are an ex.
  3. MCP Formulation (aka load ctrl structure): Make a MCP (multi cycle path) formulation, as known as load-ctrl structure. Here we make a separate signal that acts as a ctrl signal for the set of signals that we want to pass across to other clk domain.
  4. Stabilize Using XOR: Wait for all signals to be stable before capturing them. In this technique we capture each signal independently thru independent synchronizers, but then we don't pass it on to subsequent logic. We generate an enable signal which is XOR of each synchronizer value (after the synchronizer, we put an extra flop to capture the toggle), and then we OR all these values. An OR value of 0 implies that there is no change occuring in any synchronizer, implying all of them are stable. So, in next cycle, we capture all of this data bus into a subsequent set of flops, and from here on the data values are passed to subsequent logic. This is very expensive technique. We needed 2 extra flops after the synchronizer for each data bit, on top of bunch of XOR/OR gates. Not used in practice.
  5. FIFO: Make a FIFO which will allow signals to be passed continuously without waiting for each of them to be consumed. We can think of a "MCP formulation" as a "1 deep" FIFO. If we put an extra flop, we can have a very simple "2 deep" FIFO. To make a "n deep" FIFO, we need complicated scheme. Here we need multiple ctrl signals (which are addr of FIFO which indicate next entry to be read or written). They all need to be synced. FIXME. Put a diagram.

 

CDC tools:

Spyglass was the most popular tool for CDC analysis.

RDC tools: