Uncategorised

Details: Published: Friday, 05 January 2018 18:47; Hits: 506

Digital refers to high (power supply) and low (ground) signals. high is indicated by 1 and low by 0. Any circuit that is designed to operate on these high/low signals as inputs and generate high/low signals as outputs is referred as digital logic.

Using transistors, it's possible to design digital circuits which will compute AND, OR, XOR and many other complicated digital functions. Before getting into logic details, let's look at some basic digital concepts.

boolean algebra: A and B are digital (0 or 1)

AND = A.B

OR = A + B

NOT = ? = ~A

4 important logic reduction equations:

1. A + A.B = A => A.(1+B)=A

2. A + ?.B = A+B => A+A.B+?.B=A+(A+?).B=A+B

3. (A+B).(A+C) = A+B.C => A.A+A.C+A.B+B.C = A.(1+C+B)+B.C = A+B.C

4. (A+B).(?+C) = A.B + ?.C => A.?+B.?+A.C+B.C=

De Morgan's law: (can be extended to any number of variables, not limited to 2 variables)

1. ~(A+B+C+...X) = ~(A) . ~(B). ~(C). ... ~(X)

2. ~(A.B.C....X) = ~(A) + ~(B) + ~(C) + ... + ~(X)

Any logical eqn can be written as POS (product of Sum) or SOP (sum of product). SOP is referred as minterm while POS as maxterm.

SOP (AND-OR): F = x1.x2 + x3.x4

POS (OR-AND): F = (x1+x2).(x3+x4)

Shannon's expansion theorem: useful to compare logical equations

F(x1,x2,...,xn) = x1?.F(0,x2,...,xn) + x1.F(1,x2,....xn) = (x1?+F(1,x2,...,xn)).(x1+F(0,x2,...,xn))

Shannon's thm gives rise to canonical form: where there exists only form for each eqn. This is useful for comparing various eqn.

Minterm canonical representation: (SOP form)

F(x1,x2,...,xn) = F(0,0,...0).x1?.x2?...xn? + (It has 2^n minterms)

F(0,0,...1).x1?.x2?...xn +

........ +

F(1,1,...1).x1.x2...xn

Maxterm canonical representation: (POS form)

F(x1,x2,...,xn) = [F(0,0,...0) + x1 + x2 + ... + xn]. (It has 2^n maxterms)

[F(0,0,...1) + x1? + x2 + ... + xn].

........

[F(1,1,...1) + x1? + x2? + ... + xn?]

Digital basic blocks:

NAND gate: Y = ?(A.B)

NOR gate: Y = ?(A+B)

NOT gate: Y = ?A

NAND, NOR and NOT are called fundamental gates, as any logic function can be built using these 3 kinds of gates.

XOR gate: Y = A ? B

Adder: Adders are one of the fundamental logic blocks that are found in digital library along with other logic gates. Reason is because adders are used widely, so they are optimized and put in library

Half Adder (HA): adds 2 input bits and produces 2 outputs, Sum and Carry

S=A ? B

C=A.B

Full Adder (FA): adds 3 input bits and produces 2 outputs, Sum and Carry

S=A ? B ? Cin

C=A.B + Cin.(A+B)

optimizing logic functions:

There are combinatorial and sequential logic in any design. When we talk about optimizing design, it's means reducing the overall cost of design. Cost may be defined as area, speed, power, etc.

For any given logic function, firstly Truth tables are determined which describe what the output of a logic function is given it's input. These truth tables can be converted into equations with AND,OR etc implementing the functionality (i.e Y = A.B.C.D + ?.(C+E) + C.F). We can use logic reduction techniques (Karnaugh maps) to implement function using minimal logic. However K-maps are manual tools which are not very efficient for large logic equations. There are automated tools called synthesis tools, which produce such reduced logic. These tools determine prime implicant (PI) of the logic function, and then using Quine's Prime Implicant Therom, they select a subset of Prime Implicant that give minimal cost. If number of PI is very high, then heuristics are used to reduce run time. This may not give lowest cost, but are pretty close to being lowest cost solution for that function. Synthesis tool "Espresso" uses PI technique to optimize logic.

optimization of logic can be done by reducing eqn to 2 level logic either in SOP form (1st layer of AND gate followed by 2nd layer of OR gate) or in POS form (1st layer of OR gate followed by 2nd layer of AND gate). Although 2 level logic minimization is easy, but it may not give optimal cost, so multilevel logic minimization also done by tools.

ex: F = s.t.v+s.t.w.x+s.t.y.z+u.v+w.x+u.y.z => This is 2 level logic minimization. Can't be reduced any further in 2 levels. It uses 6 AND gates and 1 OR gate. However if factoring done, we get F = (s.t+u).(v+w.y+y.z) => This is 3 level logic minimization (since it has 3 levels = AND at 1st level, OR at 2nd level and AND at 3rd level). It uses 4 AND gates and 2 OR gate. So, it uses 1 less gate, and fanin of gates is also lower resulting in much smaller area, delay and power. However, we would not have arrived at this optimal solution if we confined ourselves to 2 level logic. This multilevel optimization is done via factoring (where prime divisors are found and boolean division done)

After optimizing combinatorial logic, we need to optimize sequential logic (flops, latches) too. However, there isn't much to optimize there as the number of flops is fixed by design requirement. The only place where flops can be optimized in in finite state machine (FSM). Synthesis tools optimize this by choosing lowest number of flops and gates to implement FSM.

Details: Published: Thursday, 28 December 2017 20:20; Hits: 1326

Verilog

Verilog is the most popular HDL, and unlike software programming languages which keep on cropping up dozen a day, HDL can't change that easily as it has to be supported by all the CAD tools. So, You should learn Verilog if you are ever going to work in VLSI/Hardware. It's a very simple language, and easy to follow (very much C style language). We have separate section for SystemVerilog which is just an extension of Verilog geared towards Testing, and NOT design.

Syntax:

Verilog is case sensitive.(different than vhdl which is case insensitive). All verilog keywords are lowercase. wire is verilog keyword. Wire and WIRE are totally different names (not keyword since their case is different than verilog keyword "wire" which is all small case).
Verilog file is composed of stmt, where each stmt starts with a keyword.
Verilog file syntax is: comment | module ... endmodule | primitive ... endprimitive | compiler_directive
- comment: // => for 1 line, /* ... */ => for multiline (same as C)
- module: module ... endmodule
- primitive: primitive ... endprimitive
- Compiler directives: can be put anywhere in verilog file and are handled by preprocessor part of compiler. All compiler directives precede by ` (back quote). Their scope is until the point where next compiler directive changes previous directive. Below are few common Compiler directives:
  - `include "file1.v"
  - `define WORDSIZE 64
  - `timescale 1ns/1ps => 1 ns is used for reporting but 1ps is used internally for resolution.
  - `ifdef `else `endif
  - `default_nettype trireg => default is wire
  - `resetall => resets allcompiler directives back to original default values

So, in short, everything in verilog file has to be within "module ... endmodule" keywords (with exception of comments and compiler directives). There is one more exception "primitive ... endprimitive" keyword, which is similar to "module ... endmodule", but it describes logic behaviour in terms of truth table, instead of in terms of logic operations. This is useful in modeling certain logic. Since primitive is only used for modeling latches/flops, etc, and not used in real design, we will not worry about it.

Keywords:

Every language has reserved keywords that are used by compiler to make sense of the program. Look at pg 17 of "verilog book" by vivek sagdeo.

gates: (total 26 primitives provided from which larger structural models may be built). Mostly used in models of gate stdcells. UDP are used to model more complex seqential elements. These gates may take i/p as 0,1,x,z and give o/p as 0,1,x. Usually effect of z and x on i/p are the same, as they give same o/p.
- A. logic gates: and/nand, or/nor, xor/xnor. when instantiating, 1st parameter is o/p, rest all are i/p. delays may be specified for propagation time, while strength may be specified on outputs. o/p can't be z (only 0,1,x).
  NOTE: there's no mux defined, so we define mux using udp (primitive). muludp (Q,S,A,B) used in model of MUX2, etc.
  ex: nand #1 g_mynand (myout, myin1, ~myin2); => typically used in gate models of NA210, etc.
- B. buf/inv gates: buf/bufif0/bufif1, not/notif0/notif1: buf/not have 1 o/p and 1 data i/p while *if* have 1 extra ctl i/p to model 3 state drivers. o/p is z when ctl line is inactive.
- C. mos gates: nmos/pmos, rnmos/rpmos: these model transmission gates. "r" versions model transistors with very high resistivity when conducting. These have 1 o/p(src/drn), 1 data i/p(drn/src) and 1 ctl i/p(gate). o/p is z when ctl line is inactive.
- D. Bidir gates: tran/tranif1/tranif0, rtran/rtranif1/rtranif0: these model true bidir transmission gates. tran/rtran have 2 inout terminals, while *tranif* have extra ctl i/p.
- E. pullup/pulldown gates: pull0/pull1/pullup/pulldown: these are single o/p, which drive pull strength value onto o/p net. strength may be specified
strength: weak0/weak1, strong0/strong1, supply0/supply1, highz0/highz1
data types: wire/wand/wor/tri/tri0/tri1/trior/triand/trireg, reg, time, integer, real, input, output, inout, event, parameter. We define variables to be one of these data types. We'll look thru all data types later. Variable names can be any set of characters, but for special characters, we have to use "escape character \". Also, var can't start with numeric digits, so \ should be used there too, where var name starts with digits 0-9. escaped identifier terminate with white space.
- ex: wire \reset* ; => this creates a var named "reset*". IF there was no whitespace (wire \reset*;) then var would be "reset*;", which would be syntax error since ";" is missing. other ex: wire \123name1 ; => name starts with digit, so "\" used.
- ex: Let's say netlist is flat, and has net names of form "a/b.c". So, we'll need to use special char "\a/b.c". But then a space is needed after that, otherwise tool won't now where that name ends. Note the space below
  veridian_tb.u_dig_top.\u_efuse_wrap/u_efuse_top.u_rom0 .mem[8][31] = 1'b0; //NOTE: "u_efuse_wrap/u_efuse_top.u_rom0" is name of 1 instance in verilog netlist. If name itself started with \, then we need \\. i.e: \efuse/i.b => \\efuse/i.b (NOTE: always a space at end)
- sometimes, we need everything to be treated as one name, and space messes that up. In that case, we can use double quotes.
  ncsim> tcheck -off "veridian_tb.u_dig_top.\u_efuse_wrap/u_efuse_top.u_rom0 .mem[8].u_sync_flop"
Stmt: case/casex/casez/default/endcase, if/else, while, for/forever, repeat, (In SV, we can use foreach, see in sv.txt file)
Blocks: begin/end, module/endmodule, specify/endspecify, primitive/endprimitive, function/endfunction, task/endtask, table/endtable,
Others: always, assign/deassign, initial, fork/join, edge/posedge/negedge, force/release, defparam/specparam, wait

Data Types:

Any language has data types that var are assigned to. Verilog has 2 diff kind of data types: structural and Behavioral.

Structural data type: This data type is used to decribe the structure of design. 2 primary structural data types are reg and wire. These refer to physical nets in deign. In real hardware, any data type can have only 2 values - 0 or 1 (since these are digital logic that can be either at VDD or GND). However, in Verilog, we allow every reg/wire to take 4 values - 0, 1, X (unknown) or Z (floating). X and Z are extra values that help us in simulating the design and point to errors. Reg and wire look similar, however there are subtle diff b/w them. Initial value of reg is "x", while that of wire is "z". Declaration for wire/reg is done inside a module, but outside of initial/always blocks.
Behavioural data types:

Assignment:

Let's say we have var defined (i.e var "a") and assigned it to correct data type (i.e type wire). Now we need to assign values to those var (i.e a=1). This needs to mimic how values on nets/wires etc are assigned in real hardware. For this, Verilog has 2 kinds of assignment statements. Each assignment needs to be either one of the 2 kinds:

Procedural assign stmt: used within procedural blocks (always, initial). initial is used for tb only. Here we can directly assign values to reg or wire, ex. a=2'b00; or digtop_tb.b=0 or a<=b&c;
- NOTE: any variables being assigned here have to be "reg" and not "wire". This is just a verilog syntax thing. Usually we get an error "A net is not a legal value in this context" when we directly try to assign some port of dut in the tc file. We have to declare it as a reg in digtop_tb file. However "force a = 0" is allowed inside "initial block" even if a is a wire, as force doesn't care.
Continuous assign stmt: used with assign stmt. ex: assign a=0; If we write a=0; then it has to be within always or initial block (i.e make it a procedural assign stmt).

So, within a module, every stmt has to be within procedural block (always, initial) or has to be a continuous assign stmt. Gate primitives(i.e and, or, etc), module instantiations, udp instantiation and specify_blocks can also be used.

Verilog Process:

One important place where HDL differ from Regular software languages is "Process". Process is a special block to model logic running concurrently in hardware. Only 2 ways to declare a verilog process: always & initial.

A. always:

always stmt keeps on running for ever. It's an infinite loop similar to while (1) {...}. It starts executing stmt that are within the always until it reaches the end, then it starts again, and keeps on repeating. It stops only on sensitivity list or on encountering # delay. We can have always block with sensitivity list or without sensitivity list.

1. w/o sensitivity list: used to implement clk osc. ex: always begin ... end.
ex: below code implements a clk with frq=2. It starts with clk=x at time=0, then goes to clk=0 at time 1, then clk=1 at time 2, then comes to the beginning of loop and again gets clk=0 at time=3 and so on.
reg clk = 0; //This stmt added so that at time=0, clk=0 and not x. It starts with x and then switches to 0 at time 0. We don't see an x in sims at time 0, as both events happen at time0, but there's a -ve edge at time 0 which is seen by simulator.
always begin
#1 clk = 1'b0; //delay of #1 causes the simulator to move this event to queue to be evaluated 1ns later, and go to other always block
#1 clk = 1'b1; //same as above. after this, it goes to beginning of this loop, and again encounters #1 causing it to wait.
end

ex: always a = 1'b0; => this will cause simulator to hang, as this loop will run forever. Once control comes to this loop, simulator can never exit this always stmt, as there's no delay or sensitivity list to exit this.
ex: always a = b; => this will cause simulator to hang as explained above. To prevent this, we need always @* a = b; This causes loop to get executed only when b changes.
ex: always a = #10 b; => this will run, but it runs and "a" gets updated at 0ns, then at 10ns, 20ns and so on with whatever "b" was 10ns earlier. It doesn't have anything to do with "b" changing, as it doesn't have @*.

2. with sensitivity list: Wait for event operator "@" can be used to provide sensitivity list. @ is a edge sensitive event control ("wait" provides level sensitive event control). If the sensitivity list contains all the inputs of the always block, then the code in always block will execute whenever it's i/p changes. This models a combinatorial logic.
V1995: always @a begin stmt end; => wait for +ve or -ve edge of a, and when that event happenes, execute all the stmt with "begin end" block. Then return back to start of the block at always and keep waiting until "a" changes. The block won't look at "a" while it's within the "begin end" block. This is OK for comb logic, as it will look at it when it comes at start of always block, as long as there are no delays inside the block.
V1995: always @(a or b or c) begin .. end => whenevr any of a,b,c,d changes.
V2001: always @* => implies that whenever any i/p changes, execute the block
V2001: always @(a,b,c) => comma allowed instead of or.

NOTE: For the clk generation example shown above, if we use always @*, then clk will always remain at x, since there is no input inside the block (it's tied off values 0 or 1 which don't change).

B. Initial:

initial-block is a special-case of the always-block, one which terminates after it completes for the first time. however keyword "forever" can be used to keep on running it for ever.

always is used to model both seq (NBA <=) and comb (BA =). Comb can also be specified using cont assign stmt (assign). assign is preferred for comb. always is used for comb when using case, if-else, loops (for,while,forever) or when we want to have large comb logic inside a block as an entity. Any var within always should be a reg, while that in assign should be wire. by defasult a var is wire, so no need to define it explicitly. NBA should be used wherever there are storage elements involved.

always stmt models flop or latch:
1. flop: always @(posedge clk or negedge reset).
2. latch: always @* begin if(en) Q = D end // whenever an if-else/case stmt is incompletely specified, latch is built. We could have also coded it as "always @(en,D) begin .. end".
3. comb logic:
A: assign x=b&c; //assign stmt builts comb
B: always @* if (b) x=c else x=a; //whenever if-else/case stmt is completely specified , comb is built. In the sensitivity list, no edges can be specified for a comb logic. If any of the i/p are omitted from sensitivity list (i.e when @* is not used), then it doesn't model a comb logic.

--------------
NBA vs BA: all BA assgn can be replaced with NBA, and all sims will work fine, but will take more time, as NBA use 2 queues (evaluate and update). NBA are necessary for sim purpose. synthesis tools don't diff b/w BA and NBA, and will generate the same o/p netlist.

ex:
always @(posedge clk)
regb = rega;
always @(posedge clk
rega = data;

This is a shift reg when written correctly using NBA (<=). It is OK even with BA, provided first always block executes before second always block.
In pre-synthesis simulation, the second always block may execute before the first always block. In that case, data will run through to regb on the same posedge of the clock. However, this will synthesize to a shift register, where data will not run through to regb on the same posedge of the clock. So, sim o/p will differ b/w rtl vs gate.

statements:

1. if-else: if (expr) statement_or_null [else statement_or_null]. If else can only be used with a procedural assign stmt (i.e always, begin). It can't be used with continuous assign stmt (i.e assign). else is optional and is paired with nearest if clause. expr of 0,x,z is treated as false, while any other value is treated true. NOTE: there can be only 1 "if stmt" in a "procedural assign", but if there are multiple (with no else clause), then the last one overwrites all previous ones. Ex:
always @(posedge ....) begin
if (a=1'b1) x<= c&d;
if (b=1'b1) x<= e&f; => This gets prioroty as it's last one simulated. So on +ve edge of clk, if both a and b are 1, then x gets value of e&f; It's same scenario as we do in state machine, where we assign default, and then overwrite it only in states where we need to. However this kind of code should be avoided as it can have 0 width glitch as x will change from "c&d" to "e&f" in zero time.
end
2. ?: conditional operator: expr ? expr1 : expr2. Used in place of if-else stmt. One diff b/w ?: and if-else is that since this is an operator, it can be in any expr that is either a part of procedural assign stmt or continuous assign stmt.
3. Loops: 4 stmt: repeat, for, while, forever. These can only be used in procedural assign stmt.
4. case: case expr comparison is effective when all compared bits are identical. comparison is done using 4 valued logic (0,1,x,z), so 2 bit case cond can evaluate to 16 diff values. Therefore, special types of case stmt are provided, which can contain don't-care values. casex treats both "x" and "z" as don't care, while casez treats only "z" as don't care. "?" is treated as "don't care" value for comparison purpose.
ex: reg a;
casez (a) //here if a=1'bz or 1'b?, then stmt1 is executed since then it's don't care. after stmt1 matches, stmt4 is never looked at.
1'b0 : statement1;
1'b1 : statement2;
1'bx : statement3;
1'bz : statement4;
endcase
ex:
case (1'b1) //stmt1 executed only if a=1'b1
a : statement1;
endcase

if-else-if vs case stmt:
---
1. if-else is more general as any set of comparison expr may be used in if-else-if, while in case, all case expr are compared against a common controlling expr.
2. if stmt comparison involving x/z results in x/z which is interpreted as false (unless case equality === is used). case stmt compares with x/z, and match happens only when all x/z match. So, comp involving x/z is interpreted as not matching to anything (if only 0/1 bits are there) and results in going to default. So, comp expr in case may include x/z which is useful in debug.

--------------

5. function and task: explained in testbench section below.

Module definition/instantiation:

Module definition:

In v2001, we specify port dirn and type in one stmt within brackets. This is aka "ANSI C style for module ports definitions". Recommended to have 1 input or 1 output per line

---------
module patgen ( //declarations are contained in parentheses itself
input wire [7:0] a,b, // no need to specify wire, as it's understood by default
output c,
output reg [3:0] y //note last entry doesn't have a comma after it
);

//internal wires, reg should be decalred next
reg [1:0] a,b; //
reg clk = 0; //in v2001, we can init var at time 0 within the declaration, instead of having separate initial stmt. Note if we have, reg a=top.b; => it will assign a with whatever top.b signal is at time=0, which would be x(if top.b hasn't been init). to assign a cont, we should do:
always @* a <= top.b;

//now have always/initial/assign blocks
initial begin ... end
assign b=clk;
always @* a <= b&~clk;

endmodule

---------

or we can define module ports as below: //not preferred as it's not as compact as above. It's NOT ANSI C style.
module patgen (a,b,c,y); //declarations not contained in parentheses
input wire [7:0] a,b;
output wire c;
output reg [3:0] y;
...
endmodule

In v1995, we had to specify port dirn and type separately in 2 stmts outside the brackets with port names within brackets. This is what we see in above ex. This was a necessity in V1995. With V2001, we don't need to do this anymore.
module patgen (a,b,y);
input a,b;
output y;
reg y;
// wire a,b; //a,b data type not defined as it's understood to be wire by default
...
reg clk;
initial
clk = 0; //in v1995, we had to use initial to init a var
...
endmodule

NO ports: when module doesn't have any ports, we can define like this
module Adder; //or as module Adder();
begin ...end ...
endmodule

module instantiation:

module name followed by instance name and pins
-------------------
Ex:
module Adder ( .... define adder ...)
Adder I_adder(A,B,S,C); //instantiation of Adder

Parameter/localparam:

parameter: To set value to a particular parameter within a module. They are written within module body, but NOT within always or assign block. It can be overridden by using defparam, or during instantiation.
ex: module tfilter ( ...);
parameter SREG_SIZE = 2; //sets default value to 2
parameter real r=5.0 from [-5:+5] exclude 0; => specifies real param with range of -5 to +5 (both included) and excluding 0. []=>include range, ()=>exclude range
endmodule

1. override using defparam:
defparam a_cont_deg.SREG_SIZE = 3; //overrides parameter value of 2 in a_cont_deg isnt of tfilter module. can't be inside always or assign block.
tfilter a_cont_deg (.reset(..), ...);

2. override during instantiation: If an override value is not specified, default value is used.
ex: tfilter #(3,6,7 ..) a_cont_deg (.reset(..), ...); //if multiple parameters, they are overridden in the order they are defined in module. The ones that are not defined use default values. To avoid confusion, we can do explicit assignment (added in V2001) as shown below (recommended):
ex: tfilter #(.SREG_SIZE(3),.MSIZE(4),...) a_cont_deg (.reset(..), ...); //being explicit (V2001)

3. on irun cmd line:
+defparam+veridian_tb.u_rom0.PRELOADFILE="efuse.img" \ => works. But if full path to efuse.img provided, then you need \ before "
+defparam+veridian_tb.u_rom0.PRELOADFILE=\"${DVWORK}/efuse/efuse.img\" \ => works
-defparam veridian_tb.u_rom0.PRELOADFILE="${DVWORK}/efuse/efuse.img" \ => doesn't work??

localparam: was added in V2001. It doesn't allow the parameter value to be overridden. Trying to overwrite it gives an error.
module tfilter ( ...);
localparam SREG_SIZE = 2; //sets value to 2. It can't be changed from anywhere.
endmodule

Time scale:

Timescale directives should be given before each module in each file separately. Since it's compiler directive, it's expanded at time of compilation. So, for nested modules, lower level module timescale is inherited from higher ones if not defined for the lower ones. If timescale directive not defined, then previous one remains in force until overridden by next such directive. If nothing defined at all, then timescale directive from simulator cmd line option used (irun ... +nctimescale+1ns/1ps ..) => equiv to defining timescale at beginning of each source file. So, depending on order of compilation, modules with no timescale directive might get timescale values from different files, which might cause different simulators to yield different results. That's why it's recommended to provide Timescale directive at beginning of each file or module.
`timescale time_unit/time_precision
Delays are multiples of time_unit rounded to time_precision. #x => x*time_unit rounded to time_precision. On nWave, it shows up as that number in units you choose.
i.e.
`timescale 10ns/1ns => #1 is 10ns.
#1.55 a = b;
'a' gets 'b' after 16 ns because 10ns*1.55 = 15.5 ns = 16ns rounded to nearest 1ns

`timescale 1ns/1ps => #1 is 1ns.
#1.00055 a = b;
'a' gets 'b' after 1.0006 ns because 1ns*1.00055ns = 1.00055ns = 1.0006ns rounded to the nearsest pico second

values of 1,10,100 are allowed in time_unit with unit of meas as s,ms,us,ns,ps,fs

NOTE: in verilog, time units (ns,ms,etc) are not allowed to be part of time value (ie #10, but not #10ms, etc). time units are inferred from timescale directive. However, SV allows us to specify #10ns, thus allowing us to remove ambiguity as to what delay #10 represents. It also saves typing when we have to specify large units as seconds (in SV, we write #10s; while in verilog we have to write #10_000_000_000;).

Delays:

Used in 3 different kinds of modelling: gate level, dataflow, behavioural.

I. Gate modelling: models gate delays

3 values of delays allowed which is min delay, typ delay and max delay [for each of rising, falling and turnoff (Z or floating) event, 0->Z or 1->Z]. turnoff event only applies to bufif0/1, notif0/1.
4 types of delay:

Rise delay: 0,x,z->1
Fall Delay: 1,x,z->0
Turn-Off Delay: 0,1,x->z
Change-to-unknown: 0,1,z->x (delay value is taken to be the minimum of the above three)

The 3 entries separated by comma are each for rising, falling and turnoff(if applicable). Within each entry, the triplet are for min,typ and max delay. If only 1 delay value specified, it's taken for all 3(rise,fall,turn-off). If 2 delay values specified, they are taken for rise and fall resp and min of these is taken for turn-off.

Ex: bufif1 #(1:2:3,4:5:6,7:8:9) (bus, out,dir) => #(R_Min:R_typ:R_max, F_Min:F_typ:F_max, Z_Min:Z_typ:Z_max)
Ex: nand #(12,15) g(a,b,out) => nand gate has rising delay of 12 and falling delay of 15 time units. This applies for all 3: min,typ and max.
Ex: buf #1 I_buf (Y,A); => in gate level models, we've stmt like this. this causes of delay of 1 units on Y, but the flow continues, so the other rtl stmt following this don't see this delay. This is because these are module inst, which are separate continuous blocks by themselves.

The delay model above is used for distributed delay (every element of ckt is assigned delay, and those are added up). Other delay model being used is lumped delay, where complete modules instead of gates are specified delays. The third option is path delay or pin-to-pin delay, where delays specified from pin to pin using specify.

Path delay:

To specify delays to paths across a module (for ex for a flop, we might want to specify D->Q pin delay) we can use specify block inside a module to specify timing b/w module's i/p and o/p. specify specparam ... endspecify (see in verilog book, pg 155). this is called path delay mode as the total delay from i/p to o/p is specified, and runs faster in simulation than distributed delay mode where delay is assigned to each gate in the module. (for TI_functiononly, we assign #0 gate delay to primitive gate). specify is useful when we do sdf annotation, as annotator will disregard delays in specify section, but will keep delay values specified as # delay. All our timing checks (setup,hold,etc) are also put in specify section for the same reason, so that these values will be disregarded when sdf values are present. However, the arcs in specify section have to match 1-to-1 with arcs in sdf file, else we'll get warnings that sdf annotation didn't happen for those pins.

2 methods to describe module path delays: one using "=>" and othet using "*>"

=>: establishes parallel conn b/w src i/p bits and dest o/p bits. Each bit in src connects to corresponding bit in dest. Ex: In models file in AN210.v, we have (A +=> Y) = 0.01 ; => A=>Y would imply rising/falling delay from A to Y is 0.01 time units (upto 9 values can be specified for delays as explained for delay ex above). However with += it means i/p A is not inverted internally in the module, -= means i/p A is inverted internally in the module (i.e there is a bubble on i/p A) . So, -= will apply to cells with input AZ (ex: NO311F). This polarity token is ignored by most logic simulators, but may be used by timing analyzers.
*>: establishes full conn b/w src i/p bits and dest o/p bits. Each bit in src connects to every bit in dest. Ex: in DTP20.v, we have ( CLK,PREZ *> Q,QZ) = (0.100000:0.100000:0.100000 , 0.100000:0.100000:0.100000); => delay from clk->Q, clk->QZ, prez->Q, prez->QZ are all 0.1 unit delay for rise/fall (min,typ,max)

II. Dataflow modelling: to model delays for nets or assign stmt

all 3 ex below are equiv. Any changes of the signals being assigned to the o/p net will only be propagated after the specified delay. These delay are called inertial delay, as any changes in i/p which are less than the delay specified are ignored (i.e glitches < delay are not propagated). inertial delays are easy for the simulator to implement.
In ex below: If either of the values of in1 or in2 should happen to change before the assigment to out has taken place, then the previous assignment will not be carried out, and new assign will be substituted for the old assgn with new timing, as input pulses shorter than the specified delay are filtered out.

ex 1: net declaration delay - this delay is associated with net (propagation delay) and is added to the gate delay of driver, if any. This models wire with delay of 10.
wire #10 out;
assign out = in1 & in2;

ex 2: Regular Assignment Delay - this models delay from when RHS changes to when LHS changes.
wire out;
assign #10 out = in1 & in2;

ex 3: Implicit Continuous Assigment - this is same as ex 2, but 2 stmt combined into 1.
wire #10 out = in1 & in2;
wire a = 1'b1; => equiv to forcing a to 1.

ex 4: Incorrect (this is inter-assignment delay with incorrect syntax)
wire out;
#10 assign out = in1 & in2; => here, this continuous assign is outside always/initial block. So, delay of 10 should cause all stmt after it to be delayed, but continuous assign are all running in parallel, so invalid. To make it work, we've to put it in always block as shown below in "inter-assignment delay"

III. behavioural modelling: These delays can only be used in always/initial blocks in RTL for reg type.

1. inter-assignment delay (inertial delay), aka regular delays or delayed assignment (most common type used in testbench). This used with either continuous assgn ( as with wire above in ex 1-3) or with always/initial block. used most commonly with blocking assgn (a = b etc). This inertial delay, so glitches less than propagation delay are filtered out. SDF file treats delays as intertial delays, so glitches get filtered out.
ex:
reg q; //note: q is defined as reg since it's used within always block
always @* #10 q=x+y; //waits for 10 time steps before executing the cmd. So, this causes a delay of 10 for all subsequent rtl stmt within that always block.
This delayed assgn above is equiv to this code:
begin // Equivalent to delayed assignment above.
#10; // Delay.
q = x+y; // Assign q. Overall same as #10 q = x+y. Note: no temporary staorage of result(x+y). Whatever is the current value of x,y is used.
end

2. intra-assignment delay (transport delay). used most commonly with non-blocking assgn (a <= b etc). This is non-inertial delay, so glitches not filtered out. These are used to model delay for that piece of logic only, as it would be in real gate delay. Using BA to model transport delay (as shown below) doesn't work (have to use NBA)
ex:
reg q;
always @* q = #10 x+y; //value of x + y is stored in tmp var at the time that the assignment is executed, but this value is not assigned to q until after the delay period, regardless of whether or not x or y have changed during that time. After storing it in tmp var, sim waits for #10 (execution moves to other always/initial blocks), and then copies tmp to q after delay of 10. execution doesn't move forward for this block, as it's BA. Any intermediate changes to a,b are lost, as it doesn't come back to start of this block. It comes to start of "always @" waiting for any signal change on x,y only after all stmts are done in this block. So, glitches are filtered here (inertial delay). This causes a delay of 10 not only for this stmt, but for any rtl code after this stmt within that always block, as it's BA. Stmt in other initial/always block still move forward, as #delay moves control to other blocks.
This intra assgn delay above is equiv to this code:
begin // Equivalent to intra-assignment delay.
hold = x+y; // Sample and hold y immediately.
#10; // Delay. NOTE: this delay applies to all stmt after this. At this point, execution jumps to other blocks, and comes back here after #10 delay has occurred.
q = hold; // Assignment to q. Overall same as q = #10 x+y.
end

General rules for assigning delays:

Assign of delay to LHS or RHS of BA to model combo logic is flawed.
- 1. LHS delay: always @* #10 q=a+b; => bad for mdeling, ok for tb. will cause q to update with latest value of a+b. Supposed to model inertial delay but flawed. use "assign" instead of "always @*".
- 2. RHS delay: always @* q=#10 a+b; => bad for mdeling, bad for tb. will cause q to update with sampled value of a+b, and missing out on any changes of a,b within that #10. So, value of q will be wrong for a while until a,b changes again after #10. supposed to model transport delay but flawed. use NBA instead of BA. @* is imp here. If we use "always q=#10 a+b;" then q will get updated every #10 irrespective of changes on a or b (see detailed explanation above).
Assign of delay to LHS of NBA to model combo logic is flawed, but to RHS of NBA is good (models non-inertial(transport) delay).
- 1. LHS delay: always @* #10 q<=a+b; => bad for mdeling, bad for tb (as it's inefficient compared to BA, so use BA instead). will cause q to update with latest value of a+b. same behaviour as with LHS delay on BA.
- 2. RHS delay: always @* q<=#10 a+b; => good for mdeling, ok for tb. will cause q to update with sampled value of a+b. since it's NBA, it willgo back to beginning of "always @" and wait for any new event. So any changes of a,b within that #10 will again be captured. So, value of q will be updated after #10 for any change on a,b. So, it models transport delays accurately.
Assgin of delay to LHS of "assign" is good and models inertial delays
- 1. assign #5 q=a+b; => q is scheduled to be updated #5 later. However, if a,b updates within that #5, then that future scheduled event is killed and replaced with newer event later. So, changes on a,b < #5 are killed, so it's inertial delay. So, assign is not exactly similar to "always @". assign don't queue up o/p assgn, they only keep track of next o/p value and when it will occur. It's like "always @* #5 q<=a+b;" but it doesn't wait for #5 before looking at i/p again. It continuosly looks at i/p.
- 2. assign q=#5 a+b; => not valid (RHS delay not allowed on assign)
- 3. assign q<=a+b; => not valid (NBA not valid with assign)

NUTSHELL: use assign with BA to model inertial delay, and RHS delay with NBA to model transport delay.

ex:
always @* q <= @(posedge clk) D; D is evaluated whenever it changes, but is not assigned to q until after clk posedge. => trnasport delay
always @(posedge clk) q <= D; Here on posedge of clk, instantaneous value of D is assigned to q (D is not held anywhere) => inertial delay

ex: Here clk starts with x.
initial #2 clk = 1'b0;
always #1 clk = ~clk; => at #1, RHS clk=x,so LHS clk=x; at #2, RHS clk=0 or x depending on whether initial executed first or always executed first. Then LHS clk=1 if initial executed first. If always executes first, then LHS clk=x, but then initial will execute and clk will be 0. So, b/w #2 to #3 clk=1 or 0, depending on what stmt executes first.
From the sims, looks like initial executed first, resulting in clk=1 at #2. however, by changing clk<=1'b0, we can make always execute first, resulting in clk=0 at #2.

NOTE: By default, verilog gate level models and interconnect delays are always simulated as transport delays, but they look as if they are simulated as pure inertial delays (since they don't allow glitches shorter than prop delay to pass thru). This is beacuse, by default, pulse_r and pulse_e are set to 100% in simulators. See simulation section for more details.

Operators: operator precedence decides what the final value is (pg 310 in verilog book). All operator treat Reg/wire as unsigned (unless specified explicitly as signed in V2001) and real/integer as signed or unsigned (depending on what they are specified as).

--------
logical:   !, &&, || => logical negation,and,or. two sides are logical T or F values. True is "1" or anything non-zero, while false is "0".
ex: a=4'b1100; => here a is non-zero, so a=True, so !a=false or 0. If a=4'b0000; then a=False, so !a=True or 1.
Bitwise: ~, &,   |,   ^, ^~ (or ~^) => bitwise negation, and, or, xor and xnor. Each bit in that operand is operated on.
ex: a=8'b1010xzxz; => ~a=8'b0101xxxx;
Unary reduction: & (nand = ~&), | (nor = ~|), ^ (xnor = ~^ or ^~) => unary reduction produces the single bit operation on all of the bits of the operand. Unary reduction and bitwise operation are distinguished by syntax.
Ex: any_val = |(val[7:0]); => or of all 8 bits of val.
Ex: !(reset_n) => when reset_n is not high/true (i.e low/false). similar to ~reset_n

comparison:
==, != :logical equality/inequality, result will be unknown if x or z in the input.
Ex: if (A==1) then stmt1 else stmt2; => if A=x, then this will result in x, which is "false" so stmt2 will execute. So, with ==, comparison with "x" or "z" is always evaluated to false. To do comparison with "x" or "z", we use ===.
===, !== :logical equality/inequality including x and z
> >=, <, <=, >>, << :less than/greater than, right/left shift

arithmetic: +, -, *, /, %(modulus).

concatenation: {} => joins together bits from 2 or more comma separated expr. It can also have a repition multiplier right before {.
Ex: {c, b, 2{a,b}} => {c, b, a, b, a, b}
ex: wire [3:0] c = {4{a}}; => NOTE: curly brackets are reuired outside of 4 too. Just 4{a] won't work, as 4 copies of a will need to be concatenetated with outer brackets. ex: c = 4{a} => c = a,a,a,a => incorrect. {4{a}} => c={a,a,a,a} which does the correct concatenation

NOTE: in SV, we can use these operators in C style. i,e a=a+b can be written as a += b;

structural data type: Wire or reg:
------------
2 primary data types are reg and wire. declaration for wire/reg is done inside a module, but outside of initial/always blocks. Initial value of reg is "x", while that of wire is "z".

In V95, reg, wire, ports were all unsigned only. The only data type in V95 that could be signed was integer data type. so, if we wanted to do signed operations, we had to define that var as integer which limited that number to 32 bits. Or, we could sign extend the msb of reg, and operate on those extended reg. Remember that any reg[x:0] is interpreted as +ve number 0 to 2^x-1 in V95. + just does a bitwise addition (numbers can be -ve too), while - does a 2's complement of the number to subtract and then adds it to the original number. However in V2001, reg, wire,ports could be defined as signed in which case, verilog internally sign extends them to get the correct result.
NOTE that result of any operation is dependent on RHS (operator and operands, number of bits in those, etc) and not on LHS.

NOTE: In verilog, any -ve number is internally rep in 2's complement format. We can do any arithmetic on any signed/unsigned number and we'll always get the correct result, if the result is within the range of numbers that can be rep. For ex in 4 bits, +ve numbers are 0 to +15, while -ve numbers are -8 to +7. If we subtract 0101 from 1111, then result would be 15-5=1010=10 (if 1111 is considered unsigned=15) or -1-5=1010=-6(if 1111 is considered signed=-1). So result is correct in both cases. If result falls out of range, then we need extra bits to rep it. For ex: in 4 bit rep: 2-5=0010-0101=0010+1011=1101=(rep correctly as -3 in 2's complement since it's within range but incorrectly as +13 in unsigned rep, since it's outside valid range). However in that case, we have to sign extend the operands by extra bits so that we get correct result which has the extra msb. If we don't sign extend, then look at this ex in 3 bits: if A=-3=101, B=+2=010 => Sum=101+010=111 (in 4 bits, it's 0111 which is +7=> not correct. In 3 bits, it's still rep correctly as -1). However, if we sign extend A and B to 4 bits, then Sum=1101+0010=1111 which is -1=> correct. We sign extend, since adding 2 n-bit values, results in (n+1) bit value (we can sign extend indefinitely any number and the number will still be the same = that's characteristic of 2's complement).

ex: reg cntr[3:0], cntr <= cntr - 5; => this takes a 2's complement of 5, reps it in same no. of bits as other operand (which is 4 bits in this ex) and then adds it to cntr. This will work correctly as long as cntr is 0 or +ve. As soon as cntr value becomes -ve, it won't be rep correctly as unsigned number. As a signed number, cntr can still be rep correctly in 4 bits upto -8, but after that more msb bits are needed to rep it correctly.

V1995: ex: to add signed numbers (in V95) => here input should be provided in 2's complement format in order for this to work.
module add_signed_1995 (input [2:0] A, input [2:0] B, output [3:0] Sum)
assign Sum = {A[2],A} + {B[2],B};

V2001: ex: to add signed numbers (in V2001) => we declare ports as type signed:
module add_signed_2001 (input signed [2:0] A, input signed [2:0] B, output signed [3:0] Sum)
assign Sum = A + B; //here sign extension done automatically

NOTE: in v2001, be careful when any of the operands in unsigned i.e sum=A+B+Carry; where Carry is unsigned 1 bit carry. Verilog states that if any operand in unsigned, then the result is unsigned. Also, all signed operands will be signed extended to match the size of the largest signed operand. So, here A and B won't be sign extended, as whole summation is assumed to be unsigned, so msb of sum won't be correct for -ve numbers (same as in V95 where we don't sign extend). During synthesis, we'll see a warning "signed to unsigned conversion occurs (VER-318)". If we declare C to be signed, then C would just be sign-extended, so if C=1, it would be sign extended to 4'b1111. This is incorrect as this would subtract 1 instead of adding 1. Correct way to do it would be to declare C as unsigned and then do a $signed conversion after prefixing a 0 to it i.e Sum=A+B+$signed({1'b0,C});

-------------------

Procedural assignments (any assignment in a procedure as always,initial,task,function) are used to model seq logic, while assign are used to model combinatorial logic. The reg variables store the last value that was procedurally assigned to them whereas the wire variables represent physical connections between structural entities such as gates. wires don't store any value. (NOTE: net data type can be wire, wand (wired and), wor(wired or), tri).
whenever you use assign statement (combinatorial) to a o/p port, use output wire. We can omit "wire" from port type, since by default it's assumed to be wire.
whenever you use something in always or initial statement or tasks or functions (all these are procedural assignments) to a o/p port, use output reg. (so, in our *_tc.v, all references or assignments (digtop_tb.nRST = 1'b0) should have variables declared as reg in digtop_tb).

For assign, we need to declate a var as wire and then assign it.
wire a;
assign a = b&c; //we can't write a=b&c; without an assign stmt. But if a was declared as reg, we could have done that as long it was done within an "always" block or in an "initial" block.

However, to save typing, we can do:
wire a = b&c; => does same job as above

NOTE: Reg can be used as a sequential element. But wire cannot be. Reg can be driven from an intial/always block.
1. Output/Inout port of an instantiated module can be connected to a wire only (not reg), as the net is just connecting the o/p port to some other port so it can't be driven from an initial/always block (ex: digtop dut (.Yout(y_out)). Here even though Yout might be a reg, y_out must be declared a wire).
2. Also, any net that we are assigning value to in an intial block has to be a reg, since wires can't be assigned values (ex: initial #1 digtop_tb.a_net=1'b0. here a_net should be declared a reg and NOT a wire as it's being assigned value from within initial block.

So, reg can be used anywhere, but wire can only be used with assign stmt. This is just a syntax thing from verilog. From simulation point, there's one subtle difference:
The value of wire(on R.H.S) is evaluated for every simulation delta/ change in simulation time, where as the reg is evaluated only when there is change in any of the signal in the sensitivity list. That is why even though the combo logic implemented using assign/procedural blocks is functionally same but has the above difference. Also, because of the above behavior we say that reg need to store the value it has until there is any change in the sensitivtiy list.

This gives rise to issues (as discussed in wire/reg section above), where reg a=top.b gets assigned only once at time 0, while wire a=top.b gets a cont assgn with top.v. So, with reg, we've to use "always @* a=top.b" to mimic behaviour of wire.
Starting from sv (system verilog), we use logic for both reg and wire. logic behaves same way as reg, so we need to use always @*.

Behavioral Data Types: integer, real, and time. num can be int or real (signed or unsigned)
---------
These data types used for testbench only and not for synthesis, as structural data type wire/reg are the only ones that synthesize efficiently. Integers can be displayed using %d, while real with %f, and time with %t.
const numbers can be rep in decimal, binary, octal or hex. -ve numbers rep in 2's complement format.
_ is legal anywhere in num, except as first char, where it's ignored.

1. integer num: In verilog, keyword "integer" is used to define integers. In SV, "int" is used. They can be sized (as specified) or unsized numbers (Unsized size is 32 bits).
-----------
Syntax: <size>'<radix><value>; => size is specified as decimal number in number of bits. default size is 32 bits and default radix is decimal. This is valid for interger as well as reg.

When <size> is smaller than <value>, then leftmost bits of <value> are truncated
When <size> is larger than <value>, then leftmost bits are filled with 0,Z,X(if 0/1,Z,X in leftmost bit in <value>).
1 => stored as 00000000000000000000000000000001 (32 bit decimal num)
6'b10_0011 => stored as 100011 (only 6 bits used to store it, as 6 bit size is specified)
2'd1 => stored as 01 (as 2 implies 2 bits to store the value of decimal 1. since leftmost bit is 1, so remaining left most bits are filled with 0)
2'b1 => stored as 01 (as 2 implies 2 bits to store the value of binary 1. since rightmost bit is 1, so remaining left most bits are filled with 0). To store both bits as 1, do 2'b11 or 2{1'b1} or {1'b1,1'b1}. If you do 'b1, it will store it as 00000000000000000000000000000001, so not what we wanted.
'hF => stored as 00000000000000000000000000001111 )as no size specified so 32 bits)

ex: reg [63:0] a; a='d200999000777000; //this will cause an overflow, and a will get assigned incorrect value (even though "a" is 64 bit, number on RHS is by default stored as 32 bit, so some truncated 32 bit num will get assigned to 64 bit var). overflow warning will be displayed. To prevent overflow, do: a=64'd200999000777000; => This 64'd forces the number to be stored in 64 bit format, and then it gets correctly assigned to LHS.
NOTE: any number is stored as 32 bit except explicitly asked to store in more bits. ex: 17689999001 + 3457788890 will give incorrect result as both numbers are stored in 32 bit, which will cause overflow.
ex: integer a; a=2'd3;
Note: If we define int x=3; Then x[31:0]=00000000000000000000000000000011. We can access any bits of x. Ex: 8th bit of x is x[7]=0. To access 4 lsb of x, x[3:0]=0011. This is helpful when we are looping in "for" loop using x as variable, then we can specify individual bits of x to some internal register to be written.

ex: reg [1:0] a=13; //this is same as "reg a=2'b01" as reg stores 2 lsb of 32 bit integer "13"=4'b1101. So, 01 is stored in a[1:0]. Note: every number is default to decimal radix, and is internally stored as 32 bit int.
ex: reg [3:0] a=1111; This assigns a=4'b0111 as 4'd1111 = 'b100_0101_0111. So, 4 lsb bits are 0111.

2. real num: keyword "real" is used to define real. either decimal (<value>.<value>) or scientific (<mantissa>E<exponent>)
------------
1.2 => +ve real num 1.2
3.5E6 => 3.5*(10^6) = 3,500,000.0
10e-1 => 10 *(10^-1) = 1 (e/E already implies a 10)
ex: real a; a=5.2;

Any number that does not have negative sign prefix is a positive number. -ve num rep as 2's complement.
32'hDEAD_BEEF => Unsigned (or signed positive) number rep internally as 32'hDEAD_BEEF
-32'hDEAD_BEEF => -ve num rep internally in 2's complement form as 32'h21524111
14'h1234 => Unsigned (or signed positive) number rep internally as 32'h00001234
-14'h1234 => -ve num rep internally in 2's complement form as 32'ffffedcc

3. time: holds sim time which is returned from system fn $time. size is 64 bit. If min timing resolution is 1fs(10^-15sec), then 50 bits are enough to rep 1 sec. So, 64 bits can rep about 10,000 sec.

--------------------

Data Structures: Arrays, vectors, memories.
---------------
NOTE: see system_verilog.txt for packed vs unpacked arrays.

Array: used to hold several objects of the same type.
----
ex: integer i[3:0]; //integer array with a length of 4. i[0],i[1],i[2],i[3] are each an integer with 32 bits
ex: reg     r[7:0]; //scalar reg array with length of 8. r[0] to r[7] are 8 distinct reg.

We can also have arrays of instances (added in v1995).
ex: Adder I_adder[3:0] (sum, {c_out, carry[3:1]}, a, b, {carry[3:1], c_in});
=> is equiv to
Adder I_adder[0] (sum[0], carry[1], a[0], b[0], c_in)
Adder I_adder[1] (sum[1], carry[1], a[1], b[1], carry[1]) and so on ..

generate:
--------
In V2001, generate stmt was added to achieve this and much more. generate was taken from vhdl. During elaboration, the compiler replaces stmt inside generate with multiple copies of those stmt, so it saves typing. 3 kinds of generate stmt:
1. for loop generate 2. if-else generate 3. case generate
These "generate for loops" are different than "normal for loops" since normal for loops are never replaced with multiple copies, but rather are executed at run time. generate stmt are synthesizable.
ex: generate stmt to have 4 copies of U=U[0] to U[3].
genvar i; => new variable type that can only be used inside generate. It's a local +ve integer and can be declared inside or outside generate stmt.
generate
for (i=0; i < 4; i=i+1) begin : MEM
memory U (read, write, data_in[(i*8)+7:(i*8)], address,data_out[(i*8)+7:(i*8)]);
end
endgenerate => no semicolon (endgenerate is optional)

ex:
generate
for (i=0; i < 2; i=i+1) begin : MEM
always @(reset, reg[i]) begin
   case (i)
    'd0: reg_a[i]=8'h01;
    'd1: reg_a[i]=8'h10;
   end
end
end
endgenerate

ex:
generate //genvar i has to be declared. If i has already been declared and used in some other for loop, then some variable has to be defined here, or else it gives an error
for (i=0; i < 2; i=i+1) begin : MEM //for loops should be outside always stmt, else it gives an error
   always @(posedge clk or negedge rst[i])
     if(~rst[i]) A[i+1] <= 1'b0;
     else        A[i+1] <= data[i+5];
end
endgenerate

Vectors: used to represent multi-bit busses. These are multi-bit words of type reg or wire
------
ex: reg [7:0] MultiBitWord1;    // 8-bit reg vector with MSB=7 LSB=0. This is diff than "reg r[7:0]" as we can't refrence r, we can only ref r[0],r[1] etc as 1 bit. However, when we define "reg [7:0] r", then we can ref r, as that implies word r, which is 8 bits in length.
ex: reg a;                      // single bit vector often referred to as a scalar

ref vectors:
ex: a = MultiBitWord1; //if a is also 8 bit wide then all 8 bits of MultiBitWord1 assigned to a
ex: bitslice = MultiBitWord1[3:0]; //applies the 3-0 bits of MultiBitWord1 to bitslice

Memories: array of vector reg
----------
reg [7:0] ram; // This is a 8 bit register vector.

to build mem array, do as below
reg [15:0] ram[255:0]; // This says there is an array of ram from 255 to 0, which is each 16 bits wide. so ram[0] has 16 bits, ram[1] has 16 bits and so on. For ex. 3rd bit of 2nd byte is ram[1][2]. In SV, we could have also written it as "byte ram[255:0]".
input [7:0] addr; //this is the 8 bit addr to array of ram
input [15:0] wrt_data;
output [15:0] rd_data;

So to read:
assign rd_data = ram[addr]; // This is equiv to rd_data[15:0] = ram[addr[7:0]][15:0] same as rd_data[15:0] = ram[8'bxxxx_xxxx] , so rd 16 bits from that addr location.
NOTE: Sometimes , we don't have all bits defined(i.e some bits may be x) for ram[addr], so in those cases we use case stmt:
always @(*) begin
case(addr)
ADDR1 : rd_data = {1'b0,ram[addr][6:0]};
ADDR2 : rd_data = {VAL1, ram[addr][6],VAL2[5:0]};
default: rd_data = 8'b00;
endcase
end

To write: (to write into latch, where all ram locations are built using latch and not flop)
always @(*)
if(wrt_en) ram[addr] = wrt_data; // Writes data[15:0] to that ram addr.

----
Verilog 2001 added support for auto increment/decrement array index. The offset direction indicates if the width expression is added to or subtracted from the base expression.
[base_expr +: width_expr] //positive offset
[base_expr -: width_expr] //negative offset

ex: wire [7:0] byteN=rd_data[8*count +:8]; => if count=4, then rd_data[39:32] assgn to byteN. base_expr=32, width=8, so limit_expr=32 + width of 8 = 39. If it was -, then it would be [32:25]. to get [31:24], we should do rd_data[(8*count-1) -:8];

-------------
----------------------------------
latches:
--------
To implement latch, we could have written the code as above:
----
always @(*)
if(en) q = data;
----

with reset:
----
always @(*)
if (~reset) q = 1'b0;
else if (en) q = data;

however, instead of =, prefrred way would be <=
---
always @(*)
if (~reset) q <= 1'b0;
else if (en) q <= data;

Here, since reset is async reset for a latch, there is no way for the tool to figure out that it needs to use a async latch. It just sees it as a normal if else combinational logic (latch followed by 2 muxes). So, to infer an async latch, we have to say
// synopsys async_set_reset "reset"
This says to the synopsys tool that use a async latch (instead of combinational logic for reset) with async signal tied to reset.

Note: instead of *, we could use all i/p in the sensitivity list of always @. But * automatically implies that all i/p are in sensitivity list in V2001 and beyond.
always @ ( en or reset or data) //or always @(en,reset,data) or always @(*) or always @*
if (~reset) q <= 1'b0;
else if (en) q <= data;

-----------------------------------

Coding style
------------
always @(posedge clk) begin
a<= 1;
a<= 0;
end

The above code may simulate correctly overwriting previous value of a with final assigned value. So, a will be 0 at every clk edge. However Lint (syntax checking for verilog and VHDL) will catch this.

The below code is correct but Lint will report it. It's equiv to assigning a to 1 in cases 00,01 and 11, and a to 0 in case 10. However it saves typing if we assign it at start. Then it gets overwritten in the case of 10.

always @(posedge clk) begin
a<= 1;
case (b) begin
   00: y<=c;
   01: y<=d;
   10: a<=0;y<=e;
   11: y<=f;
endcase

-------------------------

synthesis imposed coding style:
----------------------------
for a set/reset flip-flop: This style requires that all signals in a sequential-logic sensitivity list be specified with an edge (posedge or negedge).
ex: always @(posedge clk or negedge n_reset or negedge n_set) begin
      if (!n_reset) q_out <= 1'b0; //reset has priority over set
     else if (!n_set) q_out <= 1'b1;        // set all bits to one
     else q_out <= data_in;   //

However, this causes simulation to be different than synthesized FF, as synthesized FF are level sensitive to set/reset.
Problem scenarion: when n_reset goes low, then n_set goes low, and then n_reset goes high. Above code will have q_out=0 until the next clock, even though n_set is active. Actual h/w is level sensitive, so it will cause q_out=1 as soon as n_reset goes high. To match actual h/w, we need to add this to sensitivity list:
Fixed: always @(posedge clk or negedge n_reset or negedge n_set or posedge (rst_n & ~set_n))

------------------------
Compilation directive: `elsif and `ifndef were added in v2001.
`ifdef, `else, `elsif, `ifndef, `endif

ex1:
`ifdef TYPE_1
$display(" TYPE_1 message ");
`else
`ifdef TYPE_2
$display(" TYPE_2 message ");
   `endif
`endif

Compile with: +define+TYPE_1
Then simulate,RESULT: TYPE_1 message

Compile with    +define+TYPE_2
Then simulate,RESULT: TYPE_2 message

ex2:
`ifdef U00
reg f01;
`elsif D01
reg t01;
`elsif D01
   `ifdef D01
   reg t040;
   `elsif D00
   reg f040;
   `endif
`else
reg t03;
`endif

***************
verilog TestBench
***************

-------
DUT connected to testbench: 3 modules are there. DUT is original design RTL, Test is module in each testcase separately (module name Test should be the same for all testcases so that TOP_TB can just inst Test and not change name for each diff testcase) and TOP_TB is toplevel testbench that instantiates and connects both DUT and Test. Test has the logic to drive DUT i/p signals. Test can also be written as task() instead of as module.

module DUT (input logic A, output logic B, ...);   always @ .... endmodule => these are all sv files, so logic used
module Test (input logic B, output logic A, ...);   initial begin ... $finish; end endmodule => NOTE: A,B dirn are reversed from DUT
module TOP_TB;
   logic A,B,..;
   logic clk=0, reset; //internal clk, reset signals
    always #5 clk = ~clk; //clk osc with clk=0 at time=0
   DUT(.A(A), .B(B), .CLK(clk), ..);
   Test(.B(B), .A(A), ...);
endmodule

-------------------
Procedural timing control: stmt following this don't execute until condition satsified.
1. delay control: stmt delayed in its execution
ex: #(d+e)/2 rega=regb; // after delay of (d+e)/2, rega equals regb (d and e need to be defined as parameters)

2. event control: using implicit event or declared event
A. implicit event: value changes on nets and variable used as events to trigger the execution of a statement.
negedge is detected on 1->x, 1->z, 1->0, x->0, z->0.
posedge is detected on 0->x, 0->z, 0->1, x->1, z->1.
ex:
@r rega = regb; // controlled by any value change in the reg r
@(posedge clock) rega = regb; // controlled by posedge on clock
forever @(negedge clock) rega = regb; // controlled by negative edge
@(posedge clock); // wait until posedge of clk appears

B. explicit event: A new data type, in addition to nets and variables, called event can be declared. An identifier declared as an event data type is called a named event. An event name shall be declared explicitly before it is used. Event is abstract in nature and doesn't require any port connection (thus it can be used across module boundary). event can be triggered at any place in code, and then those stmt that are waiting on this event get executed.
NOTE: For a trigger to unblock a process waiting on an event, the waiting process must execute the @ statement before the triggering process executes the trigger operator, ->. If the trigger executes first, then the waiting process remains blocked. So, event value (trigger) must be changed by a separate process (trigger followed by control in same process will not work, and pgm will keep on waiting indefinitely for unblock to happen).

ex: event e; => Event declaration. variable e is of data type event
initial begin => Event triggering.
#10;
-> e; => This can be put in various places of testbench code
... => these stmts get executed nevertheless. They don't wait for event e or anything.
d=1;
-> e; => calls event e again, which changes value of variable e. This triggers the below always block again.
end

always @e d = 0; => This stmt waits for a change on variable e. variable e changes on ->e, so at that time d gets assigned 0. Then this process is done, and it keeps on waiting for another change in variable e. When that happens in above code, this assignment is again done.
wait (e); => this is another way of waiting for event "e" to get triggered. stmt following this will execute only when "e" is triggered somewhere. This is level sensitive (in contrast to @ which is edge sensitive), so it looks for e to be true. If e=0, then it keeps on waiting. Useful for syncing various processes. just as with @, we can use any reg,wire,etc instead of event "e" as an arg to wait. wait(0) is always false, while wait(1) is always true.

-------------------

fork-join : causes processes to run in parallel
--------
fork
begin proc1 end //can also be a single stmt like: repeat (16) @(negedge digtop_tb.clk_osc);
begin proc2 end
join

However join happens only when both processes complete. If one of the processes doesn't complete, then join never happens. To prevent this in testbenches where some event may never happen in case of a fault, we use join_any (SV construct). However, these forked processes are still running and will run until they complete. It's just that the pgm can continue forward. If it finds a $finish, then it terminates these forked processes and causes simulator to stop.

//when either 100 edges of ecp_clk OR 500 edges of clk_osc happen, join happens
fork
repeat (100) @(negedge digtop_tb.ecp_clk);
repeat (500) @(posedge digtop_tb.clk_osc);
join_any

join_all: in SV is equiv to join in verilog, as join_all waits for all forked processes to complete before proceeding.
join_none: in SV allows the pgm to move forward without waiting for any forked processes to complete. However, these forked processes are running in background till they complete. This is helpful by putting these in tasks, as this proc will run forever, and the pgm can continue forward to next line from where the task was called.

In verilog, sim terminate when all forked proc complete. But in SV, since we have variations of join, there are 2 variation of fork:
1. wait fork; => This allows fork to wait (before proceeding) until all forked child proc have finished. This is used in cacses, where before terminating the pgm, we want to ensure that all forked processes have completed. Ex: we use it after join_any or join_none, but right before $finish
2. disable fork; => This allows all forked processes to be terminated. This is useful, when we want to kill all forked proc, when any of the forked proc complete. Ex: we use it after "join_any" to kill all other forked processes on completion of any 1 forked process.

ex:
task a();
   fork begin // outer fork
      fork         //2 forked proc below. we wait for event to happen. If signal_1 changes within that time, then it's an ERROR
         begin
            @(posedge signal_1);
            $display("ERROR: Unexpected change in output signal at %t", $time);
         end
         begin
            @stop_trig; //wait for event to happen via ->stop_trig;
         end
      join_any    //join if any of them finish
      disable fork; //stop the other forked proc, if any of them finish
   end join_none // This is outer fork, and keeps internal 2 forked proc running. It gets out of this task to next line in pgm, as "join_none" allows it to move forward.
endtask

verilog task and functions: they are defined within a module.
--------------------------
task:
----
like s/w procedure. task call is separate procedural stmt. It can't be called from cont assgn or be used in an expr. o/p of a task is contained in o/p port.
NOTE: variables defined outside the task (which are not local to the task, called global var) can be accessed within the task, but variables defined within the task (called local var) can't be accessed outside the task defn.
The above var type, local/global refers to scope of var - i.e where it can be acccessed. There is other var type auto/static which refers to storage duration of var - i.e when a var is created/destroyed. "static" or "automatic (auto)" type can be local or global. All var by default are auto meaning their value is lost when execution leaves their scope, and are recreated when scope is entered. When defined as "static", variables remain allocated in the memory throughout the life of the program irrespective of whatever function/task. i.e values are retained from one call of the function to another.
ex: function a();
     static int a =0; //if static is not defined, then var is auto by default.
     a= a +1; //with each call, value of "a" printed is 0,1,2,3,4,.... If it wasn't defined as static, then val would always be 0.
    endfunction

NOTE: task/function can also be defined as auto or static. By default they are auto i.e re-entrant in pgm languages as C - items declared within the task are dynamically allocated (on stack) rather than shared between different invocations of the task (static storage). In verilog-1995, unlike C, args and local var were static and NOT auto by default. Meaning they were stored in fixed location. This caused confusion, as various calls to same task/fn from several places in program caused the value to be indeterminate. So, "auto" keyword was added from V2001 onwards to to force simulator to use stack for local var/args. To do this put word "automatic" in program, task, fn, module stmt.
ex: program automatic test; ... task ... endtask ... endprogram => all var in task stored on stack.
This allows us to write recursive functions in verilog. It's good coding style to put task/function as auto, since if task is defined as "non-auto" and if same task is called at same time with diff values, returned val might be indeterminate.
ncvlog/NOAUTO = In Verilog, variables declared in an automatic task or function are automatic. In SystemVerilog, there are more complex rules for which variables are automatic. Automatic variables are deallocated when execution leaves their scope. variables declared in a static task or function are static = i.e they are initialized only once. Ex: function static display(); //all var declared within this fn are now static.

Ex: task defn and call
module acc;
reg var_acc;
...
//task call
task_name(a_1, b_1[7:0]); => i/p passed thru a_1, o/p copied into b_1[7:0], These ports a_1,b_1 are local to the task, and are overwritten with each task call.
...
//task defn: There is no "always" block inside a task, as it runs only when called.
task task_name (input a, output reg [7:0] b); => i/p, o/p can only be passed thru these ports. These port defn is same as those for module defn (where V2001 takes newer concise format, while previous version take older style). Also from V2001, ports are defaulted to input logic. So, for port a, we can just use "a" instead of "input a"
reg [1:0] a; => local variables declared whose scope is local to the task. If these var need to be initialized, they can be init here in SV (in next stmt do a=2'b00;). However in verilog, you've to do init within begin/end stmt.
begin ... var_acc = 1'b1; ... end => variables defined outside the task can be read as well as be written here.
endtask
...
endmodule

NOTE:
1. If we don't define a "type" for i/p, o/p port variables, then default data type "wire" is picked up, the same way as it happens for module defn. we could have written task as:
task task_name(input integer a, output reg [7:0] b); to force certain type.
2. In verilog 95, these i/o ports have to be defined separately,
eg: task task_name; input a; output [7:0] b; => similar to how we did it for module defn
3. Cadenece IUS doesn't support unpacked arrays for i/o ports of task, module,etc. unpacked arrays have to be declared internally within module or task defn. It gives Error like (assuming rdata defined as port "output reg [7:0] rdata[7:0]"):
ncvlog: *E,MEMDIO : Memory 'rdata' previously declared as input/output/inout.
To get rid of this error, declare rdata as "output reg [63:0] rdata", and then declare rev_data within the module or task (outside begin/end stmt) as "reg [7:0] rev_data[7:0]; begin rev_data[0]=rdata[7:0]; rev_data[1]=rdata[15:8]; ..."

function:
--------
like s/w function (fn). called from with an expr or cont assgn. o/p of a fn is contained in name of fn itself. so, during execution of fn, value must be assgn to fn name.
Ex:
function reg [7:0] calc_parity(); //here o/p value being returned is 8 bit reg. If nothing is to be returned, use "void" (default)
calc_parity = data[0] ^ data[1]; //o/p value is returned in calc_parity function name itself
endfunction : calc_parity //name of function here is optional

diff b/w task/fn:
---------------
task can contain timing/event ctl stmt (#,@,wait), but a function can't. i.e task can consume time.
task may have 0 or more input,output or inout ports, while function has 1 or more input ports (no output or inout ports for fn)
task may call other tasks and fn, but fn may only call other fn and not other task. fn can be called from other task and fn.
fn must have return value, and value must be used as in assgn stmt. return value can be ignored by casting result to void

verilog system task:
--------------------
all verilog system tasks are preceded by $. They are not separate procedural call, so they have to be in "inital begin .. end" or in "always @ ...". That is why we put all $display stmt in *_tc.v module as it has an "initial" block, within which all $display stmt go.

1A. $display: to format, capital or small letters are both equiv. i.e %d or %D are both fine for displaying decimal.
-----------
ex: $display("hello a=%0d \t b=%d at time=%t\n",a,b,$time);
[0]=> its optional to place 0. when used, it prints w/o any leading zeroes or spaces.
real: %[w.d]e, %[w.d]f, %[w.d]g = scintific/decimal/short form
decimal: %[0]d (%b/%o/%h for binary/octal/hex)
char: %[0]c (%s for string) => If we use %s to display 8 bit hex value, then ASCII equiv of that hex is displayed.
For ex, if reg [7:0] a=8'h61; then ($display("%s",a); will display "a".
time: %[0]t for current format time. time is 64 bit. If timescale is 1ns/1ps, then it displays in ps. If we use %d for time, then it displays in ns.
names: %[0]m,%[0]M = display hier name. This is helful in displaying "from what modules is some lower level module called from". Ex: if some synchronizer is called from multiple places, we can have an initial stamt in lower level module with display statement. Ex: sync_2ff.v
// synopsys translate_off => used for non-synthesizable code, it turns synthesis off
module sync_2ff ( .... )
initial
begin
$display("sync_2ff: %m"); => this displays higher level module name for all modules at start of sim.
end
// synopsys translate_on => it turns synthesis back on following this stmt
always @....
endmodule

ex: reg [255:0] string1; => string defined as reg since there's no string type in verilog. SV has string type.
In Verilog: string1={"/path_to","/","file",".txt"}; $display("%s",string1); => concatenation works for string
In SV: string str1; str1 = "first string";

1A. $sformat: used extensively in uvm_error to format anything to string.
ex: $sformat(str,"%s %d %s",str1,num,str2); => would produce string with these 3 var concatenated and with space in between. str can now be passed to anywhere i.e $system(str); => will run str cmd (i.e if str = "ls -lrt", then that would be run)
ex: `uvm_error("ERROR", $sformat("ckm error at time %t",$time))

1B. $write: $write is same as $display except that $display always adds a newline at end, while $write doesn't.

1C. $strobe: strobe.

1D. $monitor: continuous monitoring

1E. dump vcd files: This is std task in verilog, so is supported by all simulators.
   - $dumpfile("tmp.vcd"); => task specifies which file to dump the variables in. If name not provided then dumped in verilog.dump
   - $dumpvars(level, list_of_variables_or_modules); => this task specifies which variables should be dumped. If no args provided, then all var at all levels dumped. level condition:
        - If level = 0, then all variables within the modules from the list will be dumped. If any module from the list contains module instances, then all variables from these modules will also be dumped. So, basically all hier starting frpm parent is dumped.
        - If level = 1, then only listed variables and variables of listed modules will be dumped.

2. $finish: to finish simulation. $finish(0)=>prints nothing, $finish(1 or 2)=>prints some info. $finish is equiv to $finish(1). $stop stops sim and returns control back to simulator's cmd line interpreter.

3. $random: generates random num. random num generated are from predetermined seq, and seed controls initial starting value of that seq.
---------
xmit_start = $random; => generates a random no b/w -(2**32-1) to (2**32). To generate +ve random number, use braces.
xmit_start = {$random}; => generates +ve random no from 0 to (2**32)
xmit_start = {$random} % (2^3); => generates random no from 0 to 7
$random(seed) => generates random number with seed(integer seed;) specified. Otherwise default seed is 0, and it will always generate the same seq of random num. 1st random num gen would be with seed 0 (if not specified), and then seed (specified or not) would be modfied by $random to create new seed to be used for next call of $random.
NOTE: seed is an inout value, so it's passed as i/p at start of $random, but is returned back a modified value as o/p at end of $random. So, using $random(5) is illegal as arg here is const 5, instead of variable. seed is not a special keyword, and we could have used $random(a) also. default value of any integer is 0, so $random starts with seed as 0, but then for next call, a gets modified to some other random value, which is used for next call of $random(a). So, seed is just controling the starting point of the random seq.

SV has $urandom and $urandom_range which are more flexible (generates unsigned num):
a = $urandom; => This fn has exactly same syntax as $random. Using this fn allows us to control seed from irun cmd line. We should see a msg saying "SVSEED set from command line: 2069655130" implying seed was taken from cmd line. With $random, we don't see this msg.
b=$urandom(seed); //seed is optional, and can be controlled from within pgm too.
ex: irun -svseed 145 => simulator assigns 145 as seed to urandom
ex: irun -svseed random => simulator assigns seed to random num using current time and process Id. So, this is most preferred way.

a = $urandom_range(255,1); => returns a random num b/w max=255 and min=1. Seed comes from cmd line or else defaults as 0. If min_val not provided, then min taken as 0. $urandom_range(1,255) behaves same as $urandom_range(255,1).

4. $time: returns simulation time. $realtime returns time in unit specified (timeunit 1ns; timeprecision 1ps; $realtime => 3.114ns

5. file system tasks: In V95, file IO was limited to reading hex files into memory array using readmemb/readmemh(data in the file could be binary/hex numbers only separated by white space) and writing file using $display and $monitor. But in V2001, system tasks were added to do C-type file operations.
$fopen: opens a file, $fclose: closes a file,
$fread: reads binary data from the file into a register or memory.
$fwrite: writes data in given format to file
$fscanf: reads characters from the file, interprets them according to a format, and stores results in its arguments.
$fgetc: reads character at a time, $fgets: reads a line at a time.
ex:
integer fd, dnum; => fd to store file descriptor. It's a 32 bit file descriptor.
fd=$fopen("image.bin", "rb"); => rb or r => rd. wb or w => wrt. ab or a => append. b=> binary file. fd is the file descriptor returned. It's 0, if cmd is unsuccessful.
reg [63:0] data; => this reg is used to store read data.
$fwrite(fd,"%b\n",data); => writes data in binary format to file
dnum=$fread(data,fd); => reads binary data. By default $fread will store data in the first data location through the final location. so data[0]=1st_bit, data[63]=64th bit. dnum stores the number of elements read from the file. -1 is returned for error. so dnum can be used to check for errors.
dnum = $fscanf(fd,"%h %h\n",din[31:16],din[15:0]); => Reads file charcaters in hex format, and stores them in 2, 16 bit reg.
r = $fscanf(fd, "%d", CMD ); => reads file and stores char as decimal number in integer variable "CMD".
$fclose(fd); => we close the file once finished reading/writing

readmemh/readmemb: h or b refers to input file format. Data stored in array is always binary. readmem* always works, while fscanf etc may not always work.
--------
reg [31:0] arr[0:7]; //make sure array size [31:0] is larger than each line of file, else readmem will not load values into array and give an error "size too large". NOTE: arr[lsb:msb] should be provided else we get error.
initial $readmemh("include_files/otp.img",arr); => here file is first divided based on white space or newline character. Then bits are read with arr[0][0] storing LSB of 1st line, arr[0][31] storing MSB of 1st line (i.e 9C gets stored as arr[0][7:0]=10011100 (NOTE: nums are still stored as binary even though file is read in hex. This is since they are stored in reg which can only be binary. They can be displayed in any format, hex, dec, etc)
for(l=0;l<=31;l=l+1) begin => to display all contents read
$display("otp.img: location = %d content = %h",l,arr[l][7:0]);
end

$readmemb => reads binary file (which contains only 0 and 1).

force/release, deposit: (can be used in input.tcl (irun cmd line) or in testcase)
------------
force/release:
ex: force digtop.freq_check = 1'b1; //in any verilog file
    #100 release digtop.freq_check; //releases the force so that value is driven by logic again
ex: force digtop.bus[7:0] = 8'hFF; //in input.tcl

deposit:
ex: deposit veridian_tb.CLKOUT = 1'b0; (in input.tcl)
ex: #20 $deposit(tb.CLKOUT, 1'b0); (in any verilog file, task, initial, etc)

timing check system tasks: there are 12 timing checks in verilog.
-------------
timing check related ones are setup/hold, skew, removal/recovery, period and width. These may only be used in specify blocks, so that back annotated values can work when using sdf files. In sdf files, we have TIMINGCHECKS section, which has SETUPHOLD, RECREM, etc which has exact values for these, instead of some arbitrary values being used in verilog timing checks inside specify blocks.

1. $setup(data_line, clk_line, limit[, notifier]); => limit is period before the event on the clk_line (normally a rising edge) during which the data_line signal is not allowed to change. If the signal breaks this constraint, an error is generated. +ve limit specifies data should change before clk.
2. $hold(clk_line, data_line, limit[, notifier]); => limit here specifies period after an event on the clk_line. Note: here 1st 2 args are in opposite order. +ve limit specifies data should change after clk.
3. $rec/$recovery(reference_event, data_event, recovery_limit [,notifier]); => specifies time constraint b/w async ctl signal and clk signal.
ex: $recovery( posedge set, posedge clk, 10 ); => viol reported if posedge clk (data) hapens within 10 units of posedge set (clk/ref)
recovery( posedge set, posedge clk, recovery_param );
4. $rem/$removal => removal
5. $width(reference_event, limit [,threshold, notifier]); => specifies min pulse width from one edge transition to opposite edge transition.
ex: $width (posedge clk, 5); => if pulse width from posedge clk to negedge clk is < 5, it reports violation.
6. $period(reference_event, limit [,notifier]); => specifies min pulse width from one edge transition to same edge transition.
7. $skew(reference_event, data_event, limit [,[notifier]]); => specifies max delay allowable b/w 2 signals. $timeskew and $fullskew also available.
ex: specparam skew_param=14; $skew(posedge clk1, negedge clk2, skew_param);
8. $nochange(ctl_port, data_port, start_edge_offset, end_edge_offset) => This check involves 3 transitions rather than 2 associated with all other timing checks. It checks if the data signal is stable in an interval of start_edge_offset and end_edge_offset of ctl signal being high or low. ctl signal has to be edge specified, while data signal doesn't have to. data must be stable "start_edge_offset" before specified edge of ctl port and must remain stable "end_edge_offset" after next edge of ctl port. So, it checks for 3 edges. data should setup before specified edge of control signal and hold after opposite edge of control signal.
ex: $nochange (posedge clock, d_input, 3, 5); => report a violation if "d_input" changes in the period of 3 time units before +ve edge of clk and 5 time units after -ve edge of the clock. In the whole time while clock is high, d_input shouldn't change (-3 from +ve dge and +5 from -ve edge of clk).
//below 2 stmt could be combined into 1. Not sure, why 2 separate ones provided in lib. probably because +ve and -ve RET have diff values in lib, so combining into 1 in verilog model will not work (as it won't be able to annotate)
ex: $nochange(negedge CLK, posedge RET, 0.005: 0.005: 0.005, 0.005: 0.005: 0.005, GVCnotifier5); //5ps setup/hold wrt -ve CLK (only for +ve RET). As per this check, -ve RET can still happen in this time window.
ex: $nochange(negedge CLK, negedge RET, 0.005: 0.005: 0.005, 0.005: 0.005: 0.005, GVCnotifier5); //this checks for -ve RET.

NOTE: setup/hold can be combined in 1. similarly for recrem. This completes all 12 timing checks for verilog.
$setuphold (reference_event, data_event, setup_limit, hold_limit [, notifier] [, tstamp_cond] [, tcheck_cond] [, delayed_clk] [, delayed_data]); => [...] mean optional. NOTE: SETUPHOLD in sdf file has data first and then clk, while $setuphold in verilog model file has clk first and data later (similar to $hold, see the syntax). So, be careful when comparing arcs.

1. notifier: reg variable used as a flag. When a timing violation occurs, the model functionality can use the notifier flag to modify the model outputs. So, we can generate an x or whatever message we want to o/p in such a case. notifier switches (x->1, 0->1, 1->0) whenever a timing violation occurs. This is passed into the UDP primitive for flop to generate an x for o/p. Notifier reg are not init, since that may cause UDP to goto x state at time 0, depending on the order in which the UDP received its i/p.
2. tstamp_cond: Places a condition on the <control_event> and the <clk_event>, if both <setup_limit> and <hold_limit> are positive values. Places a condition only on the <control_event> if the <setup_limit> is negative. Places a condition only on the <clk_event> if the <hold_limit> is negative.
3. tcheck_cond: Places a condition on the <control_event> and the <clk_event> if both <setup_limit> and <hold_limit> are positive values. Places a condition only on the <clk_event> if the <setup_limit> is negative. Places a condition only on the <control_event> if the <hold_limit> is negative.
4. delayed_clk: Delayed signal value for <clk_event> when one of the limits is negative.
5. delayed data: Delayed signal value for <data_event> when one of the limits is negative.
These delayed copies of clk/data are used as inputs to udp, instead of using CLK or D. This is because if UDPs latch data on a clk, and setup or hold times are -ve, then event driven simulators will give incorrect results unless udp inputs are delayed accordingly. clk and/or data are delayed so that there is +ve setup and hold time. (Then at edge of clk, simulators can see if D changed correctly before clk changed. If D setup time was -ve and it could change after clk change, then simulator has no way to check for future event) This doesn't change anything, only the clk or data is delayed by an amount setup+hold+delta. Simulators may give "non-convergence warning" (ncelab: *W, NTCNNC) in cases where delayed signal still gives -ve setup or hold time. This may happen when there are diff constraints for +ve and -ve data wrt clk, or when violation regions created by timing checks do not overlap. If any of setup or hold is -ve in sdf file, delays are added. If after delaying, we can make all setup and hold numbers to be +ve, then algorithm converges. If we can't make all of them +ve even after delaying, then all -ve limits are set to 0 to make algorithm converge. if violation regions do not overlap, then all -ve limits are set to 0. By setting -ve limits to 0, we are more pessimistic and are making violation window larger. This is OK as design is still guranteed to meet setup/hold if it passes this more rigorous 0 limit. By adding delays, and making algo converge w/o forcing any -ve limit to 0, we keep the original sdf limits, so that's preferred option.
Look in Ncsim doc(incisive_sim_overview.pdf and negative_timing_check_NTCNNC_AppNote.pdf ) for detail.
NOTE:
1. sum of setup+hold should be always +ve. +ve setup means before the clk, while +ve hold means after the clk.
2. Adding delay on clk line, dec setup time while inc hold time. Adding delay on data line, does opposite. However, violation window (sum of setup+hold) remains same.

ex: (from verilog models file SDC20.v)

$setuphold(posedge CLK, posedge D, 0.01: 0.01: 0.01, 0.01: 0.01: 0.01, GVCnotifier1_zd ,,TCHKON_AND_GVC_S_NOT1_CLRZ_NOT0_ != 0, GVC_CLK_CLK, GVC_D_D ); => 10ps is specified as setup/hold time for pos D wrt pos clk. similarly for neg D. GVCnotifier1_zd is the notifier which toggles if a violation occurs. GVCnotifier1_zd goes from x to 0, whenever a violation occurs. This switching causes la_nudp primitive to o/p an "x". Before toggling, it does tcheck_cond (TCHKON_AND_GVC_S_NOT1_CLRZ_NOT0_ != 0). GVC_CLK_CLK is an exact delayed copy of CLK, while GVC_D_D is an exact delayed copy of D.

$recrem(posedge CLRZ, posedge CLK &&& SLEEPMODE, 0.01: 0.01: 0.01, 0.01: 0.01: 0.01, GVCnotifier1_zd ,,TCHKON_AND_GVC_SD_NOT0_D_NOT0_S_WHATEVER_ != 0, GVC_CLRZ_CLRZ , GVC_CLK_CLK); //here (COND SLEEPMODE) is added in sdf file to recrem arc

$width(negedge CLRZ &&& TCHKON_AND_Q != 0 ,0.01 : 0.01 : 0.01 ,0, GVCnotifier2_zd) ;

--------------------------
verilog simulation details:
---------------------------
At beginning of sim, time T (tracks sim time in timesteps) is set to 0, nets set to z, variables set to x. All procedural blocks (initial and always blocks) then become active. In Verilog-2001, variables may be initialized in their respective declarations and this initialization is permitted either before or after the procedural blocks become active at time 0.

These active events (evaluate or update) are put in active event queue. They are then evaluated or updated. update and evaluate events can happen in any order, depending on their position in active event queue.
If it's update event, then specific objects are updated, and any evaluation events resulting from these are added to active event queue.
If it's evaluate event, then specific processes are evaluated and any update events resulting from these are then added to active event queue. Note that Blocking assgn (BA) are evaluated but have no update events.

A. We keep going thru these active event queue. active events such as blocking assignments and contiuous assignments can trigger additional assignments and procedural blocks causing more active events and NBA update events to be scheduled in the same time step. Under these circumstances, the new active events would be executed, until queue gets empty.
B. Then inactive events (as #0 blocking assignments) are activated causing them to move from "inactive event queue" to active event queue. They may cause more events to be activated and this contiues, until "active event queue" gets empty.
C. Then we activate all NBA update events => NBA update events are put in active event queue from NBA update event queue. When these activated events are executed (causing LHS to be executed), they may cause additional processes to trigger and cause more active events and more nonblocking update events to be scheduled in the same time step. Activity in the current time step continues to iterate until all events in the current time step have been executed and no more processes, that could cause more events to be scheduled, can be triggered.
D. At this point, all of the $monitor and $strobe commands will get moved from "monitor event queue" to "active event queue". This causes them to get executed and display their respective values. Then the simulation time T can be advanced to the next time step. Then we activate all inactive events for time T.

Queues: there are 4 queues that are kept for current sim time, and many others for future sim time:
1. active event queue: most verilog events scheduled here: BA, evaluate RHS of NBA, continuous assgn, $display stmt, evaluate i/p and update o/p of primitives. These events can happen in any order.
2. inactive event queue: all #0 BA kept here. Note, these are evaluated when there are no more active events in the current time T. These #0 stmt should NOT be used.
3. NBA update event queue: updates LHS of NBA and keeps the update event here.
4. monitor event queue: $monitor and $strobe stmt.

Ex of code of RTL simulating: in snug_2002_cec_verilog.pdf page 8-10 (section 4.0)
module tb;
reg a=0; reg b=0; //assigns a to 0 at time T=0

tb techniques:
--------------
assgn at time 0:
-----
at time 0, either always block or initial block may become active first (IEEE std says all procedural blocks become active, but doesn't specify the order). That may cause race condition in sim if the initial block becomes active first, as first edge of signal in initial block may not be seen by always blocks. All vendors have implemented Verilog simulators to activate all always blocks before activating initial blocks, which means that the always blocks are ready for the edge signal before the edge signal is defined in an initial block. We can't count on this, so we use NBA for signals in initial block.

In the reset ex below, If the initial block becomes active before the always block, the always block will not recognize reset until the next detected posedge clk or the next assertion of reset. So, we can either use NBA or put a delay for reset so that it asserts 1-2 clk cycles after sim starts

//reset coding
-------------
initial begin
rst_n <= 0; // NBA will force the reset signal to be executed at the end of time step 0, after all of the always blocks have become active. This will force the always blocks to trigger again when the reset is updated, still at time 0.
//#5 rst_n = 0; //this will also work, as rst_n remains x for 5 delay units.
...
end

always @(posedge clk or negedge rst_n) ...

//clk osc coding
----------------
`define CLK_PRD 10
initial begin
clk <= 0; //NBA forces signal to go low at end of time 0, after all seq proceses have become active. This ensures that any procedural that might be sensitive to a negedge clk will be triggered at time 0. We chose 0 as initial value, since most of the designs are posedge clk based, so we avoid +ve edge at time 0 by having this. For negedge clk based design, we could have clk <= 1.
forever #(`CLK_PRD/2) clk = ~clk; //this is more sim efficient BA (blocking assignment) inside forever stmt. We could have also written the whole procedure separately as "always #50 clk = ~clk;" and setting clk to 0 at time 0 by doing "reg clk = 0;".
end

ex: always #50 clk <= ~clk; => This has NBA, so ~clk is stored in tmp0, and sim progresses to next delta step, since there's nothing to block it. It comes to begin of same stmt and again stores a copy of ~clk in tmp1. It keeps on doing it, till it runs out of memroy. So, use BA so that the stmt gets executed before the time can adance further.

##we can also model clk as follows
initial begin clk = 0; end //this causes a -ve edge on clk from x to 0
always @clk clk <= #10 ~clk; //this is executed on 1st x->0 edge (from above initial stmt). clk will get stored in tmp0 and get assigned to 1 after #10. loop comes back to beginning as it's NBA. It waits for next edge of clk which happens #10 later.

##generally, we see clk osc modeled as follows: (This is OK except that clk -ve transition from X to 0 at time 0 may not be seen by all blocks)
initial forever begin
clk = 0;
#5 clk = 1;
#5;
end

#this also models clk in just 1 line
ex: wire #5 clk = clk === 0; => starts with clk=0, then changes to clk=1 at #5 and then repeats.

NOTE: we can also build clk osc using clk osc example shown in "always" notes above.

//timeout coding => to kill test when sim has run for a large time, and hasn't ended normally (put this in digtop_tb.v)
`define SIM_TIMEOUT 500_000_000
initial begin
#`SIM_TIMEOUT
$display ("** TEST KILLED ** (Time:%d)", $time);
$finish(2);
end

//flag "x" and "glitches" on any o/p pin => since they indicate some underlying ckt problem. include this file in digtop_tb.v
time time_prev; //64 bit int num used
always @(out1) begin //shown only for 1 o/p port. Repeat this for all o/p ports
if (out[1]===1'bx) $display("X-state: at time %t",$time); //x-state
if ($time-time_prev < glitch_limit_out[1]) $display("Glitch: at time %t",$time); //glitch compared
time_prev=$time; //this stores stores time when signal changed.
end

//state machine coding
2 ways: one where we separate out combo and seq flops, and other where we directly code it as one.
1. coded separately: next state coded as combo logic (preferred as per LINT tools)
ex: always @* begin //combo logic coded here
     sm_state_a = sm_state_r; //_a refers to i/p of flop (combo=next state), while _r refers to o/p of flop (seq=current state). This stmt is needed as it says that incase nothing is assigned to sm_state_a below, assign flop o/p sm_state_r to it. If we don't do this, then a latch may be inferred for those cases in this combo logic. This stmt may also be put below in if-else stmt instead of putting it here, but it's safer here as it guarantees that old value will be kept incase no new value is assigned in code below, we don't need to check for all if-else conditions below to find out if latch is inferred or not.
     sm_data_a = sm_data_r;
     if (...) sm_state_a = 3'b000 else if (...) sm_state_a = 3'b111
     else begin
       case (sm_state_r) //NOTE: It has next state from flop. usual state machines are written starting from here as above stmt only assign defaults in case they are not assigned below.
         SM_STATE0: begin sm_state_a = SM_STATE1; sm_data_a = 'hF; end
         SM_STATE1: begin ... end
         default : begin ... end //default is not needed as we have sm_state_a = sm_state_r at top to take care of any default cases
        endcase
      end
     end

     always @(posedge CLK, negedge XRST) begin //seq logic coded here
       if (~XRST) begin sm_state_r <= 0; sm_data_r <= 0; ... end
       else       begin sm_statr_r <= sm_statr_a; sm_data_r <= sm_data_a; ... end
     end

2. coded as one: next state coded on RHS while current state coded on RHS of flop

-----------------------------------------------
verilog simulator: see in simulation.txt file for info.
************
NcSim: type "quit" at command line if it keeps on running

--------------------------

crypto openssl code:
-------
test.v:
always @(trig) begin
      fm = $fopen("file_msg","w");
      fk = $fopen("file_key","w");
      fo = $fopen("file_out","w");
      $fwrite(fm,"%h%h%h%h",msg4,msg3,msg2,msg1);   //msg, key, tmp, etc are defined as int, so stored as hex here
      $system("xxd -r -p file_msg file.bin"); //-r=reverse (hexdump to binary) converts hex msg to bin msg
      $sformat(str,"openssl aes-128-ecb -e -in file.bin -out file.aes -K %h%h%h%h -iv 0 -nopad",key4,key3,key2,key1);
      //str = "openssl aes-128-ecb -e -in file.bin -out file.aes -K f6e1a2ed6bd2ebd7f98854f35e0c0fbc -iv 0 -nopad"
      $system(str);         //execute above openssl cmd (str stores the cmd)
      $system("xxd -p file.aes > file_out"); //-p=binary to hexdump. dump o/p in hexdump
      $readmemh("file_out",arr); //store enc msg contents in "reg [127:0] arr"
      tmp4 = arr[127:96];
      tmp3 = arr[95:64];
      tmp2 = arr[63:32];
      tmp1 = arr[31:0];
      $display(" Msg values are: msg4=%h msg3=%h msg2=%h msg1=%h",msg4,msg3,msg2,msg1);
      $display(" Key values are: key4=%h key3=%h key2=%h key1=%h",key4,key3,key2,key1);
      $display(" Enc values are: tmp4=%h tmp3=%h tmp2=%h tmp1=%h",tmp4,tmp3,tmp2,tmp1);
end

------------------------------

Details: Published: Thursday, 28 December 2017 12:37; Hits: 2414

This section deals with all aspects of hardware design.

Open source tools:

There has been a lot of development in open source CAD tools for VLSI design. Though these are not state of the art, but they are good enough to fabricate multi million gate designs in 14nm and above. A lot of developments have taken place, and as of 2023, tons of chips have been fabricated relying entirely on open source tools. FOSSi (Free and Open Source Silicon) Foundation ( https://fossi-foundation.org ) is behind this development too.

Check this link on youtube for latest developments: https://www.youtube.com/watch?v=OmEbzRp_NGg

In keeping up with philosophy of open source, I'm going to list most relevant open source tools that can be used to design hardware. In the past, you had to assemble all these open source tools, and then use them together to start from RTL and get to the final gds. However, now as of 2023, there are multiple flows available, that take these separate open source tools, put them in a package, and just download the flow kit. This has made things lot easier.

For anyone to be able to fabricate the final chip, we need 3 components:

PDK (Process Design Kit) from fab: There are tons of fabs in US and abroad that take your design in GDS format, and print it in silicon. TSMC is most well known which is a pure-play silicon foundary. Samsung, Intel, etc have their own Foundary for chips that they design in house.
- SKywater Tech (https://www.skywatertechnology.com): Skywater Tech is the only US based pure-play silicon Foundary, based in Bloomington, Minnesota. It was formed in 2017 and went public in 2021. It has collaborated with Efabless and Google to create the first open source chip manufacturing program. They fabricate chips in 90nm and 130nm CMOS process. They have open sourced Sky 130nm (S130) PDK, which is available on github. Link: https://github.com/google/skywater-pdk. This was the last hurdle for open source tool chain to clear, as prior to this, there were no real world open source PDK. There were some experimental PDK, but not ones that could take your design and fabricate silicon.
- Efabless (https://efabless.com): Efabless is another Fab Company which is a crowdsourcing design platform for custom silicon. They have Multi Project wafer (MPW) shuttles that anyone can get their silicon design on. They use
Use tool chain flow from RTL to GDS
Flow for submitting the GDS to fab
Librarystart from RTL to

These are the 2 flowchain (or tool chain) that I'm going to talk about.

Open Road:

This is the latest open source toolflow chain that has industry heavyweights behind it. It was launched in 2018 with DARPA. UC San Diego is leading the effort with involvement from companies as Qualcomm, ARM, etc.

Official website: https://theopenroadproject.org/

Resource link on above site talks about all the steps needed => https://theopenroadproject.org/resources/

All User uides on installation/setup/running are here (all relevant docs are here): https://openroad.readthedocs.io/en/latest/

Start from here which starts with flow-scripts: https://openroad-flow-scripts.readthedocs.io/en/latest/

Ubuntu and CentOS are both supported. I'll go with Ubuntu, since CentOS isn't supported officially anymore (as of 2023). So lot of stuff in CentOS breaks with installation on newer laptops (due to older drivers not working any more). We'll go with local installation of open road on our Linux system.

Steps here: https://openroad-flow-scripts.readthedocs.io/en/latest/user/BuildLocally.html

Qflow:

This was a flow that was developed around 2018. The website is http://opencircuitdesign.com. It has links to all open source tools that can actually be used to design and simulate real circuit. The founder of this website, Tim Edwards, has actually designed and fabricated a microcontroller using only the open source tools (in 2018), and it was a first time silicon success (i.e no bugs in fabricated chip). Some of the open source tools have been written by him, while some he got from others. But he combined all of them, put them in a neat flow, that can take an RTL, and generate gds. No doubt, he has done an amazing job. I've followed his instructions step and step, and have been able to get the whole toolflow working on Linux OS "CentOS Linux release 7.5.1804". Below, I'm going to show step by step instructions on how to get started with his toolflow, called "qflow".

Before we go into the toolflow, we need design of transistors, gates etc that can be fabricated in fabs. OSU (Oklahoma State universiyty) provides all of this at this link: https://vlsiarch.ecen.okstate.edu/flow/. These are included as part of tool flow "qflow", so you do not need to download anything from here, but it's good to keep a copy of the material in a separate directory on your machine.

The download link at bottom of this (http://vlsiarch.ecen.okstate.edu/flows) page provides all design related files needed for different cad tool flows (synopsys and cadence and mosis). Using these cad tool flow, full designs can be done and chips fabricated in different fabs (AMI =American Microsystems Inc (purchased by OnSemi) and TSMC). These open design related files were developed by this prof, James Stine. He was initially working at IIT (Illinois Insttitute of Tech), and then moved o to OSU (Oklahoma State University). So, you will find material from both places, but OSU stuff is latest, so use that for design flow.

If you follow the link, you will see 3 dir: These dir have tech files related to these nodes => AMI 0.5um, AMI 0.35um, TSMC 0.25um, TSMC 0.18um, FreePDK 45nm. FreePDK 45nm has been designed jointly with North Carolina State University, and is the most advanced node currently supported. The 3 dir you see are:

1. FreePDK_SRC => It has 45nm design files. In this there is *.tar.gz. Download that file and extract it in a dir named "FreePDK_SRC" (or any other name, doesn't matter). It will create a dir named "OSU_FreePDK". Inside it are 2 *.tar.gz. Extract both of them to create 2 new dir: OSU_FreePDK_Tech and osu_freepdk_1.0. We will not bother with 45nm design at all, as it's very advanced for our experimental purpose.

2. MOSIS_SCMOS => This has MOSIS (Metal Oxide Semiconductor Implementation Service) SCMOS (scalable CMOS) design files. MOSIS is a middle man foundary service that provides fab access to TSMC, GF (global Foundaries), AMS and AMI (now part of OnSemi). It has IIT and OSU stdcell libraries. We will not bother with IIT libs, as they are older.We will only deal with these 2 dir (download these 2 tar.gz files in a dir named "MOSIS_SCMOS" (or any other name, doesn't matter):

A. osu_soc_v2.7 => Inside this is a tar.gz file. Download and extract it in a dir named "osu_soc_v2.7" ((or any other name, doesn't matter). It will create 2 subdir "cadence" and "synopsys", as shown in the link. This is the version that is used by "qflow".

B. osu_stdcells_v2.4 => Inside this there are 3 tar.gz files. Download and extract it in a dir named "osu_stdcells_v2.4" ((or any other name, doesn't matter). After extracting all 3 of them, it will create 3 subdir "flow", "lib" and "ref_designs", as shown in the link.

3. stdcell_datasheet => This has datasheet for all stdcells in different tech (AMI 0.6um, AMI 0.5um, AMI 0.35um, TSMC 0.25um, TSMC 0.18um, FreePDK 45nm). We do not need to download anything from this dir, as it is for informative purpose only. We will need to refer to this datasheet from time to time though, so will be nice to keep this link bookmarked.

In my case, after downloading and extracting everything, dir structure looks like this:

/home/vlsi/osu_flows

FreePDK_SRC => It has files for FreePDK 45nm
- OSU_FreePDK
  - OSU_FreePDK_Tech
    - cdssetup
    - lib
    - techfile
  - osu_freepdk_1.0
    - flow
    - lib
    - ref_design
MOSIS_SCMOS => It has files for AMI 0.5um, AMI 0.35um, TSMC 0.25um, TSMC 0.18um
- osu_stdcells_v2.4
  - flow
  - lib
  - ref_design
- osu_soc_v2.7
  - cadence
    - flow
    - lib
    - ref_design
  - synopsys
    - flow
    - lib
    - ref_design
flow => flow dir has techfiles, tcl scripts
lib => lib dir has .v files, .lef files
ref_design => ref_design dir has a reference design of a mips processor, that can be used as a sample design to work with.

First, we need to understand chip (IC) design, and then use this understanding for system design, so that we can have a hardware that can actually do something. Chip design will be explained in a separate section. Here I'll go with the toolflow "qlow" instructions and setup. You will need to know the design process, before you go into this toolflow section.

Page 79 of 79

Start
Prev
70
71
72
73
74
75
76
77
78
79
Next
End

Nav view search

Navigation

Search