PT path timing cmds:

Most important info that we get from running a timing tool on the design is to find out if all the paths in design meet timing requirement. There are diferent types of timing that needs to be met for each path. i.e setup, hold, recovery/removal, etc. We have 2 kinds of cmds to show us the timing paths. We saw under "PT: object access functions" section that get_* and report_* are 2 kinds of cmds that allow us to access and report objects. For timing paths, we have those 2 cmds available:

1. report_timing cmd: This is for reporting path timing. This is for visual reporting, and can't be used in scripts etc.

2. get_timing cmd: This is for getting the timing path as a collection of objects, which can then be sued inside a script to get parameters of interest.

report_timing => most powerful cmd to see details of a timing path. This is the cmd that you will be using the most in any timing tool to debug paths and check timing. So, it's best to see this cmd in a separate section of it's own in detail. Max 2M paths are shown (If you specify a value >2M in any of the options below, it will error out or won't be honored).

Syntax: report_timing <options> <path_collection>

Options: There are lots of options available. See PT manual for full syntax. Here's the important ones.

  • -from/-to/-through: Options here same as in timing exception cmds (see that section) where we specify start/end point (pins, ports, nets, seq cells or clocks) of the path.  Can also specify direction as -rise_from, -fall_from, -rise_to, -fall_to, -rise_through, -fall_through.
    • If multiple objects are specified in -through option, then all paths passing thru any of the objects are reported. Ex: report_timing -from A1 -through {B1 B2} -through C1 -to D1 => this cmd gets a path that starts at A1, then passes through either B1 or B2, then passes through C1, and ends at D1.
  • -path_type < summary | end | short | full | full_clock | full_clock_expanded > => summary shows only startpoint, endpoint and slack. Useful for quick overview of path. full_* options are same as those show for -path above.
    • This old format is deprecated: -path < full | full_clock | full_clock_expanded > => default is full (meaning full clock paths are not shown).  use full_clock_expanded to see full clk path (including generated clk point). -full_clock and -full_clock_expanded are different only when clk is a generated clock.-path is deprecated, so use -path_type shown below.
  • -delay_type min|max => min=hold, max=setup (min_max reports both min and max timing). -delay (used previously) looks deprecated as of PT 2018, so use -delay_type.
    • Default is to show "max" (i.e setup) paths if no -delay_type option provided. So, if we want to see hold paths, we have to specify "-delay_type min" for hold paths (and "-delay_type max" for setup paths). Use -delay_type min_max to show both setup and hold arcs in same report. Options -max/min_rise/fall can be used to report path which are only rising/falling at data path endpoint.
  • -nets -cap(or -capacitance) -tran(or -transition_time)=> to show these values in timing reports. -voltage shows voltage (when we have Multi Voltage design, it's useful). -derate, -variation are used to report when we have derate/variations applied to PT runs.
  • -input_pins / -include_hierarchical_pins => Option -input_pins shows input pins in the path report. By default, the report shows only output pins. this is sometimes helpful as we want to know the i/p pin as well as o/p pin of a gate to check the arc in .lib. -include_hierarchical_pins shows hier pins crossed, as well as all leaf pins in the path (these show as 0 incremental delay).
  • -sort_by group|slack => By default, paths are sorted by group, and within each group, by slack (i.e if multiple groups present, all paths for grp 1 will be shown first, then paths for group 2 and so on). 
  • -nworst : number of paths to report per endpoint (default=1). We need to use this option when we do report_timing to a particular end point, and want to see all failing paths thru it. We usually set -nworst to something large so that we can all worst violating paths, irrespective of whether they are to the same endpoint or not. Otherwise we'll only see 1 failing path to each endpoint (even if they come from many different startpoints).
  • -max_paths : number of paths to report per path group. (default=1)
  • -start_end_pair => by default, only 1 worst path is reported (as max_paths=1 by default). However, this reports single worst path for each startpoint-endpoint pair. This option can result in large num of timing paths reported, so use -slack_lesser_than to limit paths. We can't use this option with -nworst, -max-paths, -uinque_pins or slack_greater_than (as start_end_pair option is meant to be used when we just want to do report_timing and don't want to clobber the report with similar paths).
  • -unique_pins => This reports only the single worst timing path thru any seq of pins. As an ex, we might have the same seq of pins repeated for diff paths, as rise/fall may change for different pins, and so multiple such paths are possible (I think only 2 paths are possible for unique start and end point, one with flop o/p rise at start and other with flop o/p fall at start. All other directions for rise/fall are determined based on gate polarity). This avoids displaying such duplicate paths which are different just in the unateness. This option is especially useful when we have non-unate logic as XOR gate, since it can result in large number of rise/fall combinations causing our report_timing to show 100's of paths (since XOR gate o/p depends on the value of other pins too, it may serve as inverting or non inverting so we may have to consider all possible cases).
  • -false => usually we provide list of false paths to PT to remove those paths from timing consideration. By using this option, we can also have PT automatically detect false paths based on the logic configuration of the design. This option is rarely used, as we don't want to rely on the tool to figure out FP for us.
  • -exceptions all => reports timing exception that applies to that path (to see why a certain path is unconstrained etc). Exception may be because of false_path, multicycle path or min/max delay. This is to helpful to use, when you have unconstrained endpoints in design. You can use this option on that path to see why is the path unconstrained. Other options are "dominant" and "overridden". We use option "all" as that shows us all exceptions that made this path unconstrained.
  • -slack_greater_than/-slack_lesser_than => used to show paths within particular slack range. Default is to show paths with slack < 0. To show paths with all slacks, use "-slack_lesser_than infinity"
  • -include_hier => reports timing with all hier included for a given net. So, if a given net is traversing 3 hier, we only see 1 line in reports. But by using this option, this one net will show up 3 times, with all 3 hier shown on separate lines. This is sometimes easier to debug as we may know the net name at a certain hier only. In such a case, printing all hier helps.
  • -group {*gating} => shows paths only for that path group. Use -group [get_path_groups *] to get reports for all path groups, where worst path for each path group is reported. If we don't use this option, then by default only the worst path in the whole design is shown. Usually we use it when we have multiple clock groups, and we want to see path associated with particular clock, but we don't know the start or end point.
  • -crosstalk_delta => reports annotated delta delay and delta transition time which are computed duringcrosstalk SI analysis. Using this option does not initiate crosstalk analysis.
  • -nosplit => this option is same as that used in other PT cmds, where it prevents line splitting. This way everything is printed in 1 line (by default, PT creates new line when text can't fit in 1 line, which becomes an issue for scripts that parse reports)
  • -normalized_slack => Sometimes we may want to report paths by normalized slack instead of raw slack. This option allows us to do that. To enable normalized slack analysis, set the timing_enable_normalized_slack variable to true before running timing analysis. The rationale behind normalized slack is this. Let's say we have few paths on clk running at 100Mhz and few other paths on clk at 1GHz. If both paths are failing by 0.1ns, then it's lot more expensive to fix 1Ghz path, as clk freq would need to increase by 10%. But for 100MHz path, we can just increase the freq by 1MHz and pass timing. So, instead of arranging paths by pure slack number, it's more beneficial to list them by slack which takes impact of clk freq. PT reports normalized slack = slack/allowed_path_delay.  Allowed path delay (aka normalized delay) is 1 cycle of clk for full cycle path, 0.5 for half cycle paths, n ycles for MCP, etc. normalized slack is a decimal from 0 to 1. So, 0.2 implies that clk freq will need to be increased by 20% to pass timing. 
  • -pba_mode < none | path | exhaustive > => specifies path based timing analysis modes. There are 2 kinds of timing analysis (Link: https://solvnet.synopsys.com/retrieve/012134.html):
    1. Graph based (GBA, default): looks at worst case i/p edge rate and load on a cell, and picks up the appropriate delay to use. Option <none> enables this mode.
      • Why do we even have GBA in STA tools? The answer is => for faster run times. Let's take an ex: if we have a nand gate, we should time both the paths from A1->ZN and B1->ZN with their own slew rate and delays. This is the correct thing to do. Then at o/p ZN, we will get 2 different values of slew and delay, depending on whether the path came from pin A1 or pin B1. Now let's assume that downstream of Nand gate, we have a buffer, which also has an arc from I->Z. Now pin I of this buffer can have 2 possible slew values depending on whether it came from A1 pin of Nand gate or B1 pin of Nand gate. Then o/p Z of buffer can also have 2 possible slew values and 2 possible delay values (corresponding to each slew). If the upstream nand gate was 100 i/p gate, then this buffer would have 100 possible slew/delay values (one for each path). If STA tools started storing slew/delay values for each upstream path for a given arc, then we would have 1000's of values stored per arc, and STA will take forever to run. To help STA run faster, we store only 1 slew and delay value for each arc (in reality, we store 2 values: 1 for min corner, and 1 for max corner). We stamp these min/max values for each arc in design. 1 slew value will give only 1 delay value, since delay of a timing arc is dependent on i/p slew and o/p load, both of which are fixed (i.e have only 1 value). In order to ensure that the design will time conservatively, we choose the worst case slew at the input pins of a gate, and stamp that worst slew on all i/p pins. That gives us a single o/p slew rate, that we use to propagate downstream. In ex above, we stamp these worst case min/max slew values on buffer/nand_gate timing arc. These worst slews are now propagated for all i/p pins of a gate, even though in silicon, these worst slews won't happen for all the paths. But that's the price we pay to speed up STA.
    2. Path based (PBA): This looks at all the paths in question, and figures out which path gives the worst timing. In this case, if path x has larger delay, but better edge rate than path y, which has smaller delay but much worse edge rate, PBA will analyze both paths separately, and pick the worst one to apply and move to the next cell. In Graph based, to analyze path x, it would have just picked the worst edge rate from path y (and stamped that on to the i/p pin of the cell for path x), and larger delay from path x (and use slew and delay to calculate delay thru the cell) even though this may not happen. Options -path and -exhaustive used for this. Option <path> performs path-based analysis on paths after they have been gathered by graph-based analysis. It's faster but worst case may not be reported. Option <exhaustive> performs an exhaustive path-based analysis to determine the truly worst-case paths in the design (after doing the recalc). This is the most accurate and most computation-intensive mode. You cannot use the exhaustive mode together with the -start_end_pair, -cover_design, or path_collection options. We always run with -path, and if that makes the path pass timing, then we are good. If that still fails, then we use exhaustive mode.
      • NOTE: For Cadence ETS, cmd for pba mode is "report_timing -retime path_slew_propagation"

Ex:

  • report_timing -from A/CP -to B/D -delay min => report timing for top failing path for hold.
  • report_timing -group [get_path_groups *] -path_type summary => This reports 1 path for each path group in STA with summary only. i.e it only shows SP, EP and Slack for top path in each path grp.

Path collection: This is used with options as -start/-end/-through. Here we specify start, thru or end points. As we saw in section on "PT timing exception cmd", we can specify cell names for start/end_point. Then, all paths from all pins of start cell to all pins of end cell are considered. We know that valid start points are CLK pin of launch flop and valid end points are Data pin of capture flop for a data to data path. But a cell has 4-5 pins as  Clk, Data, Set, Reset, Out pin Q etc. Since only of these pins is a vlaid start/end point for a given flop, PT warns that there are multiple invalid start/end points and it'll be ignoring those.
Ex: report_timing -from sync_reg -to tsd_latch => PT warns that of 5 pins in start_point of sync_reg (Dff10 has PREZ,CLK,D,Q,QZ pins), 4 are invalid start points. CLK of Dff10 is the only valid start point. PT also warns that of 4 pins in end_point of tsd_latch (LAT10 has SZ,RZ,Q,QZ pins), 2 are invalid end points. SZ/RZ of LAT10 are the only valid end points. For PT false paths, start point should always be CLK and endpoint should always be D.

IMP: Synthesis tools as DC and VDIO don't warn about invalid start/end points when cells are provided in collection list. They just ignore the constraint, if it doesn't conform to valid start/end point. VDIO/ETS reports may show startpoints as Q or D, but when false pathing, we should always write them as from CLK, or they might get ignored.

 

Correlate timing sown in report_timing with that in lib files: Sometimes, we want to check if the delay reported in report_timing cmd for a certain cell is what is expected based on the delay of that cell in the .lib file. To do this, we do "report_timing" for path of interest. Then we run following cmds

  • Get name of lib cell for cell of interest: use get_lib_cells cmd.
    • pt_shell> get_lib_cells -of [get_cells chip/gen/U134] => {"TSM_SS/OAI21_LVT"} => This is the lib cell used.
  • Get name of .lib file where that lib cell is: use get_attr cmd.
    • pt_shell> get_attr [get_lib TSM_SS] source_file_name => /home/.../TSM_SS.db => This is the name of the file where that lib cell resides.
  • Report pin to pin delay for the cell
    • pt_shell> report_delay_calculation -from chip/mod1/A1 -to chip/mod1/Z => this shows details of how cell delay was calculated (at given i/p transition and o/p load)

 

report_clock_timing => This cmd is specifically for showing detailed rpt on clks. It shows clock timing info summary, which lists max/min of skew, latency and transition time over given clk n/w.

  • report_clock_timing -type summary -clock [get_clocks *] => lists clock timing info summary, which lists max/min of skew, latency and transition time over given clk n/w.
  • report_clock_timing -type skew -setup -verbose -clock [get_clocks *] => This gives more detailed info about given clk attr (over here for skew). By default, the report displays the values of these attributes only at sink pins (that is, the clock pins of sequential devices) of the clock network. Use the -verbose option to display source-to-sink path traces.
  • report_constraints -min_period => This cmd also reports all min period viols. Pu this cmd in some PT section. FIXME

 


 

PT reporting style:

PT reports timing for clk and data path in 2 separate sections. 1st section "data_arrival_time" refers to data path from start point, while 2nd section "data_required_time" refers to clk path of end point. 1st section shows path from clk to data_out of seq element and then thru the combinational path all the way to data_in of next seq element, while 2nd section shows primarily the clk path of final seq element, ending at clk pin. Towards the end of 2nd section, it shows the final "data check setup time" inferred from .lib file for that cell.

Reports are shown for per stage. A stage consists of a cell together with its fan out net. So, transition time reported is at the i/p of next cell. delay shown is combined delay from i/p of cell to o/p of cell going thru the net to the i/p of next cell. & in report indicates parasitic data.

Ex: report_timing -from reg0 to reg1 => a typical path from one flop to other flop
Point Incr Path
------------------------------------------------------------------------------
clock clk_800k (rise edge) 1.00 1.00 => start point of 1st section
clock network delay (propagated) 3.41 4.41
.....
Imtr_b/itrip_latch_00/SZ (LAB10) 0.00 7.37 r
data arrival time 7.37
-----
clock clk_800k (rise edge) 101.00 101.00 => start point of 2nd section (usually starts at 1 clk cycle delay, 100 ns is the cycle time here)
clock network delay (propagated) 3.85 104.85
.....
data check setup time -0.04 105.76 => setup time implies wrt clk, data has to setup. So, we subtract setup time from .lib file to get data required time (as +ve setup time means data should come earlier)
data required time 105.76
------------------------------------------------------------------------------
data required time 105.76
data arrival time -7.37
------------------------------------------------------------------------------
slack (MET/VIOLATED) 98.39

 

Async paths in DC vs PT:

Async paths are paths ending in clear or preset pin instead of ending in D pin. These paths are covered differently than regular data-clock paths. DC (design Compiler from Synopsys used for synthesizing RTL) and PT treat these paths differently when reporting.

2 kinds of Async paths:

  • recovery/removal checks: PT performs this check by default, but DC neither analyzes nor opt these paths.
    • DC: To analyze and opt these paths in DC, use this: set enable_recovery_removal_arcs true
    • PT: To disable these paths in PT, use this: set timing_disable_recovery_removal_checks true
  • Timing paths thru asynchronous pins (i.e paths flowing thru set/reset pins to Q/QZ o/p pin of the flop and then setting up to clk of next flop as D pin, these are clear/preset arcs in .lib file) : by default neither PT nor DC report these paths.
    • DC: To report these paths in DC, use this: -enable_preset_clear_arcs (in report_timing cmd). Even if we have this option, it only allows us to view these paths during reporting, but DC never opt these paths.
    • PT: To report these paths in PT, use this: set timing_enable_preset_clear_arcs true (default is false)

NOTE: For recovery/removal paths, use Q of 1st flop as startpoint and CLRZ/PREZ of next flop as end point. For some reason, using CLK of 1st flop as startpoint doesn't work.

Latch based borrowing:

In latch based paths, borrowing occurs if data arrives in the transparency window. See PT doc (page 36-38). So startpoints may be from D, Q or CLK pins of Latch. CLK and Q startpoints are treated as starting from the clk of latch, while D is treated as starting from the D i/p of latch and going thru the latch to Q o/p pin of latch. Note, this behaviour is different when VDIO/ETS is used to report such paths. In VDIO/ETS path from startpoint D is still the same, but paths from CLK and Q startpoints are treated as worst case slack paths from D, CLK or Q.

to report timing for such paths, and see the latch borrowing, use -trace_latch_borrow (otherwise startpoint delay is shown for D, doesn't show the actual path to D)
Ex: report_timing -from ... -to ... -trace_latch_borrow

Ex: of time borrowing path
Point Incr Path
---------------------------------------------------------------
clock spi_stb_clk (rise edge) 55.00 55.00 => start point
...
data arrival time 56.55 => this is data path delay from 1st flop.
---
clock clk_latch_reg (rise edge) 1.00 1.00 => end point
...
Iregfile/tm_bank0_reg_9/C (LAH1B) 3.66 r => this is total delay to clk of latch
time borrowed from endpoint 52.89 56.55 => since data comes much later than rising edge of clk, we borrow the difference (56.55-3.66=52.89) from this latch, so that next path from latch o/p will consider that D->Q delay is 52.89ns.
data required time 56.55
---------------------------------------------------------------
data required time 56.55
data arrival time -56.55
---------------------------------------------------------------
slack (MET) 0.00

Time Borrowing Information
---------------------------------------------------
clk_latch_reg nominal pulse width 100.00 => this is width of clk pulse = clk_period/2
clock latency difference -0.36 => this is reduction of pulse width due to diffeerence in rising and falling edge
library setup time -0.26 => this is setup time of latch (wrt falling edge of clk) which needs to be subtracted, as max time available is reduced by this
---------------------------------------------------
max time borrow 99.37 => final max time available to borrow
actual time borrow 52.89 => actual time borrowed < max available. so timing met
---------------------------------------------------

 


 

get_timing_paths => A direct counterpart of report_timing is get_timing_paths. This cmd creates a collection of timing paths for custom reporting or other processing. All options are the same as report_timing cmd. The order in which paths are returned from get_timing_paths matches the reported timing path order of report _timing.

Syntax: get_timing_paths <options> <path_collection> => Same as in report_timing cmd.

NOTE: the collection returned above will always show only one path unless we use the option "-max_paths". By default, max_paths is 1 (as in report_timing), so only 1 path will be reported, and size of collection will always be 1. Even with "-max_paths" set, we still have to provide option "-slack_lesser_than infinity" to see all paths, otherwise only paths with -ve slack will be put into the collection.

Ex below:

  • pt_shell> set mypaths [get_timing_paths -nworst 4 -max_paths 20] => It gets the collection of paths returned by get_timing_paths (with 20 such paths) and sets "mypaths" var to that collection for later processing.
  • pt_shell> sizeof_collection $mypaths => returns 20
  • pt_shell> report_timing $mypaths => 20 path timing reports displayed

Ex: report_timing [filter_collection [get_timing_paths -max 10000 -slack_lesser_than -10 -slack_greater_than -25] "dominant_exception == min_max_delay"] -path_type summary -max_paths 10 => Here we passed get_timing collection to report_timing to report timing for selected paths only. 

To get info about paths (i.e start/end points, slack, etc) we have to use get_attributes cmd. But for that we need to know what all attr are available for paths. We can use our familiar list_attributes cmd (explained in PT cmds section). define_user_attribute and set_user_attribute can be used to define our own attr and set it on paths of interest.

list_attributes -application -class timing_path => Most useful attr for paths are

  • startpoint,
  • startpoint_clock,
  • endpoint,
  • endpoint_clock,
  • slack (slack of path),
  • startpoint_clock_latency,
  • endpoint_clock_latency,
  • clock_uncertainty,
  • common_path_pessimism (CPP)
  • points => points correspond to pins or port along the path. Most widely used attr of a timing path is the "points" collection. A point corresponds to a pin or port along the path. Iterate through these points and collect attributes on them.
  • arrival => arrival time of signal at a point in path starting from clk pin of launch point taken as time=0. There are 2 different ways, arrival is reported
    • arrival of a point in path => This takes CLK pin of launch flop as time=0 and doesn't account for clk n/w delay of launch flop.
    • arrival of the path => This accounts for clk n/w delay i.e wherever clk originates from, that is taken as time=0 for clk. In report_timing, data delay is shown accounting for clk waveform, i,e if data is launched of -ve edge of clk, then 1/2 cycle of delay is added when reporting data path, but here, "arrival" doesn't account for that, as it's purely meant to provide delay of the path, with clk n/w delay account for.
    • IMP: As a result of discrepancy b/w the 2 arrival times, you have to add "startpoint_clock_latency" to arrival time of last point in data path, to get the "arrival time" of the full path.

 

Ex: report arrival time at each cell in the specified path. The path has a launch flop, and nand2 gate and a capture flop. For simplicity, we just assumed one path. If there are multiple such paths, outer foreach loop will iterate thru all such paths.

foreach_in_collection paths [get_timing_paths -from A/flop1 -to B/flop2 -max_paths 100 -slack_lesser_than infinity] {

set path_points [get_attribute $paths points]
set startpoint [get_attribute $paths startpoint]
set endpoint [get_attribute $paths endpoint]

set path_delay [get_attribute $paths arrival] => This reports delay for all of the data path accounting for clk n/w delay.

set slack [get_attribute $paths slack]

puts "startpoint -> [get_attribute $startpoint full_name]"

        foreach_in_collection point ${path_points} {
             set arrival [get_attribute $point arrival]
            puts " point -->  [get_object_name [get_attribute $point object]] arrival -> $arrival"
        }

puts "endpoint -> [get_attribute $endpoint full_name]"
puts "slack -> $slack"

puts. "path delay -> $path_delay"

set launch_clk_lat [get_attribute $path startpoint_clock_latency] => To get clock latency for startpoint. since clk latency for startpoint isn't included in above "arrival" numbers for points (it's included in "arrival" for whole path though). It's already accounted for max/min delay based on setup or hold path.

set capture_clk_lat [get_attribute $path capture_clock_latency] => To get clock latency for endpoint. It's also accounted for max/min delay based on setup or hold path.

set clk_unc [get_attribute $path clock_uncertainty] => clk uncertainty is reported as -ve value as it eats into the available setup cycle time.

set clk_cpp [get_attribute $path common_path_pessimism] => this is CPP value that is used in CPPR (common path pessimism removal). This is reported as a +ve number as it provides extra time to meet setup cycle time. This was the clk pessimism (max_dly-min_dly) that was already added in the clk latency delay for capture clk. 

set final_capture_clk_delay_shown_in_pt [expr $capture_clk_lat + $clk_cpp + $clk_unc] => NOTE: we are adding all 3 values, as clk_cpp is reported as +ve number, and clk_uncertainty is reported as -ve number

set clk_diff [expr $launch_clk_lat - $final_capture_clk_delay_shown_in_pt] => This is the final delay b/w launch and capture clks. If we add it to path delay, then that gives us the net delta b/w data arriving wrt clk. You can subtract "net delta" data delay from cycle time to this to get setup time that you get from report_timing
}

Output from above code => It iterates thru the given path and reports arrival time at each pin or port along the path. The delay starts with a value of 0 at the CLK pin of launch flop, and keeps on adding incremental delays thru each gate it encounters until it gets to the D pin of the capture flop.

To get clock latency for startpoint and endpoint: use these attr =>

sartpoint -> A/flop1/CP

point -->  A/flop1/CP arrival -> 0.00 => NOTE: clk delay taken as 0 at startpoint of clk pin of flop (no clk n/w delay added here). So arrival time of 0 assigned to CLK pin
point -->  A/flop1/Q arrival -> 0.22

point -->  A/I_nand2/A arrival -> 0.22

point -->  A/I_nand2/Z arrival -> 0.29
point -->  B/flop2/D arrival -> 0.29

enpoint -> B/flop2/D

slack -> -0.11

path_delay -> 0.35 => This is assuming a clk n/w delay of 0.06 (0.29+0.06=0.35). Here network delay of clk added to "arrival time" at CP pin. 

 ----------------

Instead of going thru the loop for all points in the path, we can also get attr directly for all points on the path.

ex:

  • get_attribute [get_timing_paths -from x1 -to x2] launch_clock_paths => gets the lauch clk path, i.e from launch clk to the clk pin of the launching flop. Similarly for capture clock paths, we use capture_clock_paths.
  • set my_points [get_attribute [get_attribute [get_timing_paths -from x1 -to x2] launch_clock_paths] points] => Here we get all the points on the launch clk path shown above.
  • set my_obj [get_attribute [get_attribute [get_attribute [get_timing_paths -from x1 -to x2] launch_clock_paths] points] object] OR set my_obj [get_attribute $my_points object]=> Here we get "object" attr of all points on the path. We need to get object attr in order to get object names
  • get_object_name $my_obj => This shows names of all points, i.e CLK_PAD I_BUF1 I_AND2 ... I_FLOP/CLK (the clk launch path starts from clk pad pin and ends at clk pin of the flop)

 


 

Reporting logic depth of any path:

The above cmd can be used to find logic depth of any path in PT. Solvnet already has a proc here: https://solvnetplus.synopsys.com/s/article/Find-the-Logic-Depth-of-a-Timing-Path-in-PrimeTime-1576092703524

The proc is to be used in PT, but a similar proc is there for Synthesis tool too (in the same link above). It goes thru the timing path, getting all the obj in the timing path. Then it gets all the input pins of such obj (it should report only 1 i/p pin per gate in the path, assuming 2 i/p pins are not shorted to same signal). Then it gets cells of all such pins, and reports them. You can exclude buf/Inv by using -exclude_unary (then it counts only those cells which don't have 2 pins, since inv/buf are the only gates which have 2 pins).

Put this proc in a file named get_logic_depth.tcl, source that file and run cmd as below

pt_shell> get_logic_depth [get_timing_paths -slack_lesser_than 0] => reports "15" as the logic depth for the top path with slack < 0. Only 1 path is reported for get_timing_paths, so only top paths logic depth reported.

proc get_logic_depth {my_path {exclude_unary ""} } {
  set my_cells  [get_cells -quiet -of \
                [get_pins -quiet \
                [get_attr -quiet \
                [get_attr $my_path points] object] \
                -f "pin_direction==in"] \
                -f "is_combinational==true && defined (lib_cell)"]
  if {$exclude_unary == "-exclude_unary"} {
        set my_cells [filter_collection $my_cells "number_of_pins!=2"] }
        return [sizeof_collection $my_cells]
}

define_proc_attributes get_logic_depth -info "Find Logic Depth of a Timing Path" \
  -define_args {
      {path "A single path collection" path list required} \
      {-exclude_unary "Exclude Buffers/Inverters along the path" \
      "\b" string optional}
}

 


 

check_timing cmd:

check_timing => reports possible timing problems in design. This cmd is run before generating timing to identify potential timing constraint issues. The cmd prints all the checks it's running, and violations found. This is a very imp cmd that should be run and all violations fixed (or atleast understood, since violations here are serious).

options are:

  • -verbose => prints detailed violations, instead of just summary.
  • -override_defaults <check_list> => The default checks performed are done based on checks in "timing_check_defaults" var. Using this override option, we dump out info for only the check_list provided.
    • ex: check_timing -override_defaults no_clock -verbose=> -override_defaults allow us to override checking all std timings checks, and instead just report for the ones listed. Here we listed "no_clock" check only, so only this check run. This is useful to isolate various checks in their own separate report file.
  • -include/-exclude <check_list> =>This adds/removes checks from the default checks in "timing_check_defaults" var.

Checks: There are many checks performed, but these are the important ones to look at:

1. no_clock: Warns if no clock reaches a register clock pin. There should not be any except for spare flops whose clock pins are tied to high/low. These should be looked at carefully, as there will be no setup/hold checks performed on the data pin of that flop.

2. unconstrained_endpoints: Warns about unconstrained timing endpoints. This warning identifies timing endpoints (output ports and register data pins) that are not constrained for maximum delay (setup) checks. If the endpoint is a register data pin, it can be constrained by using create_clock for the appropriate clock source. You can constrain output ports using the set_output_delay or set_max_delay commands.

  • Endpoint is unconstrained because of no_capture_clock (create_clock needed on the capture flop clk pin to fix this), dangling_end_point (set_output_delay needed on the o/p pin to fix this), or fanin_of_disabled (paths ending at fanin of disabled timing arc. to fix this, undo disable timing arc)
  • Startpoint is unconstrained because of no_launch_clock, dangling_start_point or fanout_of_disabled (paths starting from fanout of disabled timing arc)
    #In both PT and ETS reports, whenever there is "no clock found" (no_clock warning in item 1 above) associated with any capture flop, all i/p (D, CLRZ, PREZ) are reported as unconstrained.

3. unexpandable_clocks: Warns if there are sets of clocks for which periods are not expandable with respect to each other. The checking is only done for the related clock domains, such as ones where there is at least one path from one clock domain to the other. This could be because of an incorrectly defined clock period for one or more of the clocks. Another possibility is when asynchronous clocks with unexpandable periods are interacting where they should have been defined in different clock domains.

  • Generally, for the multiclocks defined, PrimeTime must ensure that the common base period for a set of related clocks is divisible by every clock in the set. The 'unexpandable_clocks' check warns if there are pairs of clocks where the periods are not expandable with respect to each other. In the case where the two clock periods differ, PrimeTime calculates a common base period (LCM) between the two clocks and expands both clocks to this common base period until it finds the common edge between the two clocks. If the clocks can not expand to a common base period, the "PTE-053" warning is issued.

The relationship between the clocks within a set is known as "related". This means, the cross-clock domain timing paths require all the clocks in the set to expand to a common base period, but due to differences in the clock period they can not be expanded.

4. no_input_delay, partial_input_delay, no_driving_cell, dangling startpoints: All are Warnings for i/p ports, if no i/p delay or no driving cell speciified. Also partial i/p delay (such as set_input_delay -min or -max is specified, but not both) are also flagged.

5. clock_crossing, ideal_clocks, generated_clocks: All are clock related warnings.

  • clock_crossings check for clock interactions b/w multiple clock domains. This is ignoring any FP or async/exclusive clk relation specified in constraints. If it finds a timing path and all paths between the two clocks are false paths or they are exclusive/asynchronous clocks, the path is marked by *. If only part of paths are set as false paths or exclusive/asynchronous clocks, the path is marked by #.
    • Report format is:There are as many lines as clocks with crossings (each line has one clk listed with it's interacting clks). So the report can be parsed easily via tcl cmds.
      • 1st line: <CLK1> <list of clks which have have paths from CLK1 to these clks, separated by comma>. i.e clk1 clk2*, clk4#, clk9, clk13
      • 2nd line: <CLK2> <list of clks which have have paths from CLK2 to these clks, separated by comma> and so on for all the clocks in design
  • ideal and generated clocks are shown (there shouldn't be any ideal clocks, as all clocks should be propagated in PT runs)

6. loops: warns of combo feedback loops, as these loops are not analyzed. There shouldn't be any combo feedback loops, so be careful with these warnings. PT automatically breaks these combo loops by disabling one or more timing arcs, if not already broken by user using "set_disable_timing" cmds.

  • IMP: Synthesis tools as Genus break the timing loop by inserting a buffer "cdn_loop_breaker". It disables the timing arc from its input to the output, thus breaking the timing loop. It doesn't time these paths. PT on the other hand doesn't care about these buffers as these are just like regular buffers for PT. PT may either break timing loops by breaking loop somewhere else, or may even time thru the loop. So, there might be timing diff b/w synthesis and PT, if these combo loop paths are present. Always look for timing loops before you investigate any timing path.
  • Genus: In synthesis tool as Genus, do "report_timing -lint" to report the loops in terms of "cdn_loop_breaker" cells. Then compare them to those in PT.

7. Voltages: All signal levels, supply voltages, pg pin connections are reported.

  • signal_level: Checks that the driver signal level matches the load signal level. The signal levels are determined from the UPF or from lib attr.
  • supply_net_voltage: Checks that each segment of UPF supply nets has voltage assigned to it by set_voltage command.
  • unconnected_pg_pins - Checks that each power and ground pin is connected to a UPF supply net. The connection can be implicit (for example, power domains using UPF) or explicit (for example, connect supply_net).

8. Misc: Many more checks. Look in PT manual.

Timing Exceptions: These are cmds that adjust the required time of path or disable/enable paths for timing. So, these paths are not analyzed for timing or are analyzed with new modified timing if the cmd specifies so. These cmds are used in synthesis flow also (they are part of SDC cmds) to force synthesis tool to optimize for new timing specified via these cmds. These cmds can be of 2 types. 

A. point to point timing exceptions cmds: (from node A to node B in timing path). 5 of them. All have options as below:

  •  -setup/-hold => cmd applies to setup or hold timing path only. By default, these cmds apply to both both setup and hold timing for specified paths.
  • -from/-rise_from/-fall_from => specifies list of timing path startpoint objects (clock, PI, Seq_cell, clk pin of seq cell, data pin of level sensitive latch, or pin that has i/p delay specified). _rise/_fall specify that path must rise or fall from object specified (if clk is specified, then rise/fall refers to only the paths launched by rising/falling edge of the clock at the clock source, not at flop. if flop is specified, then rise/fall refers to startpoint/endpoint of flop). If clk is specified as starpoint, all registers and primary inputs related to that clock are used as path startpoints. If a cell is specified, one path startpoint on that cell is affected. Only one of 3 options can be used.
  • -to/-rise_to/-fall_to => specifies list of timing path endpoint objects (clock, PO, Seq_cell, data pin of seq cell, or pin that has o/p delay specified). _rise/_fall specify that path must rise or fall to object specified (if clk is specified, then rise/fall meaning same as shown in above option). If clk is specified as endpoint, all registers and primary outputs related to that clock are used as path endpoints. If a cell is specified, one path endpoint on that cell is affected. Only one of 3 options can be used.
  • -through/-rise_through/-fall_through => Specifies a list of pins, ports, cells, and nets through which the paths must pass. rise/fall applies to path with rise or fall at specified objects.
  • -rise/-fall => applies timing exception only to paths having a rising or falling signal at timing endpoint. This looks similar to -rise_to/-fall_to ?? (except that you do not specify endpoint explicitly). NOT to be used, as it's confusing (may befor le gacy purpose)
  • -comment <string> => specifies comment for our understanding. This -comment section may be used by your custom scripts to apply custom things to these cmds (i.e some keywords in -comment section may be used to indicate that this exception is only to be applied when running at particular PVT, your internal script can then handle such cases efficiently)

 

NOTE:

  • All startpoint or endpoint in -from/-to have to be pins (of leaf cells) or ports of design (with the exception that clocks and sequential cells may be specified). They can't be nets or pins of sub modules. -through is where we can specify nets.
  • We can specify pins or ports directly in start/end points (i.e -from top_a/b_reg/Din). Although it's good practise to enclose that as get_pins, get_cells, get_nets etc (i.e -from [get_pins top_a/b_reg/Din]).
  • Patterns may be used is startpoint/endpoint object names to specify multiple paths in same exception cmd. However, PT issues a warning a saying that wildcards are being used in these cmds. It's strongly discouraged to use patterns in start/end points, as timing tool won't tell us to which all places these timing exception got applied. It will apply to all eligible points based on a pattern matching, and it may get applied wrongly to points you never intended to. You will never know about it, until the path shows up in silicon, and then you report timing on that path only to find out that the path has timing exception on it. If you do have to apply wildcard pattern, create a for loop and go thru each element of the for loop. Within the loop, apply each of the expanded element to timing exception cmd explicitly.

 



1. set_false_path: sets false path on the timing path specified. This timing path is not checked for timing at all. Both pins/cells may be used in from/to list. Most used cmd in soecifying exceptions. Read detailed syntax in PT manual.
set_false_path   -from [list [get_cells Ireg/trim_reg_0] ... [get_pinss Ireg/trim_reg_2/CLK]] -to [list [get_cells Ispi/sreg_reg_1] ... [get_pins Ispi/U49/A]]

see "reset_path" cmd for resetting this exception.

 




2. set_multicycle_path: sets multi cycle path.on the timing path specified. This timing path is still checked for timing, but the default behaviour is altered. By default, PT/DC exhibit single-cycle behavior for any path. This means that when data is launched at a particular clock-edge, the tool performs a setup check at the very next clock-edge. There are 2 Hold checks that are performed for each setup check as discussed in "setup/hold" section. For simplest designs based on single freq clocks, where the flops are fired every cycle, setup check is performed on next edge (one clock cycle apart), while hold check is performed on the same edge (zero clock cycle apart). MCP changes this default behaviour. We can specify any number of cycles for setup and any number of cycles for hold, and the tool will now honor these new cycle requirements (the old requirements are removed)

To change default behaviour, we can do it in 2 ways via MCP:

  • move the capture edge (or end-point clock) to the right (moving clk edge backward or allowing more time), or to the left (moving clk edge forward or allowing less time)
  • move the launch edge (or start-point clock) to the left (moving clk edge forward or allowing less time), or to the right (moving clk edge backward or allowing more time)

By default, when we move edges by applying MCP, capture edge is moved to the right (backward) for setup and to the left (forward) for hold. Launch edge is left untouched. We can change this behaviour by using -start/-end option. We'll discuss that later below. First we talk about "multipler" which we specify to get multicycle behaviour.

Multiplier: Multiplier is a little confusing term to get thru. Multiplier specifies the number of cycles the data path must have for setup or hold, relative to the startpoint or endpoint clock, before data is required at the endpoint. Assume a single freq clk at both launch and capture. Then startpoint and endpoint clocks have same clk freq. Here "setup of n cycles of MCP" implies "n cycles" b/w launch and capture. But for hold, "hold of n cycles of MCP" doesn't imply "n cycles" b/w launch and capture. By default, hold check is always automatically moved to 1 cycle before setup. So, "setup of n cycles" changes hold to "n-1" cycles. This is the default hold behaviour, and we specify it by saying "hold of 0 cycles of MCP", which means hold has to meet a timing requirement of "n-1" cycles. This is a very difficult hold requirement to meet as data has to be held constant for "n-1" cycles, i.e data can change only between (n-1) cycles to n cycles, sp path will need to be buffered up (min delay is > (n-1) cycles, max delay < n cycles). If we want to relax hold timing, then we need to move hold capture edge inwards (to the left) to get it closer to capture edge. If we move the hold requirement to be for "n-2" cycles, then we specify it as "hold of 1 cycle of MCP". So, you can see as to why timing tools by default have "hold of n-1 cycles" => It's most pessimistic hold requirement to meet, so risk is lowest.

setup/hold: These extra 2 options of -setup or -hold implies if MCP is for setup paths or hold paths. If no setup/hold specified, then Multiplier is for setup (hold is assigned a multiplier of 0). If only setup multiplier is specified, hold is automatically moved to 1 cycle before setup (for both the hold checks). If we want to change this hold behaviour, then we should specify multiplier for hold also. Specifying multiplier for setup only (no hold multiplier), changes both setup and hold behaviour. But specifying multiplier for hold changes hold behaviour only, and doesn't affect setup behaviour. Using -hold with MCP is more confusing, and set_min_delay cmd (see next section) may be preferred to set hold requirement.

  • set_multicycle_path 4 -from ... -to ... => Here no setup/hold specified. So, this multiplier is for setup (as no -setup or -hold specified). Setup check is now 4 cycles instead of 1 cycle. Hold multiplier is kept at default value of 0, so hold check is 3 cycle, instead of 0 cycle (i.e same as adding this MCP: set_multicycle_path -hold 0 -from ... -to ...)
  • set_multicycle_path -setup  4 ... => Here -setup option used, so setup check is same as above (as setup check is assumed by default). There is no difference b/w this and the above MCP (assuming we don't specify any other MCP with -hold for this path)
  • set_multicycle_path -setup  4 ...; set_multicycle_path -hold  0 ...; => This is again same behaviour as above, as we kept default value of hold to 0. So, setup check=4 cycles, hold check = 3 cycles. This is most stringent hold requirement as data can't change before 3 cycles, and can't change after 4 cycles, so effectively only 1 cycle (the 4th cycle) available for data to change. A lot of buffers will be added on the data path, so that there is enough delay to met these requirements.
  • set_multicycle_path -setup  4 ...; set_multicycle_path -hold  1 ...; => Here setup behaviour is same as above, but hold check is moved in by 1 cycle compared to it's default of 0 cycle. So, setup check=4 cycles, hold check = 2 cycles.
  • set_multicycle_path -setup  4 ...; set_multicycle_path -hold  2 ...; => Here setup behaviour is same as above, but hold check is moved in by 2 cycles compared to it's default of 0 cycle. So, setup check=4 cycles, hold check = 1 cycle.
  • set_multicycle_path -setup  4 ...; set_multicycle_path -hold  3 ...; => Here setup behaviour is same as above, but hold check is moved in by 3 cycles compared to it's default of 0 cycle. So, setup check=4 cycles, hold check = 0 cycles. So, hold check is now on same clock edge, as what we used to have in simple designs with flops firing on every clk cycle. This is most relaxed hold requirement as data can't change before 0 cycles, and can't change after 4 cycles, so effectively 4 cycles available for data to change. This is most likely what we want in design, as MCP implies existence of clk gating or something similar that prevents other clk from firing during that time, so there's no risk of capturing newer  data on subsequent capture clk edges, before the correct capture clk edge arrives.

In report_timing cmd, when we report_timing for MCP using "report_timing -from <> -to <> -exceptions all", we'll see it report setup cycles and hold cycles as whatever was specified in the MCP cmd. So, if MCP has "-setup 5 -hold 2", then report_timing will report "5 setup cycles and 2 hold cycles".

start/end: Other extra 2 options for MCP are -start/-end.

If you note above, we specified the multiplier but we never said which clock does that multiplier apply on - is it the start clk or the end clk? Generally for single clk designs, both start clk and end clk have same freq and we don't need that. But if launch clk and capture clk are at diff freq (i.e paths crossing clk domains for multi freq design or divided versions of sync clks) and have paths which are valid (usually paths crossing clk domains are marked as invalid, but paths b/w divided clks are still valid), then we need to specify the freq of the clk too. Usually for regular timing runs, we see that Capture clk is relevant for setup, and launch clk for hold. Well, PT follows the same principle for MCP and has default assignment for the multiplier => it uses the multiplier wrt end clock (capture clk) for setup, and wrt start clock (launch clk) for hold. To change this default behavior, we can specify -start for setup, and -end for hold. By specifying this option, we control 2 things:

  • Launch/Capture edge: By using -start for setup, we move the start (launch) edge forward/backward. By using -end for hold, we move the end (capture) edge forward/backward. I don't know if this is true, as timing runs on PT_SHELL show capture clk as always moving left/right. I need to run more experiments to find that.
  • Launch/Capture freq: By using -start, we specify that multiplier should use freq of start (launch) clk. By using -end, we specify that multiplier should use freq of end (capture) clk.

4 cases possible:

  • A setup multiplier of 2 with -start moves the relation backward one cycle of the start clock. i.e start clk moves 1 cycle to left.
  • A hold multiplier of 1 with -start moves the relation forward one cycle of the start clock. i.e start clk moves 1 cycle to right. Here we are reducing hold check by 1 cycle, i.e making it less stringent.
  • A setup multiplier of 2 with -end moves the relation forward one cycle of the end clock. i.e end clk moves 1 cycle to right. This is what happens by default for setup checks when MCP is applied => capture edge gets moved.
  • A hold multiplier of 1 with -end moves the relation backward one cycle of the end clock.

Very good explanation on solvent: https://solvnetplus.synopsys.com/s/article/Setup-and-Hold-Checks-Between-Fast-and-Slow-Clock-Domains-1576007257628


Ex:
set_multicycle_path -setup  2 -fall_from [get_clock auto_sden_scanclk] -rise_to [get_clock SCLK] => implies setup check is now 2 cycles instead of 1 cycle. Here capture edge is moved to right by 1 cycle. In order to move launch edge to the left, we have to use -start (use -end to move capture edge to right, which is default anyway). By moving setup check, hold check also got moved by 1 cycle (i.e instead of checking hold time in same cycle, tool checks it against next cycle). If we want to change this behaviour and keep the original behaviour of hold, we need to move hold by 1 cycle (MCP -hold 1 ...).
set_multicycle_path -hold 1 -to u2/D -end => implies hold is moved in by 1 cycle inwards to the left, so it's back to how it was before we set multicycle for setup. (if we had -setup 5, then we needed to have -hold 4 to get default behaviour back for hold). here hold capture edge is moved to left (use -start to move launch edge to right which is default anyway)

set_multicycle_path 18 -hold  -from clock_in -to ModeSelects -start => -start/-end Specifies whether the multicycle information is relative to  the period  of  the start clock or the end clock.  These options are only needed for multifrequency designs; otherwise start and  end are  equivalent.  The start clock is the clock source related to the register or primary input at the path startpoint.   The  end clock  is  the  clock  source related to the register or primary  output at the path endpoint.  The default is to move  the  setup check  relative to the end clock, and the hold check relative to the start clock.  A setup multiplier of 2 with  -end  moves  the relation forward one cycle of the end clock.  A setup multiplier of 2 with -start moves the relation backward one  cycle  of  the start clock.

NOTE: FP takes precedence over MCP (since for timing exceptions, FP paths imply path is not to be considered for timing). Also, specific SMD cmd (in bullet 3 below) overides a general MCP cmd .

see "reset_path" cmd for resetting this exception.

 



3. set_max_delay/set_min_delay (SMD) => Specifies  the  desired  maximum(setup)/minimum(hold) delay for paths in the current design. It is timing exception command; i.e.it overrides the default single-cycle timing relationship (derived from clk waveform and i/o delays)  for one  or  more timing paths. set_max_delay has the side effect of breaking the timing graph at the 2 endpoints of the constraint (similar to set_disable_timing constraint). So, be very careful when using this cmd, as the only constraint that remains on the path specified via this cmd is the delay check for the endpoints specified. All other checks that were there on this path are deleted, i.e there's no setup/hold arc anymore. SMD work only on valid paths. If the path is invalid, then SMD will just be ignored. This solvnet article on synopsys website states that SMD cmds don't work for False paths (or paths b/w clocks defined as async clks). Link: https://solvnetplus.synopsys.com/s/article/Constraining-Paths-Between-Asynchronous-Clock-Domains-1576092601595

Sometimes, SMD cmd is useful compared to setting muticycle path cmd. We don't usually use SMD cmds in synthesis flow, as they are mostly used for controlling delays. Sometimes, designers do use them in synthesis flow to balance skew between multiple signals (achieved via using min delay of 1ns and max delay of 1.2ns for all bits of a bus to keep all bits within 0.2ns of each other). SMD cmds are most useful when running a hier design, where lower level blocks are placed/routed separately and then integrated in chip level. There if we have feed through path thru the lower level partition, then we would want to constrain the feed thru path within a certain range at block level, so that the path can meet timing at full chip level. Then block level synthesis and PnR will try to meet that delay requirement. Now when we run full chip level STA, with that block level partition as a black box, then we'll be able to meet timing at top level w/o touching the block level placement or route. One other place where MSD are used are in constraining the delay from pad of the chip to some internal capture node, when we don't want the different bits of a bus to skew too much.

NOTE: SMD should never be used in regular STA flow in functional mode. Since SMD only work on synchronous paths (i.e valid paths), they remove setup/hold time requirement of original valid path, and replace it with these SMD requirements. This will result in tool not even checking for setup/hold requirement of the original synchronous path, thus causing silicon failure. If we do have to use SMD, we should create a separate mode just for SMD purpose, put all SMD that we want to honor, and let tool optimize the path for SMD. Then when we merge modes (Functional mode and this other mode) during PnR, then both requirements will be tried to be met (i.e setup/hold requirement of Func mode and SMD requirement from this other mode). set_data_check cmd (explained below) should always be used instead of SMD, as set_data_check can do everything that SMD cmd does, without breaking the original timing graph. set_data_check is very well suited for bounding skew of different bits of a bus within a certain limit. set_data_check also works on async paths, so no reason to use SMD at all.

SMD cmds should be used in pair, i.e both set_max_delay and set_min_delay should be used on a given path, so that we meet pseudo setup as well as pseudo hold requirement for this segment of path.

NOTE: SMD cmds are different than set_input_delay and set_output_delay soecified in SDC and used in synthesis flow. Look in SDC section for details on those.

options:

  • -from/-to/-through (and all other variations as-rise_from, -fall, etc) => Synopsys manual states that -from/-to need to be valid start/end points (i.e flop CP or D pin, clocks, etc). However, when we specify invalid start/end points, then it causes timing path segmentation, and PT issues UITE-217 warning. One of -from or -to is needed. -through is needed to force the cmd thru certain cells, etc.
  • -ignore_clock_latency => When path startpoint or endpoint is on a seq cell, max_delay cmd includes clk skew in computed delay. This option allows us to ignore the launch/capture clk latencies and treat the paths as clockless. When reporting paths (report_timing) by clocks which have been ignored this cmd will result in unconstrained paths.
  • <delay> => This is a number provided in same units as that specified in library. input/output external delay are included in path delay if path start/emd points are ports. Also, unless clk skew is ignore with cmd above, launch/capture clks are included if start/end point are on seq device. Also, c2q of launch flop and setup time of capture flop are included in delay for start/end points on seq cells.

report_timing on SMD paths: When doing "report_timing" on these paths, we have to use -from/-to to be the points that we used as -from/-to in the SMD cmd. This is because the original path is gone, and the only path that remains now is the path from the starting point in "-from" of SMD to the endpoint in "-to" of SMD. When seq points are included in SMD (or valid start/end points), then report_timing should have -from as clk pin of launch flop and -to should be data pin of capture flop (or valid start/end points).

ex: set_min_delay 0.5 -from ff1/CP -to ff2/D => sets hold delay to 0.5 units from CP->D. Here the original 0 cycle hold timing path that existed b/w CP->D is broken, and replaced by this min delay. We should also use a set_max_delay to constrain the max delay of this path too, else we miss the pseudo setup time requirement.

ex: set_max_delay 1.5 -from [get_clocks clk_A] -to [get_clocks clk_B] => This sets a max delay of 1.5 units for all paths from startpoint=clk_A to endpoint=clk_B. But if clk_A and clk_B are async, then this cmd doesn't work. In such cases, we should use some switch that still allows timing analysis on those paths. For set_clock_groups -asynchronous -group {clk_A} -group {clk_B}" cmd, where two clocks are defined asynchronous, we may use "-allow_paths" switch. The switch enables the timing analysis between the paths and preserves SI analysis using an infinite arrival window. So, SMD will work now on such paths.

USAGE SCENARIO: We need SMD like the one shown in example above where we use "-allow_paths" when we want to constraint the delay b/w these 2 clocks (for ex, may be for data bus of a load ctrl synchronizer where we want to keep data bus delay to less than 2 clock period of capture clock. This may be needed if we have a 2 sync flop synchronizer on ctrl bus). There is no other way to constrain the delay for the data bus. We can always use "set_data_check" instead of SMD.

see "reset_path" cmd for resetting this exception.

 


 

4. set_data_check (SDC) => usual checks are data or set/reset wrt clock. However sometimes, we need 1 data signal to be stable before other data signal. A good example of this would be some sort of gater (similar to nand/nor clock gater) where one signal is Data and other signal is En. We might want En to be stable while Data is changing. This will require a data check. Data check should only be applied to data signals, and never to clk signals (i.e for checking skew b/w diff clock, this check not allowed). The data check is treated as non-sequential; that is, the path goes through the related and constrained pins and is not broken at these pins (seq arcs break the data path). So, original timing graphs remain intact. So, this cmd is always preferred over SMD as we don't risk removing valid paths. By using SDC, we are only adding more constraints on top of what was already there. SDC has same precedence as regular reg2reg timing, so it will be overridden or masked by other constraints or exceptions like set_false_path/set_multicycle_path/set_max_delay, etc.

options:

  • -from/-rise_from/-fall_from, -to/-rise_to/-fall_to,etc: behaves same way as other exception cmds
    • -from is always CLK(related pin, startpoint), while -to is always DATA (constrained pin, endpoint) for any path. This is the same convention as an arc for En pin wrt CLK pin in a clk gater, where En is constrained pin, and CLK is related pin. We write arc as "-from CLK -to En". Or in a flop, the setup/hold arc is always written as "-from CLK -to D". In summary, SDC creates a non-sequential setup/hold check from the related pin to the constrained pin 
  • -setup/-hold => If no -setup/hold specified, then that value is for both setup and hold. If we specify -setup, then setup check done. If we specify -hold, then hold check done. If don't specify -hold option in other cmd statement for the same path for which we defined for -setup, then only setup check is done. 
  • -clock => In general. paths to constrained and related pins may come from multiple clocks. Multiple clks at startpoints may be due to 2 reasons => either each of these startpoints have single flop but multiple clks reaching them, or they have multiple flops as startpoints with single clk on each of the startpoint. -clock option is used in both cases (though PT manual says it's only for latter case) and it specifies clks for related pins only (NOT for clks on constrained pins). As such, there's no way to specify clks for constrained pins as sdc cmd line options don't support it (In short, all the clks reaching constrained pins will be checked with SDC constraints applied). If there are multiple clocks on related pin and if we don't force particular clock using this option, then sdc constraints are set with all possible clocks, and put in their respective clock groups (Remember that clk grps are always based on destination clk, and in data to data check, related pin is the destination clk). This is the default behaviour (when not using -clock). Usually, we don't need to use -clock option, as there's only 1 clk, or even if there are multiple clks, we want to have sdc cmd apply for all clks. This option is useful only if we want to exclude certain clks for SDC cmd.
    • A side effect of this is that we may have unintended SDC cmds applied to paths we never intended. Imagine 2 flops (flop A1 and flop B) driving the constrained pin each having a different clk (clk_A and clk_B). Now we have 3rd flop (flop_A2) which is also driven by clk_A and driving the related pin. Now we set SDC cmd as =>  set_data_check -clock clk_A -from flop_A2/D -to flop_A1/D -setup 2 => Here we have wanted to set SDC on the real data/ctrl path, but SDC will get set on the constrained pin starting from flop_B too, as there's nothing in the SDC cmd to prevent this from happening. Whether the SDC will get set or get overwritten with some other constraints as clk_grp, FP, etc is dependent on other constraints.

set_data_check -from Data -to En -setup 2 => En must arrive 2ns before Data. If no -setup/hold specified, then that value is for both setup and hold. If we don't specify -hold option below, then only setup check is done. options -from/-from_rise/-from_fall, similarly -to/-to_rise/-to_fall.

set_data_check -from Data -to En -hold 10 => En must be stable for 10ns after Data

In examples above, -from (Data) is the related pin, and -to (En) is the constrained pin. So, Data acts as CLK in our example.

report_timing => When running report_timing to check these SDC paths, we have to use "rt -to <constrained_pin>". This will report normal R2R path as well as the SDC path depending on which one is the worst violater. We don't use "-from <related_pin>" as that is not a valid start point (even though we use -from <related_pin> in the SDC cmd). Valid startpoint is the start flop that fires the constrained pin. This is similar to how we report-timing for reg2reg paths. Since rt will show both R2R and SDC paths, we should consider putting all sdc paths into a separate clock_group, so that's easy to filter and see only the SDC path by specifying the clk grp. The way to know if path being reported is one with sdc or not, is to look at the bottom of the report for "data check setup/hold time".

Normal flop-to-flop paths in the design are inferred with a multicycle multiplier of 1. This means that setup captures a cycle later than launch, and that hold captures on the same cycle as launch. However, data to data check is inferred with a multicycle multiplier of 0, which pulls the capture one cycle to the left for both setup and hold checking. So, setup checks happen on same cycle, while hold checks happen 1 cycle earlier. The reason why we do this is simple to understand. Both related and constrained signals are fired in same cycle. We can think of related signal as clk going into a flop, which is being used to capture constrained signal which is data. It's all happening in same cycle, so setup check has to be in same cycle. Hold is always 1 cycle earlier than setup so that new value doesn't get captured in earlier cycle (but only in this cycle). The primary usage model of set_data_check was meant to be as setup checks applied to the design. Hold checks existed for symmetry reasons, but the resulting zero-cycle hold check behavior is much less intuitive than the zero-cycle setup check behavior. Most of the times, hold check isn't really meaningful, so we specify data to data check with -setup option only (which does the check for setup only, no hold check).

A very good link explaining this: https://vlsiuniverse.blogspot.com/2013/07/data-to-data-checks-constraining.html

Data to data checks can be specified using "set_data_check" in PT, or can be put in .lib files as "non-seq setup/hold check" for that cell. Advantage of putting it in .lib is that we can specify diff values for setup/hold based on transition of i/p signals. However, for data to data checks that are needed for pins which are not in same lib cell, we can't put such arcs in .lib, so we'll need to put it in PT script as "set_data_check".

A very good solvnet article on synopsys website is here (you can only access it if you have an account with synopsys):

https://solvnetplus.synopsys.com/s/article/What-is-the-Difference-Between-set-data-check-and-Non-Sequential-Library-Arcs-1576002483648

Most of the times, set_data_check are used for constraining skew requirements on different signals (i.e we may want different bits of a bus to be within few ps of each other, in such cases, set_data_check with -ve values allow us to specify such requirement. More in diagram below. We could have used set_max_delay for skew check too,but  set_data_check is preferred for skew check amongst bits of a bus, since it doesn't break the timing graph. 

FIXME: Attach hand made diagram

 


 

5. set_path_margin => specifies margin to adjust for data required time. +ve margin means tighter check, while -ve margin means relaxed check. This helps to see paths which we are historically very tight in silicon, but always shows +ve slack on PT runs. For such paths, we can add extra margin of let's say 50ps, so that they will show up in reports as failing even if they have slack of up to 50ps. It's a point-to-point exception cmd, so it adjusts the time of individual paths, and has same options as other exception cmds (i.e -to, -thr, -from, -setup, -hold etc). By default, margin applies to both setup and hold. We need to apply this cmd in Synthesis/PnR tool too, so that they can tighten the path. PT is just for checking, and can't fix the path.

IMP: Most of the times, we only need to tighten/relax setup paths, and NOT hold paths. So use -setup with this cmd. Hold paths are usually 0 cycle paths, and relaxing hold paths is dangerous. Tightening hold paths is still ok, but you should try to get to the root cause of why hold tightening is needed on these paths. Maybe our models etc are not in line with silicon. A very small hold tightening is OK.

ex: set_path_margin 10 -to [get_clocks CLK] => all paths to endpoints clocked by CLK must be tightened by 10 units (applies to both setup and hold paths). When we do "report_timing", we'll a extra line for "path margin" at the very end, which will reduce the data required time by 10 units (i.e subtract 10), similar to how we reduce "clock uncertainty" from required time.

This cmd may be used to tighten hold paths by a certain fixed number. "-hold 10 -to [all_clocks]" will add 10ps hold time to all paths in design. We also use "set_clock_uncertainty" cmd to add extra hold margin to all paths. "set_annotated_*" cmds are also used to annotate delays on to nets, cells, etc, which may be used to add extra margin on paths (see in annotated cmds section).

ex: set_path_margin -5 -from u_top/ff_1 -to ff_2 => This relaxes path by 5 units from ff_1 to ff_2. 

 


 

6. reset_path => this resets the specified path to it's default single cycle behaviour of 1 cycle setup and 0 cycle hold, i.e this command  undoes the effect of the set_multicycle_path, set_false_path, set_max_delay and set_min_delay commands. It has same options as timing exception cmds as -from, -to, -thr, -setup, -hold, -rise, -from, etc. By default, both setup and hold are reset for the given path. This cmd is useful in cases where we are debugging a path, and want to try few different FP for the path. In such a case, we can try 1 FP, then do reset_path -option <path>, do report_timing to make sure we see original timing again, then try another FP and so on.

NOTE: reset_path has to be applied the same way as original MCP/FP, etc. i.e if MCP is put as "-rise_to clk1", then reset_path also needs to be "-rise_to clk1". We can't do something like "-to clk1" or "-to flop1", even though flop1 is one of the paths on that clk that was MCPed. PT won't complain (it'll return status code as 1), but it just won't reset the paths.

ex: reset_path -from clk1 => resets all paths launched by clk1 to single cycle (there has to be some MCP/FP used earlier with option "-from clk1" for this reset to take effect).

ex: reset_path -from { ff1/CP } -to { ff2/D } => resets path from ff1 to ff2 to single cycle

 


 

B. timing enable/disable cmds: Other exception that disables timing arcs within cells (i.e CLK->Q path for a flop may be disabled).



1. set_disable_timing: this Disables timing  thru list of cells, ports, pins, or timing_arcs if specified. Compile applies min sizing on them, since no timing constraints here.

options: Only options are -from/-to which must be pins on the specified cells (disables all arcs b/w these 2 pins on the cell). If no -from/to specifed, then all arcs on specified cells are disabled for timing. Names of objects must be instance names, and not design names. If we specify lib cell name, then arcs are disabled for that cell, implying all instaces of that cell will have their timing arc disabled.


set_disable_timing [get_cells Idigital_mux/U179] => disables all timing thru this mux

report_disable_timing -nosplit => By default, this reports all disabled timing arcs in design. Since this would be long list, we can provide a collection or list of cells/ports. set_disable_timing is not the only way to disable timing arcs. This report_* cmd shows the reason for the disabled timing arc by showing a flag column with a 1 letter code. It's important to check for that flag when analyzing why certain arc is disabled. Below are the various flags shown:

  • flag=c: case analysis: propagation of set_case_analysis cause arcs to be disabled.
  • flag=C: presence of conditional arc in library (i.e stmt "when (A&B):" etc defined in library)
  • flag=f: disabled due to false net arcs (i.e due to set_false_path??)
  • flag=l or L: PT does loop breaking automatically where it finds loops in design, so such arcs get disabled
  • flag=m: This disabling of arc happens due to the current mode of design being different than the modes specified in lib files for those timing arcs (FIXME ??)
  • flag=p: propagation of constant values disables the arcs here
  • flag=u or U: This disabling is due to set_disable_timing. "u" implies arcs disabled due to set_disable_timing on cell instance arcs, while "U" implies arcs disabled due to set_disable_timing on lib cell arcs,

 ex: report_disable_timing [get_cells { o_reg4 }] => this reports all disabled timing arcs for cell inst "o_reg4", with a flag and a reason for that flag.

 


 

2. set_max_time_borrow:

 


 

C. Forcing pins to constant values:

Here, we force certain pins to a constant value 0 or 1. This may be because certain pins are actually constant in some modes, so for those modes, you may want to mimic how the chip state is going to be in silicon. You don't want to unnecessarily time those paths which are not even valid in that mode.

1. set_case_analysis (SCA): needed to set force a certain value 0/1 on ports or pins. However, be careful since once a pin is forced to a particular value, there is no more timing path thru that pin (as it has a constant value, doesn't change). So, all such paths are removed from timing analysis. In normal runs, if such pins do toggle then we won't be able to catch timing violations if we do this. In theory, we never need to set_case_analysis, but we have different modes of operation (functional, scan, etc). If we run w/o any case analysis, all modes will run in same run, causing a lot of bogus timing paths. So, we use case analysis on quasi static pins (pins which only change once when switching modes, but otherwise remain constant), and run different modes in different timing runs. However if we need to time the path, where we set SCA, we'll need to remove SCA in a separate run, and just do report_timing for all paths thru that pin/net. This will guarantee that we covered this timing path too (in case it turns out be toggling and there's no way user is able to control the signal on silicon).

report_case_analysis used to report all pins/ports which have case analysis set. Useful for debugging.

syntax: report_case_analysis <options> <pin_or_port_list> => If no options provided then all pins/ports in design that have SCA on them are reported. In this case, it doesn't show pins/ports where these constant values set by SCA propagate to. But if we provide pin/port list, then propagated SCA values are shown. Many times we want to know the source pin that causes this case value. That's done using -to option:

ex:  report_case_analysis -to mod1/I_clkmux2/S => This will report case value on the given pin, and also show the source pin in the path where this propagated value came from, as well as the intermediate gates in the path and their case value.

remove_case_analysis used to remove case-analysis set on the pin.

 


 

D misc set_* cmds:

1. set_mode: In liberty files for cells, we can specify an optional mode for timing arcs (see liberty section).

syntax: set_mode -type (cell|design) <mode_list> <instance-list> => instance-list is the instances (not instance definition) on which this mode is to be applied. When we use the set_mode cmd, only the modes explicitly specified are set, all other modes are unset. if we don't use this cmd at all to set any modes, then all modes are enabled (NOT disabled) by default. This behavior may look confusing, but it's to ensure that the tool will still read all timing arcs even if no mode is set. mode type may be of 2 types:

  • cell (default) => modes are defined on library cells in design, which is the most common case, or
  • design => these are user specified modes using few other mode cmds. We won't look at this option at all.

ex:  set_mode func1 block1/u_sram => Here, in liberty file for sram.lib, we have modes defined for certain arcs using stmt "mode(my_mode, func1)". Using the PT cmd "set_mode" in PT script, we are setting the mode to be func1 for particular instance (block1/u_sram) of sram.lib. NOTE: the name of mode group "my_mode" is not specified here, so not sure how multiple mode groups would work?

report_mode: Using this cmd, we can see all modes that are enabled or disabled. Good to use this cmd to check for all modes set.

pt_shell> report_mode => both modes are enabled by default

Cell                         Mode(Group)         Status    Condition  Reason
--------------------------------------------------------------------------------
block1/u_sram(sram_180nm_lib)                           
                          scan_shift (my_mode)  ENABLED        -      cell
                             func2    (my_mode)  ENABLED       -      cell

pt_shell> set_mode func2 block1/u_sram => Here we are explicitly setting mode for u_sram
1


pt_shell> report_mode => Now, the func2 is enabled, but all modes modes get disabled.

Cell                         Mode(Group)         Status    Condition  Reason
--------------------------------------------------------------------------------
block1/u_sram(sram_180nm_lib)                           
                          scan_shift (my_mode)  disabled        -      cell
                             func2    (my_mode)  ENABLED       -      cell

 

 


 

2. set_annotated_*: These cmds are used to override the default delay or checks applied by the tool, and instead annotate them by user provided values. We can either override the tool calculated value, or may add on top of that (using increment option). This is a useful cmd to use in Synthesis and Timing analysis tools to add extra margin to the design or to specific paths. Usual -from/-to, -rise/-fall etc are supported. remove_annotated_* and report_annotated_* may be used to remove or report such annotations.

  1. set_annotated_check: Sets the setup, hold, recovery, removal, or nochange timing check value between two pins on a cell or net. -increment Specifies that the delay value is to be added to the calculated check value of the specified timing arc. If an annotated incremental value already exist, then the new value will be added on the existing one.
    • ex: set_annotated_check -setup -from ff/CP -to ff/D 1.0 => 1 unit is annotated on the setup time between the clock pin and the data pin of a cell instance. If there was already a value provided by lib or calculated by tool, that value is overwritten with 1 unit delay.
    • ex: set_annotated_check -increment -setup -from ff/CP -to ff/D 0.5 => Now, we add an increment of 0.5, so total setup value is 1.0+0.5=1.5 units.
  2. set_annotated_delay: Sets the cell or net delays (instead of check delays as seen in cmd above). Cell delays exist between pins of the same leaf cell. Net delays exist between leaf-cell pins or top-level ports connected by a net. The specified delay value overrides the internally-estimated cell and net delay value. There's an extra load_delay option which specifies whether dely resulting from cpacitive load of the net should be considered part of net delay or cell delay. This will affect annotation.
    • ex: set_annotated_delay -net -rise 1.4 -load_delay cell -from U1/Z -to U2/A => annotates a rise net delay of 1.4 units between output pin U1/Z and input pin U2/A. delay value for this net does not include load delay.
  3. set_annotated_transition: Sets the the transition time that is annotated on pins in the current design. The specified transition time value
    overrides the internally-estimated transition time value.
    • set_annotated_transition -rise 0.5 [get_pins U1/U2/U3/A] => sets rise tran time of 0.5 units on pin A
  4. set_annotated_power: annotates the internal and leakage power on the specified cells and annotates the switching power on the specified nets. No -increment option available, it always overwrites tool calculated value.
    • ex: set_annotated_power -internal_power 0.1 -leakage_power 0.2 U0/U1 =>
  5. set_annotated_clock_network_power: annotate the power values on the clock networks. No -increment option available, it always overwrites tool calculated value.
    • ex: set_annotated_clock_network_power -internal 1.0e-03 -switching 2.0e-03 -clock CKA => internal and switching pwr annotated for clock CKA only.

 


 

Primetime Multivoltage flow

So far, we looked at Primetime Timing runs where we specified only one voltage domain, i.e all the libraries were specified with a single voltage. What if we have a design, where are multiple voltage domains. Some part of logic runs on Voltage A, while other runs on Voltage B.

PT can perform multivoltage analysis, where different cells in design can have different power supply. For this PT needs to know the power intent of design, which is in the UPF file.

Cmds for MV:

Instead of that, we can generate a PG netlist which has all pwr ports (VDD, VSS, VSUB, etc) of leaf instance (libcells will need to have these pwr ports in .lib). Pwr ports are in the netlist at top level too, and they get connected to these PG pins of leaf cells appropriately. Synthesis tools can dump out these PG netlists with correct pwr connections as specified in UPF. Now, it's lot easier to work with this netlist as UPF is not required aany more. All PG conectivity info is directly in the netlist.

Cmds in PT:

 

 

Regression analysis:

This is a term that is used extensively in AI, and is the starting point in AI. In statistics, regression analysis is a set of statistical processes for estimating relationship b/w a dependent variable (also commonly called outcome variable) and one or more independent variables (often called 'predictors', 'covariates', or 'features'). For ex: heart attack vs weight. Here heart attack is dependent var (on Y axis), which depends on weight, an independent var (on X axis). Here, we are trying to find a relationship b/w the 2, and see if they are related. i.e does higher weight causes more heart attack, etc.

Correlation Coefficient (R):  R is a correlation coeff that measures how well X,Y in given dataset are correlated, i.e if X changes by a certain amount, does Y also change by a proportional amount. The correlation of 2 random variables X and Y is the strength of the linear relationship between them. It's a number b/w -1 and +1 (-1 meaning perfect -ve correlation, while +1 meaning perfect +ve correlation, and 0 meaning no correlation).

There are many types of correlation coeff, but the most commonly used is Pearson's correlation coeff (rep by "r" or "R"). To measure R mathematically, we define it as follows

Pearson's r = R = Correlation (X,Y) = Cov(X,Y) / (σ(X) * σ(Y)) => Correlation exhibites same properties as covariance, as it is defined the same way. However we divide it by std debviation terms to normalize it, so that correlation remains b/w -1 to +1. See statistics section for definition of variance and covariance.

The most common form of regression analysis is Linear Regression. A special case of Linear Regression is logistic regression.

 


 

Linear Regression:

Linear Regression is a linear approach used in statistics to model a relationship b/w o/p response (dependent var Y) and i/p parameter (independent explanatory var x0, x1, x2 ...). In simple terms, it's a X,Y plot, where numerous (X,Y) data points are given. Our goal is to find an eqn that very closely fits all the data point. This is linear approach, so data is fitted with a linear line (Y = mX + b), and loss or error is calculated by taking the squares of difference for each data point. Minimizing this loss gives us the best fit, and is called "least squares" approach to fit models to data. Liner fitting or linear regression is the simplest approach, and works well, so it's very widely used.

NOTE: We need both Fitting func and Error func. W/O defining Error func, we have no definitive way to quantify how well our fitting func fitted with the data. Genrally by getting insight into the fitting func, we are able to come up with an error func. Finding the Fitiing func is the harder part.

There are 2 kinds of Linear regression.

1. Simple linear regression:  Here there is only one explanatory var on which o/p response depends. Let's say weight of a person depends on his height, then we can have Y(weight of person) plotted against X(height of person). We'll collect lot of (X,Y) data, plot it, and then do a best linear fit, by drawing a line Y=mX+b thru that data. This is simple linear regression

2. Multiple linear regression:  Here there is more than one explanatory var on which o/p response depends. Let's say in above ex, weight of a person depends on race along with his height. Then we have 2 explanatory var (height and race) on which o/p Y (weight) depends. We'll collect lot of (X0 , X1 ,Y) data, plot it, and then do a best linear fit, by drawing a line  Y=m0X0 + m1X1 + b (Here X0 is height and X1 is weight) thru it. This is a 3D plot (i.e equation of a plane in 3 var) where there are 3 axis, X, Y, Z. where X,Y are two i/p axis, and Z is o/p axis. Similarly it's n dimensional plot (eqn of a plane in n dimensions) for "n" i/p var. Since it's eqn of a plane, it flat and can't be zig-zag, so if data is zig-zag, it may not fit very well. NOTE: here we don't have b0 , b1, separately since all of them can be clubbed into 1 var b (as b= b0 ,+ b1 + ...)

Error func:

We choose our error func for best fit to be something that sums up the differences b/w the actual and predicted value. We take squares, since we want to treat both +ve and -ve differences to be treated as errors, an NOt to cancel each other out. So, our error func to determine best fit is ( Ygiven - Ypredicted )^2 and we try to minimize it. We use calculus to come up with values of m, b to minimize this error. Mean Square error (MSE) measures the mean of this square by dividing it by number of samples. We divide it by "m" to get avg error, so that error func doesn't keep on going up as we increase the number of samples. Root mean square error (RMSE) is taking the root of MSE so that the units of RMSE are same dimension as those of Y.

Coefficient of determination (R2 or  r2 or R-squared): Above we saw that R specifies extent of linear relationship b/w var X,Y. R2 is other term used to specify goodness of fit for a model. It has multiple definitions. Most widely used is that it is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). This link explains it nicely: http://danshiebler.com/2017-06-25-metrics/

Pearson's r2 (square of regression coefficient) is simply the square of pearson's r value, and is a number b/w 0 and 1. It's not the same as R2 that we talked above, but in special cases, it becomes the same.

R2 is most commonly used in regression analysis, where we are more interested in how well our predicted data fits with the actual data. This R-squared is different than "squared error" we talked above. R2 measures how much better does the data fit compared to the mean. it is a number b/w 0 and 1, and is calculated as follows:

R2 = 1-  (MSE / variance(Y)) =>  Variance(Y) is the mean line we draw which doesn't change at all with change in X. So, this is the worst fit that we can do, where it returns mean value of Y for any given X. Since MSE is always going to be smaller than variance(Y), our R2 value = 0 to 1. In worst case, we may have Y_predicted to be the same as mean line, so R2 = 0 (very bad correlation of predicted data), while in best case, Y_predicted is exactly same as Y_given, implying R2 = 1 (very good correlation of predicted data), Hypothetically R2  can be -ve infinity to 1, since we can always do worse than mean prediction by choosing a line for Y_predicted which is going in opposite direction to the real direction. However, this would be intentional. We choose mean as the worst case, since we can always choose mean line as our starting point, and see if we can do better than mean, else we stay at mean line for our predicted values too. From the formula, we can see why we call it square (since we don't take square root, but instead keep square terms in both numerator and denominator).

R2 = (variance(Y) - variance(Y_predicted)) / variance(Y) = 1 - ( variance(Y_predicted) / variance(Y)) => This is another formula for R2 and boils down to same formula as above. FIXME? Above link explains it, but couldn't figure out yet?

R2=0.25 means that only 25% of the original variation in the data is explained by the given relationship, other 75% of variation is going from some other relationship that we don't know yet. So, here correlation is weak (in other words we are only 25% in between mean and perfect fit). On the other hand, R2=0.9 means that 90% of the original variation in the data is explained by the given relationship, so correlation is very strong.

Higher order non linear eqn: Above we used linear regression, which used eqn Y=f(X). But for a better fit, We could go to higher order eqn (i.e Y=f(X2), Y=f(X3), ...) and those will fit the plot better, but those get more complex. Turns out that higher order eqn that look non linear, are actually linear regression. Let's say we have 2 i/p var, X0 and X1. In linear regression Y=f(X0 , X1). However, if we go to 2nd order eqn, then Y=f(X0 , X1,  (X0)^2, (X1)^2, X0*X1). If we choose X2=(X0)^2, X3=(X1)^2, X4=X0*X1, then the eqn ican be expressed as Y=(X0 , X1,  X2, X3, X4). This eqn is linear eqn, it just happens to have 5 i/p var, instead of 2 that we had before. So, Higher order non linear eqn can be treated as Multiple linear regression for all analysis.However, be careful as these higher order eqn may cause overfitting, and may not represent real world effect.

 


 

Logistic Regression:

Logistic regression (or Logit) is a special case of linear regression. In many textbooks, it's not even referred to as logistic regression, but rather as logistic classification. Here the o/p Y can't have infinite values (i.e Y is not a continuous function) but can only have a certain number of distinct possibilities. I.e there may be only 4 outcomes as a shape is a square, circle, rectangle or triangle. However, there are key differences b/w linear and logistic regression. One is that logistic regression predicts the probability of particular outcomes rather than the outcomes themselves, so they are restricted to values 0 to 1. So, o/p Y reps probability of that event happening for given X. Second is that conditional distribution is a "Bernoulli distribution" rather than a "Gaussian distribution" because the dependent variable is binary. FIXME Not sure how ??

Logistic regression is very nicely explained on StatQuest: https://www.youtube.com/watch?v=yIYKR4sgzI8

There are multiple types of logistic regression:

1. Binomial or binary logistic regression: They deal with situations in which the observed outcome for a dependent variable can have only two possible types, "0" and "1" (which may represent, for example, "dead" vs. "alive" or "win" vs. "loss").

2. Multinomial logistic regression: They deal with situations where the outcome can have three or more possible types (e.g., "disease A" vs. "disease B" vs. "disease C") that are not ordered

Since in Logistic regression, Y is not continuous but has distinct values, we don't try to fit Y, but instead try to fit the probability of Y employing the same curve fitting methods , 

Linear model: Y=m0X0 + m1X1 + b => Linear model with 2 predictors X0,  X1

Logistic Model: L = logb(p/(1-p)) = m0X0 + m1X1 + c, where p = P(Y=1), L=log odds of event that Y=1. => Logit model with 2 predictors X0,  X1. Logistic model predicts probability p that Y=1 for given X. It assumes linear relationship between predictors X0,  X1 and log odd of event that Y=1. Base b is usually taken as "e", but sometimes base 10 and 2 are used too. I changed the Y intercept to c here, so as not to confuse with base b.

NOTE: we are doing probability of "odds of event that Y=1", and NOT "probability of event that Y=1". Odds of event that Y=1 is "probability of event that Y=1 / probability of event that Y≠1". So, if P(Y=1) =0.5, then P(Y≠1)=0.5, so odds of event that Y=1 is 1 and NOT 0.5, implying that odd of Y=1 is same as odd of Y≠1. If P(Y=1)=0.8, then P(Y≠1)=0.2, so odds of event that Y=1 is 0.8/0.2 = 4, implying that odd of Y=1 is 4 times the odd of Y≠1.

Now, the question is why do we do logistic regression this way, why don't we just use the same fitting methodology as Linear Regression, i.e why not do "Y=m0X0 + m1X1 + b" , instead of doing "logb(p/(1-p)) = m0X0 + m1X1 + b". The reason is because such linear line will never be able to achieve a good curve fit, as it will have to run thru the middle to fit data. A portion of the data is saturated on lower end, while the remaining portion is saturated on higher end. No matter what slope or C-intercept we choose, the error will be enormous, as line will always run thru middle to minimize error, basically always predicting that the value is 0.5 or so (which makes no sense). For binary classification, we need something non-linear like S shape, which will fit values with least error. Also, we need probability here as we'll never be able to get 0 or 1 values when predicting Y, i.e if we try to predict using Y=m0X0 + m1X1 + b, then for given X, we may get Y=0.7, but that value is meaningless as Y is either 0 or 1. But if we use probability here, then p(Y=1) for a given X makes more sense, as p(y=1) = 0.7 means there is a 70% chance that Y=1 for given X.

If we choose base b=e, and solve for p=P(Y=1) using the eqn above, we get a sigmoid function . i.e p = σ(z) = 1 / (1 - e^(-z)) where z = m0X0 + m1X1 + b. So, it's easier to understand if we assume that sigmoid func came first. It constrained our o/p values to b/w 0 and 1, gave a probability func, and that fitted our requirement well. So, by taking sigmoid of our predicted value, we ended getting that log func of p/(1-p).

This sigmoid is the standard logistic func used to fit data. It gives us the probability of o/p Y for a given i/p X, rather than giving the value of o/p Y. So, our Y data (which is the probability) is always between  0 to 1.

We could have chosen some other eqn too, i.e "logb(p) = m0X0 + m1X1 + b", however that may give a worse fit. It's just a conjecture, I don't know that. Assuming sigmoid is the best function  fitting our requirement, let's calculate the error.

Error func:

Now the question that comes to mind is how do we calculate error for this function to get best fit. Can we do "residual square" method used in linear regression? Turns out that if we do residual square, we end up getting non convex graph with many local minima for Logistic regression. For linear regression, we ended up getting a beautiful convex graph, that had 1 local mimima, so it was easy to find lowest cost. This link explains it very well.

https://towardsdatascience.com/optimization-loss-function-under-the-hood-part-ii-d20a239cde11

As explained in the link, a better function to minimize error is Ygiven * Log( Ypredicted ) +  (1-Ygiven ) * Log( 1-Ypredicted )

If we plot this error function using desmos.com graphing utility, we'll see that the function is a parabola (like umbrella), with 0 at both ends (at x=0 and x=1). It reaches a max value around x=0.5. Here, we chose  Ygiven  to be the same as Ypredicted. So, when both are the same value (i.e both are 0 or both are 1), that means, we predicted perfectly and error func is 0 (as seen at the 2 ends). Anywhere in between, even if both  Ygiven  and Ypredicted  are both same (i.e 0.5, etc), the error func will throw out a non zero value. this is OK as we never have Ygiven to be anything other than 0 or 1. The respective terms, either Log( Ypredicted )  or  Log( 1-Ypredicted ) will take over when Ygiven = 1 and Ygiven = 0 respectively. This will take the error value to very large numbers (to infinity if we predict totally opposite value of what Y is supposed to be. So, the algorithm will try to stay away from predicting totally opposite values.  This is exactly how we wanted our error func to lo behave as.

The above eqn is what we use in all logistic regression as our error function that we try to minimize. We calculate error for each sample using above eqn, sum them up and try to minimize that sum. For logistic regression, we call our cost approach "maximum likelihood", instead of "residual square".

 


 

AI logistic regr example

M pictures with 1 pixel value each. Let's say we have m pictures, each with 1 pixel (each pixel has a value from 0 to 255 representing 256 possible colors), and we try to plot that pictures popularity based on that pixel value. So, on X axis, we will have these pixel values, and on Y axis, their popularity number. We can do simple linear regr, and plot a best fit line: Y=mX+b.

However, if we have 2 pixel values for each pic, then this becomes Multiple linear regr, and best fit plane becomes Y=m0X0 + m1X1 + b. Similarly if we had nx pixel values, then, we would have  Y=m0X0 + m1X1 + ... + mnX+ b as the best fit plane. This is exactly what we do in AI in finding best fit. We call these slopes (m) as weights as w0, w1, ... and so on.

Gradient Descent (GD):

This method is used to find plane with best fit. A very good video about gradient descent is here: https://www.youtube.com/watch?v=sDv4f4s2SB8

Finding these weights or slopes to minimize the error when fitting the plane to the data is a difficult problem. However, calculus comes to our rescue here, and gives us "gradient descent" method, that allows up to find such a plane (weights w0, w1, ... ), so that total error across all X is minimized. It works amazingly well (like magic) !!

Andrew Ng's Course on coursera.org called "Supervised Machine Learning: Regression and Classification" talks about gradient descent, and has labs on it. Try it, if you want to know the basics of Gradient descent. Instead of doing gradient descent, we could also just find the slope of the cost function and equate it to 0 to find the minima. GD allows you to see what's going on, Also, computers won' t be able to just solve the derivative of cost function and equate it to 0. Instead, they can always do GD to come to a point where derivative is close to 0. So, that's why we learn and implement GD in computer programs.