Optimization algorithms: Course 2 - Week 2

This course goes over how to optimize the algo for finding thew lowest cost. We saw gradient descent that was used to find lowest cost by differentiating the cost function and finding the minima. However, with NN running on lots of data to get trained, the step for finding lowest cost may take a long time. Any improvement in training algo would help a lot.

Several such algo are discussed below:

1. Mini batch gradient descent (mini gd):

We train our NN on m examples, where "m" may be in millions. We vectorized these m examples so that we don't have to run expensive for loops. But even then, it takes a long time to run across m examples. so, we divide our m examples in "k" mini batches with m/k examples in each mini batch. We call our original gradient descent scheme as "Batch gd".

We form a for loop for running each mini batch in one "for" loop. Then there is an outer for loop to iterate "num" times to find lowest cost. Each loop thru all examples is called one "epoch".

 With "Batch gd", our cost function would come down with each iteration. However, with mini batch, our cost function is noisy and oscillates up and down, as it comes down to a minima. One hyper parameter to choose is the size of mini batch. 3 possibilities:

  1. Size of m:  When each mini batch contains all "m" examples, then it becomes same as "batch gd". It takes too long when "m" is very large.
  2. Size of 1:  When each mini batch contains only 1 example, then it it's called "stochastic gd". This is the other extreme of batch gd. Here we lose all the advantage of vectorization, as we have a for loop for each example. It's cost function is too noisy as it keeps on oscillating, as it approaches the minima.
  3. Size of 1<size<m: When each mini batch contains only a subset of example, then it it's called "mini batch gd". This is the best approach, provided we can tune the size of each mini batch. typical mini batch sizes are power of 2, and chosen as 64, 128, 256 and 512. Mini batch size of 1024 is also employed, though it's less common. Mini batch size should be such that it fits in CPU/GPU memory, or else performance will fall off the cliff (as we'll continuously be swapping training set data in and out of memory)

2. Gradient Descent with momentum:

We make an observation when running gradient descent for mini batch, there are oscillations which are due to W, b getting updated with only  a small number of examples in each step. When it sees he new mini batch, W, b may get corrected to different value in opposite direction resulting in oscillations. These oscillations are in Y direction (i.e values of weight/bias jumping around) as we approach to a optimal value (of cost function) in x direction. These Y direction oscillations are the ones that we don't want. These oscillations can be reduced by a technique known as "Exponentially weighted avg". Let's see what is it:

Exponentially weighted average:

Here, we average out the new weight/bias value with previous values. So, in effect, any dramatic update to weight/bias in current mini batch, doesn't cause too much change immediately. This smoothes out the curve.

Exponentially weighted avg is defined as:

Vt = beta*Vt + (1-beta)*Xt => Here Xt is the sample number "t". t goes from 0, 1, ... n, where n is the total number of samples.

It can be proved that Vt is approximately avg over 1/(1-beta) samples. so, for beta=0.9, Vt is avg over last 10 samples. If beta=0.98, then Vt is avg over last 50 samples. Higher the beta, smoother will be the curve, as it takes avg over larger number of samples.

It's called exponential as if we expand, Vt we see that Vt contains exponentially decaying "weight" for previous samples. i.e Vt = (1-beta)*Xt + (1-beta)*[ beta*X{t-1} + beta^2*X{t-2} + beta^3*X{t-3} + ...]

It can be shown that weight decays to 1/e when we look at the last 1/(1-beta) sample. So, in effect it seems that it's taking avg of last 1/(1-beta) samples.

However, the above eqn has an issue at startup. We choose V0=0, so first few values of Vt are way off from avg value of Xt, until we have collected few samples of Xt. To fix this, we add a bias correction term as follows:

Vt (bias corrected) = 1/(1-beta^t) * [ beta*Vt + (1-beta)*Xt ] => Here we multiplied the whole term by 1/(1-beta^t), so that for the first few values of "t", Vt becomes a small value, so contribution from Xt dominates. However as "t" starts getting larger, 1/(1-beta^t) goes to 1, and can be seen to have no impact.

gd with momentum:

Now we apply the above technique of "Exponentially weighted avg" to gd with momentum. Instead of using dW and db to update W, b, we use weighted avg of dW and db to update W, b. This results in a much smoother curve, where W, b don't oscillate too much with each iteration.

Instead of doing W=W- alpha*dW, b=b-alpha*db, as we did in our original gd algo, we use weighted avg of dW and db here.

W=W- alpha*Vdw, b=b-alpha*Vdb, where Vdw, Vdb are weight avg of last beta samples of dW and db, and defined as

Vdw = beta1*Vdw+ (1-beta1)*dW, Vdb = beta1*Vdb+ (1-beta1)*db

3. Gradient Descent with RMS prop:

This is a slightly different variant of gd with momentum. Here also, we use the technique of exponentially weighted avg, but instead of using dW and db, we use square of dW and db. Also, we note that in "gd with momentum", we never knew which values of dW and db are oscillating, we smoothed all of them equally. Here, we smooth out those values more which have more oscillations, and vice versa. We achieve this by dividing dW and db by their weight avg (instead of using weighted avg of dW and db directly in the eqn). that way whichever dW or db oscillates by most (may be w1,w7 and w9 oscillate the most), then their derivatives are going to be the largest (dw1, dw7 and dw9 have high values). So, on dividing these large derivatives by larger numbers will smoothen them out more than ones with lower derivatives. Eqn are as follows:

W=W- alpha*dW/√Sdw, b=b-alpha*db/√Sdb, where Sdw, Sdb are weight avg of last beta samples of (dW)^2 and (db)^2, and defined as

Sdw = beta2*Sdw+ (1-beta2)*(dW)^2, Sdb = beta2*Sdb+ (1-beta2)*(db)^2

NOTE: we used beta1 in gd with momentum, and beta2 in gd with RMS prop to distinguish that they are different beta. Also, we add a small epsilon=10^-8 to √Sdw and √Sdb so that we don't run into numerical issue of dividing by 0 (or by a number so small that computer's effectively treat it as 0). so, modified eqn becomes:

W=W- alpha*dW/(√Sdw + epsilon), b=b-alpha*db/(√Sdb + epsilon)

4. Gradient Descent with Adam (Adaptive Moment Estimation):

Adam optimization method took both "gd with momentm" and "gd with RMS prop" and put them together. It works better than both of them, and works extremely well across wide range of applications. Here, we modify RMs prop little bit. Instead of using dW and db with alpha, we use VdW and Vdb with alpha. Then we reduce oscillations even more, since we are applying 2 separate oscillation reducing techniques in one. This technique is called moment estimation, since we are using different moments : dW is called 1st moment, (dW)^2 is called 2nd moment and so on.

So, eqn look like:

W=W- alpha*Vdw/(√Sdw + epsilon), b=b-alpha*Vdb/(√Sdb + epsilon),where Vdw, Vdb, Sdw and Sdb are defined above

Here there are 4 hyper parameters to choose from: alpha needs to be tuned, but beta1, beta2 and epsilon can be chosen as follows:

beta1=0.9, beta2=0.999, epsilon=10^-8. These values work well in practice and tuning these doesn't help much.

5. Learning rate decay:

In all the gd techniques above, we kept learning rate "alpha" constant. However, we observe that learning rate doesn't need to be constant. It can be kept high when we start, as we need to take big steps, but can be reduced as we approach the optimal cost, since smaller steps suffice as we converge to optimal value. The larger steps cause oscillations. so reducing alpha reduces these oscillations, so that it allows us to converge smoothly. this approach is called learning rate decay and there are various techniques to achieve this.

simplest formula for implementing learning rate decay is:

alpha = 1/(1+decay_rate*epoch_num) * alpha0, where alpha0 is our initial learning rate. epoch_num is the current iteration number.

So, as we do more and more iterations, we keep on reducing the decay rate, until it gets close to 0. Now we have one more hyper parameter "decay_rate" on top of alpha0, both of which need to be tuned.

Other formula for implementing learning rate decay are:

alpha = ((decay_rate)^epoch_num) * alpha0 => This also decays learning rate

alpha = (k/√epoch_num) * alpha0 => This also decays learning rate

Some people also manually reduce alpha every couple of hours or days based on run time. No matter what formula is used, they all achieve the same result of reducing oscillations

Conclusion:

Adam and all other techniques discussed above speeds up our NN learning rate. They solve the problem of plateau in gd, where the gradient changes very slowly over a large space, resulting in very slow learning. All these gd techniques speed up this learning process by speeding up learning in x direction. There is also the problem of getting stuck in local mimima, but looks like this is not an issue in NN with large dimensions.This is because, instead of hitting local minima (where the shape is like bottom of trough or valley), we hit saddle points (where th shape is like saddle of horse). For a local minima, all dimensions need to have a shape like a trough, which is highly unlikely for our NN to get stuck at. At least one of the dimensions will have a slope to get us out of this saddle point, and we will keep on making progress w/o ever getting stuck at local minima.

 

Programming Assignment 1: here we implement different optimization techniques discussed above.  We apply our different optimization techniques to 3 layer NN (of blue/red dots):

  • batch gd: This is same as what we did in last few exercises. This is just to warm up. Nothing new here.
  • min batch gd: We implement mini batch gd by shuffling data randomly across different mini batches.
  • Momentum gd: This implements gd with momentum
  • Adam gd: This implements Adam - which combines momentum and RMS prop
  • model: Now we apply the 3 optimization techniques to our 3 layer NN = mini batch gd, momentum gd and Adam gd.

Here's the link to pgm assigment:

Optimization_methods_v1b.html

This project has 2 python pgm.

A. testCases.py => There are bunch of testcases here to test your functions as you write them. In my pgm, I've them turned off.

testCases.py

There is also a dataset that we use to run our model on:

data.mat

B. opt_utils_v1a.py => this pgm defines various functions similar to what we used in previous assignments

opt_utils_v1a.py

C. test_cr2_wk2.py => This pgm calls functions in above pgms. It implements all optimization techniques discussed above. It loads the dataset, and runs the model with mini batch gd, momentum gd and Adam gd. Adam gd performs the best.

test_cr2_wk2.py

 

Summary:

By finishing this exercise, we learnt  many faster techniques for implementing gradient descent. We would get the same accuracy for all of the gd methods discussed above, it's just that slower gd techniques are going to require a lot more iterations to get there. This can be verified by setting "NUM_EPOCH" to 90,000 in our program.

Derate:

We learned OCV in previous section. However running OCV at 2 different PVT corners may not always be practical. For ex, consider the voltage drops seen on a chip due to IR. We may not be able to get lib at that particular voltage corner after accounting for the voltage drop due to IR. Similarly for temperature, we may not be able to lib for that exact temperature after accounting for on chip heating. Also, even if we are able to get these libs, ocv analysis requires running at 2 extreme corners. If we do not want to run analysis at 2 diff corners for ocv, we can run it at 1 corner only by specifying derating. Derating is an alternate approach where we speed up or slow down certain paths so that they can indirectly achieve same results as OCV. Derating is basically applying a certain multiplying factor to each gate delay so that the delay can be scaled up (by having a multiplying factor > 1), or can be scaled down (by having a multiplying factor < 1). The advantage of derate is that each and every gate in design can now be customized to have a particular delay on it. With OCV analysis

.codebox {
        border:1px solid black;
        background-color:#EEEEFF;
        white-space: pre-line;
        padding:10px;
        font-size:0.9em;
        display: inline-block;
}

, we weren't able to do this, as the flow just chose b/w WC and BC lib and applied one or the other to each gate in design. Here, we first choose a nominal voltage, for which we have library available, and then apply derate to achieve effects of Voltage and Temperature variations.

There are different kind of derates:

  • Timing derates: When we run sta at particular voltage/temperature, we assume same voltage and same temperature on all transistors. However, based on IR drop and temperature variation as well as aging effect, we know that not all transistors will be on same voltage/temperature. So, we apply timing derating. We apply these timing derates as "voltage guardband derates". Even though we say voltage, we include effect of temperature and aging effect, so that the "voltage derate" includes effects of all of these. In PT flow, these derates specified via "set_timing_derate -pocvm guardband" or "set_timing_derate -aocvm guardband" => This is explained later. By default, derate specify ocv derate, which are derates due to local process variations only. Then we apply either aocv or pocv voltage guardband derate, which account for Voltage+Temperature+reliability derates. 
  • POCVM distance derates: Only applied on clocks. This is additional derate on top of voltage derate above.
  • LDE (Layout dependent Effect) derates: provided by foundry. Applied as incremental derate
  • MixedVt / MixedLg derates: Differences in Threshold voltage as well as in "Length" of transistors, we experience differences in delay which don't scale same way. i.e process is not correlated for different VT, so LVT might be at SS -3σ corner, but ULVT instead of being perfectly at SS -3σ corner, it may be a little bit faster or slower. e.g. in a slow-slow corner the capture clock is LVT and might be slightly faster than fast-fast corner due to Vt mistracking. This VT mistracking is not OCV related. OCV models local process variations, while Mixed VT is modeling global process corner correlation. We model this MixedVt correleation effect via derate.
  • Margining derates: Other derates used for margining

What derating factor to apply for ocv/aocv/pocv is derived by running monte carlo sims.

1. set_timing_derate => It allows us to adjust delays by a multiplying factor. It automatically sets op cond to ocv mode. The derating factor applies only to the specified objects which is a collection of cells, lib_cells or nets. If no objects specified, then derating applies to whole design. report_timing -derate shows timing with derating factor shown for each cell/net in the path. We do not derate slews or constraint arcs (as they are not supported for AOCV or POCV), but we do have options for setting these in set_timing_derate cmd.

options:

-early/-late => unlike other cmds, there is no default here. We have to specify -early to apply derating on early (shortest delay) path, and -late for late (longest delay) path. We need to have separate cmd for early and late, else derating will get applied to only 1. The tool applies worst-case delay adjustments, both early and late, at the same time. For example, for a setup check, it applies late derating on the launch clock path and data path, and early derating on the capture clock path. For a hold check, it does the opposite. We get these derating values from simulations. First, we try to find worst/best case voltage drop on transistor power pins (after accounting for off chip IR drop, PMU overshoot/undershoot and on chip IR drop) and then apply derating accordingly. 

  • Early derating: We apply early derating corresponding to the voltage level which would be with off chip IR drop only. This is the absolute highest voltage that can be seen by any transistor on die. Then we add extra derate to account for temperature offset. We apply same early derate for both clk and data path.
  • Late derating: We apply late derating corresponding to the voltage level which would be with off chip IR drop + on chip IR/power_switch drop + reliability margin (due to VT shift seen for transistors with low activity). This is the absolute lowest voltage that can be seen by any transistor on die. Here, we apply slightly different derate for clk and data path. For clock path, we don't add the reliability margin, since clk is always switching, so there is no reliability penalty that the clk path incurs. So, clk path sees a slightly higher voltage.

NOTE: since we apply these early/late derates, we want our nominal voltage at which we are going to run STA to be around these early/late voltages. If our librar's nominal voltage is too far from these early/late voltages, then we have to apply large derating, which may not produce accurate results.

-cell_delay/-net_delay => By default, derating applies to both cell and net delays, but not to cell timing check constraints. This allows derating to apply only to cell or net delays. -cell_check allows derating to be applied to cell timing check constraints (setup/hold constraints)

-data/-clock => By default, delays are derated in both data paths and clock paths. This allows derating to be applied to only data or clock

-rise/-fall => By default, both rising and falling delays are derated. This allows derating to be applied to cell/net delays that end with a rising or falling transition only

There are many more options that we'll see later (including -aocvm guardband/-pocvm guardband options). If the options -aocvm guardband/-pocvm guardband are not used, then the above derating cmd sets OCV derate, which ony accounts for local process variation related derate. Voltage/Temperature and Reliability derates are captured via additional derate specified with -aocvm guardband/-pocvm guardband options. Thus OCV derate and aocv/pocv guardband derate are all needed to account for all PVT+reliability variations.

ex: set_timing_derate -early 0.9; set_timing_derate -late 1.2 => The first command decreases all early (shortest-path) cell and net delays by 10 percent, such as those in the data path of a hold check (and clk path of setup check). The second command increases late (longest-path) cell and net delays by 20 percent, such as those in the data path of a setup check (and clk path of hold check). These two adjustments result in a more conservative analysis. We should use derating < 1 for early and >1 for late, since we are trying to simulate worst case ocv that can happen. Derating gets applied to whole design, as we did not specify any object.

ex: set_timing_derate -increment -cell_delay -data -rise -late 0.0171 [get_lib_cells { TSM_svt_ff_1v_125c_cbest/SDF_NOM_D1_SVT}] => applies derating of 1.7% only to lib cell specified for rise dirn, and long delay path. -increment adds this derating on top of any derating that is already applied globally or to this cell earlier.

ex: set_timing_derate -cell_delay -net_delay -late 1.05 [get_cells top/H1] => sets a late derating factor on all cells and nets in the hierarchical cell top/H1, including its lower-level hierarchical cells

report_timing_derate => shows derates set for all cells in design. This is very long list so better to redirect it to some o/p file

AOCV and POCV: OCV is OK, but it doesn't model advanced levels of variations for 65nm and below, which results in overdesign. OCV allows us to model different derating for diff cells (by using set_timing_derate cmd), but fails to capture other factors. To mitigate some of these OCV issues, advanced forms of OCV came into picture. AOCV (advanced OCV) was used earlier, but even more advanced POCV (parametric OCV) is used now. We will go over details of both

PT has app_var variables which allow advanced OCV and parametric OCV. To report all app_var, we can use this cmd:

pt_shell > report_app_var *ocv* => reports all aocv and pocv app_var settings

AOCV: timing_aocvm_enable_analysis => setting it to true enables AOCV) => needs AOCV tables in a side file

POCV: timing_pocvm_enable_analysis => setting it to true enables POCV) => needs POCV side file or liberty file

AOCV: Advanced on chip variation

OCV doesn't handle below factors:

  1. path depth => variation reduces on long paths due to statistical canceling. So, even if each cell has lot of variations, due to random nature of variations, they can be +ve variation or -ve variation. More the gates in design, higher the changes that +ve and -ve variations will cancel out, resulting in very low level of variations. So, path depth is a factor only for random variations on die.
  2. path distance => variation increases as paths travels across more die area. This is based on simple silicon observation that close by structures have less variation, but if you compare structure far away, they have larger variation. That is why in analog circuits, matching transistors are placed as close as possible to each other to minimize variations b/w them. So, path distance is a factor only for systematic spatial variations on die. Even if you have more gates in design but they are closeby, then spatial variations will be very low, compared to when these gates are far away. In other words, we are saying that these variations are correlated more or less depending on their proximity with each other. So, total variation in any path is function of both random variation as well as spatial variation. However random variation dominate, so path distance based variation is not very critical.
  3. different cell types => variations varies depending on transistors used in cells. Lower width transistor have more variations that larger width ones. However, cell level derating is already captured in simple derating cmd above, as it allows us to set different derates for different kinds of cells.

AOCV was proposed to provide path depth and distance based derating factors to capture random and systematic on-die variation effects respectively. Instead of applying a constant derating factor as in ocv to any cell, we adjust the derating factor for a given cell based on path distance and depth. This is provided in form of a 2D table in a file. AOCV provides a single number for delay of a path based on derating used (derating value itself is taken from 2D table based on path depth and path distance for that cell). It only models delay variation, but does not model transition or constraint variation. Thus AOCV is siame as OCV except for derating added for above 2 factors.

Both GBA and PBA AOCV can be performed. GBA AOCV reduces pessimism by some compared to GBA OCV, which may be sufficient for signoff. If few paths still fail, PBA AOCV can be run on selected failing paths, which reduces pessimism even further.

AOCV flow:

 set_app_var read_parasitics_load_locations true => this is so that read_parasitics can read location info from spef file

read_parasitics file.spef => To use distance based derating specified in aocv file below, we need physical location of various gates, nets, etc. This info is contained in SPEF files, and can be read via read_parasitics cmd. If we have hirarchical flow, where there are separate spef files for blocks and for top lvel, PT can automatically determine correct offset and orientation of blocks. However, if it fails, we can specify it manually via extra args in read_parasitics cmd.

set_app_var timing_aocvm_enable_analysis true => sets GBA POCV analysis

read_ocvm file.aocvm => reads derating from this aocvm file. It has 2D table of derates with depth and distance as index (It can also be 1D table with either depth or distance as index, although this will give less accurate results). aocv derate take precedence over ocv derate specified for any cell, as it's more accurate. Syntax of this file is explained under pocv section below.

set_timing_derate -aocvm_guardband -early 0.95 => this applies additional guardband to model non proces related effects (ir drop, margin, etc) in aocv flow. For fast paths, we reduced delays by further 5%. Final derate = aocv_derate * guardband_derate. set_timing_derate -increment adds derate on top of this derate (instead of multiplying, it adds). So, Final derate = aocv_derate * guardband_derate + incremental_derate. Either guardband derate or incremental derate can be applied, as two are mutually exclusive.

set_timing_derate -aocvm_guardband -late 1.04 => For slow paths, we increased delays by 4%.

update_timing => performs timing update

report_ocvm -type aocvm => reports number of cells/nets that aocvm deratings were applied on, in summarized table. Any cell/net not-annotated with derate is listed here.

report_ocvm -type aocvm I0/U123 => If object list specified,  derating on specific instances or arcs reported

report_timing => shows timing paths with aocv analysis. -derate shows derate factor too

POCV: Parametric on chip variation

POCV is even more advanced, and radical departure from conventional methods. Here, timing values are stored not as one single number but rather as statistical quantities (i.e as a gaussian distribution with mean u and std dev sigma). These statistical distribution are propagated along the path, and probabllity/statistics theorems applied to come with a statistical distribution for the final delay at end point of a path. AOCV models deratings only for delay, but POCV statistical method is applied not only for delay, but also for transition variation for each cell on a path. It also models constraint variation (setup/hold times on flops), as these vary too depending on variation within the cell, as well as transition variation on clk and data pins. mu and sigma values are stored in lib files for each cell for delay, transition and constraint (only if provided for flops). By defauly, only delay variation is enabled. Transition variation and constraint variation have to be enabled to get better match with HPSICE monte carlo sims. Timing values can be reported at any N sigma corner (since sigma is known). Usually, we report it at 3 sigma, as that implies 99.9% of the dies for that timing path will lie within 3 sigma (i.e only 0.1% of chips will fail for that path).

POCV takes care of path depth automatically, as it propagates each distribution as independent random variable. So, statistically cancellation takes care of path depth. path distance is handled by using distance based AOCV tables. So, these tables are 1D in case of POCV (as opposed to 2D tables in AOCV).

Lower VDD, as found in low nm tech, increases sensitivity to delay, transition and constraint variation (as Vdd is close to Vth, so small change in Vdd causes large changes in current) . So, POCV accounts for all this sensitivity, and prevents overdesign during PnR. POCV run with GBA  provides better coorelation with PBA, as it reduces pessimism in GBA. On other hand, with AOCV, exhaustive PBA had to be run at signoff as GBA has inbuilt pessimism, increasing run time. Tight GBA PBA corelation in POCV  prevents running exhaustive PBA.

POCV is run in PT as regular flow. The only extra step is reading variation information from liberty files, or from sidefiles in AOCV like table. Then timing is reported at specific sigma corner.

POCV input data: 2 methods: One is providing a sidefile for distance based derating (or single sigma value called as single coefficient), and the other is liberty file with sigma values across i/p slew rate and o/p load. derate in POCV can be applied to both mean or sigma values. derate is applied to mean values available in .lib file, and sigma values available in .lib file or in sdefile as single coeff.

1. sidefile with POCV single coefficient: This sidefile is just an extension of AOCV table format in version 4.0 (this is same synopsys file format as shown in AOCV section). It can either have distance based derate, or constant coefficient for sigma. Syntax is as below: (file1.pocvm or file1.aocvm or any name)

version: 4.0

ocvm_type: pocvm => it can be aocvm or pocvm
object_type: lib_cell => this can be design, lib_cell or hier cell
rf_type: rise fall => for both rise/fall

voltage: 0.9 => this allows voltage based derating where diff derate can be applied based on voltage the cell is at.
delay_type: cell => this can be cell or net. For net, object_spec is empty to apply it for all nets
derate_type: early => early means it's applied on shortest paths (for setup, clk paths are early, while for hold, data paths are early => to model worst case scenario)
path_type: clock => below derating is applied only for cells in clk path (applicable only for setup). For cells in data path for early (applicable only for hold), we may specify a different derating (<1).
object_spec: -quiet TSM_LIB/* => applies to all cells. For particular cells, we can do TSM_LIB/invx1*. -quiet prevents warnings from showing up
distance: 0 5000000 10000000 20000000 30000000 40000000 => this specfies distance for derating purpose
table: 1.000000 0.990442 0.986483 0.980885 0.976588 0.972967 => since type is early (fast paths), derates are < 1 to model worst scenario.

coefficient: 0.05 => this specifies single coefficient which is sigma divided by mean = sigma/mu (random variation coeff normalized with mu). This is specified if we want to do single coeff based POCV for our timing runs, instead of more accurate liberty based sigma. However, coeff and distance are mutually exclusive, you can specify only one of them. Different values can be specified for diff cells, etc. Usually more accurate lib files used to provide sigma, instead of providing coefficient here.

ocvm_type: pocvm
object_type: lib_cell
rf_type: rise fall
delay_type: cell
derate_type: late => late means it's applied on longest paths
path_type: clock => below derating only for cells on clk path (applicable only for hold). We specify derating separately for cells on data path for late (applicable only for setup)
object_spec: -quiet TSM_LIB/*
distance: 0 5000000 10000000 20000000 30000000 40000000
table: 1.000000 1.009558 1.013517 1.019115 1.023412 1.027033 => since type is late (slow paths), derates are > 1 to model worst scenario. 

coefficient: 0.02

2. LVF (liberty variation format) file: These file have additional groups which contain sigma info for delay, transition and constraint variation. They may also have distance based derating values here, instead of being in a sidefile (using ocv_derate group)

 format of this explained in Liberty section

POCV flow:

set_app_var read_parasitics_load_locations true => this is so that read_parasitics can read location info from spef file

read_parasitics file.spef =>

set_app_var timing_pocvm_enable_analysis true => sets GBA POCV analysis

set_app_var timing_pocvm_corner_sigma 3 => sets corner value for timing to be 3 sigma. It can be set to 5 sigma for more conservative analysis

set_app_var timing_enable_slew_variation true => to enable transition variation (i/p slew variation affects delay variation as well as o/p transition variation). Optional but recommended for better accuracy at < 16nm

set_app_var timing_enable_constraint_variation true => to enable constraint variation (setup/hold, rec/rem, clkgating variation). Optional but recommended for better accuracy at < 16nm

read_ocvm file.pocvm => reads single coeff or distance based derating from side file based on what's available

set_timing_derate -pocvm_guardband -early 0.95 => For fast paths, we reduced delays by further 5%. POCV guardband is applied on both mean delay and sigma delay (AOCV guardband is only for mean delay, as there's no concept of sigma in AOCV). If we want to derate only sigma delay, we can scale pcvm coefficient in sidefile or liberty file (w/o modifying the value directly in sidefile or liberty file) by using "set_timing_derate -pocvm_coefficient_scale_factor 1.03" to scale it, which scales only sigma and not mean delay. However, pocvm coeff scaling is applied on top of guardband for sigma delay.

  • Final derate_mean = pocv_derate * guardband_derate + incremental_derate,
  • Final_derate_sigma = guardband_derate * pocvm_coefficient_scale_factor

set_timing_derate -pocvm_guardband -late 1.04 => For slow paths, we increased delays by 4%

update_timing => performs timing update

report_ocvm -type pocvm => reports summarized pocvm deratings applied on cells/nets. If object list provided, it shows coeff and distance based derating picked from sidefile or LVF

report_timing => shows timing paths with aocv analysis. -derate shows derate factor too. However, now we may want to see both mean and sigma delays (since sigma delays are taken into account when reporting slack). slacks are not simple difference b/w expected arrival time and atual arrival time, but are square root of squares of these (since now we are dealing with statistical quantities). To see both mean and sigma delays, set this app var:

set_app_var variation_report_timing_increment_format delay_variation => Now report_timing -derate will show 2 columns: mean (delay w/o variation) and sensit (sigma or delay variation). Incremental time column for that arc should equal mean +/- 3*sensit (+ or - depending on slow(max) or fast(min) paths). mean and sensit are with derating already applied. Apart from incremental colums, there is also path colums, which show both mean and sensit again. Mean here is the cummulative mean upto that point (sum of all means), while sensit is cummulative sensitivity upto that point (sqrt of sqaures of all sigma). These help to verify various path delays and how they contribute to overall delays. There are also statistical corrections applied to get numbers to add up. There is also statistical graph pessimism applied in timing analysis. Latch borrowing also needs to be treated differently when in POCV mode.

  • final cell_delay mean derated = Cell_delay_mean * final_derate_mean = cell_delay * ( "POCVM guardband" * "POCVM distance derate" + "Incremental derate" )
  • final cell_delay sigma derated = cell_delay_sigma_adjusted * final_derate_sigma = cell_delay_sigma_adjusted * ( "POCVM guardband" * pocvm_coefficient_scale_factor) => cell_delay_sigma here is adjusted from the original sigma number reported in liberty file (if LVF used) by accounting for the fact that transition variation on i/p will affect delay variation (as well as o/p transition variation) depending on correlation b/w transition and delay. There is proprietery formula applied by synopsys to come up with adjusted sigma number.

Now update_timing runs aocv, and report_tiing shows timing paths with pocv analysis.

report_delay_calculation -from buf1/A -to buf1/Z -derate => This shows detailed calculation of cell mean delay and cell sigma delay with derating. This is useful for debug.

POCV analysis with xtalk: POCV analysis is only applied on non-SI cell delay. However, POCV can indirectly change crosstalk delta delay due to the difference in timing window alignment.

POCV characterization: POCV data can be generated via Synopsys SiliconSmart too, It can generate sigma for use in LVF, or coeff for use in pocv side file. distance based derating in sidefile can't be generated via any tool, and is generated via silicon measurements

car oil change

If you have a regular car, chances are that an oil change is one of the things that you have to do once or twice a year. Without an oil change, your car may get into serious trouble. If you skip oil changes for a couple of years, or bought a used car that had neglected oil changes, then your car may not last long. Oil change is #1 maintenance thing that you have to do. If you buy a new car, for the first 10 years of ownsership, you may not have to do any other maintenance except for oil change every year. So, don't skip on it, but also do not overdo it. Overdoing it doesn't harm your car, but may be money getting flushed down the toilet.

Oil change can be done either yourself or at a mechanic's shop. I'll list both options below:

1. At a shop: This usually runs from $20 to $100 depending on the car. Older or cheaper cars use conventional oil which is usually cheaper but needs an oil change every 6 months or so. Newer cars use "synthetic oil" which is usually expensive but needs an oil change just once a year. More about oils below. So, in the long run, all kind of cars cost about $50-$100 in oil change per year. Walmart is the cheapest place for an oil change. They don't look for unnecessary repairs for your car, or come back with a list of 100 things that you have to get done on your car. If you take your own oil and filter to them, they charge you just for the labor, which is $10. However, some walmart locations will still charge you the full price irrespective of whether you bring your own oil+filter or get their oil.

Other places as Jiffy, Pep Boys, Car dealers etc have coupons for oil change. Look on their websites. Car dealers do your oil change for almost the same price as these local chains or mom and pop stores, especially if you use their coupon that they have on website most of the times. I would rather get the oil change from the dealer than from these smaller shops.

2. Do it yourself (DIY): Of course you are here on this website to save money, so we everything humanly possible ourselves. Oil change is such an easy job, that it can be done yourself in less than 30 minutes in your parking space. It not only saves you money but time too. Going to a repair shop, waiting there, getting the oil change and coming back is half a day lost for nothing. Not to mention cheap oil that is used on top of being an expensive oil change. I've started doing my oil change, and I never have done any tool work in my life before. So, if I can do it, anyone can do it !! You will need to buy some tools though, but they will all pay off in one oil change.

Items needed for Oil Change:

Get these things before you start doing oil change. These are one time investment, and can be reused.

1. Ramp: You need to raise the car, so that you can get under it. This is the most dangerous part, that gets people extremely scared to do oil change. Raising car on jacks is one option, but jacks are not easy, and you don't know if it's done right. Ramps are a solution to these. You just drive your car ver the ramp. The front 2 wheels get oon the ramp, and now are raised by couple of inches compared to back wheels. Then you slide form the front of the car.  Only 1/4th to 1/3rd of your body needs to be under the car, as the oil changing screw is in the front of th car. These ramp are very sturdy. One thing to note is that your ramps start sliding, once you drive your front wheels over the ramps. To prevent thet, place the front of the ramps against some raised level (as like the entry of garage where the outside of garage is little bit lower than the inside of garage, giving an edge that will prevent sliding). Or people use a rope, or some other tricks. Read in links below on slickdeals on various ideas.

I bought Rhino ramps from walmart for about $35. Here's a deal for  ramps for $30:

https://slickdeals.net/f/15026011-rhino-gear-rhinoramp-29-99-at-advance-auto-parts

2. Screw: Opening the screw at the bottom of the car, from where the oil drains is another big thing. You need to have the right size "screw opener" to open it. Look in Youtube videos for your particular brand car and find out the size you need. Any generic toolbox has the screw opener you need.

3. Oil Filter Wrench: You not only need the screw opener, but also the circular oil filter opener (called as wrench). This is the hardest part to find. Filter box should be easy to come out, but they are circular, and so hard for anything to move it without slipping. I bought one at Autozone for about $10 (It specifically mentioned on the item that it works for Toyota cars), which works great on My Totyota Sienna. All the other styles that I bought never worked. So, choose this or a similar style:

Link: https://www.autozone.com/shop-and-garage-tools/oil-filter-wrench/p/performance-tool-oil-filter-wrench-w54105

4. Drain pan container: To drain oil and store it, you need a drain pan. These collect the used oil as you drain them. Then you can close the top opening of the container, and take it to a local auto shop, and they will get rid of the oil for you - for free. These used to cost $7 or so, but as of 2022, I'm seeing prices of $15.

One such Link: https://www.ebay.com/itm/374355003098

5. Funnel: Any funnel to pour oil into the engine. I bought this super funnel from walmart for $2.50, but you may easily get smaller ones for a dollar or less.

Link: https://www.walmart.com/ip/FloTool-05034-Super-Funnel/20440553

6. Rugs: These are any plastic sheet and old clothes lying around. You don't need to buy anything here. Just use old kids clothes, socks, underwear or whatever. Something to soak the oil if it drips on the floor, as well as to clean the floor that you are going to lie down on.

7. Oil:

First things first: oil is not same as gasoline. In countries other than USA, oil loosely refers to gasoline, but here oil and gasoline are  2 different things. Oil is the one that goes in your engine to lubricate it. You change it once a year or so for newer cars. Gasoline is what your car runs on (diesel or petrol). You will need to know what oil goes into your car. Assuming you are driving a regular vehicle, you will  see oil with numbering as 0W-20, 5W-30, etc written on them. There will also be synthetic oil, conventional oil, full synthetic oil etc. Most of the times Synthetic oil is what is used on newer cars. However, your car manual is the ultimate guide on what kind of oil can go in your car. Read it. Many times, even though manual say conventional oil,you can still go for synthetic (search online forums to see if it's supported). Given an option, go for full synthetic oil as the price is the same as other synthetic oils (semi synthetic, blend etc). Most cars get better mileage and need less frequent oil change with synthetic oil.

Viscosity of oil: Oil viscosity is it's ability to flow. higher number => thicker or more viscous (higher viscosity) oil. Oil with viscosity=5 flows better than oil with viscosity=20. Thicker oil generally gives lower fuel econmoy, and more stress on engine. Oil gets thinner when hot and thicker when cold. So, to maintain optimal performance, you would want to use thinner oil in winter (as oil will get thicker anyway with colder temp), and thicker oil during summer (as oil will get thinner anyway with hotter temp). So, you are able to maintain best fuel economy and less stress on your engine. This is what people used to do in old times. They would change oil with different viscosity during summer and winter to keep optimal performance. These were called single weight oils. Single weight oils are not supposed to be used anymore. We have multi weight oils now that care of viscocity automatically in high and low temps. They have a mixed formula where the composition of oil changes to thinner oil in winter, since it gets thicker anyway with cold temp, and to thicker oil in summer, since it gets thin anyway due to heat. Thus one oil works for both summer and winter, and there's no need to change oil due to outside temps changing.

Oil Numbering (5W-20, etc) refers to this viscosity at low and high temps. Numbers such as 5W-20 refer to the weight and viscosity (or thickness) of the oil. The letter ‘W’ means both varieties are suitable for cold temperatures. The first number before the letter refers to the oil’s thickness at a cold temperature and the number following the letter indicates the thickness at operating temperature. 

Oil is expensive. Your car needs 5-7quart of oil. You can almost always find Oil on sale. Look in gasoline/oil deals section. It will cost about $1/quart when on sale.

 


 

Steps for Oil Change:

Search on youtube for your specific model of car followed by "oil change", i.e "Honda Civic 2012 oil change". You should find at least some video which has all the steps for an oil change. Even if your exact model or year is not there, many of the oil change steps are similar for same brand car with similar specs. This is because, car companies don't make lot of changes around where different components are placed, as that disrupts their assembly line process, and incurs higher cost. May be once every 10 years, you will see some changes, but that's it.

This is one of such videos showing how to change oil for 2010-2016 Toyota Sienna minivan: https://www.youtube.com/watch?v=8jlpDcSeUz0

I'll list the basic steps again for cars that Ive worked on (I've worked on doing oil changes on both Toyota and Honda cars. First time it took about an hour, but now it takes 30 minutes or less).

  1. Put the ramps on flat ground. It's better if you can find the edge of your garage to put the front of your ramps. Since there 's small notch (a slightly higher surface separating the outside driveway from inside of the garage), you can rest your ramp against that notch so that the ramps don't slide. Now drive your car on to those ramps. It looks scary the first time you drive, because the car goes really high (or at least it seems that way). At this point you will notice that the ramps start sliding. this is where the notch helps you and keeps those ramps from sliding. Make sure that the car front wheels are on the flat surface of the ramp.
  2. Open the hood of of the car, and take out the "oil dipstick". Check oil level, and leave the dipstick on side, or put it back. We'll use it to check the level of oil once we fill in new oil.
  3. Once the car is resting in a stable state on the ramps, get underneath the car from the front of the car. You will have to slide your body may be 2 feet inside to get to the nut that holds it.
  4. Get your oil drain pan under the nut that you are going to unscrew. Oil is going to drain fast as soon as you unscrew the nut. Unscrew it slowly, so that you have enough time to move the drain pan in right spot to collect the oil from the car.
  5. Now you need to get out the oil filter that is sitting close to this nut. This oil filter is tricky, as sometimes you need special tool to open it. I had to use "oil filter wrench" for my toyota sienna. Some left over oil will gush from here too.
  6. Let the car sit for 5 min or so to let all the oil come out. Once done, you replace the old oil filter with new oil filter , and put the filter back in place using the wrench. Now screw the nut back exactly to the point it was there before. If you don't screw it enough, you will see drops of oil dripping when car is standing overnight. If it's too tight, it may not unscrew at all next time you do an oil change. That's even more painful, since that may require a mechanic shop visit to unscrew it, or worse break that nut, that will cost you a bit. As a guide, I mark the point on the nut to the body of car, before I unscrew the nut. Then when I put it back, I screw it until those marks align again. That way I know that it's tightened enough where it won't leak (as it wasn't leaking before).
  7. Now on the hood of the car, open the oil cap, from where you are going to fill in the oil. You put in a little bit of oil and check if it's leaking from the bottom, where you tightend the nut. If not, you fill more. Keep filling until you are slightly below the spec. If it says, 6.4oz, you go to 5oz and then check the oil level. Of course the car is tilted, so the oil levels will not be accurate. Close the oil cap, and put the dipstick back where it came from.
  8. Now you start the car, and bring it back to ground.
    1. IMP: Do not drive the car before filling in the new oil. Since you have drained out old oil, the engine is w/o oil, and driving may damage your car seriously.
  9. Put in some more oil now, and keep checking the oil level via the dipstick. Dipstick is not always accurate, so you have to rely on the spec. Make sure, you fill slightly under the spec. i.e if it says 6.4oz, go up to 6.2oz and monitor the car for a day. Check the dipstick and put some more oil if needed. Never overfill, else we'll need to drain it from the bottom by loosening the nut a little. That is too much work, so play safe.
  10. You should keep monitoring oil levels in general, and if it goes below the min, you should fill it with extra oil from the top. That shouldn't happen though. So, it might indicate a leak or something else wrong with the engine. If it's over the max, then you need to see how much over. Little bit over is fine, but if it's too much, then drain out some oil from the bottom.

At this point you are done with oil change. Collect you used oi, and take it to any car repair shop. They have to take your used oil to dispose of it in lawful way. They can't refuse to take it. Or goto walmart auto shop which might be lot easier. Keep lot of rugs, papers handy, as oil may get to your hands, floor, inside of car, etc. Congrats for job well done !!

 


 

solid state devices:

Before we get into VLSI and running various tools, it's important to get basic idea of transistors on which all of VLSI material is based. Knowing about transistors is not really needed for doing VLSI work, but it will help you get a more wholesome picture of how things work.

Transistors is the basic fundamental element of making any circuit. We had Resistance, Capacitors and Inductors but they were all considered passive elements. When tarnsistors were invented, they solved the problem of making automatic switches - switches that you can turn on or off by applying different voltages to it. These were called active elements and brought a paradigm shift in making circuits.

 

Modern transistors are built in Silicon material, and have 2 flavors: NMOS and PMOS. This is called CMOS (or complimentary Metal Oxide) technology.

Transistor can be in off state or on state. In on state, there are 2 separate regions of operation identified:

1. linear region:

Linear current of a transistor is defined via following eqn:

Ids(lin) = µ*Cox*W/L*[(Vgs-VTH)*Vds - Vds2/2]

2.saturation region:

Saturation current of a transistor is defined via following eqn:

Ids(sat) = µ*Cox*W/L*(Vgs-VTH)/2

 

 

Below is the map for various attractions in Dallas TX, USA:

Dallas, TX

In terms of attractions, there is not much in Dallas that will specifically want you to plan a trip.