Restaurants:

This section includes all restaurants chains where you can get food for decent price. As of 2022, prices are going up for all restaurants, so pricing may be outdated. Below are some of the chains that provide value for your money. If you are looking for fast food options, please check in "fast food" section.

 

Olive Garden:

This is a chain. Good Italian food for a very reasonable price. They are not a fast food, but look like high end restaurant with meals being served at table. All their entrees include unlimited soup, salad and bread sticks. This itself is worth $5 or more. Then their luch entrees are $8 and regular entrees are $15. Quantity is good enough for 2 people. Many times you can find their gift cards on sale ($40 for a $50 GC).

Few vegetarian dishes here that I like:

  1. Eggplant parmasino => This is a favorite among veggie Indians. It has eggplant stuffed with things, tastes nice.
  2. Stuffed Fettuccine Alfredo: This is a nice option. It's stuffed with cheese.
  3. Five Cheese Ziti al Forno => I don't remeber4 how it tasted. Will update it later?

Here's a link from slickdeals with various options to try: https://daily.slickdeals.net/food/olive-garden-special-meal-deals/

  • Monday-Friday lunch specials for $8-$10. Includes unlimited soup and salad.
  • For every dine in entree, you can get to take home an entree for $5 more. There are 3 options for take home entree = Fettuccine Alfredo, Five Cheese Ziti al Forno and Spaghetti with Meat Sauce. Lunch specials don't qualify for the take home.

 

 


 

DEALS:

 

All Gift Card deals for fast food are in gift card section. Consider buying those GC where possible and then get these deals.

 

2023:

 

 


 

09/27/23: Olive Garden - Unlimited Pasta for $14 - limited time:

Good offer that comes once in a while: https://slickdeals.net/f/16947016-olive-garden-never-ending-pasta-bowl-w-soup-salad-and-breadsticks-14-dine-in-only-at-participating-locations

 

Semiconductor Memory:

Processor and Memory are 2 most important components of any digital chip. Just as transistors are used to build logic functionality on a chip (such as AND, OR gates to buld a adder, etc), the same transistors are used to serve as memory to store bits. Memory can store bit 0 (voltage=0 Volts) and bit 1 (voltage=VDD Volts)

There are 2 kinds of memory:

1. Volatile Memory:

These are the memories that lose their contents when power is turned off. In your laptop, you have a hard drive, which is non volatile memory. It keeps the contents even when power is turned off. The CPU transfers programs from hard drive to a volatile memory, and accesses it from there. That makes the programs run faster, as there is significantly lower delay accessing contents from this faster volatile memory. There are 2 kinds of Volatile memories: 

A. SRAM (static random access memory):  This is usually seen on a processor, integrated with other logic. Any circuit that has 2 back to back inverters can serve as a memory. So, we could use flops, latches, etc to serve as memory. However, flops and latches have large number of gates (usually 8 or more gates), which is very costly in terms of area. Early on, engineers started making custom version of these latches so that can be put together closeer and need fewer transistors. They came with idea of using 6 transistors to make a memory cell (very similar to latch but with less transistors). Also, they reduced the size of transistor, and optimized the layout and decode logic, to start building compact memory modules. This memory is called 6T SRAM and is used in all logic chips to make register file, caches, etc. Tese memory are fast, but costly in terms of area, so they are usually limited in size to 64MB or so. They are used in caches and other memories on microprocessors. These are not sold as stand alone memories.

B. DRAM (dynamic random access memory):  This is the memory that is usually built and sold separately. It's not integrated in the processor, but sits right next to the processor. This is slower than SRAM, as it's sitting further away, and has lot more stuff packed. However, it requires only 1 transistor and one capacitor to build 1 memory cell. This makes it much smaller than SRAM. However, in absence of back to back inverter, there is no feedback loop to hold the bit value to a 0 or 1. So, periodic refreshing of the value is needed, which slows DRAM further. Since it needs to be refreshed periodically, it's called dynamic. All the memory that you hear in news, journals, etc is this DRAM. This memory is the one used on external memory modules that you buy from BestBuy, Amazon etc (known as DDR memory cards). They can go as large as 128GB or more (Samsung already reported 512GB DRAM memory modules available). DRAM started out as SDR DRAM, and then moved to DDR style DRAM. DRAM are also known as SDRAM (synchronous DRAM), as all signals are driven synchronous to a clock. NOTE: SDRAM and SDR DRAM are referring to 2 different things.

 

Next we look at each of these meories in detail:

A. SDR

B. DDR

DDR1

DDR2

DDR3

DDR4

DDR5

LP DDR

 

Foundations of CNN - Course 4 week 1

This course goes over basics of CNN. Detecting edges is the basic motivation behind CNN. In anypicture, we want to detect horizontal and vertical edges, so that we can identify boundaries of different things in the picture.

We construct a filter (or a kernel) with some dimension, and then convolve it with a picture to get an output. The convolution operator is denoted by asterisk (*) which is the same operator that's used in  multiplication. This causes confusion, but that's what has been used in Digital signal processing Convolution operations, so we use same notation for operator here. In python, func "conv_forward" does convolution, while in TF, tf.nn.conv2d does the job.

convolution just applies the operation of convolution for a given filter on all parts of the picture, one part at a time. When convolving, we just multiply element wise each entry of filter with each entry of picture, and sum them up to get a single number. See the example explained in lecture.

ex: A 6x6 matrix convolved with a 3x3 matrix gives 4x4 matrix.

Edge Detectors:

An example of vertical edge detector would be a 3x3 filter with 1st column as all1, then 2nd col as al 0, and 3rd col as all -1. This detects edges, if we associate +ve numbers with whiteness, -ve numbers with darkness, and 0 being in b/w white and black (i.e gray). We can also make a horizontal detector, by switching rows with columns, i.e 1st row is all 1, 2nd row is all 0, and 3rd row is all -1.

Instead of hard coding these 9 values in a edge detector filter, we can define them as 9 parameters: w1 to w9, and let NN pick up the most optimal numbers. Back propagation is used to learn these 9 parameters. This gives the best results.

Padding and Striding:

Valid Conv: Here o/p matrix shape is not same as i/p matrix shape.

A nxn picture convolved with fxf filter gives matrix with dimension (n-f+1) x (n-f+1). That's why 6x6 matrix convolved with 3x3 filter gave 4x4 o/p (as n=6, f=3, so o/p = 6-3+1=4)

Same conv: To keep the dimension of o/p the same as that of i/p pic, we can use padding, where we pad picture border with extra pixels on the boundary of pic. This involves adding row or col of 0 or 1 or some other value. We can choose padding number p such that the o/p matrix dim remain same as that of i/p pic.

With padding p, a n x n picture (padded with p pixels on each side of pic on border) convolved with f x f filter gives matrix with dimension (n+2p-f+1) x (n+2p-f+1). That's why 6x6 matrix (with p=1) convolved with 3x3 filter gives 6x6 o/p (as n=6, p=1, f=3, so o/p = 6+2-3+1=6). so, o/p matrix retains same shape as i/p matrix.

For any general shape of i/p matrix, we have to choose p such that o/p matrix shape is same as i/p matrix shape. For that to happen, n+2p-f+1 = n => p=(f-1)/2. So, for filter of size=3, we have to choose p=(3-1)/2=1.

With padding, we increase the size of o/p matrix. Striding does opposite of that where it reduces the size of o/p matrix. Striding is where we jump by more than 1 when calculating conv for adjoining boxes. So far, we used a stride of 1 for all our conv, but we could have used any stride number as 2, 3, etc. We do this stride or skipping in both horizontal and vertical directions.

With stride s, a n x n picture (padded with p pixels on each side of pic on border, and stride s) convolved with f x f filter gives matrix with dimension floor((n+2p-f)/s+1) x floor((n+2p-f)/s+1). We use floor function incase numbers don't divide to give an integer.

By using padding and striding together, we can do "same conv".

Convolution over Volume:

So far we have been doing conv over 2D matrix. We can extend this concept to do conv over volume (i.e 3D matrix). In such a case, the i/p matrix is 3D (where the 3rd dimension is for channel, i.e each 2D matrix is for separate color R, G, B). The filter is also 3D. The o/p matrix in such a case is still 2D with same dim as before (n-f+1) x (n-f+1) (assuming p=0, and s=1).

Conv over volume is same as that over area: multiplication and addition is done over all elements including the 3rd dim. So, o/p returned for each conv operation is still a single value for one given box.

However, if we have more than 1 filter for conv operation (i.e one filter is for vertical edge detection, while other filter is for horizontal edge detection, and so on), then the o/p matrix becomes a 3D matrix.

For N filters being applied on i/p pic with dim n x n x nc and filter with dim f x f x nc , the o/p matrix shape would be (n-f+1) x (n-f+1) x N.

Note that nc which is the number of channels in the i/p has to be the same for the filter.

Ex: An i/p pic of 6x6x3 conv with 2 filters of shape 3x3x3 gives o/p matrix of shape 4x4x2 (since n=6, f=3, nc=3 and N=2)

1 Layer of CNN:

For CNN also, we have multiple layers as in Deep NN. In Deep NN, for each layer, we compute activation func a[l]=g(z[l]) where g is the function used for that layer and z[l] = w[l] *a[l-1] + b[l] (* here means matrix multiplication).

In CNN, for each layer, we compute convolution instead of matrix multiplication. So, for i/p layer a[0], z[1] = w[1] *a[0] + b[1]  where w[1] is the filter matrix, and b[1]   is the offset added as before. Here asterisk * refers to convolution operation. Then we use activation function as ReLU, sigmoid, etc to compute o/p matrix a[1]=g(z[1]) . This is true even if we have more than 1 filter, our weight matrix will just have one extra dim for the number of filters.

In general for each layer "l" , we have following relation:

f[l] = filter size

p[l] = padding size

s[l] = stride size

nc[l] = number of filters. Each filter is of dim f[l] x f[l] x nc[l-1] 

dim for "l"th i/p layer a[l-1]  = nh[l-1]  x nw[l-1]  x nc[l-1] where nh = number of pixels across height of pic, nw = number of pixels across width of pic, nc = number of color channels of pic (for RGB, we have 3 channels), 

dim for "l"th o/p layer a[l-1] = nh[l]  x nw[l]  x nc[l] where nh[l]  = floor( (nh[l-1]  + 2p[l] - f[l])/s[l] + 1 ) , nw[l]  = floor( (nw[l-1]  + 2p[l] - f[l])/s[l] + 1 ) 

For m examples, A[l-1] = m x nh[l]  x nw[l]  x nc[l] 

dim of weight matrix w[l] = f[l]  x f[l]  x nc[l-1] x nc[l], where nc[l], is the number of filters in layer "l"

dim of bias matrix b[l] = 1 x 1 x 1 x nc[l] => bias is a single number for each filter, so for nc[l] filters, we have nc[l] parameters.

Example of Conv NN: provided in lecture

3 Types of layers in conv NN: Just using convolution layers may suffice to give us good results, but in practise, supplementing CONV layers with POOL layers and FC layers results in better results.

  1. convolution layer (CONV): This is about using the convolution operator.
  2. Pooling layer (POOL): This is about using max or avg of a subset of matrix, so as to reduce the size of matrix.
  3. Fully connected layer (FC): This is similar to conventional NN, where we connect each i/p entry to each o/p entry which results in a lot of weights being used. But since we use the FC feature in the last few stages of the NN, the size of matrix is greatly reduced by that time, resulting in fewer entries in weight matrix.

Reasons for using Conv NN: (see in last lecture)

  1. Parameter sharing: Same conv filter can be used at multiple places in the image
  2. Sparsing of connections: Not each i/p needs to be connected to o/p, since most of the o/p only depend on a subset of i/p matrix.

 

 Finding optimal values of Weights:

We use same technique of gradient descent to find the lowest value of cost given different weight matrix, and filters. The derivation is not shown in programming assignment, but look in my hand written notes.

 

 Assignment 1:

 

Assignment 2:

 

 

 

 

 

 

 

Course 4 - Convolution Neural Networks

This course is one of the most important course on NN, as convolution NN (CNN) is the one that is used most of the places in computer vision. Computer vision is the discipline of CS where we extract features in any given picture or video, i.e in autonomous cars, it's the art of extracting features from picture such as other cars, pedestrians, etc. CNN has been very successful in Computer vision.

There are 4 sections in this course:

1. Foundations of CNN: This is a lengthy course (2 hrs) as it talks about basics of CNN (conv layers and pooling layers). It explains this with some basic examples. Spend some time on this course. Go over it again so that you get the basics. There are 2 programming assignments

2. Deep Convolutional models: case studies: It has 1 pgm assgn on Residual NN (RNN)

3. Object detection: It has 1 pgm assgn on "car detection with YOLO"

4. Special applications: Facial recognition and neural style transfer: This has 2 pgm assgn. 1st one is "art generation with neural style transfer", while 2nd one is on "facial recognition"

Course 3 - Structuring ML projects

This course is a slight departure from technical discussion of NN. It discusses several techniques when dealing with ML projects. It has no pgm assignments. It has only 2 sections. Both sections are theoretical only, and can be finished in 3-4 hours. Even if you skip this course all together, you won't miss much. I've summarized the lectures below:

1. ML strategy 1: This talks about following:

A. Orthogonalization: This refers to choosing orthogonal knobs to control or improve certain aspects of your NN.

B. Single number evaluation metric: We should use a single metric to evaluate and compare performance across different ML algo

C. Satisficing and Optimizing metric: Out of different metrics we use to evaluate our algo, some of the metrics may be classified as "satisficing" metric, where you just need to satisfy those metric (i.e the performance needs to meet certain threshold for those metric). Other metrices may be classified as "optimizing metric" where we really want our algo to optimize for that metric.

D. distribution: distribution of data in train set, dev set and test set should be similar, otherwise our algo may perform badly on sets where there is data which is vastly different from trained data.

E. size of train/dev/test set: In big era data where we have millions of training data, we usually divide available data to have 98% training data, 1% dev data and 1% test data. We can do this as even 1% is 10K data point, which is large enough to work as dev/test set.

F. weights: Sometimes we may want to assign different weighing to different loss terms. i.e there may be cases where we want to assign a much larger weight to a loss term where a elephant pic is identified  as cat pic, but much lower weight to loss term if a bobcat is identified as a cat. This can be done by multiplying loss term with the weight term and then summing the product. To normalize the sum, we then divide it by the sum of the weights (instead of dividing it by the number of examples). This weight is different than the weights we use to optimize our loss.

G. Human level performance: All ML algo strive to reach human level performance. Bayes error is the lowest error that you can get and for Computer vision, human error is pretty close to Bayes error. So, once you get your ML algo to get to human level performance, you are pretty close to lowest error that's possible. It's very hard to get even incremental improvements to error once you reach human level.

Difference b/w human error and Training set error is called "avoidable bias", as that error gap can be brought closer to 0. The gap b/w training error and dev/test set error is called variance. Both "avoidable bias" and "variance" may be a problem for our ML project, so we have to be careful on which one to target more to get most lowest error on our dev/test set. "Avoidable bias" can be reduced by choosing larger training model (deeper NN), usin better algo as Momentum, RMS prop, Adam, etc. To reduce "varaince", we can use larger training set, use regularization techniques as L2, drop out, etc.

2. ML strategy 2: This talks about following:

A. Analyzing error: It's important to analyze your error, i.e all cat pics that were misclassified. Once we start categorizing these errors in different buckets, we can start seeing exactly where is our ML not working as expected. Sometimes the o/p label itself is incorrect (i.e a cat pic is incorrectly labeled as "non cat" pic). This may or may not be worth it to fix, depending on how severe the issue is. We also have to make sure that our training data and test/dev data come from same distribution, else there will be lot of variance. One way to find out if variance is due to mismatched data b/w training and dev set is to carve out a small percentage of training data as train-dev set, and not use this portion of data as training set, but use it as dev set. If the variance is small on this train-dev set, but large on dev set, than that indicates mismatc b/w train data and dev/test data. To address data mismatch, one other solution is to include as much varied data as possible on the training set, so that ML system is able to optimize across all such data.

B. Build system quickly and then iterate: It's always better to build a barely working system quickly, and thn iterate a lot to fins tune the ML system to reduce errors.

C. Transfer learning: This is where we use a model developed for one ML project, into some other project with minimal changes. This is usually employed in cases, wher we have very little training data to train our ML algo, so we use parameters developed for some other ML project, and just change the o/p layer parameters, or parameters for last couple of layers. This allows us to get very good performance. For ex in radio image diagnosis, NN developed for image recognition may be used, since both applications are similar.

D. Multi task learning: This is where we use same model to do multiple things instead of doing one thing. An ex is autonomous car, where the image recognition model needs to identify images of cars, pedestrians, stop signs, etc all at same time. Instead of building separate NN for each of them, we can build a single NN with many different o/p values, where each o/p value is for a specific task as other car, pedestrian, stop sign, etc.

E. End to End Deep Learning: This is where a NN can take an i/p and give an o/p w/o requiring intermediate steps to do more processing. As an ex, translating audio clip to text, traditionally required many steps of complex pipelining to get it to work. But with large amounts of big data, deep NN just learned from i/p data to produce translation w/o requiring any intermediate pipeline. Sometimes we do divide the task in 2-3 intermediate steps before we implement DL on it, as that performs better. We have both kind of real life examples, where End to End DL works better, as well as cases where breaking it down into 1-2 smaller steps works better.