Deep Learning

Most of the material here is from Andrew Ng's AI course on couesera.org. It's called Deep Learning specialization. Even though it's called Deep Learning, it starts with basic concepts of AI, and then moves to ML, ANN and finally to CNN. It consists of 5 courses as outlined in link below:

https://www.coursera.org/specializations/deep-learning

The 5 courses in deep learning specialization are as follows:

Course 1: Neural Networks and Deep Learning: => Has 4 weeks worth of material, requiring about 20 hrs to complete.

Course 2: Hyperparameter tuning, Regularization and Optimization => Has 3 weeks worth of material, requiring about 18 hrs to complete.

Course 3: Structuring ML projects => Has 2 weeks worth of material, requiring about 5 hrs to complete.

Course 4: Convolutional Neural Networks (CNN) => Has 4 weeks worth of material, requiring about 20 hrs to complete. CNN is the most popular NN

Course 5: Sequence models => Has 3 weeks worth of material, requiring about 15 hrs to complete.

More AI related reserach and info is available on https://www.deeplearning.ai/

Before we go into the course work, we have to get prepared for doing exercises in Python. Without doing exercises and playing around, you will never get a feel of AI. AI is a very fast field, and we will never be able to learn even a little fraction of it, but whatever we learn, we should make sure we learn the basics well.

Installation of Python and various modules:

Below are some of the pgm you will need to install on your computer, before you can do any exercises on coursera. Ofcourse they have Jupyter Notebook for you to work in (Jupyter Notebook migrated to Coursera Lab environment starting Sept, 2020). Jupyter Notebook is an app that allows you to run python and many other pgm languages from within it. However, you may not be able to understand all bits and pieces of how things are working. Also, as these keep changing on Coursera website (i.e you are at the mercy of coursera on how long they continue with which app), I highly encourage you to install python and other needed modules on your local machine (running any Linux OS, I'm running it on CentOS 7), and do all the programming exercises locally. It'll be much more fun. I'm doing it locally myself, so will post all needed info below.

Python: Visit the section under Python programming. Install Python3 (python 3.6 as of July, 2020) as detailed in that section, and then install these other modules.

NumPy: Install NumPy package for python as detailed there, and go thru the basic tutorial

H5py: Install H5py package for python which is used to read data files in HDF5 format. This format is used for our exercises to store large amounts of data.

matplotlib: Install matplotlib module for python and go thru tutorials as explained in that section

PIL: Install Pillow module for python, which we'll use widely for reading images, as compared to matplotlib and scipy. See PIL/Pillow section.

Downloading various local functions and datasets:

There are many functions and datasets that you will see being imported in python pgms on coursera. You do not see them on main notebook page. One way to see all the files being used in python pgm is to go to Jupyter Notebook, and click on File->open on the top of the page. This will take you to a new page, which will show all the folders and files for that programming assignment. There is a "download" button on top, so download all the files that you need (one file at a time, if you try to download multiple files, the download button disappears)


 

 

Machine Learning (ML):

ML is a particular subset of AI. ML itself has many branches. One such particular algorithmic approach of machine learning is called artificial neural networks (ANN) which was loosely based on how human brains work. In this section, we'll learn very basic concepts of ML.

 

 

 

Maths:

This section deals with basic maths.

I've divided Maths section into multiple parts. I'll mostly be discussing maths curriculum as it's organized in USA schools and universities. USA schools have grades from Kindergarten (KG) to Grade1 (i.e class 1 in India) to Grade 12 (i.e class 12 in India). Then after grade 12, you go to colleges of your choice.

Best place to learn school level Maths is from www.khanacademy.org

It's free as it's non profit, and has videos which are very easy for kids to understand. It's one of the best resources for any educative material.

Then there is this wonderful fun website to learn everything related to Maths: https://www.mathsisfun.com/

One more website with lots of free sample exercise and theory is: https://www.math-only-math.com/

Other website similar to math-only-math with lots of practice questions is: https://www.mathopolis.com/questions/course.php

Interesting Maths Questions:

This is a list of very elegant Maths question, kind of which don't require anything more than elementary or basic high school Maths. These are the type asked in Maths Olympiad:

  • Finding radius of a semicircle inside a right angled triangle: https://www.youtube.com/watch?v=_o79ngJ0TI4
    • Soln: on link above
  • Find the number of distinct pairs of integers (x,y) such that 0<x<y √1984 = √x + √y.
    • Soln: 1984=16*4*31. So, √1984 = 8*√31. So,  8*√31 =  √x + √y => 8*√31-√x = √y => sq both sides => 1984 - x - 16√(31*x) = y => 16√(31*x) = 1984 - x - y => So, RHS is integer as all numbers are integers, so, LHS also need to be an itger. That means (31*x) needs to be a perfect sq root. => x=31*a^2 where a is an integer. Possible values of a=1,2,
    • x=31,31*4,31*9,31*16,31*25,31*36,31*49 => √x=1*√31,2*√31,3*√31,4*√31,5*√31,6*√31,7*√31
    • y=31*49,31*36,31*25,31*16,31*9,31*4,31 =>  √y=7*√31,6*√31,5*√31,4*√31,3*√31,2*√31,1*√31
  • This appeared as Problem 25 in AMC 8, 2015. I've generalized it as follows: Given a sq of length n, one cuts squares of length "m" from each corner. What's the max square that can be fit in remaining area?
    • There are various ways to solve it. the question is not hard. However, there is one solution which is a one line solution and really smart. See link:
    • Solution 2 (Contest Soln) is the smartest way to solve it. Area will be n*(n-2*m). To convince ourselves, we can also solve it other way, where we find the ratio of sides of the 2 similar trianglesthat will form the side of the new square that can fit in. Once we find the ratios, it's easy to solve. But this will be a longer soln.

 

 

Interesting Maths Tricks:

Some of these tricks will look like magic even to sophisticated Maths folks. A lot of them here: https://puzzling.stackexchange.com

  • A five card trick (aka Fitch Cheney trick): Alice and Bob perform a magic as a team. Alice shuffled a pack of cards, and then asked someone from Audience (Charlie) to pick 5 cards out of this pack.  Charlie looks at 5 cards and returns the 5 cards back to Alice. Alice hands over 4 cards to Bob, and 1 card back to Charlie. Bob looks at the 4 cards, looks at Charlie and then is able to tell which card Cjharlie is holding. This looks impossible, as there are 52 random cards, and figuring 1 out of 48 based on 4 cards is just insane. However, the trick is in arranging the 4 cards in a pattern, and then using that pattern to figure out the 5th card. To narrow down the choice to 1 unique card, Alice also uses fixed algo to decide which card to return back. This all works out to give a unique card that is always returned back. Details here: https://puzzling.stackexchange.com/questions/6569/a-five-card-trick-how-does-it-work

Biology:

This section deals with Biology, which is basically study of all forms of life. The first place to start learning about life is our own human body.

We'll learn about Human body as well as about forms of life. A good deal will be spent on human cells, as they are the power house and the building blocks of human body.

Then we'll learn about all human parts.

We'll also briefly cover other forms of life.

MCAT (Medical College Admission Test) =>  MCAT is a standardized test administered by the Association of American Medical Colleges (AAMC). This is to get into a medical school when graduating from a 4 year College. Students take it in their Junior or Senior year of college. The topics covered in MCAT are in general advanced concepts in Biology, Physics, Chemistry, etc and are a good starting point for anyone wanting to get into details, and not just superficial knowledge.

Khan Academy (KA) MCAT link => https://www.khanacademy.org/test-prep/mcat

Biology AP Course: Full AP Biology course taught in High Schools is again very nicely covered on Khan Academy. It has a lot of overlap with topics under MCAT, but the AP course is more basic than MCAT course. High Schools also offer "Biology Honors" which is more basic than AP course, and I would recommend to do AP Biology instead of Honors Biology to get better understanding.

KA AP Biology link=> https://www.khanacademy.org/science/ap-biology

I've also included notes from Biology High School Course. This is a Honors Biology course, so the material is much more basic.

  1. Basics of Life => talks about very basic concepts:  Unit 1 => Chemistry of Life
    1. Human body is mostly composed of C,H and O (carbohydrates). oxygen=65%, carbon=20%, hydrogen=10%, Nitrogen=3% (by weight). In earth's crust, 50% is oxygen, 25% is silicon, with miniscule amounts of C, H and N. So, basically our body is made up of same elements that are found in earth's crust. It's just that the molecules they form from these are different in living beings, than what is there in non living things.
    2. Water is a polar molecule (with oxygen being slightly -ve charged, and hyrdogen being slightly +ve charged). So, water causes ionic compounds or polar solutes to dissociate by breaking their bonds. Most chemical reactions in our body takes place in the watery environment in our cells.
  2. Biomolecules => These are biological molecules, or molecules that make life. Khan Academy video on biomolecules => https://www.youtube.com/watch?v=j5VA6YrqTNs
    1. Organic compounds are carbon containing compounds. A monomer is small such unit of compound, while polymer are many such units/molecules connected together. Macro molecules (macro means large) are even larger polymers made up of more parts. Biomolecules are the macromolecules of these organic compounds and are the ones that eventually make cells of living things. Biomolecules are divided into 4 groups and all 4 are required in all living organisms.
      1. Carbohydrates (carbs) => They contain C,H,O. There is 1 carbon atom for every water molecule (H2O). So, formula is Cn(H2O)n or CnH2nOn. This composition gives carbohydrates their name: they are made up of carbon (carbo-) plus water (-hydrate). Carbs are used by mitochondria in cells to generate immediate energy or can be stored for later use. Carbohydrate chains come in different lengths, and biologically important carbohydrates belong to three categories: monosaccharides, disaccharides, and polysaccharides. Saccharides are sugar (CHO in ring form). Momo, di and poly refer to 1, 2 or many of such rings connected to each other. All saccharides end in -ose meaning sugar.
        1. Monosaccharides (mono- = “one”; sacchar- = “sugar”) are simple sugars, the most common of which is glucose. Monosaccharides typically contain three to seven carbon atoms.
          1. Common monosaccharide: One important monosaccharide is glucose, a six-carbon sugar with the formula C6H12O6. Other common monosaccharides include galactose (which forms part of lactose, the sugar found in milk) and fructose (found in fruit). All 3 are the simplest forms of carbs, and are isomers. Glucose and galactose are stereoisomers of each other (differing only in 3D orientation), while Fructose is a structural isomer of glucose and galactose, meaning that its atoms are actually bonded together in a different order. Glucose is broken down into H2O and CO2 by taking O2 as additional input, and generating energy in the process (in form of ATP). All other forms of carbs will need to be broken down into glucose before it can be used by cell to generate energy.
        2. Disaccharides (di- = “two”) form when two monosaccharides join together via a dehydration reaction (see below). In this process, the hydroxyl group of one monosaccharide combines with the hydrogen of another, releasing a molecule of water and forming a covalent bond known as a glycosidic linkage.
          1. Common disaccharides: include lactose, maltose, and sucrose. Lactose is a disaccharide consisting of glucose and galactose and is found naturally in milk. Many people can't digest lactose as adults, resulting in lactose intolerance (which you or your friends may be all too familiar with). Maltose, or malt sugar, is a disaccharide made up of two glucose molecules. The most common disaccharide is sucrose (table sugar), which is made of glucose and fructose.
        3. Polysaccharide (poly- = “many” is a  long chain of monosaccharides linked by glycosidic bonds. Polysaccharides are used for storing energy as well as for providing structure
          1. Common polysaccharides: Starch, glycogen, cellulose, and chitin are some major examples of polysaccharides important in living organisms.
            1. Starch: is the stored form of sugars in plants and is made up of a mixture of two polysaccharides, amylose and amylopectin (both polymers of glucose). Plants are able to synthesize glucose using light energy gathered in photosynthesis, and the excess glucose, beyond the plant’s immediate energy needs, is stored as starch in different plant parts, including roots and seeds. The starch in the seeds provides food for the embryo as it germinates and can also serve as a food source for humans and animals, who will break it down into glucose monomers using digestive enzymes. Starch usually has a branched structure.
            2. Glycogen: is the storage form of glucose in humans and other vertebrates (as opposed to Starch which is stored energy in plants). Like starch, glycogen is a polymer of glucose monomers. Glycogen is usually stored in liver and muscle cells. Whenever blood glucose levels decrease, glycogen is broken down via hydrolysis to release glucose monomers that cells can absorb and use.
            3. Cellulose, is the structural form of glucose and is a major component of plant cell walls, which are rigid structures that enclose the cells (and help make lettuce and other veggies crunchy). Wood and paper are mostly made of cellulose, and cellulose itself is made up of unbranched chains of glucose monomers linked by glycosidic bonds (straight rigid structure). The β glycosidic linkages in cellulose can't be broken by human digestive enzymes, so humans are not able to digest cellulose. (That’s not to say that cellulose isn’t found in our diets, it just passes through us as undigested, insoluble fiber.) However, some herbivores, such as cows, koalas, buffalos, and horses, have specialized microbes that help them process cellulose. Cellulose is the most abundant organic material on earth (as it's found in all plants), and still we can't digest it !!
            4. Chitin: is similar to cellulose in that the chains are long rigid structure. They are found in animal's shells/wings, exoskeleton of insects,  etc to provide rigidity.
      2. Lipids (Fats) => They also contain C,H,O, (similar to carbs) but end up creating different compounds. They may sometimes contain P. Lipids are not strictly macromolecules as they are not polymers, and are much smaller than other 3 biomolecules. Lipids are nonpolar and so insoluble in water. Lipds are long term energy source, which are stored in body. Ex: Triglyceride. Cell membranes are made of lipids. Lipids create waxy covering for moisture retention. Lipids also provide insulation to keep body warm. Lipids don't have true monomers, and instead are long chains of C and H. They mainly use Fatty acid and glycerol to form long chains.
        1. Fatty Acids: These are long chain of C and H. They can be considered as a base unit which attach to others. Fatty acids can be of 2 types depending on carbon bond.
          1. Saturated Fatty acid: Carbon atoms with single bonds b/w them (single covalent bond with each other) form a straight chain and are called Saturated,
          2. Unsaturated Fatty acid: Carbon atoms with double bonds b/w them causes a bend anyhwere there is a double bond and are called unsaturated. Double bonds have a bend, so they can't be nicely stacked on top of each other. That is why saturated fat (like butter) are solid at room temp as they have saturated carbon bonds, while unsaturated fat (like oil) is liquid at room temp.
        2. Triglyceride. It's a lipid which is long term energy source for animals. It's formed by dehydration synthesis of 1 glycerol with 3 fatty acids. Glycerol has formula C3H8O3.
        3. Phospholipd: It is used to make cell membrane. It consists of 2 fatty acids, 1 glycerol and 1 Phosphate group (PO4 -ve) . Each phospholipd molecule has a head (made of Phosphate) and 2 tails (made of fatty acids). Heads have Phosphate group and so are hydrophilic (water loving as polar). Tails are hydrophobic (water hating as fatty acids are non polar). The 2 tails are different = one is saturated forming straight chain, while other is unsaturated forming bent chain. These helps the structure to be flexible. These phospholipid molecules go together in pair with the tails connecting to each other and the heads exposed (one to the outside of cell and other to the inside of cell, both have watery environment so this helps as the heads are hydrophilic). This is called phospholipd Bilayer. The bent structure of one of the tails keeps membrane fluid and flexible. More details under cells section.
        4. Steroid lipid: Chloesterol is a steroid lipd. Steroid hormones regulate functions in body (ex: testosterone)
      3. Proteins => They contain N in addition to C,H,O. They sometimes contain S too. Proteins provide structural support. They build up bones, muscles, hairs, nails, etc. They are also present in cell membrane as Transport proteins. Simplest unit of proteins are amino acids, which is a monomer.
        1. Amino Acids (monomer): In amino acids, there's always a central carbon attached to hydrogen, which is called alpha carbon. On the left of this we have an amine group (NH2), while to the right, we have an carboxyl group (COOH). Another carbon group attached to the bottom of alpha carbon is different for different amino aids, and is called R-group. There are 20 essential amino acids that our body needs. Glycine is the simplest Amino Acid and has just a single H in the R group.
        2. Polypeptides (polymer): These amino acids are stitched together to form proteins by forming bonds called as peptide bonds. The polymers formed are called polypeptides. These bonds are formed by "Dehydraton synthesis" and involves removal of H from NH2 and OH from COOH. There are 4 layers of structures that proteins may have:
          1. Primary: The linear structure formed by stitching of amino acids is called Primary structure. It may contain 100's of diff amino acids stitched together. It's NOT a functional protein, as it needs to be folded into a definite structure to become a functional protein. Exception: insulin (it regulates blood sugar). Even though Primary st don't form protein, insulin is an exception
          2. Secondary: Primary st folds into helices (spiral structure) and is called a secondary structure. The spiral structure comes due to Hydrogen bonds formed between different amino acids (between H of NH2 and O of C=O,OH), which helps it get that shape and maintain it.There are 2 possible st here => The alpha helix and the beta pleated sheet.
          3. Tertiary: 3 of these secondary structure combine together to form tertiary structure, which becomes a functional Protein. Lots of interactions happen to make this Functional protein. It has more bonds than just H bonds to give this 3D shape of protein.
          4. Quarternary: If a protein has more than 1 polypeptide chain, the way they are arranged is it's quarternary st. In collagen, we have 3 of these polypeptide chains combine, while in hemoglobin, we have 4 of the secondary structure combine to form a Quarternary protein.
        3. Enzymes: These are proteins that help speed up metabolic reactions. Enzymes lower the activation energy needed for any reaction. Enzymes end in -ase, ex: lipase. Enzymes don't get used up. They act on the substrate only if the substrate fits the active site of enzyme perfectly. Active site s the area of an enzyme that binds to the substrate during the reaction. Each enzyme's active site has a shape that is specific to only that type of substrate, so that that enzyme can only act on that substrate and nothing else. This is like a key-lock mechanism, where there is only 1 key for a given lock, where enzyme is the lock and substrate is the key. There may be multiple active sites in a given enzyme, meaning it can act on multiple substrates, but each site is specific to only 1 type of substrate.
          1. Enzymes can either break substrate into parts (catabolic reaction), or it can make substrate combine into another molecule (anabolic reaction).
          2. Enzymes can both activate or inhibit a reaction. We saw how key-lock mechanism allows enzymes to speed up reaction for a given substrate. However, we may also have an antibiotic or another substrate bind to an active site so that the shape of the active site is altered. Then the original substrate on which this enzyme was supposed to act on, can no longer fit on the active site, and hence can no longer react. This inhibits the enzyme from working properly, and is called competitive inhibition. There may also be non competitive inhibition, where the new substrate binds to other active site on the enzyme, w/o interfering with the original active site (ex: cyanide poison)
          3. Denature is the changing of shape of active site of enzyme due to temperature, or pH of medium. This may cause the substrate to no longer fit in the active site, thus rendering the enzyme useless. This change is permanent. At a temperature of 40 degrees and a pH of 8, enzymes are most effective.
        4. Antibodies: Antibodies are proteins produced by immune system to neutralize foreign invaders as viruses. Antigens which are themselves proteins are an integral part of any virus. Virus use these antigens to attach to cell. Antibodies prevent this from happening by attaching to antigens, so that they are blocked and can't attach to cell anymore.
        5. Peptide hormones: are hormones made of proteins which act on the surface of target cells to send msg around the body. Ex: insulin.
      4. Nucleic acids => They contain P in addition to C,H,O, N. They are the most important ones as they instruct what happens inside our cells, and pass that hereditary info to offsprings, They are considered most fundamental macromolecules to life. These were first observed in Nucleus of cells, and are acidic in nature, that's where the name comes from. Khan academy link (intro to nucleic acids) => https://www.youtube.com/watch?v=hI4v7v8AdfI
        1. Nucleotides (monomer): Building blocks of Nucleic acids are called Nucleotides. It consists of a 5 carbon sugar (pentose sugar) ring in the center, with a phosphate on the left and a nitrogenous base (NB) on the right.  for:
        2. RNA (Ribonucleic acid): RNA molecules were the ones that evolved first and were unstable. The Phosphate group in one nucleotide attaches to carbon ring of another forming long chains. There are 4 kinds of NB that we have for RNA.
          1. Guanine (G):
          2. Adinine (A)
          3. Cytosine (C)
          4. Uracil (U):
        3. DNA (Deoxyribonucleic acid): DNA molecules eveolved from RNA molecules. They formed by combining 2 strands of RNA molecules via hydrogen bonds. This provided them stability. G always bonded with C via 3 hydrogen bonds, while A always bonded with T via 2 hydrogen bonds
          1. Guanine (G):
          2. Adinine (A)
          3. Cytosine (C)
          4. Thymine (T)
    2. 2 kinds of chemical reactions => make larger chains from smaller chains or vice versa
      1.  Dehydration Synthesis => here smaller units attach to form bigger units. It is also called condensation reaction. It generally requires energy (think of as building something like a house from bricks requires energy). Dehydration synthesis is where H and OH bonds at the ends of shorter polymer attach to each other releasing H2O and combining to form a larger polymer.
      2. Hydrolysis => here a larger polymer breaks into smaller polymers. -lysis means breaking, while hydro is water, so breaking by water is hdrolysis.  It's exactly reverse of Synthesis, where it needs H2O to form H and OH bonds at the ends of smaller polymer, and thus separates them. This process generally releases energy.

DFT: Design for Testatbility

Any chip that is fabricated is going to have some defects during fabrication, which will cause some of the transistors or wires on the chip to not function properly. This may cause the chip to fail. One way to check if the chip manufactured is good or not, is to run thru the same functional patterns on the chip pins that the chip is going to go thru when it's in operation.

For small chips this method may work, but for large chips, it's practically not feasible for 2 reasons. First, there may be billions of such possible patterns on chip pins that we may have to aplly, which is time prohibitive. Secondly, it may still not find out all bad devices or bad connections in chip, since those patterns may not target 100% of the chip devices.

Without having 100% check to test each and every transistor and each and every connection, we can never be sure if the chip being shipped is 100% functional or not. This is where DFT comes. DFT simply means adding extra logic on chip so as to allow us to test the whole chip. DFT is a broad field by itself, an you will usually see thousands of job postings just for DFT engineers.

In this section, we will go thru the basics of DFT,