1.2 - Neural Network Basics

Details: Last Updated: Tuesday, 27 October 2020 06:31; Published: Tuesday, 20 October 2020 05:51; Hits: 689

Course 1 - week 2 - Neural Network Basics:

This is the first technical introduction to NN. Well, the material for this week doesn't really talk about NN, it talks about regression, and how to do a linear and logistic regression. But in later weeks, you will see that these regressions are the simplest kind of NN. Logistic regression is a concept from statistics, but this defines the building block for AI.

For Linear and Logistic regression, see the AI section on "Statistics - Regression". This is all this week's lecture is about. Trying to do binary classification on a picture with n_x pixels, to find out if it's a cat or not. First, we give m such training pictures to our regression engine, let it find weights which gives it the lowest cost, and then use the weights to predict on a test picture. If our weights are optimal, and the test picture is close to our training set picture, then our regression algo would do a good job in classifying the picture correctly.

However, just from common sense it looks like this approach of simple regression will never work, as cats can come in any color, shape, position, background, etc. Regression is just matching pixels and trying to minimize distance, it has no spatial information (i.e if there are 10 pixels next to each other to form a eye, then our logistic regression model doesn't care if these 10 pixels are on 10 different corners of the picture, or they are next to each other).

As an example, consider 8X8 pixel black and white picture. Each pixel can have 2 values: 0 for black and 1 for white. So, total possibilities of all pictures possible is 2^(8*8)=2^64 unique pictures possible. Our regression analysis is trying to go thru limited set of such possible combinations and predict what each picture is going to be. It's impossible to do that even for 8x8 pixel black and white picture. Just imagine how to do that for 64x64 colored picture !! And then for even larger pictures. It's just not possible by brute force "least error" regression technique. Something better has to be done. That's for later courses !!

This week has a programming assignment, that is an absolute must to be completed, if you want to learn AI. It helps you go thru simplest NN that's possible, which is actually logistic regression. All new concepts are developed. Take your time to finish this assignment.

Programming Assignment 1: This is a simple image recognition pgm. It reads a file of images to get trained (using whatever algorithm we use, here we use logistc regression), and then we run the pgm on test images to see how well our algorithm works.

Here's the link to pgm assigment:

Logistic_Regression_with_a_Neural_Network_mindset_v6a.html

This project has 2 python pgm, that we need to understand.

A. lr_utils.py => this is a pgm that defines a function "load_dataset". We'll import this file in our main pgm. However, instead of writing it as a separate pgm, I copied the function defined in this file in the main python pgm.

The function load_dataset() reads 2 files: test data and training data. Below are the two h5 files that contain our training data and test data. Feel free to download the 2 files by right clicking and choosing "save link as" (If you directly click on the link below, it will open the h5 file in the browser itself, which will look garbage as it's not a text file that browser knows how to display):

train_catvnoncat.h5

test_catvnoncat.h5

1. training data: This data is used to train our algo. It has 209 training data set with label="train_set_x". It has 209 2D pictures, which are each 64x64 pixels, and each picture has a triplet of R,G,B values

2. testing dat: This data is used to test our algo. It has 50 testing data set with label="test_set_x". It has 50 2D pictures, which are each 64x64 pixels, and each picture has a triplet of R,G,B values.

Below I'm writing the function "load_dataset" from lr_utils.py

import numpy as np
import h5py

def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features. We store this data into an array of 209X64X64X3
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels. This stores the type=0 for non cat and 1 for cat corresponding to 209 pictures.It's a 1D array with 209 elements, but since it's 1D, we convert it to 2D array as shown later

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features. Similarly for test set, we have 50 pictures, array is 50X64X64X3
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels. This stores the type for these 50 pictures

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes

    print("train = ",train_dataset, "test = ",test_dataset, "classes = ",classes,classes.shape)

    print("OLD", train_set_x_orig.shape, train_set_y_orig.shape, test_set_x_orig.shape, test_set_y_orig.shape)
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    print("NEW", train_set_x_orig.shape, train_set_y_orig.shape, test_set_x_orig.shape, test_set_y_orig.shape)
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

result:

train = <HDF5 file "train_catvnoncat.h5" (mode r)> test = <HDF5 file "test_catvnoncat.h5" (mode r)> classes = [b'non-cat' b'cat'] (2,) => train_dataset, test_dataset are just pointers. classes is a 1D array with just 2 string values [non-cat cat]

OLD (209, 64, 64, 3) (209,) (50, 64, 64, 3) (50,) => The y labels are 1D array here
NEW (209, 64, 64, 3) (1, 209) (50, 64, 64, 3) (1, 50) => They y labels have been converted into 2D array here (X labels are still 4D array)

B. test_cr1_wk2.py => This pgm calls func load_dataset() defined in lr_utils, and we define our algorithm for logistic regression here to find optimal weights, by trying out algorithm on training data.. We then apply those weights on test data to predict whether the picture has a cat or not.

Below is the whole pgm, including the function defined in lr_utils.py

test_cr1_wk2.py

Below are the functions defined in our pgm:

sigmoid() => defines sigmoid func for any input z
initialize_with_zeros() => initializes w,b arrays with 0
propagate() => computes total cost. Given X, w, b, this func calculates activation A (which is the sigmoid function of linear eqn w1*x1+... wn*xn +b) and then computes cost (which is the log function of A,Y). Then it computes gradients dw, db. It stores dw, db in dictionary "grads". It returns scalar "cost" and dictionary "grads"
optimize() => This function iterates thru cost function to find optimal values of w,b that gives the lowest cost. It forms a "for" loop for predetermined number of iterations. Within each loop, it calls function propagate() with given values of X,w,b. In beginning, w and b are 0. propagate() returns new dw,db. Then it updates w,b with new values based on dw, db, and learning arte chosen. Then it starts with next iteration. In next iteration, it feeds newly computed values of w,b into propagate() to get even newer dw, db, and updates w,b. It keeps on repeating this process for "num_iterations", until it gets to w,b which hopefully give lot lower cost than what we started with.
predict() => Given input picture array X, it predicts Y (i.e whether pic is cat or not). It uses w,b calculated using optimize function. We can provide a set of "n" pictures here in single array X (we don't need to provide each pic individually as an array). This is done for efficiency purpose, as Prof Andrew explains multiple times in his courses.
model() => This is the main func that will be called in our pgm. We provide both training and test pictures as 2 big arrays as i/p to this func. It calls above functions as shown below:
- calls func initialize_with_zeros() to init w,b,
- then calls optimize() to optimize w,b to give lowest cost across the training set.
- It then calls predict() to predict on any picture. predict is called twice for both training set and test set to predict cat vs non cat.
- Accuracy is then reported for all pictures on what they actually were vs what our pgm predicted.

Below is the explanation of main code (after we have defined our functions as above):

We load the datset X,Y for m pictures stored in h5 files.
Then we enter in a loop, where we can repeat running this program as many times as we want for whatever reason. NOT really needed.
Inside the loop, we flatten and normalize array X that we read from dataset in h5 file. We flaltten array of R,G,B pixels for each picture into shape(nx*nx*3,1). This flattening is done since our weight array also flattened. We want one weight for each pixel, so both weight and pixel value have to be 1D array, so that we can just multiply them directly as w1*x+w2*x2+...+wn*xn. In our implementation of this in numpy, we make them 2D array, but they still have only 1 row or col filled (i.e they behave like 1D).
Now we run function model() on array X (which already has m training pics in it), and find optimal w,b by running it on training set. Function model() then runs prediction() and reports prediction accuracy for both training and test set.
Then we have a choice of trying various learning rates, and see the effect on minimal cost achieved by our pgm. Learning rates matter a lot, as we see by trying small/large rates.
Then finally we have a choice of trying 10 diff random images (these images are in all_data dir), which are predicted by calling predict(). Prediction value for each image is reported. We see that accuracy is bad (about 50%). Here we used Image module from PIL library. I couldn't get "imread" from matplotlib to work.

Summary:

By finishing this exercise, we learnt how to do logistic regression to figure out optimal weight for each pixel of a picture so that it can predict a cat vs non cat picture.

Nav view search

Navigation

Search

1.2 - Neural Network Basics