Course 3 - Structuring ML projects

This course is a slight departure from technical discussion of NN. It discusses several techniques when dealing with ML projects. It has no pgm assignments. It has only 2 sections. Both sections are theoretical only, and can be finished in 3-4 hours. Even if you skip this course all together, you won't miss much. I've summarized the lectures below:

1. ML strategy 1: This talks about following:

A. Orthogonalization: This refers to choosing orthogonal knobs to control or improve certain aspects of your NN.

B. Single number evaluation metric: We should use a single metric to evaluate and compare performance across different ML algo

C. Satisficing and Optimizing metric: Out of different metrics we use to evaluate our algo, some of the metrics may be classified as "satisficing" metric, where you just need to satisfy those metric (i.e the performance needs to meet certain threshold for those metric). Other metrices may be classified as "optimizing metric" where we really want our algo to optimize for that metric.

D. distribution: distribution of data in train set, dev set and test set should be similar, otherwise our algo may perform badly on sets where there is data which is vastly different from trained data.

E. size of train/dev/test set: In big era data where we have millions of training data, we usually divide available data to have 98% training data, 1% dev data and 1% test data. We can do this as even 1% is 10K data point, which is large enough to work as dev/test set.

F. weights: Sometimes we may want to assign different weighing to different loss terms. i.e there may be cases where we want to assign a much larger weight to a loss term where a elephant pic is identified  as cat pic, but much lower weight to loss term if a bobcat is identified as a cat. This can be done by multiplying loss term with the weight term and then summing the product. To normalize the sum, we then divide it by the sum of the weights (instead of dividing it by the number of examples). This weight is different than the weights we use to optimize our loss.

G. Human level performance: All ML algo strive to reach human level performance. Bayes error is the lowest error that you can get and for Computer vision, human error is pretty close to Bayes error. So, once you get your ML algo to get to human level performance, you are pretty close to lowest error that's possible. It's very hard to get even incremental improvements to error once you reach human level.

Difference b/w human error and Training set error is called "avoidable bias", as that error gap can be brought closer to 0. The gap b/w training error and dev/test set error is called variance. Both "avoidable bias" and "variance" may be a problem for our ML project, so we have to be careful on which one to target more to get most lowest error on our dev/test set. "Avoidable bias" can be reduced by choosing larger training model (deeper NN), usin better algo as Momentum, RMS prop, Adam, etc. To reduce "varaince", we can use larger training set, use regularization techniques as L2, drop out, etc.

2. ML strategy 2: This talks about following:

A. Analyzing error: It's important to analyze your error, i.e all cat pics that were misclassified. Once we start categorizing these errors in different buckets, we can start seeing exactly where is our ML not working as expected. Sometimes the o/p label itself is incorrect (i.e a cat pic is incorrectly labeled as "non cat" pic). This may or may not be worth it to fix, depending on how severe the issue is. We also have to make sure that our training data and test/dev data come from same distribution, else there will be lot of variance. One way to find out if variance is due to mismatched data b/w training and dev set is to carve out a small percentage of training data as train-dev set, and not use this portion of data as training set, but use it as dev set. If the variance is small on this train-dev set, but large on dev set, than that indicates mismatc b/w train data and dev/test data. To address data mismatch, one other solution is to include as much varied data as possible on the training set, so that ML system is able to optimize across all such data.

B. Build system quickly and then iterate: It's always better to build a barely working system quickly, and thn iterate a lot to fins tune the ML system to reduce errors.

C. Transfer learning: This is where we use a model developed for one ML project, into some other project with minimal changes. This is usually employed in cases, wher we have very little training data to train our ML algo, so we use parameters developed for some other ML project, and just change the o/p layer parameters, or parameters for last couple of layers. This allows us to get very good performance. For ex in radio image diagnosis, NN developed for image recognition may be used, since both applications are similar.

D. Multi task learning: This is where we use same model to do multiple things instead of doing one thing. An ex is autonomous car, where the image recognition model needs to identify images of cars, pedestrians, stop signs, etc all at same time. Instead of building separate NN for each of them, we can build a single NN with many different o/p values, where each o/p value is for a specific task as other car, pedestrian, stop sign, etc.

E. End to End Deep Learning: This is where a NN can take an i/p and give an o/p w/o requiring intermediate steps to do more processing. As an ex, translating audio clip to text, traditionally required many steps of complex pipelining to get it to work. But with large amounts of big data, deep NN just learned from i/p data to produce translation w/o requiring any intermediate pipeline. Sometimes we do divide the task in 2-3 intermediate steps before we implement DL on it, as that performs better. We have both kind of real life examples, where End to End DL works better, as well as cases where breaking it down into 1-2 smaller steps works better.

Indian Visa and OCI Card:

THE BELOW ARTICLE APPLIES ONLY TO INDIANS WHO HAVE MOVED TO USA AND OBTAINED A USA CITIZENSHIP (BY SURRENDURING THEIR INDIAN CITIZENSHIP) AFTER BEING ON GREENCARD FOR A WHILE.

Once you have attained USA citizenship, it's time to get your required documents to travel to India. After you get your USA Citizenship, you are no longer a Indian Citizen (even though you may have an unexpired Indian Passport). This is because Indian Constitution doesn't allow dual Citizenship, so your Indian citizenship is automatically cancelled, once you acquire USA citizenship.

Once you get your USA citizenship, you should apply for a USA passport. it's not possible to enter back to USA after an international travel, if you don't carry a USA passport. Your "Green Card" is marked invalid, so a USA passport is your only way of entry. Once you've gotten your USA passport (see my previous article on how to get USA passport), you are now a foreigner when it comes to visiting India. So, you have o enter India just like a foreigner. If you try to enter India by using your Indian Passport (while still having your USA passport), it's considered illegal. Don't try that, as they may return you from India. It's not worth the risk. Since you have been an Indian Citizen before, or have an Indian origin, you get a preferred treatment in getting documents.

Service Providers:

Indian government hires outsourcing companies to help them with processing of Indian Passports, Visa, OCI, etc. Prior to Oct 2020, "Cox and Kings Global Services" was the outsourcing provider for all kind of immigration services for Indian Community in USA. However since Oct 2020, VFS Global has replaced Cox and King as the outsourcing provider.

https://www.indiawest.com/news/global_indian/vfs-global-to-replace-cox-kings-as-outsourcing-provider-for-visas-passport-services-for-indian/article_4c644bea-0b94-11eb-bb82-43f1c23eb95b.html

If you search on internet for any India related immigration services, you will notice a lot of articles pointing to ckgs website. This will no longer work as of Nov 2020. CKGS website is here: https://www.in.ckgs.us/  => Please don't use it for anything

You should instead use VFS global website: https://www.vfsglobal.com/en/individuals/index.html

 VFS global is in service as of Dec 2020. Choose "from" as USA and "to" as India, and it will open a new window with this link: https://visa.vfsglobal.com/usa/en/ind

This is where you can get all your immigration related services. VFS global is just a outsourcing company. They just collect all the required documents and pass it on to Indian Embassy. Then When Indian Embassy returns the documents back, they pass it on to us. They are our only point of contact. They have no say in anything. So, you can email them and ask them any sort of question, and they usually respond in a couple of days. Do NOT call them as they ask for your credit card number and then charge you $2.50/minute for anything over 5 minutes of talk time. Since they already have your credit card number, they can bill you for any amount, and there's nothing that you can do.

Rantings: This is all a ludicrous process, but there is nothing we can do. They charge so much money, and provide 0 customer service (or charge you by minute for talking to them). Their response via email may come in few days if you are lucky but may take weeks or never come at all. They will have conflicting documents at multiple places for the same application. Some of their upload process will crash the browser, or sometimes you may never be able to get to your partially filled application. Sometimes, the pictures uploaded will get rotated for no reason, and remain that way, unless you start a new application on a different computer. This is how Indian Embassy has always worked. Beware: It's going to be a very frustrating process, but we need the card because there's no alternative. So, suck it up and keep going.

There is a good discussion forum here: https://www.immihelp.com/oci-card-experiences/1/

I'll list the steps as I've gone thru the process, but the process may change anytime w/o notice. The below process worked as of Dec, 2020.

Services:

So, let's look at all the services provided by VFS Global:

1. Indian Passport => If you are in USA, and need any Indian Passport related services (i.e renewing Passport, name change, etc), you won't have to deal with any officials in India. US consulate in USA handle it for you.

2. OCI card => This is needed if you have acquired a foreign citizenship, and still want to go to India w/o getting a visa every time.

3. Renunciation of Indian Citizenship => This is needed if you want to get an OCI card.

 

Trip to India:

If you are in USA, and you need to make a trip to India, then how to proceed depends on whether you are an Indian citizen or not. If you have a valid Indian Passport, then you can enter India anytime. However, if your kids are born here, then they are US citizens, and hence can't enter India w/o needing some additional documents. Similarly if you acquired citizenship of USA, you can't enter India w/o those additional documents. We'll talk about these documents below:

There are 2 kind of document that will allow you to enter India (if you are a citizen of USA). If you are an Indian citizen (with unexpired valid Indian Passport) then you can enter India just showing your Indian Passport, you don't need documents listed below:

1. Visa:

Indian Visa is the quickest way to travel to India. However Indian Visa is short term solution, as it expires in a short time frame (I believe it's 6 months for US citizens of Indian origin), and you will need to re apply for it every couple of years (or may be once out of multiple time you plan to travel to India). I don't even see any option for Indian Visa on the VFS website. Looks like you can apply for Indian Visa directly on Indian govt consular website. This is the link that explains that: https://www.indianembassyusa.gov.in/extra?id=78

On above page, you have to click on link https://www.indianembassyusa.gov.in/extra?id=79 to initiate the Visa process. The requirements are simple. However, there's no detail on the fees, timeline or physical appointment requirement. A lot of information looks out of date.

One of the requirements is that it asks for "copy of renunciation certificate", which I've explained below under OCI section. It takes a month or more to get that, even though it says 2 weeks processing time. So, basically there is no way for you to get immediate "Visa" if you have become a US citizen, and want to visit India. However, looks like there's also an electronic Visa (e-Visa), so you don't need to go anywhere to get the Visa, it's just comes in mail. But I'm not sure about the "renunciation certificate" requirement. That basically makes "immediate Visa" impossible.

I would suggest to go with below option for OCI card. Visa process looks as long as OCI process, and costs as much, I guess.

COVID UPDATE: I read some on some the sites that due to Covid19 , Indian Government is not allowing anyone to enter India on any Visa, so OCI may be your only option if you want to enter India. The only visa they are allowing are emergency Visa, for which you need a certified paper by doctor or someone certifying that it's an emergency. I would rather get OCI while there's still time, rathan than get into the mess of Visa.

2. OCI (overseas citizen of India) Card:

This is the preferred option for traveling to India. This card is for lifetime, so once you get it, you are set for life. However every time your passport renew, you have to renew the OCI card too. So, it's not really lifetime, as you have to again go thru the same torturous process. Before you can apply for an OCI card, you will need to renounce your Indian citizenship. If you don't have this Renunciation certificate or a cancelled Indian Passport, your OCI card won't be processed. So, it's extremely important to finish this step before applying for OCI. If you never had an Indian Passport, then obviously you don't need this step.

TIPS: A few notes before you start the application:

  • The USA passport should be valid for at least 6 months at time of application. I would say to stay safe, your USA passport should be valid for 8 months, since processing and delays may add extra 1-2 months by the time embassy starts working on the application.
  • Photos need to be sent too. See photo spec (it should be similar to what you have for US passport they state that photo attire shouldn't be textured or patterned, and be in light blue color. They are not strict about enforcing that. I used the same photo as I used in US passport, which had a textured attire, and it went fine).
  • You are needed to provide a copy of US Passport. It asks for photocopy of the 1st page with your passport details, along with the last 2 pages. The last 2 pages that it's asking for, are those blank pages which don't have "visa" written on them  (page 26, page 27 in my passport). These last 2 pages are called amendment pages, as these are used for making amendments to your passport, and can't be used for Visa stamping. Indian Officials just want to make sure there's nothing amended to the passport. NOTE: It's NOT the last page that has info about restrictions, customs etc.
  • Some of the documents need to be notarized. Most places charge you $8/page or so for notarizing it. I've never paid any money for notarizing. Most of the big banks as Chase, Bank of America, Wells Fargo, etc will notarize it for free. You can also check at your work place. The admins or other support people many times have notary stamps, and will gladly notarize your forms.
    • You have the option of choosing their Fedex service or your own UPS/USPS service. I would strongly advise to use their FEDEX service. It costs $5 more each way, but it's worth it. You don't want to add an extra variable to an already messy process.
  • These are general tips while filling application. It's going to be a long frustrating process, so keep calm:
    1. use chrome browser to fill in and upload your application. Firefox kept on looping while uploading photos. Chrome too gave issues, but not as bad as Firefox.
    2. Try to complete the whole application in one go from start to finish. If you exit application, and try to get in again, then you may not be able to get in again (depending on where in application process you were and what got saved).
    3. Chrome will crash, if you try to upload document too many times or take too long to upload. So, you will have to start the whole application again from scratch (as the original application will assume that you uploaded the pics/documents, even though you may not have uploaded them because of chrome crashing). You may never know if the docs/pics uploaded or not, since it doesn't show you your uploaded section anymore.
    4. There is an issue with uploading the photo (this is only applicable for OCI applicants and not for renunciation). If you upload the photo in rotated position the first time, subsequent photos uploaded will show as rotated even though they have been uploaded correctly. I changed browsers, started new application, but nothing resolved it. In the end, starting brand new application on a different computer resolved the issue. So be very careful when uploading the photo the first time. Make sure it's not rotated, or you are in for endless torture.
    5. Save option: You can't spend too much time on any page of any application. There is no option to "save". The only options are "save and exit" or "save and continue". If you spend more than 5-10 minutes on any page, it will time out and all your info will be lost. So, keep all info ready for each page, before you start filling it out. I learned the hard way.
    6. There are too many other issues to list here, so consider yourself lucky if you are able to get thru the whole process in the first go.

 

A. Renunciation of Indian Citizenship: This is where the Indian Embassy marks "Cancelled" on your Indian Passport, so that your Indian Passport can't be used for anything anymore. However, they charge about $225 just for providing you this certificate, which is an insane amount of money for something so trivial (you are asking them to cancel your existing document, not to renew or issue a new document). It's just money making business here.

This link explains the steps involved in getting the renunciation certificate: https://visa.vfsglobal.com/usa/en/ind/apply-for-renunciation.

On this link, in step 1, it has a link for renunciation here: https://www.vfsglobal.com/one-pager/india/united-states-of-america/renunciation-of-indian-citizenship/. It has more details on fees, photo, etc (photo shouldn't be in textured attire. also attire shouldn't be white), as well as few documents which are not available anywhere else. We'll download some forms from here later on. For now, just keep this link as reference.

Follow the step by step guide. These are the steps in detail:

  1. GOVT WEBSITE: Apply online at Indian embassy website: https://embassy.passportindia.gov.in
    1. Choose USA as your country of origin. It will open a new window. Register yourself. you will get a link in email.
    2. Click on the link. That activates your account
    3. Now you need to login into your account (by clicking on 3rd box "User login" on right. It's below the register box and track status box)
    4. Apply for "surrender of Indian Passport". You will have a long application to fill online. total of 7 steps.). It takes about 15 minutes to fill.
    5. Once completed, you will be taken to final screen, and then asked to print the application form by going to home page. Once you scroll to the bottom of home page, you will see the application form. click on it, and you will see a "print application" button show up. Print the form.
    6. Make a note of ARN - Application Reference number. You will need this number in step 2 below.
  2. VFS WEBSITE: Once done with step 1, start with VFS website application
    1. click on this link: https://visa.vfsglobal.com/usa/en/ind/apply-for-renunciation
    2. Register by providing your email and password. Once registered, click on link sent in email to activate your account (if you don't click on link in email, you won't be able to log in). If you are already registered, then just log in directly, Do NOT create another account.
    3. Log into your VFS account. click on "create new application"
      1. Select correct entries - category as "Renunuciation of Indian citizenship" and application category as "Naturalized after June  2010 and does NOT have a stamp"
      2. Select online payment (they do charge 3.5% extra, so a cashiers check, etc may also be an option to save couple of dollars)
      3. Next screen asks you to add customer details:
        1. Put Govt Reference number from embassy website in step 1 above.
        2. In top half of form, you enter your USA passport details. Nationality is USA, as you are filling details for USA Passport
        3. In bottom 2 entries (where it asks for "Do you have your most recent Indian passport in your possession?"), you enter your Indian Passport details.
        4. Once you complete the details, you are asked to select Courier service: 2 options - select option 1 below
          1. If you select VFS provided courier (fedex is the only option), then you have to pay $30 for round trip service. This is about $15 more expensive than doing it yourself thru USPS. I don't want to pay these guys anything at all. But  unfortunately I've to, as you can see the reason in next bullet.
          2. If you select "I will use my own courier service", then you need to provide "Incoming and outgoing" courier labels before you can proceed further. Since USPS is the cheapest courier provider, we'll select USPS, and goto a USPS facility, and get 2 courier labels, one with "VFS address" (incoming courier) and other with "Your home address" (outgoing courier). This should cost about $$7.75*2=$15.50. This looks like 2 trips to USPS to save $15. So, i wouldn't recommend this (i.e select option 1). VFS address is
          3. It will show the total amount you owe (~$230), and then ask you to enter your home address (this should match your home address in application). Once address is validated, an you hit "submit" it takes you to the summary screen to "confirm application"
        5. Once you click on "confirm application", it will ask you to pay the money (if paying online). Once you pay the money, it brings you back to your account. At this point, you have completed the online application. Make a note of the "Reference number" listed on top right. 
      4. Now you are done.There is a "application confirmation letter" in your VFS account for you to print. It has list of required documents.
  3. DOCUMENTS: For a list of documents, check this link: https://www.vfsglobal.com/one-pager/india/united-states-of-america/renunciation-of-indian-citizenship/index.html#. Under "documents required" section, there is a checklist pdf file. This checklist and your checklist from "application confirmation letter" on VFS portal have inconsistencies. Below is the list of documents that I believe are required to be submitted (double check in the checklist, but this is all I submitted):
    1. GOVERNMENT APPLICATION ONLINE FORM - This is the application from step 1.5 (from passportindia.gov.in site). It's 2 pages. Paste a Passport picture and sign it on both pages.
    2. PHOTOGRAPH & SIGNATURE - Attach 2 extra hard photos. See photo spec above. NOTE: In our printed form, it asks for digital photo and signature to be uploaded on government website. There is no option anywhere to upload these on government website. This is incorrect requirement (this is needed when applying for OCI, but it's incorrectly copied here)
    3. DECLARATION OF RENUNCIATION OF CITIZENSHIP OF INDIA FORM - 2 copies of Renunciation form to be filled and submitted. You can either fill them in Adobe reader or download them and hand write it. This is the link to download it: https://visa.vfsglobal.com/one-pager/india/united-states-of-america/passport-services/pdf/renunciation-certificate-fillable-form.pdf
    4. PROOF OF ADDRESS - Attach a copy of your driver's license (or any other doc as listed) for this.
    5. PHOTOCOPY OF US/NON-US PASSPORT = This is copy your US passport. Original is not needed. See above in TIPS section for details on which pages are needed.
    6. AFFIDAVIT OF NATURALIZATION AFTER THE EXPIRY OF INDIAN PASSPORT - If yor passport expired before you had the oath ceremony for US Naturalization, then you will need this affidavit. I hand wrote the reason on a plain piece of paper and got it notarized by Bank of America. Reason for the gap may be anything like your US citizenship process took longer then expected, or you had no plans to go to India in near future, etc, etc.
    7. MOST RECENT INDIAN PASSPORT  - Original Indian Passport is to be submitted here. Photocopy of first three pages of the passport and last two pages of the passport is also needed.
    8. NATURALIZATION CERTIFICATE - Photocopy of US Naturalization certificate (one that you get at US Naturalization ceremony).
    9. COURIER LABELS - If you are using VFS for courier services, then they will have a link on left side of your VFS account portal to download the 2 courier labels (one for outgoing and one for return). Attach the outgoing one to the Fedex Envelope that you use to send to VFS. The return label is just to be put inside the packet, along with all the other documents.
    10. APPLICATION CONFIRMATION LETTER - This is another document that you download from your VFS account portal.This has details of customer, money paid and all documents needed. The documents listed here may not be the same as what I've listed above. Don't worry. What I've listed above is more accurate.
  4. MAIL: Put all the above documents in a packet and head to Fedex (if using VFS service) or to USPS (if using your own labels). Paste the address label on the package and send them to VFS. Address is based on your area consulate. For VFS provided labels, address is already there.
  5. STATUS: Hopefully, in 1-2 weeks, you would have your renunciation certificate. Once you have it, you can now apply for OCI card. Here's how to check status:
    1. You can check the status on VFS website: https://visa.vfsglobal.com/usa/en/ind/track-application
      • click on "Track application", and then on new window, enter following:
        1. Applicant Reference Number (ARN) - This is the govt ref number in the form 20-2002895674 or something similar. You need to enter the dash "-" too. Application confirmation letter in step 2.4 has this number.
        2. Date Of Birth (DOB) - Enter date in dd/mm/yyyy format (for 17th May, 1988, enter 17/05/1988)
        3. Click "submit". If it gives "invalid inputs", you may have entered wrong ARN (may be - is missing, or extra space), or date is in wrong format.
    2. You can also check the status of Govt passport website: https://portal4.passportindia.gov.in/Online/index.html
      • click on "Track status" and enter same details as above (ARN and DOB) on the new popup. For me, it kept on saying, no records found. Not sure why? Finally sent an email to: This email address is being protected from spambots. You need JavaScript enabled to view it. (Houston consulate, since that is where my application went), and they responded promptly. You can find the email addr for all consulates here: https://portal4.passportindia.gov.in/Online/index.html. Click on "connect to your embassy/consulate" (4th right tab on bottom). Choose USA as your country, and it will show email addr for all 6 consulates.

 

B. OCI card:

Once done with the renunciation of Indian Citizenship, it's time to apply for OCI card. Remember, renunciation certificate is needed only if you were a Indian Citizen at any time (i.e carried an Indian Passport). If you never were an Indian citizen, then you don't need step A above. For ex, if your kids were born in USA, they don't need "renunciation certificate" above.

This link explains steps involved in getting the OCI card: https://visa.vfsglobal.com/usa/en/ind/apply-oci-services. Follow the step by step guide. These are the general steps for all OCI categories.

There is one other link too. Open the "oci services" link here:  https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/. Look thru and make sure you are applying in the correct category. It has more details on fees, photo, as well as few documents which are not available anywhere else. We'll download some forms from here later on. For now, just keep this link as reference.

Follow the step by step guide. These are the steps in general: (specific steps for each category are listed later in more detail).

  1. GOVT WEBSITE:  Apply online at government OCI website: https://ociservices.gov.in.
    1. Choose correct category for OCI card: There are 4 categories:
      1. New OCI registration: Choose this if you never had a PIO card or an OCI card. This would apply to most of adult Indians applying for the first time (after getting a US citizenship). This may also apply to your kids (as long as they are US citizens with Indian origin. They don't need to have Indian Passports).
      2. OCI registration (in lieu of valid PIO card): Choose this if you have a valid PIO card. This usually applies to your minor kids, if you ever applied for their PIO cards in past.
      3. OCI registration (in lieu of lost PIO card): Same as #2 above, except that the PIO card is lost
      4. OCI misc services: Choose this if you are an OCI card holder, but need to renew or get a new reissued OCI card for any reason.
    2. Most of the people fall in category 1 "New OCI registration". Category 2 and 3 applies to people who had PIO card before. This is going to be applicable to very few people who actaully applied for PIO card before 2015 (as PIO cards are no longer accepted as valid documents). I don't have details on category 4 (OCI misc services), as I haven't used it yet. I suppose the process should be very similar to other categories.
    3. Once you choose the correct category, there is a long form to fill with 2 parts - Part A and Part B. I've details later on how to fill it. Once you complete part A, and move to part B, you can't edit Part A. For any changes, you will need to restart as fresh applicant.
    4. After you complete Part A, you will be asked to upload passport picture and signature.
      • If the pictures you uploaded are within specs, then you will get a green message saying the pictures are good. Pictures have to within size spec, as well as pixel spec. One way to do it is to resize image using paint. I work on linux machine and used the resize option on the picture. As long as the size of the picture is within spec, the inbuilt crop feature on the website will allow to crop and resize the picture, so don't worry if you don't know how to use "paint" or other software. Just reduce the size of picture by cropping it or scanning it at lower DPI, and then adjust the pixel and dimensions using "crop" on this website itself.
      • Make sure that the picture and signature are in right direction. i.e the final thing that displays should be a upright picture with signature on the bottom going left to right. If it's not, rotate the pictures to get correct orientation and then upload them again.
    5. Once upload is done, you move to Part B. Part B is very small. Once done with Part A and Part B, you will be asked to upload supporting documents. These are the general tips:
      • Docs need to be in separate pdf file. Each pdf file can't exceed 500KB. I used the "windows fax and scan" software to scan images at 300 DPI. I then installed "cutepdf" software from https://www.cutepdf.com/index.htm. There are various options here - for our purposes we'll Install the cutepdf writer, which is free. This allows you to print those scanned images into pdf file
        1. After cutepdf is installed, just use the printer button option on any document to print it. It wil show "cutepdf" as one of the printers. (NOTE: select print, not save on menu of scanner or printer. Save will not show the option to save as pdf file). Then we select "cutepdf" as the printer, which saves the file in pdf format (If you click on "save", then it will show you the option of saved file as jpeg only, ther's no option of pdf).
        2. I chose the dpi as 72dpi for this printed pdf file which takes around 100KB per page. This is very low resolution file, however it meets the requirement of keeping each file size to be < 500KB (as some of the documents have 3-4 pages).
        3. We also need to combine these multiple PDF files into one file, as each document is supposed to have only 1 pdf file for all it's pages. For that, we click on "cutePDF editor" link on cutepdf.com website. It opens the software in our browser itself w/o downloading anything. Then on right side, there is a option to "merge pdf" (scroll down to see that option). When you click on that, it allows you to add a number of pdf files, and then clicking OK will combine them into one. Just save that file, and now you have all those pdf into one file.
    6. Once you have completed all steps on ociservices.gov.in website, you are done with ociservices website. Print the application form. It already has a Passport picture and signature on 1st page. It has a File Reference Number (FRN)  in the form "USAH***" or something similar. Note this number as it will be needed later. In the checklist, it asks to affix a Passport picture and sign it. There is no pace to affix the passport picture on form, So, just ignore that. Move on to VFS website now.
  2. VFS WEBSITE: Now apply at VFS website, by creating an account, and filling in details.
    1. click on this link: https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/
    2. click on Apply now. At bottom of this page (in step3), there is a link to "complete questionaire and create profile". click on that.
    3. It will take you over here: https://row1.vfsglobal.com/IHCUSPostalAppointment/IHCQuestionnaire/Index. There are 2 choices here.
      1. If you already created an account with VFS earlier (maybe for renunciation purpose or other purpose), then you can directly log into the account. Hit on "click here" at bottom of that page to go directly to Login page. You don't need to make a new account.
      2. If you don't have an account already, then clicking on "OCI" will bring a form to fill in the details.
        1. Fill in the details. You will have to provide your OCI Govt ref number (aka File Ref number starting with "USA...", the one that we got when we completed application in step 1 on ociservices.gov.in). This is how your VFS application is linked to a particular govt application.
        2. Register by providing your email and password. Once registered, click on link sent in email to activate your account (if you don't click on link in email, you won't be able to log in). Now loginto the VFS account.
    4. Fill in the details, link your www.ociservices.gov.in application to application over here (by entering your govt FRN from step 1.6 above) , and look at the checklist for your particular category.
    5. Print all the paperwork from vfswebsite as well as from ociservices.gov.in website. Collect all supporting documents too.
    6. Also, on ociservices.gov website, it says that all documents need to be self attested. However, that requirement is not mentioned anywhere else, so i never bothered self attesting anything (as it causes more confusion as to where to attest on the document, who will attest for minor, etc). The process works without self attesting anything, but if you are paranoid, go ahead and self attest it. You have valid grounds for not self attesting it, as nowhere on vfs website, do they ask you to self attext it.
  3. DOCUMENTS: Put all the above documents in a packet and head to Fedex (if using VFS service) or to USPS (if using your own labels). Paste the address label on the package and send them to VFS. Address is based on your area consulate. For VFS provided labels, address is already there.
  4. STATUS: Hopefully, in 4-8 weeks, you would have your OCI card. Finally the end of nonsense torture.
    1. You can check the status on VFS website: https://visa.vfsglobal.com/usa/en/ind/track-application
      • click on "Track application", and then on new window, enter following:
        1. Applicant Reference Number (ARN) - This is the govt ref number (i.e File reference number) in the form "USAH***" or something similar. Application form in step 1.6 has this number. This is not the application number that you may get from vfs website.
        2. Date Of Birth (DOB) - Enter date in dd/mm/yyyy format (for 17th May, 1988, enter 17/05/1988)
        3. Click "submit". If it gives "invalid inputs", you may have entered wrong ARN, or date is in wrong format.
    2. You can also check the status of Govt passport website: https://ociservices.gov.in/welcome. 
      • This will show valid status only after the Indian office has received your application. This usually happens few days after vfs has notified you that your application is "In-Transit to the Embassy of India/Consulate General of India".
      • Over there, click on status inquiry. On new window, enter following:
        1. File Reference Number (FRN) - This is the govt file ref number in the form "USAH***" or something similar. It's same number as you entered for tracking above.
        2. Passport number - Enter your USA passport number
        3. Click "submit". If it gives "invalid inputs", you may have entered wrong FRN, or may have spaces somewhere. If it says "no such records found" and it has been a couple of days, then you need to contact them. Their email addr is:This email address is being protected from spambots. You need JavaScript enabled to view it.. They never responded to me, so it may be hit or miss.

Below is the link for each of the category. I've specific details below. This is on top of what has been provided above.

1. New OCI Registration (category 1 for Adult/Minors)

Below is the complete set of steps for Adults (I've added additional steps for MINOR where needed):

  1. GOVT WEBSITE: Apply online at government OCI website: https://ociservices.gov.in.
    1. Choose correct category for OCI card. Here Choose "New OCI Registration". Click Proceed. A pop up box appears. click OK. There is no separate option for Minors, so the steps below apply to Minors too. Just that if the Minor is less than 5 years old, you need his/her thumb impression (left thumb for boys and right thumb for girls).
    2. The form now needs to be filled up. Long form with 2 parts (Part A and B) and bunch of docs to be uploaded.
      • Sample form here: https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/pdf/new-oci-sample-form.pdf
      • There is another one here too (Not sure which one is the latest one, there are some differences): https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/pdf/oci-government-sample-form.pdf
      • Part A: This is a long form with 2 pages.
        • Page I:
          • Under Applicant's info, enter details as in Indian passport. For current nationality, choose "United States". For visible marks, write "NONE". NOTE: "given name" includes your first name and middle name, while surname includes only your last name. Make sure, it matches exactly with what's in your Indian Passport.
          • Under Passport details, enter your USA passport details. For place of issue, write "USDOS". Enter your previous USA passport number under "Previous Passport number", if you had more than 1 USA passports.
          • Under Applicant's Family details, enter information as per your indian Passport. Parents name should match exactly as what's mentioned on your Indian Passport (last page)
          • Under spouse details, enter your spouse details. It asks for this info, even though you are NOT applying for OCI based on your spouse. Under "Relation with Root indian", select "self". I left spouse Passport details empty, as they weren't mandatory. Also, wasn't sure, what Passport details (USA or Indian) was it asking for? Most likely USA, but again why bother.
        • Page II:
          • Enter Occupation details. If no applicable category, choose "Others". Provide your employer address and contact info.
          • Enter your address, email, phone details. 10 digit phone number with no dashes is what I used.
        • Next page asks you to verify all information entered. You will see that most of spouse details are not present here (meaning they were useless to start with).
        • After you complete Part A, you will be asked to upload passport picture and signature. You are also given your File Reference number - note it down. Click on "Yes, I'm ready". It will bring, new pop up box. Browse the correct images for passport photo and signature, and click on upload. Look above on tips for uploading.
        • Once done, it will ask you to confirm that everything looks correct and then move to part "B". Once you move to Part B, you can't come back to Part A.
      • Part B: This is a much simpler form.
        • Enter Citizenship details, All questions are answered"no" except the first question which is for "other memebers who have applied for OCI". Enter details. If OCI is in progress for some other member (i.e spouse or kids), there's no need to mention that, as OCI has not been granted yet.
        • Enter nationality details - date you were naturalized. and renunciated Indian Passport number.
        • For any relatives staying in India, enter details of some relatives. You can enter Parents, Grandparents, Aunts, Uncles, etc. I just entered it for parents for adults or grandparents for minors (1 or 2 names are enough).
      • Once done with Part A and Part B, you will be asked to upload supporting documents. Following are the docs needed (see above general section for tips on uploading):
        • Current USA passport - page with passport details and last endorsement page (merged as 1 pdf)
        • Naturalization certificate - 1 pdf
        • Renunciation Certificate - This is the certificate that was delivered along with your cancelled Indian passport, when you applied for "Renunciation". It doesn't specifically ask for this under ADULT checklist, but asks for this under MINOR checklist. Better to upload this, if you have it.
        • Cancelled Indian Passport - first page with passport details, the page with "surrender of passport" written on it, as well as the last page with personal details. Combine all of this as 1 pdf and upload it. On the checklist, it asks for this as "Indian Origin proof". Upload it under that category, as there is no separate category for uploading "Cancelled Indian passport".
        • MINORS ONLY: If a minor is applying based on the parent's Indian origin, then both parents need to upload the below relationship documents (assuming Minor is a US citizen). Minors just need copy of their current US passport along with the below documents:
          • Birth certificate of Minor showing Parents name
          • Notarized parental Authorization form: Download from here: https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/pdf/parental-authorization.pdf. Needs to be signed by both the parents and notarized. See Tips section above on how to do notary for free.
          • Both Parents Indian Passport copy (1st 3 pages and last 2 pages) + OCI card copy (if OCI card granted to parents)
          • Copy of Naturalization certificate of both parents => To show current nationality
          • Copy of US passport (1st page and last 2 pages) of both parents => To show current nationality. This may not be needed, but why risk it?
          • Copy of Parents' marriage certificate => This is difficult to get. If you don't find it, write reason on piece of paper and notarize it. Birth certificate of the minor has name of both the parents anyway, so that should suffice, I guess. Not sure on why insist on marriage certificate ??
    3. Once you have completed all steps on ociservices.gov.in website, you are done with ociservices website. There is an option to print form. Download and Print it as we'll need to send it along with other documents. Move to VFS application now.
  2. VFS WEBSITE: Now let's start with VFS website application
    1. click on this link: https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/. Click on "new oci registration".
    2. Once you log into your VFS account. click on "create new application". 
      1. fill in all entries. Main category is "OCI", while Application category is "New OCI"
      2. On next page, add customer and the details.. Enter OCI Govt reference number that you got in step 2 bove on ociservices.giv.in website. It starts with "USA...". Passport details are for USA passport. Fill in your Father's name from Indian Passport. Click submit when done.
      3. On next page, select "VFS offered courier service". It will show total charges as around $350 (since it's $30 extra for using VFS provided fedex courier labels for OCI + $15 extra for returning US passport. Consular fee is $275 + other misc charges). Click continue and provide mailing address. Click validate and then submit.
      4. check "accept terms" on next page. Make a note of "application reference number". This is a new number for your VFS application. You will need this number to download "courier labels" in next step. Click on"confirm application". It will ask you for credit card details to pay the fees.
      5. After paying fees, you will see links to download "application conformation letter" and "courier labels". Download and print them for later use.
  3. DOCUMENTS: Now you are done.The "application confirmation letter" in your VFS account as well as the checklist below has a list of required documents. Below are the required documents:
    1. CHECKLIST - This checklist below is to be printed out and signed on page 5. It has a list of all the documents to be sent with the application. On the top of each page, there are applicant's details to be filled in. You can hand write those details. Passport number to be written is for USA.
    2. GOVERNMENT APPLICATION ONLINE FORM - This is the application from step 1 (from ociservices.gov.in).  It's 7 pages (combining both part A and part B). I just send an extra picture, since there's no place to affix the picture. Page 4 (or the last page of form before the instructions start) has a place to sign that needs to be signed by the applicant.
      • MINORS ONLY: If the applicant is a minor, then both parents need to sign and notarize the form (lower half of page 4). It again says in the place where parents were supposed to sign as "signature of the applicant". So, to be safe, have the minor sign it as well as both the parents. So, for a minor, you will need to go to the notary office twice - initially to get the parental Authorization form notarized, which is then uploaded on govt website, which then generates the govt online form (in step 3). This form then again needs to be notarized. Just more non sense. Very imp to get it notarized or your application will be rejected. 
    3. PHOTOGRAPH & SIGNATURE - Attach 2 extra hard photos. NOTE: Here again it asks us to affix photo on physical govt application form. Since there is no place to affix it, send 3 photos instead of 2.
    4. PROOF OF ADDRESS - Attach a copy of Driver's license (or any other doc as listed) for this.
    5. PHOTOCOPY OF USA PASSPORT = This is your US passport. Just photocopy 1st page with your passport details, along with the last 2 endorsement pages. See Tips above.
    6. ORIGINAL USA PASSPORT = This is your US passport. Original is also needed for verification.They will return it back to you in a week using one of the courier return labels
    7. PROOF OF RENUNCIATION: Copy of renunciation certificate (It was issued when you applied for "Renunciation").
    8. PROOF OF INDIAN ORIGIN: Photocopy of first three pages and last two pages of the passport is needed. I would also include the page which has "Surrender of Passport" written on it (if it's not in 1st 3 or last 2 pages).
    9. NATURALIZATION CERTIFICATE - Photocopy of US Naturalization certificate (one that you get at US Naturalization ceremony).
    10. COURIER LABELS - If you are using VFS for courier services, then they will have a link on left side of your VFS account portal to download the 3 courier labels (one for outgoing and one for return. The 3rd one is to return your US Passport). Attach the outgoing one to the Fedex Envelope that you use to send to VFS. The 2 return labels are just to be put inside the packet, along with all the other documents.
    11. APPLICATION CONFIRMATION LETTER - This is another document that you download from your VFS account portal.This has details of customer, money paid and all documents needed. The documents listed here aren't really there (i.e no hyperlink or anything)
    12. MINORS ONLY: Since Minors may not have had Indian Passport, and are applying based on their parents origin, they need to submit additional documents:
      • Birth certificate of Minor showing Parents name
      • Notarized parental Authorization form (PAF): Download from here: https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/pdf/parental-authorization.pdf. Needs to be signed by both the parents and notarized. See Tips section on how to do notary for free.
      • Both Parents Indian Passport copy (1st 3 pages and last 2 pages) + OCI card copy (if OCI card granted to parents)
      • Copy of Naturalization certificate of both parents => To show current nationality
      • Copy of US passport (1st page and last 2 pages) of both parents => To show current nationality. This may not be needed, but why risk it?
      • Copy of Parents' marriage certificate => This is difficult to get. If you don't find it, write reason on piece of paper and notarize it. Birth certificate of the minor has name of both the parents, so that should suffice, I guess??
  4. MAIL: So, in all the documents above, you just need to sign at 2 places - one in Checklist and other in Government application form. (For Minors, parents need to sign in 2 additional place as noted above - one in page 4 of govt application form, and other in notarized PAF ). Put all the above documents in a packet and head to Fedex (if using VFS service) or to USPS (if using your own labels). Paste the address label on the package and send them to VFS. Address is based on your area consulate. For VFS provided labels, address is already there.
  5. STATUS: Check status as explained in general section above.

 

2. Conversion from PIO to OCI (category 2 and 3 for Adults/Minor):

Below is the complete set of steps for Adults (I've added additional steps for MINOR where needed):

  1.  GOVT WEBSITE: Apply online at government OCI website: https://ociservices.gov.in.
    1. Choose correct category for OCI card. Here Choose "OCI in lieu of valid PIO" or "OCI in lieu of lost PIO" as needed. Click Proceed. A pop up box appears. click OK. The steps below are for category 2 (OCI in lieu of valid PIO), but nearly same steps should apply for category 3 (OCI in lieu of lost PIO). There is no separate option for Minors, so the steps below apply to Minors too. Just that if the Minor is less than 5 years old, you need his/her thumb impression (left thumb for boys and right thumb for girls).
    2. The form now needs to be filled up. Long form with 2 parts (Part A and B) and bunch of docs to be uploaded.
      • Sample form here: https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/pdf/oci-government-sample-form.pdf
      • Part A: This is a long form with 2 pages.
        • Page 1:
          • Place of birth should be exactly as in USA Passport - for ex, if it's TEXAS, USA, enter "TEXAS USA" without the comma.
          • For visible marks, write "NONE"
          • For Parents nationality, write "USA" (if parents have become US citizens), else write "India".
          • For place of issue, write "USDOS"
        • Page 2: Enter PIO card details
        • Next page asks you to verify all information entered.
        • After you complete Part A, you will be asked to upload passport picture and signature. You are also given your File Reference number - note it down. Click on "Yes, I'm ready". It will bring, new pop up box. Browse the correct images for passport photo and signature, and click on upload. Look above on tips for uploading.
        • Once done, it will ask you to confirm that everything looks correct and then move to part "B". Once you move to Part B, you can't come back to Part A.
      •  Part B: This is a much simpler form.
        • Enter Citizenship details, All questions are answered"no" except the first question which is for "other memebers who have applied for OCI". Enter details. If OCI is in progress for some other member (i.e spouse or kids), there's no need to mention that, as OCI has not been granted yet.
        • Enter nationality details - date you were naturalized. and renunciated Indian Passport number.
        • For any relatives staying in India, enter details of some relatives. You can enter Parents, Grandparents, Aunts, Uncles, etc. I just entered it for parents for adults or grandparents for minors (1 or 2 names are enough).
      •  Once done with Part A and Part B, you will be asked to upload supporting documents. Following are the docs needed (see above general section for tips on uploading):
        • PIO card - first and last page
        • Current USA passport - page with passport details and last endorsement page
        • MINORS ONLY: Parental Authorization form - Notarized and filled in. See above for notary tips.
    3. Once you have completed all steps on ociservices.gov.in website, you are done with ociservices website. There is an option to print form. Download and Print it as we'll need to send it along with other documents. Move to VFS application now.
  2. VFS WEBSITE: Now let's start with VFS website application
    1. click on this link: https://services.vfsglobal.com/one-pager/india/united-states-of-america/oci-services/. Click on "oci registration (in lieu of valid/lost PIO card)". See above on how to create an account and log in.
    2. Once you log into your VFS account. click on "create new application". 
      1. fill in all entries.  Application category is "OCI in lieu of valid/lost PIO"
      2. On next page, add customer and the details.. Enter OCI Govt reference number that you got in step 2 bove on ociservices.giv.in website. It starts with "USA...". Passport details are for USA passport. Fill in your Father's name from Indian Passport. Click submit when done.
      3. On next page, select "VFS offered courier service". It will show total charges as around $150 (since it's $30 extra for using VFS provided fedex courier labels for OCI). clcik continue and provide mailing address. Click validate and then submit.
      4. check "accept terms" on next page. Make a note of "application reference number". This is a new number for your VFS application. You will need this number to download "courier labels" in next step. Click on"confirm application". It will ask you for credit card details to pay the fees.
      5. After paying fees, you will see links to download "application conformation letter" and "courier labels". Download and print them for later use.
  3. DOCUMENTS: Now you are done.The "application confirmation letter" in your VFS account as well as the checklist below has a list of required documents. Below are the required documents:
    1. CHECKLIST - This checklist below is to be printed out and signed on page 5. It has a list of all the documents to be sent with the application. On the top of each page, there are applicant's details to be filled in. You can hand write those details. Passport number to be written is for USA.
    2. GOVERNMENT APPLICATION ONLINE FORM - This is the application from step 1 (from ociservices.gov.in).  It's 7 pages (combining both part A and part B). I just send an extra picture, since there's no place to affix the picture. Page 4 (or the last page of form before the instructions start) has a place to sign that needs to be signed by the applicant.
      • MINORS ONLY: If the applicant is a minor, then both parents need to sign and notarize the form (lower half of page 4). It again says in the place where parents were supposed to sign as "signature of the applicant". So, to be safe, have the minor sign it as well as both the parents. So, for a minor, you will need to go to the notary office twice - initially to get the parental Authorization form notarized, which is then uploaded on govt website, which then generates the govt online form (in step 3). This form then again needs to be notarized. Just more non sense. Very imp to get it notarized or your application will be rejected. 
    3. PHOTOGRAPH & SIGNATURE - Attach 2 extra hard photos. NOTE: Here again it asks us to affix photo on physical govt application form. Since there is no place to affix it, send 3 photos instead of 2.
    4. PROOF OF ADDRESS - Attach a copy of Driver's license (or any other doc as listed) for this.
    5. PHOTOCOPY OF USA PASSPORT = This is your US passport. Just photocopy 1st page with your passport details, along with the last 2 endorsement pages. See Tips above.
    6. ORIGINAL PIO: Original PIO card , and copy of all pages of PIO card
    7. AFFIDAVIT: Affidavit in lieu of originals. this needs to be notarized. For minors, parents take this oath.
    8. COURIER LABELS - If you are using VFS for courier services, then they will have a link on left side of your VFS account portal to download the 2 courier labels (one for outgoing and one for return). Attach the outgoing one to the Fedex Envelope that you use to send to VFS. The return label is just to be put inside the packet, along with all the other documents.
    9. APPLICATION CONFIRMATION LETTER - This is another document that you download from your VFS account portal.This has details of customer, money paid and all documents needed. The documents listed here aren't really there (i.e no hyperlink or anything)
    10. MINORS ONLY: Since Minors are applying based on their parents origin, they need to submit additional documents:
  4. MAIL: So, in all the documents above, you just need to sign at 2 places - one in Checklist and other in Government application form. (For Minors, parents need to sign in 2 additional place as noted above - one in page 4 of govt application form, and other in notarized PAF ). Put all the above documents in a packet and head to Fedex (if using VFS service) or to USPS (if using your own labels). Paste the address label on the package and send them to VFS. Address is based on your area consulate. For VFS provided labels, address is already there.
  5. STATUS: Check status as explained in general section above.

 

3. Misc OCI services, lost OCI, etc (category 4 and beyond):

I don't have details on these, but if you read thru any of the categories above, you will get a good idea of how to proceed. Steps should look pretty similar to other categories of OCI.

 

Indian Passport:

For any Passport related services, you have to use VFS. Indian Passport services in USA are much faster. This is the link to apply for renewal/new Indian passport:

https://visa.vfsglobal.com/usa/en/ind/apply-passport

Steps are same as what we saw above for OCI services. You can read thru above non sense and apply it here too. I guess it's no less torture than any other application category.

 

GOOD LUCK TO EVERYONE ON COMPLETING THIS TORTUROUS JOURNEY cry

Programming Frameworks:

When learning NN, we wrote a lot of our functions for finding out optimal weights. There were too many parameters to tune, too many algorithms to chase, and writing each of them from scratch for each project isn't very productive. So, idea is to write python modules to do our job. AI/ML people came up with programming frameworks. Programming frameworks provide all these functions pre written for us in a compact library, with a lot of additional features for speed, efficiency, etc. Most popular AI frameworks are PyTorch, TensorFlow, Keras, etc. These are all open source. 

TensorFlow:

TensorFlow (tf) is one of the programming frameworks used in AI. It was developed by google and is open source now. TensorFlow framework, provides a collection of libraries to develop and train models using  pgm languages as python, javascript, etc. for ML We'll concentrate on using TensorFlow in Python only, since tf is most widely used with python. Its flexible architecture allows for the easy deployment of computation across a variety of platforms (i.e CPUs, GPUs, TPUs).

Official website for tensorflow is:

https://www.tensorflow.org

This is a good place to get started with basic syntax and installation:

https://www.tensorflow.org/guide

Gotchas of TensorFlow:

Caution: If you start learning tensorflow, there's actually no clear tutorial or simple documentation for this. So, you learn by examples. you write some cryptic looking code, and it does the job. It's very hard to see why it works, how it works and how to debug it if it fails. In raw python, you can just debug by writing your debug code and having enough "print" statements to see where did it go wrong. In tf, a lot of steps are combined into one cryptic function, and if it doesn't return the desired result, there's little cryptic looking help. There's TensorBoard that supposedly helps you in this debug process. I haven't tried that yet. A lot of AI folks hate tensorflow for it's obscure programming style. One such rant here:

http://nicodjimenez.github.io/2017/10/08/tensorflow.html

A lot of these complains are about initial version of TF known as Tensorflow 1. So, google came out with new revision of Tensorflow, called TensorFlow 2, which supposedly is better than earlier version. More details below.

Installation:

Tensorflow is installed as a python module, just like any other module. tf pkg are available as tensorFlow 1 and tensorFlow 2:  tf 1 is older tf, while tf 2 is newer one.

TensorFlow 1: This is original TensorFlow pkg (one with lots of complaints). Version 1.0 of TensorFlow was released in 2017. Final version of TensorFlow 1 is 1.15. For TensorFlow 1.x, CPU and GPU packages are separate:

  • tensorflow==1.15 —Release for CPU-only
  • tensorflow-gpu==1.15 —Release with GPU support

TensorFlow 2: When Facebook released their own ML framework called pyTorch, it immediately started gaining ground against TF.  By 2018, popularity of TensorFlow started declining. Pytorch seemed more intuitive to people. So, google made a major version release of TensorFlow named as TensorFlow 2. This was released in 2019. TensorFlow 2.0 introduced many changes, the most significant being TensorFlow eager, which changed the automatic differentiation scheme from the static computational graph, to the "Define by run" scheme originally made popular by Chainer and later PyTorch. Here CPU and GPU pkg are in one.

Migration from TF1 to TF2:

We can write our code in TF 1, and then migrate that code to be able to run in TF 2, by applying very few changes to TF 1 code. This link shows how:

https://www.tensorflow.org/guide/migrate

Install TensorFlow 1:

TensorFlow gets installed by installing "tensorflow using pip3". Documentation doesn't say which major version gets installed by running the install cmd. It looks like tf 2 gets installed by default, provided your system meets the requirements. However tf2 needs newer python and pip3 versions. Basic tf installation needs python 3.5 or greater and pip3.  For tf 2 we need python 3.8 or greater (not sure??) and pip3 version 19.0 or greater. First check the versions, to find out if tf can even be installed or not, and if so, which major version.

$ python3 --version => returns Python 3.6.8 on my local m/c
$ pip3 --version => returns "pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)" on my local m/c

TensorFlow can be installed on CentOS 7 via following cmd in a terminal (assuming pip is installed).

sudo python3.6 -m pip install tensorflow => Type exactly as is. If you omit any of the options (i.e not doing sudo or or not using python3.6), the cmd will give you a lot of errors, and won't be able to install tensorflow for python 3.6. After installing, check in python3.6 dir to make sure the package is there:

$ ls /usr/local/lib64/python3.6/site-packages/ => shows following tensorflow related new dir. As can be seen, tensorFlow 1.14 got installed (probably due to not meeting python3 and pip3 requirements for tf 2). even though the last release for tensorflow 1 is 1.15, we see that 1.14 got installed (and not the latest 1.15). This may be due to some system requirements not being met.

tensorflow-1.14.0.dist-info/

keras_applications/                
Keras_Applications-1.0.8.dist-info/   
keras_preprocessing/      
Keras_Preprocessing-1.1.2.dist-info/    

TensorFlow 1 vs TensorFlow 2: TensorFlow 2 is radical departure from TensorFlow 1, and if you planning to learn Tensorflow, don't even bother about tf 1. Just learn tf 2 and you will saved from a lot of grief. However we are going to learn tf 1, as that is what gets used in Coursera courses in Deep learning. If we use tf 2, we may not be able to get our programming assignments to work (s they are written for tf 1), as tf is still cryptic enough, and any bug is not easy to debug..

NOTE: Documentation on google site doesn't mention what version exactly gets installed when you install tensorflow as above. Nor is there any documentation to help us understand how to install only tf1 or tf2. It just happened that tf1 v14 go installed for me. From the tons of warnings that I receive from the installed version, looks like not everything got installed the right way for tf 1. However the warnings seemed benign, so I continued on. I didn't try to update python3 and pip3 to see if tf 2 would get installed. Python3 latest version may not be available on many linux distro, and may even break your other python applications. It's always a major risk to update python3 and pip3, so I would be careful to do that on my system. So, let's live with tf 1 for now.

Syntax: Even though we have installed tf 1, all documentation on tensorflow.org refers to tf 2. My notes below are relevant for tf 1, but I'll highlight tf 2 wherever applicable.

Tensorflow is like any other module in Python. So, all cmds of python remain same, except that we call functions/methods in tensorflow as "tf" followed by dot and the function/method name (i.e tf.add() just as we do for any other module in python). Some of these functions/methods have to be run a certain way though, which is the start of mystic coding style of tf. We'll see details later.

After installing tf, run a quick pgm to see if everything got installed correctly. Name file as "test_tf.py".

#!/usr/bin/python3.6

import math
import numpy as np
import tensorflow as tf

print(tf.version) #print version

We see that on running above pgm, we get tons of warnings as below: those are OK. Also, the version gets printed with the file name and module v1. This indicates it's tf 1. For tf 2, we would see v2 instead of v1 (my hunch??).

/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. => These are the warnings ...
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])

.... and so on ....

<module 'tensorflow._api.v1.version' from '/usr/local/lib/python3.6/site-packages/tensorflow/_api/v1/version/__init__.py'> => this line shows "v1" implying it's tensorflow 1. So, our TF 1 got installed correctly !!

Tensor Data Structure

TensorFlow is "flow of tensors". Tensors are used as the basic data structures in TensorFlow language. Tensors represent the connecting edges in any flow diagram called the Data Flow Graph. Tensors are defined as multidimensional array or list with a uniform type (called a dtype). If you're familiar with NumPy, tensors are (kind of) like np.arrays. Tensor objects are rep as tf.Tensor.

You can see all supported dtypes at tf.dtypes.DType(). Some of the Dtype objects are float, int, bool and string (i.e tf.float16/32/64, tf.int16/32/64, tf.unit16/32/64,  tf.bool, tf.string and few more). These look same as numpy dtype (array1.dtype). so, there's no difference b/w the two, except that one is for numpy, while other one is for tf. Not sure, why they had to define their own data type, when they are the same as numpy data types.

NOTE: All tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one.

When writing a TensorFlow program, the main object that is manipulated and passed around is the tf.Tensor. A tf.Tensor object has a shape/rank (dimensions as rows, columns, etc. same as shape in numpy) and dtype (data type of tensor elements,  all of which need to be of the same data type). We operate on these tensor objects, i.e add, multiply, etc (just like what we do for any arrays).

Tensors can be of any rank (i.e dimension). See in python - numpy section for details on arrays.

Rank 0 Tensor: Rank 0 means it's a scalar and not an array. ex: 4. All other higher ranks of array are vectors.

Rank 1 Tensor: Rank 1 is an array with 1 dim. Ex: [2, 3, 4]

Rank 2 Tensor: Rank 2 is an array with 2 dim. Ex: [ [2.1, 3.4], [3.5, 4.0] ]

And so on for higher dim Tensors.

NOTE: comma are needed to separate individual elements as in numpy arrays. numpy arrays can be used in many TensorFlow functions/methods. To carry out any computation on these tensors, as matrix multplication, matrix addition, etc, we use tf functions/methods instead of numpy functions/methods. Just as in numpy, where we define arrays of same data type for a given array, in a tensor, the values present in a tensor hold an identical data type with the known dimensions of the array. So, tensors are same as arrays in numpy for all practical purposes.

ex: tensor_2d = np.array([(1,2,3,4),(4,5,6,7),(8,9,10,11),(12,13,14,15)]) => declares a 2D tensor., and can be used in tensor operations NOTE: tensor_2d is just 2D numpy array.

Specialized Tensors: Constants, Variables, and Placeholders: We'll learn about how to create Tensors of different data types, and different ranks. There are many functions available to do this, but we'll look at 3 most important ones.

1. tf constants: TensorFlow constant is the simplest category of Tensors. It is not trainable and does not have a fixed dimension. It is used to store constant values. "constant" function is used here to declare constants of any rank.

syntax: constant(value, dtype=None, shape=None, name=’ Length ’, verify_shape=False ) => where, value is a constant value that will be used; dtype is the data type of the value (float , int, etc.); shape defines the shape of the constant (it’s optional); name defines the optional name of the tensor, and verify_shape is a Boolean value that will verify the shape.

ex: L=tf.constant(10, name="length", dtype=tf.int32) => Defines constant 10, with name "length" and of type int32.

print("L=",L) => prints L= Tensor("length_1:0", shape=(), dtype=int32) => This shows that the object is a Tensor object with shape blank (since it's a scalar) and type int32. NOTE: it doesn't display the value of constant. In tf 1, values are computed when session is run. We'll learn running sessions later. In TF 2, looks like the data is printed right here, even w/o running the session (that is what google tensorflow tutorials show).

ex: c = tf.constant([[4.0, 5.0 1.2], [10.0, 1.0 4.3]]) => Defines a rank 2 tensor. Since type not specified, it's automatically inferred based on contents of tensor. Here type is float32.

print("c=",c) => prints c= Tensor("Const_2:0", shape=(2,3), dtype=float32) => This shows object is a Tensor of rank=2 with shape=(2,3). As expected, type is assigned as float, even though we never explicitly assigned the type. Name here is "Const_2.0", since we didn't assign a specific name.

2. tf placeholders: TensorFlow placeholder is basically used to feed data to the computation graph during runtime. Thus, it is used to take the input parameter during runtime. We need to use the feed_dict method to feed the data to the tensors during session runtime. How to do this is explained later. Function "constant" discussed above had a constant value assigned at time of declaration, but here we assign value when running the session. Declaration of TensorFlow Placeholder is done via function "placeholder"

syntax: placeholder(dtype, shape=None, name=None) => Here, dtype is the data type of the tensor; shape defines the shape of the tensor, and name will have the optional name of the tensor.

ex: L2= tf.placeholder(tf.float32) => placeholder of type float32. Here we didn't define the shape, so any shape tensor can be stored into it.

print("L2=",L2) => prints L2= Tensor("Placeholder:0", dtype=float32). => Note: name here is "Placeholder:0" since we didn't specify a name.

ex: sess.run(L2, feed_dict = {L2: 3}) => This assigns a value of 3 to L2 during session run time. NOTE: We have to put L2 as 1st arg in sess.run to actually run Tensor "L2". If we don't do that, it will error out.

ex: L2= tf.placeholder(tf.float32, shape=(2,3)) => Here it's array of rank=2. NOTE: order of args have to be the same as defined in syntax above. Else it errors out.

print("L2=",L2) => prints L2= Tensor("Placeholder:0", shape=(2, 3), dtype=float32)

ex: sess.run(L2, feed_dict = {L2: [[2, 3, 1],[1, 2, 1]]}) => This assigns array values as shown to L2 during session run time.

3.tf Variables: These are variables used to store values that can change during operation. We can assign initial values and they can store other values later. These are similar to variables in other languages. We use function "Variable" to define a var. tf Variables act and feel like Tensors and are backed by tf.Tensor. Like tensors, they have a dtype and a shape, For all purposes, we can treat them as Tensors.

ex: Here we define a constant, and then create a var using that constant as the initial value. We don't define type and shape as they are automatically inferred. We don't define a name either.

my_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]]) => Here we define a constant
my_var
= tf.Variable(my_tensor) => Here we defined a variable "my_var" which has the initial value defined by the constant above. It's shape is automatically inferred to be (2,2) and type as float32.

print("my_var = ",my_var) => prints my_var = <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref> => NOTE: here it doesn't print "Tensor" but instead prints "tf.Variable" as Varaibles are not Tensor objects, but are backed by tf.Tensor. We have to explicitly convert them to tensors (by using tf.convert_to_tensor func explained later), if any function requires a Tensor as i/p.

ex: my_var = tf.Variable([[1.0, 2.0], [3.0, 4.0]]) => This is exactly same as above. We just put the initial value of variable into the function itself.

ex: Var= tf.Variable(tf.zeros((1,2)), dtype=tf.float32, name=”Var1”) => Here we create 2D tensor with shape=(1,2) named "Var" that we init to 0. See syntax of tf.zeros later.

print("var1 = ",Var1) => prints var1 =  <tf.Variable 'Var:0' shape=(1, 2) dtype=float32_ref>

Non trainable variables: Although variables are important for differentiation, some variables will not need to be differentiated. You can turn off gradients for a variable by setting trainable to false at creation. An example of a variable that would not need gradients is a training step counter.

ex: step_counter = tf.Variable(1, trainable=False) # initial value is assigned to 1. However by declaring it as non-trainable, we prevent it from differentiation.

ex: Var=tf.Variable( tf.add(x, y), trainable=False) => Here we define variable "Var", which is sum of tensors/array x,y, but we don't init to anything. This is because init values are picked up from x, y.

Irrespective of whether we inittialized variables or not, actual initialization of these variables does NOT take place at time of defining,  but when we run func "global_variables_initializer()".

ex: init= tf.global_variables_initializer() #Here we assigned this func to "init". Now, initialization takes place when we run init as "sess.run(init)"

tf.get_variable() => Gets an existing variable with these parameters or create a new one. Not sure about the diff b/w tf.Varaiable() and tf.get_variable(). May be here we get many more options for initialization, regularization, etc.

Syntax: tf.get_variable(name, shape=None, dtype=None, initializer=None, regularizer=None, ... many more options):

ex: tf.get_variable("W1", [2,3], initializer = tf.contrib.layers.xavier_initializer(seed = 1)) => This creates a var "W1" with shape=(2,3) and initializer set to Xavier initialization.

Difference b/w constants, placeholders and variables: constants are easy = their value remains fixed. Placeholders are like constants, but they allow us to change their values at run time so that we can run the pgm with many different values. Variables are like variables in any other pgm language => They allow us to store results of any computation.

shape, type, numpy: We can get shape or type of any Tensor by using Tensor.shape, Tensor.dtype, etc (i.e my_var.shape, my_var.dtype). We can also convert Tensors to numpy by using Tensor.numpy() (i.e my_var.numpy() will print array [[1.0, 2.0], [3.0, 4.0]] => However, in my installation, it gives an error  => AttributeError: 'RefVariable' object has no attribute 'numpy'.

TensorFlow programs: Once we have created tensors (constants, placeholders, variables), we can use these in TensorFlow programs. Making tf pgm involve three components:

  • Graph: It is the basic building block of TensorFlow that helps in understanding the flow of operations.
  • Tensor: It represents the data that flows between the operations. Tensors are constants/variables that we created above. operations are add, multiply, etc. In the data flow graph, nodes are the mathematical operations and the edges are the data in the form of tensor, hence the name Tensor-Flow.
  • Session: A session is used to execute the operations. Session is the most important and odd concept in TF. More details later.

Writing and running programs in TensorFlow has the following steps:

  1. Create Tensors (constants, placeholders, variables, as shown above) that are not yet executed/evaluated.
  2. Write operations between those Tensors (i.e multiply, add, etc). These operations can be done via tf functions as add, mul, etc or by using plain +, * etc as these operators are overloaded in tf (since these are originally in python). We can also use numpy arrays as i/p to Tensor operators, as the arrays will automatically be converted into Tensors. When you specify the operations needed for a computation, you are telling TensorFlow how to construct a computation graph. We put them in computation graph, but we haven't run them yet. This is different than what we do in conventional programming, where computation is carried out, as soon as we write the operation. The computation graph can have some placeholders whose values you will specify only later.
  3. Initialize your Tensors. Constants are already initialized via func "constant()", but variables need to be initialized using func "global_variables_initializer()" shown above.
  4. Create a Session using function "Session()". A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.
    • syntax: tf.Session(target= ' ', graph=None, config=None). Usually no args are provided, so, just call Session() and pass the handle to a var.
    • ex: sess=tf.Session().
  5. Run the Session, using method "run()" on that session.  By running the session we can get values of Tensor objects and results of operations.This will run the operations you'd written above. You have to specify the function/method inside run() to run that particular func. If you have defined placeholders, you need to assign their values here. When you run the session, you are telling TensorFlow to execute the computation graph.
    • syntax: run(fetches, feed_dict=None, options=None, run_metadata=None) => Runs operations and evaluates tensors in fetches. This method runs one "step" of TensorFlow computation, by running the necessary graph fragment to execute every Operation and evaluate every Tensor in fetches, substituting the values in feed_dict for the corresponding input values. options and run_metadata are not used for our purposes.
    • The fetches argument may be a single graph element, or an arbitrarily nested list, tuple or dict containing graph elements at its leaves. A graph element can be one of the following types (there are few more, but we list 2 that we mostly use: Operation and Tensor):
      • A tf.Operation. The corresponding fetched value will be None.
      • A tf.Tensor. The corresponding fetched value will be a numpy ndarray containing the value of that tensor. This is important to note that the fetched value is not Tensor but numpy ndarray.
    • The optional feed_dict argument allows the caller to override the value of tensors in the graph. Each key in feed_dict can be a tf.Tensor, the value of which may be a Python scalar, string, list, or numpy ndarray that can be converted to the same dtype as that tensor. Each value in feed_dict must be convertible to a numpy array of the dtype of the corresponding key.
    • The value returned by run() has the same shape as the fetches argument, where the leaves are replaced by the corresponding values returned by TensorFlow.
    • ex: a = tf.constant([10, 20]); b = tf.constant([1.0, 2.0])
      • v = sess.run(a) => Here a is evaluated. Since "fetches" arg is a single graph element of type Tensor, return value is numpy array [10, 20]
      • v = sess.run([a, b]) => Here a and b are evaluated. Since "fetches" arg is a list of 2 graph elements of type Tensor, return value is a list with 2 numpy array [10, 20] and [1.0, 2.0]
  6. Close the session, using method "close()" on that session. A session may own resources, so by closing the session, we release the resources.

So, why do we do these complicated steps of making a graph, and then running them via session? Most likely, this is to map these computations to different nodes of CPU/GPU/TPU, etc. We keep on defining various operations of the final graph (i.e add, mul, etc to calculate cost function), and then map it to various nodes of GPU/TPU. Once we've mapped these, then in the very last step, we simply provide i/p values to the i/p nodes, and processor can easily compute the node value for all nodes in the graph.

ex: Below is an example where we multiply 2 constants to get the result.

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b) # we can also write c=a*b, as operators are overloaded
print(c) # You will not see result for c=20, but instead get this Tensor => "Tensor("Mul:0", shape=(), dtype=int32)". 
#This says that the result is a tensor that does not have the shape attribute, and is of type "int32".

sess = tf.Session() # In order to actually multiply the two numbers, you will have to create a session and run it.
print("result = ",sess.run(c)) # Now, we run the session for computation graph c. we get the result 20.
print(c) # This will again not show 20, but show the Tensor.
#Reason is that return value of sess.run(c) has to be grabbed to get the value of c.
print("res =", sess.run(c**2) => We could write any operation, i.e c**2 and it would compute c^2=400. This prints 400.
#Or o/p of sess.run(c) can be stored in a var and printing that var will show the value 20. i.e
tmp=sess.run(c)
print(tmp) # This will print value 20, as o/p is stored in tmp, which won't change now.
sess.close()
 

ex: Here we solve eqn y=m*x+c. Here y is computed for constant m, c and var x varying from 10 to 40.

m= tf.Variable( [2.7], dtype=tf.float32) #define m as var with initial value 2.7

C= tf.Variable( [-2.0], dtype=tf.float32) #define C as var with initial value -2.0

X=tf.placeholder(tf.float32) #X is defined as placeholder, as it's value is going to be assigned during session runtime later.

Y = m*X + C #we write the eqn directly instead of using func add, mul, as this involves single value computation

sess = tf.Session() #create session

init = tf.global_variables_initializer() #func to Initialize all the variables, as var initialization takes place only via this func

sess.run(init) #running session for init

print(sess.run( Y, feed_dict = {X :[10, 20, 30, 40]})) #Running session for computing Y. feed_dict func used to feed X data. We see o/p as: [ 25.  52.  79. 106.]

sess.close()

 In both the examples above, we see a lot of warnings related to many of these names being deprecated. This may be possibly due to TensorFlow 2 (v2) now in release, so earlier names for TensorFlow 1 (v1) have been moved to tf.compat.v1.* (compatibility version v1) so as to not cause confusion. I see these warnings: (compat.v1 needs to be added to get rid of the warnings). you can add these to get rid of the warnings. I haven't tried that yet (UPDATE: on trying that, a lot of other things broke, so not worth it to fix these warnings).

WARNING:tensorflow:From ./test_tf.py:14: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From ./test_tf.py:25: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From ./test_tf.py:28: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

Running Sessions: We ran session above using one of the ways to run sessions. There are actually 2 ways to run sessions:

Method 1: This is the method we used above

sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session

Method 2: This is the method that is more concise (requires less lines of code)

with tf.Session() as sess: 
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)

TensorFlow Functions: We'll look at few important functions used in tf. Many functions take i/p as Tensors or numpy ndarray with no issues.

tf.one_hot() => This returns a 1 hot tensor (tensor is array or list). One hot is used very widely in AI in multi class classification. Here we have a o/p vector Y which has a classification for each i/p vector. As an example, consider a picture which can be cat, dog, mouse, others. So, given a picture it can be  any of these 4 classes. We give these classes number as cat=0, dog=1, mouse=2 others=3. In AI, we write o/p vector for 6 different pictures as Y=[1 3 0 2 0 2] => This implies 1st picture is dog(class=1) 2nd picture is others(class=3) and so on. However, we can't this Y vector directly in NN equations, as we need to write it in form which says whether each picture is cat/not-cat, dog/not-dog, mouse/not-mouse, other/not-others. This is the same form as what we wrote for 2 class classification, which said if picture is cat or not-cat.

To write in above form, we need to have 4 cols for each picture, each of which says whether it's cat/not-cat, dog/not-dog, mouse/not-mouse, other/not-others. 

So, Y(1-hot) =

[ [ 0 1 0 0 ] => 1st row is for picture 1, says that picture is not-cat, is dog, is not-mouse and is not-others (implying it's a dog picture, but written in 1 hot form. It's 1 for dog, and 0 for others)

  [ 0 0 0 1 ]

  [ 1 0 0 0 ]

  [ 0 0 1 0 ]

 [ 1 0 0 0 ]

 [ 0 0 1 0 ] ] => 6th row is for picture 6, says that picture is not-cat, is not-dog, is mouse and is not-others (implying it's a mouse picture, but written in 1 hot form. It's 1 for mouse, and 0 for others)

syntax: tf.one_hot (indices, depth, on_value=None, off_value=None, axis=None, dtype=None, name=None) => Of all args, important ones are indices and depth. Indices is a tensor (or an array or list) and the locations represented by indices in indices take value on_value (default=1), while all other locations take value off_value (default=0). depth is a scalar (i.e a single number) that defines the depth of 1 hot dimension.

If indices is a scalar the output shape will be a vector of length depth.

If the input indices is rank N, the output will have rank N+1. The new axis is created at dimension axis (default is -1: which means the new axis is appended at the end). If indices is 1D (i.e rank=1), then for axis=-1, the shape of output is (length_of_indices X depth), while for axis=0, the shape of output is (depth X length_of_indices)

ex:

indices = np.array([1,2,3,0,2,1]) #Here indices is a 1D array with rank=1

depth=4 #depth is needed since we don't know how many total classes we have for classification. indices may not contain all the classes.

one_hot = tf.one_hot(indices, depth) #one_hot is a 2D tensor of shape (indices, depth) => since axis is not specified, it's set to default of -1.

print("one_hot = \n", one_hot) => This prints the Tensor w/o computing it. So, it prints => Tensor("one_hot:0", shape=(6, 4), dtype=float32). We need to run session in order to compute the graph.

sess = tf.Session() #create session

one_hot = run.sess(tf.one_hot(indices, depth))

sess.close() #session can be closed once computation graph has run.

print ("one_hot = \n", one_hot) => This prints the one_hot tensor vector as below. one_hot is a 2D tensor, with shape (6,4)

one_hot = 
[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]]

 tf.zeros() / tf.ones() => These functions initialize a vector to zeros or ones. It takes in a shape and return an array of dimension shape full of zeros and ones respectively. Here shape is  a list of integers, a tuple of integers, or a 1-D Tensor. These are same as numpy np.zeros / np.ones except that in numpy shape are in form of tuple (a,b), while in tf, shape can be in form of list or 1D tensor = [a,b]. 

ex: tf.zeros((1,2)) => Here we create 2D tensor with shape=(1,2). NOTE: we specified shape in tf.zeros as (1,2) which is same syntax as numpy. However, we can specify shape as array too, i.e tf.zeros([1,2]). This is what you will see used more commonly.

ex: tf.ones([2, 3], tf.int32) => This returns 2D tensor of shape(2,3) =  [[1, 1, 1], [1, 1, 1]]

ex: tf.zeros([3]) => This returns 1D tensor of shape (3,) = [0. 0. 0.]

tf.convert_to_tensor()  => This converts  Python objects of various types to Tensor objects. It accepts Tensor objects, numpy arrays, Python lists, and Python scalars. "tf.Variable" which is not a Tensor object is converted into Tensor type by using this func.

syntax: tf.convert_to_tensor(value, dtype=None, dtype_hint=None, name=None) => This converts "value" into a tensor.dtype is the lement type for the returned tensor. If missing, the type is inferred from the type of value.

ex: W1 = tf.convert_to_tensor([[1.0, 2.0], [3.0, 4.0]]) => This converts the array into tensor object.

print(y1) => returns "y1= Tensor("Const_4:0", shape=(2, 2), dtype=float32)" => this shows that it's a Tensor now with type float32 which is inferred from the i/p type.

tf.train.GradientDescentOptimizer(learning_rate = 0.005).minimize(cost) =>  tf.train.GradientDescentOptimizer is an Optimizer that implements the gradient descent algorithm. It uses the learning rate specified. It has a method "minimize" which adds operations to minimize loss by updating var_list. Minimize() method simply combines calls compute_gradients() and apply_gradients(). This whole function with it's method is called in Tensorflow to do back propagagtion and parameter update for 1 iteration on the "loss" equation. We iterate over it multiple times to get optimal "weights" to get lowest loss.

syntax of minimize: minimize(loss, var_list=None) => loss is a Tensor containing the value to minimize. var_list is an Optional list or tuple of "tf.Variable" objects to update to minimize loss. Defaults to the list of variables collected in the graph under the key GraphKeys.TRAINABLE_VARIABLES.

For Adam optimizer, we can use AdamOptimizer.

ex: optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

tf.nn.softmax_cross_entropy_with_logits: => This function computes softmax cross entropy between logits and labels. logits are the o/p of last nn layer, before it feeds into the exponential function. So, Z[L] is the logit. It's a matrix of shape (c,m), where c=num of classes, m=num of examples. It feeds into sigmoid function for a binary classifier to yield a[L]. For Multi class classifier, it feeds into exponent function to yield a[L] which is a matrix of same shape as Z[L] . Labels is a matrix of same shape as Z[L]. Softmax cross entropy is the loss function that is defined in AI section, i.e Loss(Y, Yhat) = - ∑ Yj * loge(Yhat(j)) where Yhat = a[L] and Y is output labels vector. Backpropagation will happen into both logits and labels.

syntax: tf.nn.softmax_cross_entropy_with_logits(labels, logits, axis=-1, name=None) => It returns a Tensor that contains the softmax cross entropy loss. Its type is the same as logits and its shape is the same as labels except that it does not have the last dimension of labels. So, loss returned is a vector where each entry is for each example.

ex: Here it's a (2,3) matrix for logits and lables. As per syntax, logits and labels are transposed, so shape of logits and labels feeding into this function is (m,c). So, below we have data for 2 examples, and 3 classes. classes don't need to be 1-hot, they can be probability values that add up to 1.

logits = [[4.0, 2.0, 1.0], [0.0, 5.0, 1.0]] => Here Z[L] for 1st example is [4,2,1], while for 2nd example it's [0,5,1]
labels = [[1.0, 0.0, 0.0], [0.0, 0.8, 0.2]] => Here probability of 3 classes for 1st example is 1,0,0, while for 2nd example it's 0,0.2,0.8.
print(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))
Tensor("softmax_cross_entropy_with_logits_sg/Reshape_2:0", shape=(2,), dtype=float32) => o/p shape is 1D vector with 2 entries, 1 for each example
print("y=",sess.run(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)))
y= [0.16984604 0.82474494] => This is the loss value computed for the 2 examples.

This how it's computed: NOTE: log is with base e(i.e it's ln and NOT log with base 10)
Loss for 1st example = -( 1.0*log(e^4/(e^4+e^2+e^1)) + 0.0*log(e^2/(e^4+e^2+e^1)) + 0.0*log(e^1/(e^4+e^2+e^1)) ) = - ( 1*log(54.6/54.6+7.4+2.7) + 0 + 0 ) = -log(54.6/64.7)= - (-0.17) = 0.17
Loss for 2nd example = - ( 0.0*log(e^0/(e^0+e^5+e^1)) + 0.8*log(e^5/(e^0+e^5+e^1)) + 0.2*log(e^1/(e^0+e^5+e^1)) ) = - ( 0 + 0.8*log(148.4/1+148.4+2.7) + 0.2*log(2.7/1+148.4+2.7) ) 
= - ( 0.8*(log(148.4/152.1) + 0.2*log(2.7/152.1) ) = -( 0.8*(-0.03) + 0.2*(-4.07) ) = 0.828

This computation matches closely with what's computed by the softmax function.
 
logits = [[4.0, 2.0, 1.0], [0.0, 5.0, 1.0]]
labels = [[1.0, 0.0, 0.0], [0.0, 0.8, 0.2]]
tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)

tf.reduce_mean: This computes mean across all entries of an array (same as numpy np.mean). This is used in conjunction with above softmax function to calculate final cost. Final cost is mean of all the costs (i.e sum of al the costs for "m" examples, divided by "m").

ex: tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)))  => This returns y= 0.4972955, which is the mean of entries in array returned above = (0.17+0.82)/2 = 0.49

 --------

LINKS:

This section is for putting random links to websites, articles or anything else that have been useful or interesting to me:

 


 

General Websites:

www.wikipedia.org: => Number 1 website for all my learning needs. Whether it's mathematical, geographical, complex advanced science material, wikipedia has always given me the best material to start with.

www.slickdeals.net: => This is number 1 website for all your deals. However, you may not want to but anything by clicking the link from slickdeals, as you don't make any cashback from here. There are cashback websites that give you cash for buying things on internet, so use those for making money. Two reliable ones that I use are topcashback.com and rakuten.com

www.doctorofcredit.com => This is another very good website for finding any deal that makes you money. It's different than slickdeals, in that it puts all deals (financial, credit cards, cashback websites, etc) that are not necessarily sponsored. You will never find these kind of deals on slickdeals unless they are sponsored. The comments on this site also help on lot, in helping you decide if you should pursue a deal or not.

www.stallman.org => This is the site of Richard M Stallman (rms), who started the open source revolution (Open Source Foundation and GNU project). "Basic human freedom" is the cornerstone of his views.

 


 

Educational sites:

3blue1brown: I would have never known about this site, had I not run into it for an AI video search. It's a channel+website started by a Stanford guy named Grant Sanderson. Absolutely amazing !! If you can find your topic on his video, you don't need to watch any other video for that topic, that's how good they are. Lots of topic on Maths, AI, Crypto, etc. Not even sure how can 1 person have absolute expertise in such unrelated fields. Learning a lot:

Personal site with videos: https://www.3blue1brown.com

You Tube Video channel (named 3blue1brown): https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw

 


 

Puzzles:

Maths puzzles for kids: https://www.weareteachers.com/10-magical-math-puzzles/

Google interview puzzle about finding fastest 3 horses among 25 horses in minimum number of races => https://www.youtube.com/watch?v=i-xqRDwpilM 

 


 

Random Articles on web:

https://getpocket.com/explore/item/the-feynman-technique-the-best-way-to-learn-anything => Very nice technique discussed to learn something = break complex things into simplest things

https://getpocket.com/explore/item/whoa-this-is-what-happens-to-your-body-when-you-drink-enough-water => benefits of drinking lots of water. Make sure you drink at least 2 litres of water everyday. 1 litre of water should be drunk every morning on empty stomach after you wake up, and before you use the restroom.

https://www.propublica.org/article/what-are-2020s-tax-brackets-and-will-i-get-audited => Tax details

https://getpocket.com/explore/item/mental-models-how-to-train-your-brain-to-think-in-new-ways => mental models to train your brain

https://getpocket.com/explore/item/indian-employers-are-stubbornly-obsessed-with-elite-students-and-it-s-hurting-them?utm_source=pocket-newtab => somewhat interesting take on hiring under performers from non elite colleges in India 

https://getpocket.com/explore/item/work-stress-how-the-42-rule-could-help-you-recover-from-burnout  => amount of rest your body needs is 10 hr/day.

https://getpocket.com/explore/item/why-being-lazy-and-procrastinating-could-make-you-wildly-successful => How laziness and procrastination is so awesome !! My Favorite article. Let me go back to ...

https://www.bbc.com/worklife/article/20210222-how-a-beginners-mindset-can-help-you-learn-anything => How "beginner's mindset" helps us learn anything

 


 

YouTube Channels:

Robert Greene: Channel => https://www.youtube.com/@RobertGreeneOfficial

Andrew Huberman: A neuroscience professor from Stanford. Tons of podcasts. Channel => https://www.youtube.com/@hubermanlab

 

 


 

Course 2 - week 3 - Hyperparameter tuning, Batch Normalization and Programming Frameworks

This week's course is divided into various sections. The 1st 2 sections are continuation of previous week material. That last section is about using Programming frameworks, whcih is totally new and will require some time to undersatnd it.

Hyperparameter tuning:

There are various hyper parameters that we saw in previous section, that require to be tuned for our NN. These parameters in terms of their effect on NN performance are:

1. learning rate (alpha): Most important hyper parameter to tune. Not choosing this value properly may cause large oscillations in optimal cost function.

2. Mini batch size, number of hidden units and momentum (beta): These are second in importance.

3. Number of layers (L), learning rate decay, Adam parameters (beta1, beta2, epsilon): These are last in importance. Adam hyperparameters (beta1=0.9, beta2=0.999, epsilon=10^-8) are usually not tuned, as these values work well in practice.

 It's hard to know in advance what hyper parameter values will work, so we try random values of these hyper parameters from within a bounding box (may be changing 2 at a time or 3 or even more at a time). Once we find a smaller bounding box, where hyper parameters seemed to perform better, we use "coarse to fine" technique to start trying finer values, until we get close to optimal hyperparameters. 

We need to choose the scale of where to sweep the hyper parameters very carefully, so that we cover the whole range. For ex, to sweep learning rate alpha, we sweep hyperparameter on log scale from 0.0001 (10^-4) to 1 (10^0) on a log scale in steps of x10 (i.e 10^-4, then 10^-3, then 10^-2, then 10^-1 and finally 10^0).

 There are 2 approaches to hyper parameter tuning:

1. caviar approach: We use this if we have many computing resources available. We train many NN in parallel with different hyper parameter settings, and see which ones work. Caviar is a fish, and how they care for their babies, is to have too many of them, and just let the best ones survive.

2. panda approach: Here, we just run one NN model with a set of hyperparameters. However as time pass, we tune hyperparameters and see if they are making the performance of NN better or worse, and keep on adjusting hyper parameters every day or so. So, here we babysit just 1 model, similar to how panda do with their babies. They don't produce too many babies, but keep on watching their one baby with all effort and making them stronger.

 

Batch Normalization:

 Here, we normalize inputs to speed up our NN. We subtract the mean from inputs, and then divide it by their variance (or should be std deviation, since variance is still in square form). That way inputs gets more uniformly distributed around a center, which causes our cost function to be more symmetric, resulting in faster execution when finding minima.

For a deep NN, we can normalize inputs to each layer. Input to each layer is o/p of activation func, a[l]. However, instead of normalizing a[l], we normalize Z[l].

 μ = 1/m * Σ Z[i]

σ2 = 1/m * Σ (Z[i] -  μ)2

Znorm[i] = (Z[i] -  μ) / √(σ2+ε)

Now instead of using Z[i] in our previous NN eqn, we use Znorm[i] which is the normalized value.

If we want to be more flexible in how we want to use Z[i], we may define learnable parameters gamma and beta, which allows the model to choose either raw Z[i], normalized Znorm[i] or any other intermediate value of Z[i]. This can be achieved by defining new var Z˜[i] (Z tilde)

Ztilde[i]   = γ*Znorm[i]  + β => by changing values of gamma and beta, we can get any Ztilde[i]  . For ex, if γ=1 and β=0, then Ztilde[i]   = Znorm[i]. If γ=√(σ2+ε), and β=μ, then Ztilde[i]   = Z[i]

Since gamma and beta are learable parameters (just like weights), we really don't have to worry about the most optimal values of gamma and beta. The gd algo would choose the values that gives the lowest cost for our cost function. Note that each layer has it's own gamma and beta, so they can be treated just like weights for each layer. gd now calculates γ[i]  and β[l], on top of W[l] and b[l]. However since we are normalizing, we will see that b[l] is cancelled out, so we can omit b[l]. So, we have 3 parameters to optimize: γ[i] , β[l] and W[l] for each layer l. We can extend this to mini batch technique too, with all gd algorithms like momentum, adam, etc.

Batch norm works because it makes NN computation more immune to covariate shifts. The i/p data and all other intermediate i/p data are always normalized. It ensures that mean and variance of all i/p will remain the same, no matter how the data moves. This makes these values more stable even if i/p shifts.

Multi Class classification:

Binary classification is what we have used so far, which classifies any picture into just 2 outcomes = cat vs non-cat. However, we can have multi class classification, where o/p layer produces multiple outputs, i.e if the picture is cat, dog, cow or horse (known as 4 class classification). It outputs the final probability of each of the classes, and the sum of these probabilities is 1.

Here the o/p layer L, instead of generating 1 o/p, generates multiple o/p values one for each class. So, the o/p layer Z[L], instead of being a 1x1 matrix in binomial classification, is a Cx1 matrix now for multi class classification, where C is the number of classes to classify. Previously activation function for o/p layer a[L] used to be sigmoid function which worked well for binomial classification. However, now with multi class classification, we need a different activation function for o/p layer. We choose activation function to be exponent function normalized by sum of exponents.

For 2 class classification,we use sigmoid func:

Sigmoid function σ(z) = 1/(1+e^-z) = e^z/(1+e^z)

prob for being in class 0 = yhat = σ(z) and

prob for being in class 1 (not in class 0, or class=others) = 1 - yhat = 1 - σ(z) = 1/(1+e^z)

We generalize above eqn for C classes. We use exponent func in o/p layer (also called as softmax layer)

exponent func = e^zk/(e^z1 + e^z2 + ... e^zc) where C is the number of classes, k is the kth class

prob for being in class 0 = yhat[0] =  e^z1/(e^z1 + e^z2 + ... e^zc) 

prob for being in class 1 = yhat[1] =  e^z2/(e^z1 + e^z2 + ... e^zc) 

...

prob for being in class C-1 = yhat[c] =  e^zc/(e^z1 + e^z2 + ... e^zc) 

So, probabilities all add up to 1. Matrix a[L] or yhat is CX1 matrix

For C=2, multiclass reduces to binary classification. For implementation of multi class, the only difference in algo would be to compute o/p layer differently, and then do back prop.

For 2 class, if we choose e^z2=1, then we get

prob for being in class 0 = yhat[0] =  e^z1/(e^z1 + e^z2) = e^z1/(e^z1 + 1) 

prob for being in class 1 = yhat[1] =  e^z2/(e^z1 + e^z2) = 1/(e^z1 + 1) 

Which is exactly what we got by using our sigmoid function earlier. so, exponent func and sigmoid func in o/p layer give the same result, implying sigmoid was just a special case of exponent func.

NOTE: In binary classification, we had an extra function at o/p which converted yhat to 0 or 1 depending on if it's value was greater < 0.5 or not. That was called hard max. Here in multi class classification, we don't have that extra function. We just stop once we get the probabilties of each class. This is called softmax.

Logits: In multi class classification, computed vector Z = [Z1, Z2 ... ZC] are called logits. The shape of logits is (C,m) where C=number of classes, m=number of examples

Labels: In multi class classification, given vector Y = [Y1, Y2 ... YC] are called labels. Each Y1, Y2 is 1 hot, so it has C entries, instead of just one entry. The shape of labels is same as that of logits i.e shape = (C,m)

Cost eqn: For multiclass classification, loss function is same as for binary classification with some modification.

Programming Frameworks

Instead of writing all these NN functions ourselves (forward prop, backprop, adam, gd, etc), we have NN frameworks, which provide all these functions to us. Tenosrflow is one such framework. We'll use tensorflow in python for our exercises. You can get introductory material for tensorflow including installation in "python - tensorflow" section. Once you've completed that section, come back here.

Programming Assignment 1: here we have 2 parts. 1st part, we learn basics of tensorflow (TF), while in 2nd part, we build a NN using TF.

Here's the link to pgm assigment:

TensorFlow_Tutorial_v3b.html

This project has 3 python pgm, that we need to understand.

A. tf_utils.py => this is a pgm that defines following functions that are used in our NN model later:

tf_utils.py

  • load_dataset() => It loads test and training data from h5 files, similar to function used in section "1.2 - Neural Network basics - Assignment 1". The only difference is that Y is now a number from 0 to 5 (6 numbers), instead of being a binary umber - either 0 or 1. This is because we are doing a multi class classification here. Each picture is a sign language picture representing number 0, 1, 2, 3, 4 or 5.
    • Training set: 1080 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (180 pictures per number). X after flattening is 2D vector with shape = (12288, 1080), while Y after flattening is 2D vector with shape = (1, 1080)
    • Test set: 120 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (20 pictures per number). X after flattening is 2D vector with shape = (12288, 120), while Y after flattening is 2D vector with shape = (1, 120)
  • random_mini_batches() => This creates a lit of random mini batches from set X,Y. These random mini batches are shuffled, and each have a size as specified in argument.
  • convert_to_one_hot(Y, C) => This returns a 1 hot matrix for given o/p vector Y, and for "C" classes. 1 hot vector is needed for multi class classification.
  • predict(X, parameters) => Given i/p picture X, and optimized weights, it returns the prediction Yhat, i.e what number from 0 to 5 is the picture representing
  • forward_propagation_for_predict(X, parameters) => Implements the forward propagation for the model. It returns Z3, which is the o/p of last linear unit (before it feed into the softmax function to yield a[3])

B. improv_utils.py => this pgm is not used anywhere, so you can ignore it, This is a pgm that defines all the functions that are used in our NN model later. This has all functions that were in tf_utils.py, as well as all the func that are going to be defined in test_cr2_wk3.py. So, this pgm is basically solution to the assignment, as all the functions are written here, that we are going to write in our assignment later. You should not look at this pgm at all, nor should you use it (unless you want to check your work after writing your own functions).

improv_utils.py

C. test_cr2_wk3.py => Below is the whole pgm,

test_cr2_wk3.py

This pgm has 2 parts to it. In 1st part, we just explore TF library, while in 2nd part, we write the actual NN model using TF.

Part 1: This is where we explore TF library. All i/p and o/p of these examples is Tensor Data. NOTE: we don't use any of these functions below in our NN model that we build in part 2. This is just for practise.

  • comput loss eqn: simple loss eqn value is computed by creating a TF variable for loss.
  • multiply using constant: multiplying 2 constant numbers and printing result.
  • multiply using placeholder: Here we feed value into placeholder at runtime, and compute 2*x.
  • linear_function(): Here we compute Y=W*X+B, where W,X,B and Y are all Tensor vectors (i.e Matrices) of a pre determined shape
  • sigmoid(z): Given i/p z, compute sigmoid od z.
  • cost(logits, labels): This computes cost using tf func "tf.nn.sigmoid_cross_entropy_with_logits()".  This calculates cost= - ( y*log(sigmoid(z)) + (1-y)*log(1-sigmoid(z)) ). This returns a vector with 1 entry for each logits/label pair. When you have "m" examples for each logit/label, then it computes summation and mean. However in NN model that we build later, we'll be using "tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits())", which works for multiclass classification. This func is explained in Python-TF section.
  • one_hot_matrix(labels, C): This returns a 1 hot matrix for given labels, and for "C" classes.
  • ones(shape) => creates a Tensor matrix of given shape, and initializes it with all 1.

 

Part 2: This is where we build a neural network using tensorflow. Our job here is to identify numbers 0 to 5 from sign language pictures. We implement a 3 layer NN. The model is LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX.

Below are the functions defined in our pgm for part 2:

  • create_placeholders() => creates placeholders for i/p vector X and o/p vector Y
  • initialize_parameters() => initializes w,b arrays. W is init with random numbers, while b is init with 0.
  • forward_propagation(X, parameters) => Given X, w, b, this func calculates Z3 instead of a3 (z3 is the output o last NN layer, which feeds into the softmax(exponent) function)
  • compute_cost(Z3, Y) => This computes cost (which is the log function of A3,Y). A3 is computed from Z3, and cost is calculated as per loss eqn for softmax func. We use following TF func for computing cost: tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...)). logits=Z3, while labels=Y.
  • backward propagation and parameter update: This is done using a TF func "tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)". This is explained in TF section.The notes talks about TF func "tf.train.GradientDescentOptimizer" but we use "tf.train.AdamOptimizer" for our exercise. This func is instantiated directly within the model, as it's a built in func (not a user defined func that we need to define or write function for)
  • predict() => Given input picture array X, it predicts Y (i.e whether pic is cat or not). It uses w,b calculated using optimize function. We can provide a set of "n" pictures here in single array X (we don't need to provide each pic individually as an array). This is done for efficiency purpose, as Prof Andrew explains multiple times in his courses.
  • model() => This is the NN model that will be called in our pgm. We provide both training and test pictures as 2 big arrays as i/p to this func. This model has 2 parts. First it defines functions, and then it runs(calls) them It inside a session. These are the 2 parts:
    • Define the functions as shown below:calls above functions as shown below:
      • defines func create_placeholders()  for X,Y.
      • defines func initialize_parameters() to init w,b randomly
      • Then it defines forward_propagation() to compute Z3
      • Then it defines compute_cost() to compute total cost given ze and o/p labels Y.
      • then it defines TF func "tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)" to do backward propagation to update paramer=ters for 1 iterayionptimize() to optimize w,b to give lowest cost across the training set.
      • defines an inti func "tf.global_variables_initializer()". This is needed to init all variables. See in TF section for details
    • Now it creates a session, forms a loop, and calls the above functions
      • start the session. Inside the session. run these steps
        • Run the init func defined above => "tf.global_variables_initializer()".
        • Make a loop and iterate below func for "num_of_epoch" times. It's set to 1500. We will change it to 10,000 too and see the impact on accuracy.
          • Form minibatches of X,Y and shuffle them
          • iterate thru each minibatch
            • call these two func "tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)" and "compute_cost" for each minibatch. Here, we don't need to explicitly call "compute_cost ", since running "minimize(cost)" will call the func compute_cost any way. The reason we do it, is because we want to get the "cost" o/p returned by "compute_cost" func for our plotting of cost vs iterations.
            • We add the cost from each iteration to "total_cost" and divide it by number of examples, to get avg cost
        • Now plot "total_cost" vs "num f=of iterations". This shows how the cost is going down as we iterate more and more.
        • Now it runs "parameters" node again to get values of parameters. NOTE: that running "parameters" again doesn't run func "initialize_parameters() " again, but instead just returns the computed values for that node
        • It then calls tf functions to calculate prediction and accuracy for all examples in test and training set. Accuracy is then reported for all pictures on what they actually were vs what our pgm predicted.

 Below is the explanation of main code (after we have defined our functions as above):

  1. We get our datset X,Y by calling load_dataset().
  2. Next we can enter index of any picture, and it will show the corresponding picture for our training and test set. This is for our own understanding. Once we have seen a few pictures, we can enter "N" and the pgm will continue.
  3. Now we flatten the array returned and normalize it. We also use "one_hot" function to convert labels from one entry to a one hot entry, since our labels need to be one-hot format for our softmax func to work.
  4. Now we call our function model() defined above. We provide X,Y training and testsing arrays (which are not Tensors, but are numpy arrays). We see that these numpy arrays are used as Tensor i/p to many functions above. I guess it still works, as conversion from numpy to Tensors takes place automatically when needed.
  5. In above exercise, we used a 3 Layer NN, with fixed number of hidden untis for each layer. We ran

Results:

On running above pgm, we see these results:

  • On running the above model with 1500 iterations get a training accuracy of 70%.
    • Cost after epoch 0: 1.913693
    • Cost after epoch 100: 1.049044 .... => If you get "Cost after epoch 100: 1.698457", that means you are still using GradientDescentOptimizer". Switch to "tf.train.AdamOptimizer".
    • Cost after epoch 1400: 0.053902
  • When we increase the number of iterations to 10,000, our training accuracy goes to 89%. See how cost keeps on going down and then kind of flattens out.
    •  Cost after epoch 1400: 0.053902 ...
    • Cost after epoch 2500: 0.002372
    • Cost after epoch 5000: 0.000091
    • Cost after epoch 9900: 0.000003

 

Programming Assignment 2: This is my own programming assignment. It's not part of the lecture series. Here, I took an example from one of the earlier programming assignments, and rewrote it using TF to see if I could do that. It does work, but not sure if everything is working correctly (as the cost is different from previous assignment, and there is no way to verify the accuracy)

test_cr2_wk2_tf.py => Below is the whole pgm, This pgm is copied from course 2 week 2 pgm => course2/week2/test_cr2_wk2.py. We wrote same pgm with tensorflow functions now. We implement it for batch gd only (not other ones)

test_cr2_wk2_tf.py

We implement a 3 layer NN. The model is LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX.

Even though this is a binary classifier, we still used a softmax implementation, as binary is a special case of softmax with number of classes=2. All the functions that we defined in Pgm assignment 1 are the same here. The only diff is in initialize_parameters() func, and model() func. Differences are explained below:

  • initialize_parameters() => here we allowed args to be passed for "number of hidden units" for each layer, so that we can keep it consistent with our "course 2 week 2 pgm". That also allows us to play around with different number of hidden units for each layer, and observe the impact. The default number of hidden units for 3 layers is: [2, 25, 12, 2] where 1st entry 2 is for input layer.
  • model() => Here, the definition of functions is same as in assignment 1. These are few differences:
    • Test set: Since we have only training set in this example, we don't have arg for "test_set" in model() func. model() func is copied from course2 week 2 and is modified wherever needed to work for TF.
    • optimizer: We call "tf.train.GradientDescentOptimizer" instead of Adam Optimizer. We could try both.This is just to keep it consistent with "course 2 week 2" pgm.
    • cost_avg: One other diff is that we don't do "cost_avg" by dividing by "m", as we already averages when we divide it by "mini_batch_size" within the loop.
    • All other part of model() is same, except that we don't evaluate test accuracy (since there's no test set)

Now we run the main pgm code the same way as in assignment 1. These are the diff:

  • We load the red/blue dataset (by using diff func load_dataset_rb_dots()). This is needed since the dataset here is different and is created by writing python cmds.
  • We convert Y label to 1 hot. This is needed for softmax function as explained earlier. We convert Y = [ 0 1 0] into Y(one_hot) = [ [1 0] [0 1] [1 0] ] where 0=red, 1=blue
  • We now call model() with desired number of hidden units, and it gives us the prediction accuracy.

Results:

This is the result we get (with the default settings we have in our pgm):

Cost after epoch 0: 0.051880
Cost after epoch 1000: 0.038114
Cost after epoch 2000: 0.030764
Cost after epoch 3000: 0.027093
Cost after epoch 4000: 0.025386
Cost after epoch 5000: 0.022814
Cost after epoch 6000: 0.021766
Cost after epoch 7000: 0.021067
Cost after epoch 8000: 0.019954
Cost after epoch 9000: 0.019063
Parameters have been trained!
Train Accuracy: 0.9166667

 

Summary:

Here we built a 3 layer NN using Tensor Flow. TF is not easy or intuitive, so I'm lost too, on why somethings work with tensors, some with numpy, and what run session does, and s on. But eventually it did work. The main take away is that multi class classification worked just as easily as binary classification, and got us 90% accuracy if trained for long enough. Our optional second assignment, helped us to see how we can transform a regular NN pgm written using numpy into TF NN pgm.