Python GUI: tkinter

python has gui support also, provided via tkinter package. "tkinter" pkg just provides an i/f to Tcl/Tk gui, so all cmds from Tcl/Tk can be used to create gui. It is avilable for both python2 and python3, though name of package changes slightly (In python 2, it's named Tkinter (capital T), while in python3, it's named tkinter (small t). The majority of its standard library is provided in the python*-libs package, which should be installed automatically along with python* executable (in /usr/lib64/python*/ dir). python-tkinter which is the gui lib, is not installed by default, as it's big. So, we need to install it separately.

sudo yum install python36-tkinter => this installs tkinter for python3.6. I was constantly getting dependency errors while downloading tkinter for python34, so I chose to install and download python36 and python36-tkinter.

After installation, check to see if it got installed correctly. Type these 2 cmds. python2 will give an error for tkinter, while python3 will bring up a new box.

python -m tkinter => shows error msg => /usr/bin/python: No module named tkinter. This is because we didn't install tkinter for python2
python3 -m tkinter => brings up a new box which shows "Tcl/Tk version 8.5"

Object Oriented Programming:

Classes, objects, etc are part of Object Oriented prgramming (OOP) in python, which isn't really needed to learn. You can do almost anything with regular programming in python, without using OOP at all. So, this section is mostly optional. Many times you need to know OOP in pyhton, so that you can understand python pgms written in OOP (A lot of new large scripts in Workplace are increasingly written in OOP by cad folks).

class: an object that defines a set of attributes.
ex:

class Employee: => creates new class called Employee
   'Common base class for all employees'
   empCount = 0 => class var which is shared among all inst of this class. This can be accessed as Employee.empCount from inside the class or outside the class.

   #The first method __init__() is a special method, which is called class constructor or initialization method that Python calls when you create a new instance of this class.
   def __init__(self, name, salary):
      self.name = name
      self.salary = salary
      Employee.empCount += 1
   
   #declare other class methods like normal functions with the exception that the first argument to each method is self. Python adds the self argument to the list for you; you do not need to include it when you call the methods.
   def displayCount(self):
     print "Total Employee %d" % Employee.empCount

"This would create first object of Employee class"
emp1 = Employee("Zara", 2000) => create instances of a class. calls ___init__ method
"This would create second object of Employee class"
emp2 = Employee("Manni", 5000) => we can change any attr as follows: emp2.age=8
emp1.displayEmployee()
tot_emp = Employee.empCount

overloading, inheritence etc are allowed with classes.

Functions vs methods:

Now that we have seen both functions and method, we may wonder what's the difference b/w the two.

Functions: As we saw earlier, functions may be inbuilt or user defined, or we import them from various modules. We call func by providing optional args, and they may optionally return a value on exit.

ex: ans = log(a) => This is an inbuilt "log" function.

Methods: As we saw above, methods are OO concept, and they are associated with objects/classes. They are same as functions, and can have optional args, return value etc.

ex: arr_shape = my_array.shape() =>  Here my_array is an object of type array, and calls a method named shape() on that object. arr_shape stores the tuple returned. Here shape() has no args, but it can have optional args as for method my_array.redshape(-1), etc.

Differences: Methods serve same purpose as function, except that they work implicitly on object on which they are called. Also, method is only accessible by objects in that class, i.e an object that doesn't belong to the class for which that method is defined, then that method won't work on that object. In short, method is a function that belongs to an object, not everyone can use it.

NOTE: Some methods may modify the input object, since they may operate on it directly. ex: my_arry.sort() sorts the my_array object in place, while np.sort(my_array) copies my_array, sorts it and then returns this copy, so that the original my_array remains intact. However, most methods don't modify the input object passed to it, so there is usually no diff b/w method and function.They operate and return result in new object or variable.

Module method vs Module Function: With most modules, you will see that they provide both function and method equivalent for operating on an object. We could use either one as per our convenience. However, some modules may not provide equiv method for a function. Also, there are many in built functions for which there is no module equivalent, so looks like function is the more generic and complete implementation?? Most tutorials that you find online will provide tutorials on modules, and their usage, but will mix methods and functions, for some operation they will use methods while for some they will use Functions. You should consult python documentation for that module to get both method and function equivalent for those operations.

In NumPy module, we have various functions and methods for same operation.

Transpose: transpose of a matrix has function transpose(), as well as method T. We could use either one, however method is more convenient to use in this case, as it's shorter to write.

1. function: new_arr = np.transpose(Arr) => Here transpose is a function with arg Arr, which is an array

2. method: new_arr = Arr.T => Here Arr array is an object, on which the method T is working to transpose it (T method is for transpose).

Median: No method exists for median in NumPy, although for mean, both method and function equivalent exist. Look in documentation to see what exists.

1. function: y = np.median(my_arr) => returns mean of array object

2. method: y = my_arr.median() => This returns an error "AttributeError: 'numpy.ndarray' object has no attribute 'median'".

Matplotlib: one of the most popular Python packages used for data visualization. It is a cross-platform library for making 2D plots from data in arrays. Matplotlib is written in Python and makes use of NumPy, Matplotlib along with NumPy can be considered as the open source equivalent of MATLAB. MATLAB, is a proprietary programming language developed by MathWorks used for plotting as well as carrying out various kinds of scientific computations. Infact, you don't need to use MatLab or any other open source equivalent of Matlab (not even gnuplot) at all, once you start using Matplotlib. Matplotlib can create static, animated, and interactive visualizations. 

Official matplotlib website: https://matplotlib.org/

Very good tutorial here: https://www.tutorialspoint.com/matplotlib/index.htm

Source code for matplotlib here: https://matplotlib.org/_modules/matplotlib.html

As you can see, there are lotsof functions defined, that can be called from this library.

Installation: on pthon3.6 version. NOTE: All our examples are with python3 (version 3.6).

1. CentOS:

$ python3 -m pip install matplotlib => this installs it for specific version of python (here for python3, which is pointing to python3.6). We can verify that it got installed for python 3.6 by looking in dir for python 3.6 library.

$ ls /usr/local/lib64/python3.6/site-packages/matplotlib/* => We see this "matplotlib" dir in python3.6 lib dir, so it's installed correctly

Usage:

Pyplot: A very important module of matplotlib is pyplot.

matplotlib.pyplot is a collection of command style functions that make Matplotlib work like MATLAB. Matplotlib is the whole package; matplotlib.pyplot is a module in Matplotlib.

Here's the full source code for pyplot: https://matplotlib.org/_modules/matplotlib/pyplot.html

Here's introductory pyplot tutorial: https://matplotlib.org/3.1.0/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py

import pyplot: Before we can use pyplot, we need to import pyplot in python script:

import matplotlib.pyplot as plt => This imports pyplot module as plt (plt is just a conevention, we can use any name)

from matplotlib import pyplot as plt => This is an alternative way of importing pyplot

Pyplot functions: Each Pyplot function makes some change to a figure (i.e xlabel, title, plot, etc).These are few common ones:

pyplot.imread() => Reads an image from a file into an array. Similarly pyplot.Imsave() saves  an array as in image file, while pyplot.Imshow() displays an image on the axes.

pyplot.show() => displays a figure by invoking plot viewer window, this is used in interactive mode if you want to see the plot. Most of the times, you will just want to save the plot. pyplot.Savefig saves the current figure, while pyplot.close() closes a figure window

pyplot.bar() => makes a bar plot, similarly options for histogram (pyplot.hist), scatter plot (pyplot.scatter), line plot (pyplot.plot), etc. These create the plot in background, but do not display it. We use pyplot.show() to display the plot.

A. scatter plot (plt.scatter): scatter plot is used widely in stats. The syntax for scatter plot is:

matplotlib.pyplot.scatter(x, y) => A scatter plot of y vs x with varying marker size and/or color that can be specified with optional args. X, Y are array like. Fundamentally, scatter works with 1-D arrays; x, y may be input as 2-D arrays, but within scatter they will be flattened. shape of X and Y are (n,) where n is the number of elements in the 1D array. Important Optional args are size and color.

s=area (or size). Since area is written as R^2, so we always write area in form R**2, where R=radii in point for that marker. This can be provided as a scalar number or as an array of shape (n,) (i.e 1D array of n elements) in which case size array is mapped to each element of X,Y array.

c=color. color is little complicated. We can specify single color format string or sequence of colors. Seq of colors is specified as 1D array of shape (n,) where elements are floating numbers b/w 0 to 1 or int b/w 0 to inf, or any decimal numbers. (Not sure how these numbers map to colors). We can have optional cmap option(cmap=my_map), where we specify a colormap instance or a registered colormap name. cmap is used only if c is an array of n floating numbers (again floating numbers can be in any range).

ex: Below ex draws X-Y plot with random size and random color of each point which is itself placed randomly.

N = 50
x = np.random.rand(N) #This returns 1D array with 50 random entries in it (all b/w 0 and 1). returns x = [ 0.46875992 .... 0.498185 ]
y = np.random.rand(N) #similar 1D random array y
colors = np.random.rand(N) #similar 1D array for color. Each of 50 entry corresponds to one (X,Y) point. Each colors value is random floating num from 0 to 1.
area
= (30 * np.random.rand(N))**2 # 0 to 15 point radii. This is again a 1D array wit 50 random entries one for each (x,y) point.
plt.scatter(x, y, s=area, c=colors, alpha=0.5) #c is color, s is area in points**2. alpha is the "alpha blending value", between 0 (transparent) and 1 (opaque). alpha=0.5 is in between transparent and opaque.
plt.show() #this shows the plot for display

ex:Below ex draws same kind of random plot as above, but here size is fixed at 40 and we use a colormap called Spectral. Spectral colormap has a mapping from number to color, so it maps accordingly.

color=np.random.randint(5,size=(400,))

plt.scatter(X[0, :], Y[0, :], c=color, s=40, cmap=plt.cm.Spectral); => This plots elements of 2D array where X and Y are 2D arrays. Here [0,:] array is sliced along index=0, so it plots axis=1 for both arrays. c is color, s is area of 40 and cmap is a registered colormap name. Here color is provided as 1D array of 400 values, with each value being a random num from 0 to 4. As per Spectral cmap, 0 maps to red, 1 maps to blue and so on. So, each dot on scatter plot gets a random color from the set of 5 colors.  However, again I see here that any real number in any range can be provided for color and it still works, so again not sure how it maps.

B. Line plot (plt.plot): We draw a line plot (most common kind of plots) using plt.plot() func.

Ex: Below is a simple plot of sine wave, where x is varied from o to 2*pi, while y is a sine wave constructed from these x values.

#!/usr/bin/python3.6
from
matplotlib import pyplot as plt

import numpy as np import math #needed for definition of pi

x = np.arange(0, math.pi*2, 0.05) #numpy lib has arange function that provides ndarray objects of angles between 0 and 2π with step=0.05. If we increase this step to 1, we get a very non smooth plot.

y = np.sin(x) plt.plot(x,y) #plots line plot with x and y

plt.xlabel("angle")

plt.ylabel("sine")

plt.title('sine wave')

plt.show() #displays the plot

Ex: plot an array with no x values provided => uses index of array as X values
ex: a = [1, 2, 3, 4, 5]
plt.plot(a) #here we provide only Y axis values to plot. Since it's an array, the array index are plotted on X axis.
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(b, "or") #Here we plot graph "b" but not as a line, option "or" means o=circle, and r=red. So, it says to plot it as scatterplot, with X axis being the index, and dots being circle with red color.


C. Contour plot (plt.contour): These are one of the ways to show 3D surface on 2D planes. More details here: https://www.tutorialspoint.com/matplotlib/matplotlib_contour_plot.htm
meshgrid is needed to create gris of X and Y array values on which the plot of Z is drawn. plt.contourf() fils the contour lines with color, while plt.contour() just shows the contour lines

ex: draws meshgrid with X and Y both in range of -3 to +3. Z is the function whose contour plot we draw.

import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
plt.contourf(X, Y, Z)
plt.show()


Figures: all plots that we saw above used a figure to draw the plot on. We never had to call any function separately to create a figure. However, we can do that too.

1. Figure (plt.figure):   matplotlib.figure module in matplotlib contains the Figure class. It is a top-level container for all plot elements. The Figure object is instantiated by calling the figure() function from the pyplot module (an object inst is returned back on calling the class method init, which is automatically called on calling class_name. See OO section for details)

fig=plt.figure(figsize=(3,4)) => This returns figure instance by calling figure class. Here we specified figure (width,height) in inches. If we don't specify anything, i.e. plt.figure() then default size figure is opened.

2. Axes: Axes object is the region of the image with the data space. The Axes contains two (or three in the case of 3D) Axis objects. Axes object is added to figure by calling the add_axes() method. It returns the axes object and adds an axes at position rect [left, bottom, width, height] where all quantities are in fractions of figure width and height.

ax=fig.add_axes([0.05,0.06,0.9, 0.9]) => This adds axes on the figure starting from left which is 0.05*figure_width, bottom which is 0.06*figure_height, and then height and width are same as height and width of figure. Usually you'll see [0,0,1,1] as axes position, but that doesn't show the axes as the axes are right on the edge of figure. So, we leave some margin on sides.

Now on the axis object, we can draw plots, put labels, legends, etc

l1 = ax.plot(xlist,ylist,'ys-') # solid line with yellow colour and square marker
l2 = ax.plot(x2list,ylist,'go--') # dash line with green colour and circle marker
ax.legend(labels = ('tv', 'Smartphone'), loc = 'lower right') # legend placed at lower right
ax.set_title("Advertisement effect on sales") #title on top of axes plot
ax.set_xlabel('medium') => label on x axis
ax.set_ylabel('sales') => label on y axis
plt.show() => this finally shows the fig with the plot on it

3. subplots: Apart from plot function, we also have subplots func, that is used to create a figure and a set of subplots.The subplots() method returns both the fig and the axes. subplots() is used when we want to create more than 1 plot on the same figure. plt.subplots(nrows, ncols) draws subplot grid with nrows and ncols.

ex: fig, ax = plt.subplots() => Now we can use fig and axis the regular way (i.e ax.set_xlabel, etc). i.e ax.plot(x,y)

ex: Below ex creates 2 rows and 2 cols with 4 plots total. Only plotiing 2 plots, the other 2 plots remain empty.

fig, axs = plt.subplots(2, 2)

axs[0, 0].plot(x, y)

axs[1, 1].scatter(x, y)

plt.show()

We can also use plt.title, plt.xlabel, plt.ylabel etc, and bypass ax.set_title, etc. Not sure, what the diff is. FIXME?

GDP = Gross Domestic Product. Now you know it !!

We hear this term so much in everyday life, yet I knew nothing about it. So, started reserahing about it to find out what is it, and if it has any relevance at all?

Wages and salaries: If we add up everyone's wages and salaries in a given year that may give us some idea of how much more money are people making every year, compared to previous year. However ieven if a person makes more money this year than last year, it doesn't mean much, unless with his increased paycheck, he could buy more things.

Let's say with $100 in wages last year, a person was able to but 20 kgs of rice (at $5/kg). This year, let's say he made $110, but could still but only 20kg of rice, then he didn't really make any more money, since he could still but the same amount of rice as last year (and nothing more since price of rice this year is $5.50/kg).

Historical GDP for each country can be found here: https://data.oecd.org/gdp/gross-domestic-product-gdp.htm

Total GDP =$80T, USA=$20T, China=$12T, Japan=$5T, Germany=$4T. Next is India, UK, France and Brazil. Just top 15 countries, with 50% of world population account for 75% of world GDP. GDP grows by rate of 3% per year. From $1.4T in 1960, it has grown 50 fold in last 50 years. No matter which part of world you live, you need to spend about $1K/year on your food and basic supplies to survive. So, 8B people would imply a minimum of $8T in GDP. Of course, top 10% of the world spend more than that on a phone every year, so rest of the GDP comes from those rich people. About 2.5B people in world are very poor, do not get enough to eat and are malnourished. 70% of world population lives on less than $10/day. Only 7% or 500M people live on > $50/day. Most of these rich people are in USA, Canada, UK, Australia and western European countries.

Usually countries with high population also have large GDP. Since GDP is closely tied to increasing population (more people, more consumption, more GDP), countries with fast increase in population will have higher GDP growth every year.

USA GDP:

USA GDP data is more reliable than world GDP data. And since we'll be looking at US economy data in more detail, it's better to look at USA GDP. I'll mostly be talking about nominal GFP and not real GDP.

Nominal GDP = GDP as in current US dollars

Real GDP = Nominal GDP - Inflation (For this we consider some particular year as a baseline, and then compare real GDP compared to that year).

US nominal GDP: https://fred.stlouisfed.org/series/NGDPNSAXDCUSQ

 

 

 

 

 

 

NumPy: Numerical python. Very popular python library used for working with arrays. Python has native lists that work as arrays but they are very slow. NumPy is very fast. It has a lot of functions to work with the arrays too. It is the fundamental package for scientific computing with Python. Numpy is used heavily in ML/AI, so we need to have this installed. All exercises in AI use numpy.

Official numpy website with good intro material is: https://numpy.org/doc/stable/

A good tutorial is here: https://www.geeksforgeeks.org/python-numpy

Installation:

CentOS: We install it using pip.

 $ sudo python3.6 -m pip install numpy => installs numpy on any Linux OS. We can also run "sudo python3 -m pip install numpy",

Arrays:

Basics of Array: Number of square brackets [ ... ] in the beginning or end determine the dimension of array. so, [ ... ] is 1 dimensional, while [ [ ... ] ] is 2 diemensional and so on, as you will see below.

1 Dimensional array is a an array which has only 1 index to find out any element. ex: arr_1D = [ 1 2 3 4 5 ] => This is a 1D array with 5 elements. arr_1D[0]=1, arr_1D[1]=2, ...

2 Dimensional array is an array which has 2 indices to find out any element. So, we have 2 square brackets here.

ex: arr_2D = [ [1 2 3 ]  [7 8 9]  [4 5 6]  [2 4 6] ] => Here we see that outer array has 4 elements (similar to 1D array), but now each element of this outer array is itself an array. so, if we try to print each element of this outer array, it will print the array element. ex: arr_2D[0] = [1 2 3], arr_2D[1] = [7 8 9], and so on. Now if we want to print element of each internal array too (i.e the final value stored in array), we have to provide that index too, i.e arr_2D[1][2] = 9 => here arr_2D[1] points to array [7 8 9], and then for this we can report any index. So, if var=arr_2D[1] = [7 8 9], then var[0]=7, var[1]=8. var[2]=9. But here var happens to be arr_2D[1], so arr_2D[1][2] gives 2nd internal array and 3rd entry in this array. So, full array range is arr_2D[0:3][0:2].

Sometimes writing 2D array in other way is more visual. Writing above array in row/col format, we now see that there are 4 rows and 3 cols. So, it's 4X3 matrix array, i.e outermost has 4 elements and each of that contains 3 elements.

[ [ 1 2 3 ] 

  [7 8 9] 

  [4 5 6] 

  [2 4 6] ]

ex: arr_2D = [ [1] [2] [3] ] => Each element is 1D array. So, it's a 3X1 matrix, i.e outermost has 3 elements and each of that contains 1 element. So, shape is 3X1, and dimension is 2.

[ [1]

 [2]

 [3] ]

ex: arr_2D = [ [1 2 3] ] =>Each element is 1D array with 1X3 matrix. So, shape is 1X3, and dimension is 2.

3 Dimensional array is an array which has 2 indices to find out any element.

ex: arr_3D = [ [ [1 2 ] [3 4] ] [ [5 6] [7 8] ]  [ [9 0] [1 4] ] ]. Here outer array has 3 elements, all 3 of which are 2D array. The 2D array is 2X2. So, full array range is arr_3D[0:2][0:1][0:1]. so, it's a 3X2X2 matrix, i.e outermost has 3 elements and each of that contains 2 elements and each of these 2 elements finally contains 2 elements. So start with innermost entries, that determines the final dimension of matrix. Then move outward.

[ [ [1 2 ] [3 4] ]

 [ [5 6] [7 8] ]

 [ [9 0] [1 4] ] ]

Usage of Numpy:

We saw array module in python section to create arrays. However, it's highly preferred to use numpy module to work on arrays, instead of using array module, that's included in python by default.

Import numpy module:

First we need to import numpy module in our python script in order to use it:

ex: import numpy => imports numpy. Now, we can call numpy functions as numpy.array, etc.

NumPy is usually imported under the np alias, so that we can use the short name np instead of longer NumPy

ex: import numpy as np

Creating numpy array:

After importing numpy module, we can use array( ) function in numpy module to create numpy array object. The class of this array object is ndarray (it will be seen as "numpy.ndarray" object in pgm). See in "python: Object Oriented" section on how classes are created.

array() function: Input to array function can be python objects of data type list, tuples, etc. See in Python section for list, tuples, defn. These list, tuples, etc are converted into numpy array object of class "ndarray" by the array() func. Array in Numpy is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. The type of array created is figured out automatically based on the type of input contents (i.e if list has int type, then array created has int type). If we have mixed contents, then type is undefined. We can also explicitly define a type for ndarray object, that we'll see later.

import numpy as np
arr = np.array( [1, 2, 3, 4, 5] ) #here input is a python list, with all integers. An ndarray object is created with all integer elements, i.e arr = [1 2 3 4 5]

arr = np.array((1, 2, 3, 4, 5)) # here tuple is provided as an input to array() function.

Print: can be used to print elements of array
print(arr) => prints array elements [1 2 3 4 5]. "arr" is ndarray object. It has no commas when it's printed. We don't know what form is ndarray object stored internally, but "print" func prints it in this form. This is 1D array. arr[0] = 1, arr[4]=5, and so on

NOTE: In above ex, the input list, tuple etc, has elements which are separated by a comma (as per the syntax of list, tuple, etc), and they get printed the same way with commas. However, the output of array() func is ndarray object, which is printed with no commas. i.e arr = [1 2 3 4 5]. However, [1 2 3 4 5] is not ndarray object (it's just the printed o/p), arr is the ndarray object. If we try to apply any numpy func on [1 .. 5], we'll get an error: i.e. arr=np.array([1 2 3 4 5]) gives syntax error.

We can't create numpy array by just assigning a python "list" to a var.

arr= [ [1,2], [3,4],[5,6] ] => This assigns the "list" to var "arr". Since it's not numpy array (since we didn't use numpy.array() function on this), we wouldn't expect any numpy function/method to work on this list. However, surprisingly it does work for a lot of functions i.e np.squeeze(arr) will work, even though arr is a list (and NOT ndarray object). Not sure why? Maybe, most numpy func automatically convert input arg which is list or tuples into ndarray object, if it's NOT ndarray to start with. Best thing to do is to convert list/tuple into numpy  "ndarray" object using np.array() func, and then work on it. Later, we'll see many other functions to create numpy array (besides the array() func)

Data types (dtype): data types in NumPy are same as those in Python, just a few more. They are rep as int32, float64, etc or we can specify it in short form as single char followed by number of bytes, i.e int8 is rep by "i1", "f4" for 32 bit float, "b" for boolean, "S2" for string with length 2, etc. Instead of S type, we use U (unicode) type string in Python 3. See details for unicode in regular python section.

W don't have a separate type for each element of ndarray object, as ndarray can have elements of only one type. As we saw above, numpy array object inherits the "type" from type of list/tuple. This type becomes the data type of whole array. It's referred to as attribute "dtype" of the array object.

print(arr.dtype) => property "dtype" prints data type of an array. Here it prints "int64", since data is integer rep with 64 bytes

When declaring array using array() func, we may specify dtype explicitly. Then those array contents are converted to that data type and stored (if possible)

ex: arr=np.array(['2', '72', 'a'], dtype='int64') => Here it errors out since 3rd entry 'a' can't be converted to int type. '2' and '72' are OK to be converted even though they are strings. If a' is replaced by 823, then arr would be [2 72 823] , i.e array with int64 elements and NOT string.

arr=np.array(['23', 'cde', 71],dtype='S2') => Here we are creating an array of 3 elements with dtype as string of 2 byte. So, numpy converts 71 (which is without quotes, and so an int) to a string too. However, 'cde' needs 3 bytes, but since we are forcing it to 2 bytes, 'e' is dropped and only 'cd' is stored

print(arr.dtype, arr) => It returns => |S2 [b'23' b'cd' b'71'] => S2 means it's dtype is string with 2 Byte length. b'23 means string "23" is stored as bytes. Here array got printed with these b', which we don't want. To print only the string, we can convert these to utf-8 by using decode method, i.e arr[1].decode("utf-8")) returns "cd" unicode string

ex: arr=np.array(['2', '32', 7],dtype='i4'); print(arr.dtype, arr) => returns => int32 [ 2 32  7] as dtype is int32 and array elements are converted to 4 byte integer, so string '2' and '32' become integer 2 and 32.

arary with multiple data types: ex: np.array( ['as', 2, "me", 4.457]  ) => here all 4 elements of array are of diff data types. By default, dtype here is U=Unicode. This is valid.  Since 4.457 has length=5, so it's type is Unicode with length=5 or U5. So, all elements of this array are U5 irrespective of whether it's string or int. Basically all array elements got converted to unicode (or string in loose sense). Just that operations like arr[2] + arr[3] may not be valid, since not all operations apply on unicode type.

shape: A tuple of integers giving the size of the array along each dimension is known as shape of the array, i.e the shape of an array is the number of elements in each dimension.

print(arr.shape) => returns a tuple with each index having the number of corresponding elements. Here it returns (2,3) meaning array is 2 dimensional, and each dimension has 3 elements, so it's 2X3 array.

Since shape is a tuple, we can access each element of this tuple cia index, i.e shape[0] returns 2 (num of arrays), while shape[1] returns 3 (elements in each array)

Dimension (ndim): This shows dimension of an array as 1D, 2D and so on. In Numpy, number of dimensions of the array is also called rank of the array (i.e 2D array has rank of 2).

print(arr.ndim) => "ndim" attr returns the number of dimension of an array. since arr has 1 dimension, this returns 1

0D array => a_0D = np.array(2) => This is an array with just element of the array, i.e there is only 1 value. so, it's not really an array, but a scalar. It shows ndim=0. I shows blank for shape, i,e a_0D.shape = ( )

1D array => b_1D = np.array( [2] ) => By adding square brackets, we convert 0D array into 1D array. It shows ndim=1, and b_1D.shape = (1, ). We would have expected it to show (1,1) since there's 1 row and 1 col, but for 1D array, number of rows is 0 (since if there were any rows, it would become 2D array. 1D array just has columns. So shape tuple omits rows, and only shows cols for 1D array. This is called a rank 1 array, and because it's neither  row vector nor a col vector (explained below), it's difficult to work with. So, avoid having these 1D arrays, as they won't yield desired results in AI computations. We usually use reshape function (explained later) to transform it to a 2D array as row vector.

ex: b_1D = np.array( [2, 3, 5] ) => This shows shape as (3, ) since this has 3 columns.

2D array => c_2D = np.array( [[1, 2, 3], [4, 5, 6]] ) => This is 2D array with 1st row [1 2 3] and 2nd row [4 5 6]. c_2D.ndim=2, c_2D.shape = (2,3) since there are 2 rows and 3 columns. NOTE: there are comma in between elements and in between arrays.

print( c_2D[0]) => prints 1st element of array c_2D which is "[1 2 3]", c_2D[1]=[4 5 6], c_2D[1,2] = 5

arr_2D = np.array( [[1, 2, 3]] )=> This is 2D array which has only 1st row which is a 1D array with 3 elements. So, arr_2D.ndim=2, arr_2D.shape=(1,3). NOTE: this 2D array has 1 row and 3 columns, unlike 1D array which had no rows and just 3 columns.

row vector: These are 2D array of shape (1,n) i.e they have a single row. ex: [ [ 1 2 3 ] ]

column vector: These are 2D array of shape (m,1) i.e they have a single col. ex: [ [1] [2] [3] ]


3D array => d_3D = np.array( [[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]] ) => 3D array can be seen as each row itself being 2D array. d_3D.ndim=3, d_3D.shape = (2,2,3) since it has 2 outermost entries, then each of these 2 entries has 2 array, and each of these 2 have 3 elements each.

Axis of Numpy array: In numpy, number of dimension of array is called as number of axis of an array, i.e 3D is called an array with 3 axis. 1st axis or axis=0 is the outermost array. Then 2nd axis or axis=1 is the next inner array and so on.

For N dim matrix as (N1, N2, ... Nn) => There are N1 data-points of shape (N2, N3 .. Nn) along axis-0. Applying a function across axis-0 means you are performing computation between these N1 data-points.. Each data-point along axis-0 will have N2 data-points of shape (N3, N4 .. Nn). These N2 data-points would be considered along axis-1. Applying a function across axis-1 means you are performing computation between these N2 data-points. N3 data points would be considered along axis-2. Similarly, it goes on. The dimension of the array is reduced as well, since 1 or more axis are gone.

As an ex: For a 2D array, Let's try computing across the 2 axis. ex: data = numpy.array([[1, 2, 3], [4, 5, 6]]);

1. axis=0: adding across 1st axis or axis=0 means adding across all rows, i.e adding all col (vertically down) for each row.

ex: result = data.sum(axis=0); print(result) => prints [1+4 , 2+5, 3+6] = [5 7 9] => This is a 1D array now instead of 2D array.

2. axis=1: adding across 2nd axis or axis=1 means adding across all cols, i.e adding all row (horizontally across) for each col.

ex: result = data.sum(axis=1) => prints [ [1+2+3] [4+5+6] ] = [6 15]  => this is again a 1D array

More ways to generating a new array: there are many functions in numpy to generate a new array with any given shape, and inititalize it with values.

1. arange: arange function returns an ndarray object containing evenly spaced values within a given range (i.e arange= array range). The array is 1D and it's size is the range of numbers that will fit in that array.

Syntax: numpy.arange(start, stop, step, dtype) => "stop" is required (final element value is n-1), all others are optional. By default start=0, step=1 and dtype is same type as stop, so if stop is float, then type is float too.

x = np.arange(5) => returns [0 1 2 3 4]. Here range is defined as 0 to 4 with step of 1. data type=integer since 5 is integer.

x = np.arange(10,20,2) => returns [10 12 14 16 18]. It's 1D array with 5 elements in it.

2. linspace: Similar to arange. It returns ndarray object with evenly spaced numbers over a specified interval.

Syntax: numpy.linspace(start, stop, num) => start, stop are required. num=50 by default

np.linspace(2.0, 3.0, num=5) => returns array([2.  , 2.25, 2.5 , 2.75, 3.  ]) => Here 5 samples are included b/w 2 and 3 with equally spaced values.

3. zeros/ones: These are 2 other functions that init an array with zeros or ones.

Syntax: numpy.zeros(shape, dtype) => Returns an array containing all 0 with given shape. dtype is optional and is float by default.

x = np.zeros(2) => returns [0. 0.]. 2 implies 1 dim array with shape of (2,)

y = np.ones((3,2), dtype=int) => returns a 2 dim array of shape (3,2), with type of 1 as integer, so it's 1 and NOT 1.0 or 1. (i.e NOT decimal 1, but integer 1)

[[ 1  1]
 [ 1  1]
 [ 1  1]]

4. random: There is a random module in NumPy to generate random data. It has lots of methods which are very useful in AI and ML for generating random dataset.

from numpy import random => this is not really needed generally, but here we need it since numpy has it's own random module (while python has it's inbuilt random module), and we want to use numpy's random module. When we import numpy, we import all it's modules and methods, including random module. So, "from numpy import random" is not needed. But then we have to use np.random everywhere, to indicate that we are using numpy random module. If we just call "random", then we'll be calling python's inbuilt random module. So, we add this line "from numpy import random" to use numpy random directly. since using random is more convenient (instead of np.random).

Seed: All random numbers generated for a given seed. Seed provides i/p to pseudo random num generator to generate random numbers corresponding to that seed. Different seeds cause numpy to genrate diff set of random numbers.

np.random.seed(1) => this will generate pseudo random numbers for all random functions using seed=1. We could use any integer number as seed. We don't really need to provide this seed at all, since by default, numpy chooses a random seed and generates random num corresponding to that seed. But then our seq of random numbers generated will be diff for each run of pgm, which will be difficult to debug or reproduce. so, we usually assign a seed, when coding our program the 1st time. Once we have debugged the pgm with couple of seeds, we can get rid of this seed function.

randint():

ex: x = random.randint(100) => randint method says to generate integer random number, and arg=100 says the range is from 0 to 100-1 (i.e 0 to 99). Note, we could have written np.random.randint(100) too, but we don't need that since "from numpy import random" imports random into current workspace.

To generate 1D or 2D random numbers, we can specify size.

ex: random.randint(50, size=(3,5)) => generates a 2D array of size=3X5, with each element being a random int from 0 to 49

rand():

ex: random.rand(3) => just "rand()" method returns random float b/w 0 to 1. Number inside it reps the size of array, i.e 3 means it's a 1D array of size 3. i.e random.rand(size=(3)), however we don't write it that way (size=3) with rand method, we directly specify the size, as rand method is different than randint

ex: x = random.rand(3, 5) => returns 2D array with matrix=3X5.

[[0.14252791 0.44691071 0.59274288 0.73873487 0.22082345]

[0.00484242 0.36294206 0.88507594 0.56948479 0.15075563]

[0.69195833 0.75111379 0.92780785 0.57986471 0.6203633 ]]

randn(): returns samples from standard normal distribution. Std normal dist is gaussian distribution with mean mu, and spread sigma. So here instead of having equal probability for different numbers, it has probability distribution that is higher for numbers closer to mean, and the probability keeps on falling down as you get away from mean. 99% of the values lie within 3 sigma of mean. We provide shape of array as i/p.

ex: x=random.randn(3,4,5) => returns 3D array of shape=(3,4,5)with random float in it which have mean=0, sigma=1.

To get values corresponding to other mean and sigma, just multiply the terms appr:

ex; Two-by-four array of samples from N(3, 6.25): Here mean=3, sigma=√6.25 = 2.5

3 + 2.5 * np.random.randn(2, 4) => 67% of numbers will be b/w 3-2.5 to 3+2.5, i.e in b/w 0.5 to 5.5
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random

ex: np.random.randn() => returns single random float "2.192...".  Here only float is returned since no array shape specified. 


Operations on Array:

array slicing => array[start: end-1: step] => If omitted, start=0, step=1, end=last index of array. For nD array, we can slice each index of the array.

IMP: When we provide the last index of array, it's last_index-1. So, arr[2:4] will have arr[2], arr[3], but not arr[4], as range is from 2:(4-1). This is the same behaviour that we saw with lists/tuples/arrays in python. One other thing to note is that complex slicing is allowed on numpy multi dimensional arrays which were not possible on lists/tuples/arrays in python. This is where numpy turns out to be much more powerful in terms of operations being done on arrays. Also, in numpy, we access elements of array via arr[2,3,0], while the same element accessed in a list/tuple/array via arr[2][3][0] (i.e commas are needed in numpy. However, python list syntax works for numpy too, i.e arr[2][3][0] is equally valid in numpy, but we don't access numpy arrays that way)

np_arr = [[300, 200,100,700,212], [600, 500, 400,900,516], [21, 23,45,67,45]]

ex: np_arr[1] = returns entry of index=0, which is itself an array, so returns all of that array => [300 200 100 700 212]

ex: np_arr[1,2] => returns 400 (since it's index=1 for outer array and index=2 for inner array. So, for multi dim array, we specify indices separated by commas.  np_arr[1][2] also works, though as explained above, that's not the right way.

ex: np_arr[0:2:1] = Here we provided the outermost array index (since there are no commas for inner indices). The range is from 0 to 1 with increment of 1. So, this returns as below:

[[300 200 100 700 212]
 [600 500 400 900 516]]

ex: arr_3D[1,0,1:2] = [ [ [8 9] ] ] => since it's 3D array, it reports the final slice of the array as 3D array. Here we take index=1 for axis=0 which is [[7 8 9] 010 11 12]], then we take index=0 for axis=1, which is [7 8 9] , then we take slice 1:2 of this final one which [8 9]

x=np_arr[0:2,3:1:-1]  => Here we provide index range for both dimension of array. axis=0 goes from 0 to 1 (since range 0:2 implies 0 and 1), while axis=1 goes from index 3 to 2 in reverse direction (if we do 1:3:-1, this would return empty array, since 1:3 index can never be achieved by going in reverse dir. This is important to remember). NOTE: array entries are now reversed, i.e the array x gets assigned the values as [700 100] instead of [100 700] as in original array. Array "x" still remains a 2D array.

x =

[[700 100]
 [900 400]]

ex: [1 2 3 4 5]; arr[: : 2] = [1 3 5] => prints every other element (since start and end are not specified, start=0 and end=length of array.

ex: arr = np.array([[[1, 2], [3, 4]], [[5, 6],[2,3]]])

ex: print(arr[:]) => prints all elements of array since no start/end specified. All 3D elements printed. Same as what would have been printed with print(arr)

ex: print(arr[0,:]) => This prints all elements of index=0 for axis=0. The same o/p is printed with arr[0][:] (i.e list/tuple format in python) prints [[1 2]   [3 4]]. NOTE: it's 2D array now.

print(arr[:,0]) => prints [[1 2] [5 6]]. This says that for axis=0, slice everything since no range specified, so the whole array is returned. Then 0 says that for axis=1 return index=0. Array for axis=1 is [ [1 2] [3 4] ] and [ [5 6] [2 3] ]. index=0 is [1 2] from 1st one and [5 6] from 2nd one.

reshape: Reshaping means changing the shape of an array. reshape(m,n) changes an array into m arrays with n elements each (i.e turns the array into 2D array), provided it's possible to do that, else it returns an error. similarly reshape(p,q,r) changes an array into 3D array with p arrays that each contains q arrays, each with r elements. reshape(1)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3) => this changes the above 1D array into 2D array with 4 arrays and each having 3 elements. So, newarr.ndim=2, newarr.shape=(4,3) since it has 4 arrays with 3 elements in each.

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


ex: n_arr = arr.reshape((1, arr.shape[0])) => Since arr is 1D array, shape=(12, ) i.e 12 followed by blank. To convert it into 2D array, we use reshape method as shown. since shape[0] returns 12, this becomes newarr=arr.reshape(1,12). This becomes 2D array with 1 row and 12 elements in each. So arr=[1 2 .. 12] while n_arr = [ [ 1 2 ... 12] ]. NOTE: 2 square brackets in n_arr, as compared to single brackets in arr (since it's a 2D array now). n_arr.ndim=2, n_arr.shape=(1,12)

assert (a.shape == (1,12)) => This asserts or checks for the condition that shape of array a is (1,12). This is helpful to figure out bugs in code, since if the shape is not as expected, this will throw an error.

newarr.reshape(12) => When only 1 integer provided, then result is 1D array of that length. So, this returns [1 2 ... 12]. We can also provide -1 as the length of array to get same result.

newarr.reshape(-1) => flattens the array, i.e converts any array into 1D array. So, this returns [1 2 ... 12]. However, if we provide other integers for new shape along with -1 as last integer, then array is converted into required shape, with other values inferred.


newarr.reshape(d_3D.shape[0],-1) => Here 1st value is 2 (from above example). So tuple is (2,-1) meaning it's 2X6 (since 6 is inferred automatically. -1 implies flatten other dimension, so 6 is the only other value). result is: 

[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]

Other way to flatten an array is by using func ravel() or method ravel. It's same as reshape(-1).

ex: np.ravel(newarr) => converts newarr array into flattened 1D array. We could also apply method ravel on newarr as newarr.ravel()

We usually want a 2D array, with one row, instead of 1D array with 1 row. It's easier to work with 2D array. NOTE: They are kind of same except that there are 2 square brackets in 2D array, while only 1 square bracket in 1D.

new_arr = arr.shape(arr.shape[1]*arr.shape[2]*arr.shape[3], arr.shape[0]) => Here we convert an array of shape (m,n,p,q) into array of shape (n*p*q, m) i.e we convert 4D array into 2D array with outer m array not flattened, but everything inside it flattened.

squeeze: this func removes one-dimensional entry from the shape of the given array. This is used in opposite scenarios where 2D array is converted to 1D array. Axis to be squeezed should be of length=1. By default, axis0 is squeezed. We can specify axis to be squeezed

ex: y=np.squeeze(x) => x is 3D array with shape (1,3,3) while y now becomes 2D array with (3,3), i.e axis0 is squeezed

X = 
[[[0 1 2] [3 4 5] [6 7 8]]] Y = [[0 1 2] [3 4 5] [6 7 8]] The shapes of X and Y array: (1, 3, 3) (3, 3)
r_: This is a simple way to build up arrays quickly. Translates slice objects to concatenation along the first axis. dd
ex:
np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])] => returns array([1, 2, 3, 0, 0, 4, 5, 6]) => This ex concatenates 1D array then 0, 0, then another 1D array. It concatenates along axis=0, still returns 1D array.
ex:
np.r_['1,2,0', [1,2,3], [4,5,6]] => the numbers within '...' before the array specifies how to concatenate. Here number 1 specifies concat along axis=1 (2nd axis) array([[1, 4], [2, 5], [3, 6]])
 
c_: Translates slice objects to concatenation along the second axis.
np.c_[np.array([[1,2,3]]), 0, 0, np.array([[4,5,6]])] => returns array([[1, 2, 3, 0, 0, 4, 5, 6]]) => This ex concatenates 2D array along axis=1 (2nd axis).


Matrix Operations:

matrix transpose: This is one of the useful functions to find transpose of a matrix. Transpose of 2D array is easy to see, it's rows and columns are swapped, so rows become columns and columns becomes rows (i.e 3X4 matrix becomes 4X3 matrix, w/o any change to any of the contents). You can transpose any n Dim matrix too, and specify how to transpose it. By default for n Dim marux, the order is rveresed, i.e 2X3X4 matrix becomes 4X3X2 matrix.

ex: np.transpose(newarr) => changes newarr from 4X3 to 3X4 array

Instead of using function, we can also use method to transpose.

ex: y=newarr.T => Here we are applying "T" (T is the name for transpose) method to newarr object. Result is same as transpose function above.

matrix dot operation: To find dot product of 2 matrix, we use dot function. NOTE: dot operation is different than multiplication operation. Mult just multiplies each element of 1 array with that of other array, while dot operation is the mult/add of differnt elements of array. You can find more details of dot operation on matrix in "high school maths" section. For 2-D vectors, it is the equivalent to matrix multiplication. For 1-D arrays, it is the inner product of the vectors. For N-dimensional arrays, it is a sum product over the last axis of a and the second-last axis of b. The dimensions of two matrix being dot has to compatible for matrix dot operation, else we'll get an error. Instead of using dot function, we can write a for loop and iterate over each element of 2 array and sum them appr. However, this for loop takes a long time to run, as it can't use parallel instructions such as SIMD (single inst multiple data). Dot function in python uses these SIMD inst or GPU (if available), which significantly speeds up the multiplication/addition part.  Using dot operation is called vectorization, and in AI related courses, we'll always hear this term, where we'll always be asked to vectorize our code (meaning put it an array form and then use dot functions to do multiplication)

a = np.array([[1,2],[3,4]]) => 2D array of 2x2
b = np.array([[11,12],[13,14]]) => 2D array of 2x2
np.dot(a,b)
This produces below 2D array of 2x2 which is calculated as follows =>
[[1*11+2*13, 1*12+2*14],[3*11+4*13, 3*12+4*14]]
[[37  40] 
 [85  92]] 

matrix add/sub/mult/div operations: All other matrix operations as add, divide, multiply, abs, log, etc can be done by using specific matrix functions similar to matrix mult shown above, instead of using for loop.

ex: c=np.add(a,b) => adds 2 matrix a and b. Each element of matrix a is added to corresponding element of matrx b. Similarly for np.subtract(a,b)

ex: c=np.divide(a,b) => divides 2 matrix a and b. Each element of matrix a is divided by corresponding element of matrx b. Similarly for np.multiply(a,b)

Other misc operations: Many other operations defined working on single array.

log: log: ex: c=np.log(a) => computing log of each element of array "a"

abs: ex: c=np.abs(a) => computing absolute value of each element of array "a"

sum: There is other operator "sum" (NOT add) which adds the each row or column of an array to return 1D array.

ex: A = [ [300, 200,100], [600, 500, 400] ]
C=np.sum(A,axis=0) => adds each col (since axis=0) and returns 1D array with shape=(3,). result=[900 700 500]

C=np.sum(A,axis=1) => adds each row (since axis=1) and returns 1D array with shape=(2,). result=[600 1500]

C=np.sum(A) => adds all rows and cols (since no axis specified, it adds across all axis) and returns a scalar 2100.

Broadcasting: Array broadcasting is a concept in Python, where we can perform matrix operations, even when the matrix are not entirely compatible. Python expands the required rows or columns by duplicating them. Certain rules apply as follows:

Rule 1. matrix of dim=mXn operated with matrix of dim=1Xn (1 row only) or with matrix of dim=mX1 (1 col only) => operations are +, -, *, /. The matrix 1Xn or mX1 are converted into matrix mXn by duplicating rows or col, and then operation is performed.

ex: A = [ [200, 100] , [300, 400] ] , B = [ [1, 2] ] => Here A is 2X2 matrix, while B is 1X2 matrix.

C= np.sum(A,B) => Here, B is broadcast to 2X2 matrix, by duplicating 1st row. so, result is C = [[201 102]  [301 402]]

Rule 2: matrix of dim 1Xn or of dim mX1 => We can do operations of +, -, *, / on these matrix with a real number. The real number will be converted into 1Xn or mX1 matrix and then operation performed.

ex: B = [ [1, 2, 3] ] => This is 1X3 matrix. If we add real number 2 to this matrix, then it's converted to [ [ 2, 2, 2 ] ] and then addition performed.

C=np.add(B,2) => [ [1, 2, 3] ]  + 2 = [ [3 4 5] ]

Other operations on array: iteration over elements of array, join, split, search, sort, etc are miscellaneous functions provided to work on arrays.