Python module

Python module - numpy

Details: Last Updated: Thursday, 01 October 2020 05:08; Published: Monday, 24 August 2020 14:48; Hits: 1050

NumPy: Numerical python. Very popular python library used for working with arrays. Python has native lists that work as arrays but they are very slow. NumPy is very fast. It has a lot of functions to work with the arrays too. It is the fundamental package for scientific computing with Python. Numpy is used heavily in ML/AI, so we need to have this installed. All exercises in AI use numpy.

Official numpy website with good intro material is: https://numpy.org/doc/stable/

A good tutorial is here: https://www.geeksforgeeks.org/python-numpy

Installation:

CentOS: We install it using pip.

$ sudo python3.6 -m pip install numpy => installs numpy on any Linux OS. We can also run "sudo python3 -m pip install numpy",

Arrays:

Basics of Array: Number of square brackets [ ... ] in the beginning or end determine the dimension of array. so, [ ... ] is 1 dimensional, while [ [ ... ] ] is 2 diemensional and so on, as you will see below.

1 Dimensional array is a an array which has only 1 index to find out any element. ex: arr_1D = [ 1 2 3 4 5 ] => This is a 1D array with 5 elements. arr_1D[0]=1, arr_1D[1]=2, ...

2 Dimensional array is an array which has 2 indices to find out any element. So, we have 2 square brackets here.

ex: arr_2D = [ [1 2 3 ] [7 8 9] [4 5 6] [2 4 6] ] => Here we see that outer array has 4 elements (similar to 1D array), but now each element of this outer array is itself an array. so, if we try to print each element of this outer array, it will print the array element. ex: arr_2D[0] = [1 2 3], arr_2D[1] = [7 8 9], and so on. Now if we want to print element of each internal array too (i.e the final value stored in array), we have to provide that index too, i.e arr_2D[1][2] = 9 => here arr_2D[1] points to array [7 8 9], and then for this we can report any index. So, if var=arr_2D[1] = [7 8 9], then var[0]=7, var[1]=8. var[2]=9. But here var happens to be arr_2D[1], so arr_2D[1][2] gives 2nd internal array and 3rd entry in this array. So, full array range is arr_2D[0:3][0:2].

Sometimes writing 2D array in other way is more visual. Writing above array in row/col format, we now see that there are 4 rows and 3 cols. So, it's 4X3 matrix array, i.e outermost has 4 elements and each of that contains 3 elements.

[ [ 1 2 3 ]

[7 8 9]

[4 5 6]

[2 4 6] ]

ex: arr_2D = [ [1] [2] [3] ] => Each element is 1D array. So, it's a 3X1 matrix, i.e outermost has 3 elements and each of that contains 1 element. So, shape is 3X1, and dimension is 2.

[ [1]

[2]

[3] ]

ex: arr_2D = [ [1 2 3] ] =>Each element is 1D array with 1X3 matrix. So, shape is 1X3, and dimension is 2.

3 Dimensional array is an array which has 2 indices to find out any element.

ex: arr_3D = [ [ [1 2 ] [3 4] ] [ [5 6] [7 8] ] [ [9 0] [1 4] ] ]. Here outer array has 3 elements, all 3 of which are 2D array. The 2D array is 2X2. So, full array range is arr_3D[0:2][0:1][0:1]. so, it's a 3X2X2 matrix, i.e outermost has 3 elements and each of that contains 2 elements and each of these 2 elements finally contains 2 elements. So start with innermost entries, that determines the final dimension of matrix. Then move outward.

[ [ [1 2 ] [3 4] ]

[ [5 6] [7 8] ]

[ [9 0] [1 4] ] ]

Usage of Numpy:

We saw array module in python section to create arrays. However, it's highly preferred to use numpy module to work on arrays, instead of using array module, that's included in python by default.

Import numpy module:

First we need to import numpy module in our python script in order to use it:

ex: import numpy => imports numpy. Now, we can call numpy functions as numpy.array, etc.

NumPy is usually imported under the np alias, so that we can use the short name np instead of longer NumPy

ex: import numpy as np

Creating numpy array:

After importing numpy module, we can use array( ) function in numpy module to create numpy array object. The class of this array object is ndarray (it will be seen as "numpy.ndarray" object in pgm). See in "python: Object Oriented" section on how classes are created.

array() function: Input to array function can be python objects of data type list, tuples, etc. See in Python section for list, tuples, defn. These list, tuples, etc are converted into numpy array object of class "ndarray" by the array() func. Array in Numpy is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. The type of array created is figured out automatically based on the type of input contents (i.e if list has int type, then array created has int type). If we have mixed contents, then type is undefined. We can also explicitly define a type for ndarray object, that we'll see later.

import numpy as np
arr = np.array( [1, 2, 3, 4, 5] ) #here input is a python list, with all integers. An ndarray object is created with all integer elements, i.e arr = [1 2 3 4 5]

arr = np.array((1, 2, 3, 4, 5)) # here tuple is provided as an input to array() function.

Print: can be used to print elements of array
print(arr) => prints array elements [1 2 3 4 5]. "arr" is ndarray object. It has no commas when it's printed. We don't know what form is ndarray object stored internally, but "print" func prints it in this form. This is 1D array. arr[0] = 1, arr[4]=5, and so on

NOTE: In above ex, the input list, tuple etc, has elements which are separated by a comma (as per the syntax of list, tuple, etc), and they get printed the same way with commas. However, the output of array() func is ndarray object, which is printed with no commas. i.e arr = [1 2 3 4 5]. However, [1 2 3 4 5] is not ndarray object (it's just the printed o/p), arr is the ndarray object. If we try to apply any numpy func on [1 .. 5], we'll get an error: i.e. arr=np.array([1 2 3 4 5]) gives syntax error.

We can't create numpy array by just assigning a python "list" to a var.

arr= [ [1,2], [3,4],[5,6] ] => This assigns the "list" to var "arr". Since it's not numpy array (since we didn't use numpy.array() function on this), we wouldn't expect any numpy function/method to work on this list. However, surprisingly it does work for a lot of functions i.e np.squeeze(arr) will work, even though arr is a list (and NOT ndarray object). Not sure why? Maybe, most numpy func automatically convert input arg which is list or tuples into ndarray object, if it's NOT ndarray to start with. Best thing to do is to convert list/tuple into numpy "ndarray" object using np.array() func, and then work on it. Later, we'll see many other functions to create numpy array (besides the array() func)

Data types (dtype): data types in NumPy are same as those in Python, just a few more. They are rep as int32, float64, etc or we can specify it in short form as single char followed by number of bytes, i.e int8 is rep by "i1", "f4" for 32 bit float, "b" for boolean, "S2" for string with length 2, etc. Instead of S type, we use U (unicode) type string in Python 3. See details for unicode in regular python section.

W don't have a separate type for each element of ndarray object, as ndarray can have elements of only one type. As we saw above, numpy array object inherits the "type" from type of list/tuple. This type becomes the data type of whole array. It's referred to as attribute "dtype" of the array object.

print(arr.dtype) => property "dtype" prints data type of an array. Here it prints "int64", since data is integer rep with 64 bytes

When declaring array using array() func, we may specify dtype explicitly. Then those array contents are converted to that data type and stored (if possible)

ex: arr=np.array(['2', '72', 'a'], dtype='int64') => Here it errors out since 3rd entry 'a' can't be converted to int type. '2' and '72' are OK to be converted even though they are strings. If a' is replaced by 823, then arr would be [2 72 823] , i.e array with int64 elements and NOT string.

arr=np.array(['23', 'cde', 71],dtype='S2') => Here we are creating an array of 3 elements with dtype as string of 2 byte. So, numpy converts 71 (which is without quotes, and so an int) to a string too. However, 'cde' needs 3 bytes, but since we are forcing it to 2 bytes, 'e' is dropped and only 'cd' is stored

print(arr.dtype, arr) => It returns => |S2 [b'23' b'cd' b'71'] => S2 means it's dtype is string with 2 Byte length. b'23 means string "23" is stored as bytes. Here array got printed with these b', which we don't want. To print only the string, we can convert these to utf-8 by using decode method, i.e arr[1].decode("utf-8")) returns "cd" unicode string

ex: arr=np.array(['2', '32', 7],dtype='i4'); print(arr.dtype, arr) => returns => int32 [ 2 32 7] as dtype is int32 and array elements are converted to 4 byte integer, so string '2' and '32' become integer 2 and 32.

arary with multiple data types: ex: np.array( ['as', 2, "me", 4.457] ) => here all 4 elements of array are of diff data types. By default, dtype here is U=Unicode. This is valid. Since 4.457 has length=5, so it's type is Unicode with length=5 or U5. So, all elements of this array are U5 irrespective of whether it's string or int. Basically all array elements got converted to unicode (or string in loose sense). Just that operations like arr[2] + arr[3] may not be valid, since not all operations apply on unicode type.

shape: A tuple of integers giving the size of the array along each dimension is known as shape of the array, i.e the shape of an array is the number of elements in each dimension.

print(arr.shape) => returns a tuple with each index having the number of corresponding elements. Here it returns (2,3) meaning array is 2 dimensional, and each dimension has 3 elements, so it's 2X3 array.

Since shape is a tuple, we can access each element of this tuple cia index, i.e shape[0] returns 2 (num of arrays), while shape[1] returns 3 (elements in each array)

Dimension (ndim): This shows dimension of an array as 1D, 2D and so on. In Numpy, number of dimensions of the array is also called rank of the array (i.e 2D array has rank of 2).

print(arr.ndim) => "ndim" attr returns the number of dimension of an array. since arr has 1 dimension, this returns 1

0D array => a_0D = np.array(2) => This is an array with just element of the array, i.e there is only 1 value. so, it's not really an array, but a scalar. It shows ndim=0. I shows blank for shape, i,e a_0D.shape = ( )

1D array => b_1D = np.array( [2] ) => By adding square brackets, we convert 0D array into 1D array. It shows ndim=1, and b_1D.shape = (1, ). We would have expected it to show (1,1) since there's 1 row and 1 col, but for 1D array, number of rows is 0 (since if there were any rows, it would become 2D array. 1D array just has columns. So shape tuple omits rows, and only shows cols for 1D array. This is called a rank 1 array, and because it's neither row vector nor a col vector (explained below), it's difficult to work with. So, avoid having these 1D arrays, as they won't yield desired results in AI computations. We usually use reshape function (explained later) to transform it to a 2D array as row vector.

ex: b_1D = np.array( [2, 3, 5] ) => This shows shape as (3, ) since this has 3 columns.

2D array => c_2D = np.array( [[1, 2, 3], [4, 5, 6]] ) => This is 2D array with 1st row [1 2 3] and 2nd row [4 5 6]. c_2D.ndim=2, c_2D.shape = (2,3) since there are 2 rows and 3 columns. NOTE: there are comma in between elements and in between arrays.

print( c_2D[0]) => prints 1st element of array c_2D which is "[1 2 3]", c_2D[1]=[4 5 6], c_2D[1,2] = 5

arr_2D = np.array( [[1, 2, 3]] )=> This is 2D array which has only 1st row which is a 1D array with 3 elements. So, arr_2D.ndim=2, arr_2D.shape=(1,3). NOTE: this 2D array has 1 row and 3 columns, unlike 1D array which had no rows and just 3 columns.

row vector: These are 2D array of shape (1,n) i.e they have a single row. ex: [ [ 1 2 3 ] ]

column vector: These are 2D array of shape (m,1) i.e they have a single col. ex: [ [1] [2] [3] ]

3D array => d_3D = np.array( [[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]] ) => 3D array can be seen as each row itself being 2D array. d_3D.ndim=3, d_3D.shape = (2,2,3) since it has 2 outermost entries, then each of these 2 entries has 2 array, and each of these 2 have 3 elements each.

Axis of Numpy array: In numpy, number of dimension of array is called as number of axis of an array, i.e 3D is called an array with 3 axis. 1st axis or axis=0 is the outermost array. Then 2nd axis or axis=1 is the next inner array and so on.

For N dim matrix as (N1, N2, ... Nn) => There are N1 data-points of shape (N2, N3 .. Nn) along axis-0. Applying a function across axis-0 means you are performing computation between these N1 data-points.. Each data-point along axis-0 will have N2 data-points of shape (N3, N4 .. Nn). These N2 data-points would be considered along axis-1. Applying a function across axis-1 means you are performing computation between these N2 data-points. N3 data points would be considered along axis-2. Similarly, it goes on. The dimension of the array is reduced as well, since 1 or more axis are gone.

As an ex: For a 2D array, Let's try computing across the 2 axis. ex: data = numpy.array([[1, 2, 3], [4, 5, 6]]);

1. axis=0: adding across 1st axis or axis=0 means adding across all rows, i.e adding all col (vertically down) for each row.

ex: result = data.sum(axis=0); print(result) => prints [1+4 , 2+5, 3+6] = [5 7 9] => This is a 1D array now instead of 2D array.

2. axis=1: adding across 2nd axis or axis=1 means adding across all cols, i.e adding all row (horizontally across) for each col.

ex: result = data.sum(axis=1) => prints [ [1+2+3] [4+5+6] ] = [6 15] => this is again a 1D array

More ways to generating a new array: there are many functions in numpy to generate a new array with any given shape, and inititalize it with values.

1. arange: arange function returns an ndarray object containing evenly spaced values within a given range (i.e arange= array range). The array is 1D and it's size is the range of numbers that will fit in that array.

Syntax: numpy.arange(start, stop, step, dtype) => "stop" is required (final element value is n-1), all others are optional. By default start=0, step=1 and dtype is same type as stop, so if stop is float, then type is float too.

x = np.arange(5) => returns [0 1 2 3 4]. Here range is defined as 0 to 4 with step of 1. data type=integer since 5 is integer.

x = np.arange(10,20,2) => returns [10 12 14 16 18]. It's 1D array with 5 elements in it.

2. linspace: Similar to arange. It returns ndarray object with evenly spaced numbers over a specified interval.

Syntax: numpy.linspace(start, stop, num) => start, stop are required. num=50 by default

np.linspace(2.0, 3.0, num=5) => returns array([2.  , 2.25, 2.5 , 2.75, 3.  ]) => Here 5 samples are included b/w 2 and 3 with equally spaced values.

3. zeros/ones: These are 2 other functions that init an array with zeros or ones.

Syntax: numpy.zeros(shape, dtype) => Returns an array containing all 0 with given shape. dtype is optional and is float by default.

x = np.zeros(2) => returns [0. 0.]. 2 implies 1 dim array with shape of (2,)

y = np.ones((3,2), dtype=int) => returns a 2 dim array of shape (3,2), with type of 1 as integer, so it's 1 and NOT 1.0 or 1. (i.e NOT decimal 1, but integer 1)

[[ 1  1]
 [ 1  1]
 [ 1  1]]

4. random: There is a random module in NumPy to generate random data. It has lots of methods which are very useful in AI and ML for generating random dataset.

from numpy import random => this is not really needed generally, but here we need it since numpy has it's own random module (while python has it's inbuilt random module), and we want to use numpy's random module. When we import numpy, we import all it's modules and methods, including random module. So, "from numpy import random" is not needed. But then we have to use np.random everywhere, to indicate that we are using numpy random module. If we just call "random", then we'll be calling python's inbuilt random module. So, we add this line "from numpy import random" to use numpy random directly. since using random is more convenient (instead of np.random).

Seed: All random numbers generated for a given seed. Seed provides i/p to pseudo random num generator to generate random numbers corresponding to that seed. Different seeds cause numpy to genrate diff set of random numbers.

np.random.seed(1) => this will generate pseudo random numbers for all random functions using seed=1. We could use any integer number as seed. We don't really need to provide this seed at all, since by default, numpy chooses a random seed and generates random num corresponding to that seed. But then our seq of random numbers generated will be diff for each run of pgm, which will be difficult to debug or reproduce. so, we usually assign a seed, when coding our program the 1st time. Once we have debugged the pgm with couple of seeds, we can get rid of this seed function.

randint():

ex: x = random.randint(100) => randint method says to generate integer random number, and arg=100 says the range is from 0 to 100-1 (i.e 0 to 99). Note, we could have written np.random.randint(100) too, but we don't need that since "from numpy import random" imports random into current workspace.

To generate 1D or 2D random numbers, we can specify size.

ex: random.randint(50, size=(3,5)) => generates a 2D array of size=3X5, with each element being a random int from 0 to 49

rand():

ex: random.rand(3) => just "rand()" method returns random float b/w 0 to 1. Number inside it reps the size of array, i.e 3 means it's a 1D array of size 3. i.e random.rand(size=(3)), however we don't write it that way (size=3) with rand method, we directly specify the size, as rand method is different than randint

ex: x = random.rand(3, 5) => returns 2D array with matrix=3X5.

[[0.14252791 0.44691071 0.59274288 0.73873487 0.22082345]

[0.00484242 0.36294206 0.88507594 0.56948479 0.15075563]

[0.69195833 0.75111379 0.92780785 0.57986471 0.6203633 ]]

randn(): returns samples from standard normal distribution. Std normal dist is gaussian distribution with mean mu, and spread sigma. So here instead of having equal probability for different numbers, it has probability distribution that is higher for numbers closer to mean, and the probability keeps on falling down as you get away from mean. 99% of the values lie within 3 sigma of mean. We provide shape of array as i/p.

ex: x=random.randn(3,4,5) => returns 3D array of shape=(3,4,5)with random float in it which have mean=0, sigma=1.

To get values corresponding to other mean and sigma, just multiply the terms appr:

ex; Two-by-four array of samples from N(3, 6.25): Here mean=3, sigma=√6.25 = 2.5

3 + 2.5 * np.random.randn(2, 4) => 67% of numbers will be b/w 3-2.5 to 3+2.5, i.e in b/w 0.5 to 5.5
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random

ex: np.random.randn() => returns single random float "2.192...". Here only float is returned since no array shape specified.

Operations on Array:

array slicing => array[start: end-1: step] => If omitted, start=0, step=1, end=last index of array. For nD array, we can slice each index of the array.

IMP: When we provide the last index of array, it's last_index-1. So, arr[2:4] will have arr[2], arr[3], but not arr[4], as range is from 2:(4-1). This is the same behaviour that we saw with lists/tuples/arrays in python. One other thing to note is that complex slicing is allowed on numpy multi dimensional arrays which were not possible on lists/tuples/arrays in python. This is where numpy turns out to be much more powerful in terms of operations being done on arrays. Also, in numpy, we access elements of array via arr[2,3,0], while the same element accessed in a list/tuple/array via arr[2][3][0] (i.e commas are needed in numpy. However, python list syntax works for numpy too, i.e arr[2][3][0] is equally valid in numpy, but we don't access numpy arrays that way)

np_arr = [[300, 200,100,700,212], [600, 500, 400,900,516], [21, 23,45,67,45]]

ex: np_arr[1] = returns entry of index=0, which is itself an array, so returns all of that array => [300 200 100 700 212]

ex: np_arr[1,2] => returns 400 (since it's index=1 for outer array and index=2 for inner array. So, for multi dim array, we specify indices separated by commas. np_arr[1][2] also works, though as explained above, that's not the right way.

ex: np_arr[0:2:1] = Here we provided the outermost array index (since there are no commas for inner indices). The range is from 0 to 1 with increment of 1. So, this returns as below:

[[300 200 100 700 212]
[600 500 400 900 516]]

ex: arr_3D[1,0,1:2] = [ [ [8 9] ] ] => since it's 3D array, it reports the final slice of the array as 3D array. Here we take index=1 for axis=0 which is [[7 8 9] 010 11 12]], then we take index=0 for axis=1, which is [7 8 9] , then we take slice 1:2 of this final one which [8 9]

x=np_arr[0:2,3:1:-1] => Here we provide index range for both dimension of array. axis=0 goes from 0 to 1 (since range 0:2 implies 0 and 1), while axis=1 goes from index 3 to 2 in reverse direction (if we do 1:3:-1, this would return empty array, since 1:3 index can never be achieved by going in reverse dir. This is important to remember). NOTE: array entries are now reversed, i.e the array x gets assigned the values as [700 100] instead of [100 700] as in original array. Array "x" still remains a 2D array.

x =

[[700 100]
[900 400]]

ex: [1 2 3 4 5]; arr[: : 2] = [1 3 5] => prints every other element (since start and end are not specified, start=0 and end=length of array.

ex: arr = np.array([[[1, 2], [3, 4]], [[5, 6],[2,3]]])

ex: print(arr[:]) => prints all elements of array since no start/end specified. All 3D elements printed. Same as what would have been printed with print(arr)

ex: print(arr[0,:]) => This prints all elements of index=0 for axis=0. The same o/p is printed with arr[0][:] (i.e list/tuple format in python) prints [[1 2] [3 4]]. NOTE: it's 2D array now.

print(arr[:,0]) => prints [[1 2] [5 6]]. This says that for axis=0, slice everything since no range specified, so the whole array is returned. Then 0 says that for axis=1 return index=0. Array for axis=1 is [ [1 2] [3 4] ] and [ [5 6] [2 3] ]. index=0 is [1 2] from 1st one and [5 6] from 2nd one.

reshape: Reshaping means changing the shape of an array. reshape(m,n) changes an array into m arrays with n elements each (i.e turns the array into 2D array), provided it's possible to do that, else it returns an error. similarly reshape(p,q,r) changes an array into 3D array with p arrays that each contains q arrays, each with r elements. reshape(1)

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3) => this changes the above 1D array into 2D array with 4 arrays and each having 3 elements. So, newarr.ndim=2, newarr.shape=(4,3) since it has 4 arrays with 3 elements in each.

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

ex: n_arr = arr.reshape((1, arr.shape[0])) => Since arr is 1D array, shape=(12, ) i.e 12 followed by blank. To convert it into 2D array, we use reshape method as shown. since shape[0] returns 12, this becomes newarr=arr.reshape(1,12). This becomes 2D array with 1 row and 12 elements in each. So arr=[1 2 .. 12] while n_arr = [ [ 1 2 ... 12] ]. NOTE: 2 square brackets in n_arr, as compared to single brackets in arr (since it's a 2D array now). n_arr.ndim=2, n_arr.shape=(1,12)

assert (a.shape == (1,12)) => This asserts or checks for the condition that shape of array a is (1,12). This is helpful to figure out bugs in code, since if the shape is not as expected, this will throw an error.

newarr.reshape(12) => When only 1 integer provided, then result is 1D array of that length. So, this returns [1 2 ... 12]. We can also provide -1 as the length of array to get same result.

newarr.reshape(-1) => flattens the array, i.e converts any array into 1D array. So, this returns [1 2 ... 12]. However, if we provide other integers for new shape along with -1 as last integer, then array is converted into required shape, with other values inferred.

newarr.reshape(d_3D.shape[0],-1) => Here 1st value is 2 (from above example). So tuple is (2,-1) meaning it's 2X6 (since 6 is inferred automatically. -1 implies flatten other dimension, so 6 is the only other value). result is:

[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]

Other way to flatten an array is by using func ravel() or method ravel. It's same as reshape(-1).

ex: np.ravel(newarr) => converts newarr array into flattened 1D array. We could also apply method ravel on newarr as newarr.ravel()

We usually want a 2D array, with one row, instead of 1D array with 1 row. It's easier to work with 2D array. NOTE: They are kind of same except that there are 2 square brackets in 2D array, while only 1 square bracket in 1D.

new_arr = arr.shape(arr.shape[1]*arr.shape[2]*arr.shape[3], arr.shape[0]) => Here we convert an array of shape (m,n,p,q) into array of shape (n*p*q, m) i.e we convert 4D array into 2D array with outer m array not flattened, but everything inside it flattened.

squeeze: this func removes one-dimensional entry from the shape of the given array. This is used in opposite scenarios where 2D array is converted to 1D array. Axis to be squeezed should be of length=1. By default, axis0 is squeezed. We can specify axis to be squeezed

ex: y=np.squeeze(x) => x is 3D array with shape (1,3,3) while y now becomes 2D array with (3,3), i.e axis0 is squeezed

X = 
[[[0 1 2]
 [3 4 5]
 [6 7 8]]]

Y = 
[[0 1 2]
 [3 4 5]
 [6 7 8]]

The shapes of X and Y array:
(1, 3, 3) (3, 3)

r_: This is a simple way to build up arrays quickly. Translates slice objects to concatenation along the first axis. dd
ex: np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])] => returns array([1, 2, 3, 0, 0, 4, 5, 6]) => This ex concatenates 1D array then 0, 0, then another 1D array. It concatenates along axis=0, still returns 1D array.
ex: np.r_['1,2,0', [1,2,3], [4,5,6]] => the numbers within '...' before the array specifies how to concatenate. Here number 1 specifies concat along axis=1 (2nd axis)
array([[1, 4],
       [2, 5],
       [3, 6]])

c_: Translates slice objects to concatenation along the second axis.

np.c_[np.array([[1,2,3]]), 0, 0, np.array([[4,5,6]])] => returns array([[1, 2, 3, 0, 0, 4, 5, 6]]) => This ex concatenates 2D array along axis=1 (2nd axis).



Matrix Operations:

matrix transpose: This is one of the useful functions to find transpose of a matrix. Transpose of 2D array is easy to see, it's rows and columns are swapped, so rows become columns and columns becomes rows (i.e 3X4 matrix becomes 4X3 matrix, w/o any change to any of the contents). You can transpose any n Dim matrix too, and specify how to transpose it. By default for n Dim marux, the order is rveresed, i.e 2X3X4 matrix becomes 4X3X2 matrix.

ex: np.transpose(newarr) => changes newarr from 4X3 to 3X4 array

Instead of using function, we can also use method to transpose.

ex: y=newarr.T => Here we are applying "T" (T is the name for transpose) method to newarr object. Result is same as transpose function above.

matrix dot operation: To find dot product of 2 matrix, we use dot function. NOTE: dot operation is different than multiplication operation. Mult just multiplies each element of 1 array with that of other array, while dot operation is the mult/add of differnt elements of array. You can find more details of dot operation on matrix in "high school maths" section. For 2-D vectors, it is the equivalent to matrix multiplication. For 1-D arrays, it is the inner product of the vectors. For N-dimensional arrays, it is a sum product over the last axis of a and the second-last axis of b. The dimensions of two matrix being dot has to compatible for matrix dot operation, else we'll get an error. Instead of using dot function, we can write a for loop and iterate over each element of 2 array and sum them appr. However, this for loop takes a long time to run, as it can't use parallel instructions such as SIMD (single inst multiple data). Dot function in python uses these SIMD inst or GPU (if available), which significantly speeds up the multiplication/addition part. Using dot operation is called vectorization, and in AI related courses, we'll always hear this term, where we'll always be asked to vectorize our code (meaning put it an array form and then use dot functions to do multiplication)

a = np.array([[1,2],[3,4]]) => 2D array of 2x2
b = np.array([[11,12],[13,14]]) => 2D array of 2x2
np.dot(a,b)
This produces below 2D array of 2x2 which is calculated as follows => [[1*11+2*13, 1*12+2*14],[3*11+4*13, 3*12+4*14]]

[[37  40] 
 [85  92]]

matrix add/sub/mult/div operations: All other matrix operations as add, divide, multiply, abs, log, etc can be done by using specific matrix functions similar to matrix mult shown above, instead of using for loop.

ex: c=np.add(a,b) => adds 2 matrix a and b. Each element of matrix a is added to corresponding element of matrx b. Similarly for np.subtract(a,b)

ex: c=np.divide(a,b) => divides 2 matrix a and b. Each element of matrix a is divided by corresponding element of matrx b. Similarly for np.multiply(a,b)

Other misc operations: Many other operations defined working on single array.

log: log: ex: c=np.log(a) => computing log of each element of array "a"

abs: ex: c=np.abs(a) => computing absolute value of each element of array "a"

sum: There is other operator "sum" (NOT add) which adds the each row or column of an array to return 1D array.

ex: A = [ [300, 200,100], [600, 500, 400] ]
C=np.sum(A,axis=0) => adds each col (since axis=0) and returns 1D array with shape=(3,). result=[900 700 500]

C=np.sum(A,axis=1) => adds each row (since axis=1) and returns 1D array with shape=(2,). result=[600 1500]

C=np.sum(A) => adds all rows and cols (since no axis specified, it adds across all axis) and returns a scalar 2100.

Broadcasting: Array broadcasting is a concept in Python, where we can perform matrix operations, even when the matrix are not entirely compatible. Python expands the required rows or columns by duplicating them. Certain rules apply as follows:

Rule 1. matrix of dim=mXn operated with matrix of dim=1Xn (1 row only) or with matrix of dim=mX1 (1 col only) => operations are +, -, *, /. The matrix 1Xn or mX1 are converted into matrix mXn by duplicating rows or col, and then operation is performed.

ex: A = [ [200, 100] , [300, 400] ] , B = [ [1, 2] ] => Here A is 2X2 matrix, while B is 1X2 matrix.

C= np.sum(A,B) => Here, B is broadcast to 2X2 matrix, by duplicating 1st row. so, result is C = [[201 102] [301 402]]

Rule 2: matrix of dim 1Xn or of dim mX1 => We can do operations of +, -, *, / on these matrix with a real number. The real number will be converted into 1Xn or mX1 matrix and then operation performed.

ex: B = [ [1, 2, 3] ] => This is 1X3 matrix. If we add real number 2 to this matrix, then it's converted to [ [ 2, 2, 2 ] ] and then addition performed.

C=np.add(B,2) => [ [1, 2, 3] ] + 2 = [ [3 4 5] ]

Other operations on array: iteration over elements of array, join, split, search, sort, etc are miscellaneous functions provided to work on arrays.

Nav view search

Navigation

Search

Python module - numpy