Python module - Matplotlib

Matplotlib: one of the most popular Python packages used for data visualization. It is a cross-platform library for making 2D plots from data in arrays. Matplotlib is written in Python and makes use of NumPy, Matplotlib along with NumPy can be considered as the open source equivalent of MATLAB. MATLAB, is a proprietary programming language developed by MathWorks used for plotting as well as carrying out various kinds of scientific computations. Infact, you don't need to use MatLab or any other open source equivalent of Matlab (not even gnuplot) at all, once you start using Matplotlib. Matplotlib can create static, animated, and interactive visualizations. 

Official matplotlib website: https://matplotlib.org/

Very good tutorial here: https://www.tutorialspoint.com/matplotlib/index.htm

Source code for matplotlib here: https://matplotlib.org/_modules/matplotlib.html

As you can see, there are lotsof functions defined, that can be called from this library.

Installation: on pthon3.6 version. NOTE: All our examples are with python3 (version 3.6).

1. CentOS:

$ python3 -m pip install matplotlib => this installs it for specific version of python (here for python3, which is pointing to python3.6). We can verify that it got installed for python 3.6 by looking in dir for python 3.6 library.

$ ls /usr/local/lib64/python3.6/site-packages/matplotlib/* => We see this "matplotlib" dir in python3.6 lib dir, so it's installed correctly

Usage:

Pyplot: A very important module of matplotlib is pyplot.

matplotlib.pyplot is a collection of command style functions that make Matplotlib work like MATLAB. Matplotlib is the whole package; matplotlib.pyplot is a module in Matplotlib.

Here's the full source code for pyplot: https://matplotlib.org/_modules/matplotlib/pyplot.html

Here's introductory pyplot tutorial: https://matplotlib.org/3.1.0/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py

import pyplot: Before we can use pyplot, we need to import pyplot in python script:

import matplotlib.pyplot as plt => This imports pyplot module as plt (plt is just a conevention, we can use any name)

from matplotlib import pyplot as plt => This is an alternative way of importing pyplot

Pyplot functions: Each Pyplot function makes some change to a figure (i.e xlabel, title, plot, etc).These are few common ones:

pyplot.imread() => Reads an image from a file into an array. Similarly pyplot.Imsave() saves  an array as in image file, while pyplot.Imshow() displays an image on the axes.

pyplot.show() => displays a figure by invoking plot viewer window, this is used in interactive mode if you want to see the plot. Most of the times, you will just want to save the plot. pyplot.Savefig saves the current figure, while pyplot.close() closes a figure window

pyplot.bar() => makes a bar plot, similarly options for histogram (pyplot.hist), scatter plot (pyplot.scatter), line plot (pyplot.plot), etc. These create the plot in background, but do not display it. We use pyplot.show() to display the plot.

A. scatter plot (plt.scatter): scatter plot is used widely in stats. The syntax for scatter plot is:

matplotlib.pyplot.scatter(x, y) => A scatter plot of y vs x with varying marker size and/or color that can be specified with optional args. X, Y are array like. Fundamentally, scatter works with 1-D arrays; x, y may be input as 2-D arrays, but within scatter they will be flattened. shape of X and Y are (n,) where n is the number of elements in the 1D array. Important Optional args are size and color.

s=area (or size). Since area is written as R^2, so we always write area in form R**2, where R=radii in point for that marker. This can be provided as a scalar number or as an array of shape (n,) (i.e 1D array of n elements) in which case size array is mapped to each element of X,Y array.

c=color. color is little complicated. We can specify single color format string or sequence of colors. Seq of colors is specified as 1D array of shape (n,) where elements are floating numbers b/w 0 to 1 or int b/w 0 to inf, or any decimal numbers. (Not sure how these numbers map to colors). We can have optional cmap option(cmap=my_map), where we specify a colormap instance or a registered colormap name. cmap is used only if c is an array of n floating numbers (again floating numbers can be in any range).

ex: Below ex draws X-Y plot with random size and random color of each point which is itself placed randomly.

N = 50
x = np.random.rand(N) #This returns 1D array with 50 random entries in it (all b/w 0 and 1). returns x = [ 0.46875992 .... 0.498185 ]
y = np.random.rand(N) #similar 1D random array y
colors = np.random.rand(N) #similar 1D array for color. Each of 50 entry corresponds to one (X,Y) point. Each colors value is random floating num from 0 to 1.
area
= (30 * np.random.rand(N))**2 # 0 to 15 point radii. This is again a 1D array wit 50 random entries one for each (x,y) point.
plt.scatter(x, y, s=area, c=colors, alpha=0.5) #c is color, s is area in points**2. alpha is the "alpha blending value", between 0 (transparent) and 1 (opaque). alpha=0.5 is in between transparent and opaque.
plt.show() #this shows the plot for display

ex:Below ex draws same kind of random plot as above, but here size is fixed at 40 and we use a colormap called Spectral. Spectral colormap has a mapping from number to color, so it maps accordingly.

color=np.random.randint(5,size=(400,))

plt.scatter(X[0, :], Y[0, :], c=color, s=40, cmap=plt.cm.Spectral); => This plots elements of 2D array where X and Y are 2D arrays. Here [0,:] array is sliced along index=0, so it plots axis=1 for both arrays. c is color, s is area of 40 and cmap is a registered colormap name. Here color is provided as 1D array of 400 values, with each value being a random num from 0 to 4. As per Spectral cmap, 0 maps to red, 1 maps to blue and so on. So, each dot on scatter plot gets a random color from the set of 5 colors.  However, again I see here that any real number in any range can be provided for color and it still works, so again not sure how it maps.

B. Line plot (plt.plot): We draw a line plot (most common kind of plots) using plt.plot() func.

Ex: Below is a simple plot of sine wave, where x is varied from o to 2*pi, while y is a sine wave constructed from these x values.

#!/usr/bin/python3.6
from
matplotlib import pyplot as plt

import numpy as np import math #needed for definition of pi

x = np.arange(0, math.pi*2, 0.05) #numpy lib has arange function that provides ndarray objects of angles between 0 and 2π with step=0.05. If we increase this step to 1, we get a very non smooth plot.

y = np.sin(x) plt.plot(x,y) #plots line plot with x and y

plt.xlabel("angle")

plt.ylabel("sine")

plt.title('sine wave')

plt.show() #displays the plot

Ex: plot an array with no x values provided => uses index of array as X values
ex: a = [1, 2, 3, 4, 5]
plt.plot(a) #here we provide only Y axis values to plot. Since it's an array, the array index are plotted on X axis.
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(b, "or") #Here we plot graph "b" but not as a line, option "or" means o=circle, and r=red. So, it says to plot it as scatterplot, with X axis being the index, and dots being circle with red color.


C. Contour plot (plt.contour): These are one of the ways to show 3D surface on 2D planes. More details here: https://www.tutorialspoint.com/matplotlib/matplotlib_contour_plot.htm
meshgrid is needed to create gris of X and Y array values on which the plot of Z is drawn. plt.contourf() fils the contour lines with color, while plt.contour() just shows the contour lines

ex: draws meshgrid with X and Y both in range of -3 to +3. Z is the function whose contour plot we draw.

import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
plt.contourf(X, Y, Z)
plt.show()


Figures: all plots that we saw above used a figure to draw the plot on. We never had to call any function separately to create a figure. However, we can do that too.

1. Figure (plt.figure):   matplotlib.figure module in matplotlib contains the Figure class. It is a top-level container for all plot elements. The Figure object is instantiated by calling the figure() function from the pyplot module (an object inst is returned back on calling the class method init, which is automatically called on calling class_name. See OO section for details)

fig=plt.figure(figsize=(3,4)) => This returns figure instance by calling figure class. Here we specified figure (width,height) in inches. If we don't specify anything, i.e. plt.figure() then default size figure is opened.

2. Axes: Axes object is the region of the image with the data space. The Axes contains two (or three in the case of 3D) Axis objects. Axes object is added to figure by calling the add_axes() method. It returns the axes object and adds an axes at position rect [left, bottom, width, height] where all quantities are in fractions of figure width and height.

ax=fig.add_axes([0.05,0.06,0.9, 0.9]) => This adds axes on the figure starting from left which is 0.05*figure_width, bottom which is 0.06*figure_height, and then height and width are same as height and width of figure. Usually you'll see [0,0,1,1] as axes position, but that doesn't show the axes as the axes are right on the edge of figure. So, we leave some margin on sides.

Now on the axis object, we can draw plots, put labels, legends, etc

l1 = ax.plot(xlist,ylist,'ys-') # solid line with yellow colour and square marker
l2 = ax.plot(x2list,ylist,'go--') # dash line with green colour and circle marker
ax.legend(labels = ('tv', 'Smartphone'), loc = 'lower right') # legend placed at lower right
ax.set_title("Advertisement effect on sales") #title on top of axes plot
ax.set_xlabel('medium') => label on x axis
ax.set_ylabel('sales') => label on y axis
plt.show() => this finally shows the fig with the plot on it

3. subplots: Apart from plot function, we also have subplots func, that is used to create a figure and a set of subplots.The subplots() method returns both the fig and the axes. subplots() is used when we want to create more than 1 plot on the same figure. plt.subplots(nrows, ncols) draws subplot grid with nrows and ncols.

ex: fig, ax = plt.subplots() => Now we can use fig and axis the regular way (i.e ax.set_xlabel, etc). i.e ax.plot(x,y)

ex: Below ex creates 2 rows and 2 cols with 4 plots total. Only plotiing 2 plots, the other 2 plots remain empty.

fig, axs = plt.subplots(2, 2)

axs[0, 0].plot(x, y)

axs[1, 1].scatter(x, y)

plt.show()

We can also use plt.title, plt.xlabel, plt.ylabel etc, and bypass ax.set_title, etc. Not sure, what the diff is. FIXME?