Heatmaps in Python
Introduction
Data visualization has given a significant benefit for visualizing a large set of data. Heatmap is one such data visualization method that comes under the Seaborn Python package. Heatmaps are the graphical representation of values depicted using various shades. The color shades remain the same for each value when plotted.
Seaborn for Data Visualization
Seaborn is a popular data visualization library, which is based on Matplotlib. It renders high-end graphical figures and organized methods for presenting engaging statistical graphics. Since Seaborn is built on top of the Matplotlib library, there is a possibility of further tweaking the graphics through Matplotlib methods for enhanced graphics.
Heatmaps and their use
Heatmaps are the 2D graphical representation of different values residing in a matrix form. The seaborn Python package allows data analysts to create annotated heatmaps. When there is an increase in the value or data that shows higher activities, brighter colors like reddish or blueish shades get preferred. To use heatmap for visualization, import Seaborn library and then use the seaborn.heatmap() function.We use heatmaps when we want to describe the weight, variance, strength & concentration of data, visualize patterns, the intensity of action, and anomalies.
The syntax is:
heatmap(<data-value>, *, vmin = None, vmax = None, cmap = None, center = None, annot_kws = None, linewidths = 0, linecolor=’<colorcode>’, cbar=<True/False>)
Here parameters
· data-value: It is a 2-dimensional dataset coerced into a ndarray.
· vmin, vmax: These are values anchored to the colormap as arguments.
· cmap: It maps data from data values to color space.
· center: It holds the value that is center the colormap while plotting divergent data.
· annot: This is set to True and is used for writing the data value in each cell.
· fmt: This is a string formatting code useful for adding annotations.
· linewidths: This represents the width of the lines dividing each cell of the heatmap.
· linecolor: This represents the color of the lines dividing each cell.
· cbar: If this value is True, it will draw a color-bar.
Program
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
# generate a 2D matrix of size 12 x 12 using random integer numbers
val = np.random.randint(low = 15, high = 150, size = (12, 12))
print(“Here is the data to be plotted in matrix form :\n”)
print(val)
# plotting the heatmap
heatm = sb.heatmap(data = val)
# using show method to plot the calculated heatmap
plt.show()
Output:
Customizing Heatmaps
i. Colors are the most critical and appealing part of a visualization chart. If you want to plot the heatmap with a single color shade, change the cmap value like this:
heatm = sb.heatmap(data = val, cmap = “Blues”)
heatm = sb.heatmap(data = val, cmap = “tab20”)
ii. Labeling: A data analyst can also customize the heatmap by tweaking the ticks on the x and y-axis. Bringing the ticks to the bottom and adding labeled names to the chart will make your chart look more like a presentation.
val = np.random.randint(low = 15, high =150, size=(12, 12))
# plotting the heatmap
xtick = [‘Jan’, ‘Feb’, ‘Mar’, ‘Apr’, ‘May’, ‘Jun’,
‘Jul’, ‘Aug’, ‘Sep’, ‘Oct’, ‘Nov’, ‘Dec’]
heatm = sb.heatmap(data = val,xticklabels=xtick,
yticklabels=False)
# using show method to plot the calculated heatmap
plt.show()
iii. Centering the Heatmap: It will center down the colormap when we need to plot divergent data. For this, pass the center attribute with the value center.
heatm = sb.heatmap(data=val,
cmap=”Blues”,
center=center)
iv. Customized lines: Data analysts can change the thickness and the line color that separates the cells as per requirement. For this, include value to the linewidths and linecolor parameters.
heatm = sb.heatmap(data=val,
cmap=cmap,
linewidths=2.5,
linecolor=”green”)
v. Disable color bars and remove labels:To disable the color bars, set cbar parameter to False.To remove labels, set the x-label and y-label values using xticklabels and yticklabels parameters to False.
heatm = sb.heatmap(data = val,
xticklabels = False,
yticklabels = False)
Correlation Matrix
It is a matrix-based table that will represent a correlation among the data. There can be a lot of redundancy in the correlation matrix. For this, you can use the masking feature. Luckily, we can use the masking concept with Seaborn’s heatmap. Also, we need the NumPy array() to build one.
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
val = np.array([[True, True, True, True, True, True, True, True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, False],
[True, True, True, True, True, True, True, True, True, False, False],
[True, True, True, True, True, True, True, True, False, False, False],
[True, True, True, True, True, True, True, False, False, False, False],
[True, True, True, True, True, True, False, False, False, False, False],
[True, True, True, True, True, False, False, False, False, False, False],
[True, True, True, True, False, False, False, False, False, False, False],
[True, True, True, False, False, False, False, False, False, False, False],
[True, True, False, False, False, False, False, False, False, False, False]])
print(“Here is the data to be plotted in matrix form :\n”)
print(val)
# plotting the heatmap
heatm = sb.heatmap(data = val)
# using show method to plot the calculated heatmap
plt.show()
Annotated Heatmaps
Annotated Heatmaps are another vital form of a heatmap that shows added information correlated with data values and cells of the heatmap. It represents values through rows of grids where we can compare multiple metrics.
import matplotlib.pyplot as plt
import seaborn as sb
sb.set()
# flights dataset is a predefined dataset
flights_val = sb.load_dataset(“flights”)
flights = flights_val.pivot(“month”, “year”, “passengers”)
# Annotated heatmap that shows numeric values on each data-cell
f, ax = plt.subplots(figsize=(9, 6))
sb.heatmap(flights, annot=True, cmap=”tab10", fmt=”d”, linewidths=.5, ax=ax)
Conclusion
Heatmaps help in better illustrating density-based visual analysis. Although, as an alternative, we can use scatter plots. But they tend to become hard to comprehend if we have much data. With the increase in data, scatter plot points start to overlap and that is where heatmaps become beneficial.
If you want such technical tutorials for your B2B or B2C business, contact me here. I can provide excellent technical and non-technical blogs or tutorials with infographics, animations, and SEO-based articles that can bring potential leads & audiences to your website.