kokobob.com

Understanding Matplotlib: A Beginner's Guide to Effective Data Visualization

Written on

Do you find Matplotlib perplexing? If you're new to it, your confusion may stem from not having familiarized yourself with its specific terms and functions. If you think that might be true, keep reading! This guide is brief and will help clarify things.

Matplotlib is the leading open-source library for creating plots in Python. It allows users to generate both quick, straightforward charts and intricate, detailed figures where every element can be customized. Given its long-standing popularity, you can find a wealth of resources and sample codes to assist you.

However, like many powerful tools, Matplotlib can be "syntactically tedious," as one author noted. While basic plots can be straightforward, complexity escalates quickly. Although platforms like the Matplotlib gallery offer useful code snippets, you may still find yourself puzzled if you need something slightly different from what's available.

Many users resort to copying and modifying existing code until they achieve a satisfactory result. As one user remarked, "Every time I use Matplotlib, it feels like the first time!"

Fortunately, you can significantly reduce this frustration by investing time in learning some fundamental aspects of the library. This article will focus on the terminology and plotting methods that often create confusion. With this knowledge, you might start to see Matplotlib as a valuable asset rather than a daunting task.

What's Causing the Confusion? Based on my experience with Matplotlib, here are three common sources of confusion:

  1. The unusual terminology used for plots.
  2. The existence of two plotting styles: the pyplot approach and the object-oriented style.
  3. Similar but differently named methods for manipulating plots in these two styles.

Let's examine each of these issues.

The Structure of a Plot To grasp Matplotlib better, it's crucial to understand the terminology associated with plotting. Let's break down a plot and its components.

In Matplotlib, plots reside within a Figure object, which serves as a blank canvas for all plot elements. This object not only provides the canvas for drawing but also manages aspects like plot size, aspect ratio, spacing between multiple plots, and the ability to save the plot as an image. The leftmost square in the accompanying diagram represents a Figure object.

The actual plots, or figures as we commonly refer to them, are represented by the Axes class, which is at the center of the diagram. This class encompasses most figure elements, including lines, polygons, markers (points), text, titles, and methods for interacting with them. It also establishes the coordinate system. A Figure can contain multiple Axes objects, but each Axes object belongs to only one Figure.

It's essential to distinguish the Axes object from the Axis element, which refers to the numerical values along the x- or y-axis of a plot, including tick marks, labels, and limits. All these elements are part of the Axes class.

Each component in the diagram exists within a hierarchical structure. The lowest level includes elements like each axis, tick marks, labels, and the curve (Line2D). At the highest level is the Figure object, which contains all lower-level components.

Since a Figure can hold multiple Axes objects, you can have several Axes pointing to a single Figure. A common scenario is subplots, where one Figure canvas contains two or more distinct plots.

The Pyplot and Object-Oriented Styles Matplotlib offers two main interfaces for plotting. The first, known as the pyplot approach, relies on the internal pyplot module to automatically create and manage Figure and Axes objects, which you then manipulate using pyplot methods. This method is primarily designed for single plots, minimizing the amount of code you need to write, akin to MATLAB's API, making it convenient for quick, interactive tasks.

Here's a simple example:

import matplotlib.pyplot as plt

import numpy as np

data = np.arange(5, 10)

plt.plot(data)

This plot required just one line of code, with the pyplot module making all decisions, including line type, color, thickness, axis ranges, and font styles. It even assumed corresponding x-values for each y-value, starting from 0 and increasing by 1.

In contrast, the object-oriented style requires you to explicitly create Figure and Axes objects and then call methods on those objects. This approach grants you greater control over customizing plots and managing multiple plots in larger programs. It also simplifies interactions with other libraries when you create an Axes object first.

import matplotlib.pyplot as plt

import numpy as np

data = np.arange(5, 10)

fig, ax = plt.subplots()

ax.plot(data)

Both approaches yield identical results. You can identify the object-oriented style by the line:

fig, ax = plt.subplots()

The plt.subplots() method generates a Figure instance and a set of subplots (a NumPy array of Axes objects). If you don’t specify the number of subplots, a single subplot is returned by default.

Since two objects are returned, you need to unpack the results into two variables, conventionally named fig and ax. In the pyplot approach, these entities are created automatically behind the scenes.

In the following sections, we will explore both styles. However, Matplotlib documentation recommends choosing one method for consistency, particularly for complex plots or scripts intended for reuse in larger projects.

It’s worth noting that one reason beginners find Matplotlib daunting is that they often encounter a mix of both approaches in existing code, especially on platforms like Stack Overflow. To mitigate confusion, I recommend familiarizing yourself with both approaches, enabling you to make an informed choice for your own work and understand legacy code or tutorials you may encounter.

Using the Pyplot Approach In the previous section, we created a plot using pyplot with a single line of code:

plt.plot(data)

Notably, we didn’t explicitly refer to Figure or Axes objects, as pyplot handled these behind the scenes. Additionally, we didn’t specify which elements to display, including ticks and values along the axes. Instead, Matplotlib made intelligent choices based on the data.

The plot() method generates line charts, scatter() produces scatter plots, bar() creates bar charts, hist() generates histograms, and pie() makes pie charts, among others. Examples of all these can be found in the Matplotlib plot types index.

The automatic nature of the pyplot plot creation methods is beneficial for quick data exploration, but the resulting plots can often be too basic for presentations or reports. For instance, the default settings for methods like plt.plot() assume you want the axis size to match the data range (e.g., x from 5 to 8 instead of 0 to 10).

Moreover, it assumes you do not want a legend, title, or axis labels, and that lines and markers should be blue. However, pyplot offers several methods to enhance charts with titles, axis labels, grid backgrounds, and more, which we will explore next.

Creating and Modifying Plots with Pyplot Methods Despite being considered simpler than the object-oriented style, pyplot can still produce intricate plots. Let's use some pyplot methods to create a more advanced plot.

A catenary is the curve formed by a chain suspended from both ends, commonly seen in nature and architecture, such as a square sail under wind or the Gateway Arch in St. Louis. You can create a catenary with the following code, where cosh(x) denotes the hyperbolic cosine of the x values.

import numpy as np

import matplotlib.pyplot as plt

x = np.arange(-5, 5, 0.1)

y = np.cosh(x)

plt.title('A Catenary')

plt.xlabel('Horizontal Distance')

plt.ylabel('Height')

plt.xlim(-8, 8)

plt.ylim(0, 60)

plt.grid()

plt.plot(x, y, lw=3, color='firebrick')

While the code is somewhat verbose, it remains logical and easy to follow, with all plotting steps calling methods on plt.

In Matplotlib, elements displayed on a Figure canvas, such as titles, legends, or lines, are referred to as Artist objects. Standard graphical elements like rectangles, circles, and text are called primitive Artists. The objects that house these primitives, such as Figure, Axes, and Axis objects, are termed container Artists.

Some frequently used pyplot methods for creating plots and working with Artists are listed in the following tables. For a complete list, visit the Matplotlib pyplot summary page. Clicking method names in this online list will lead you to detailed information about parameters and example applications. To learn more about Artists, visit the Matplotlib artist’s page.

Keep in mind that the code examples in these tables represent simple cases. Most methods accept numerous arguments, allowing you to fine-tune your plots regarding properties like font style, line widths, colors, rotation angles, and more.

Working with Subplots Thus far, we've dealt with single figures, but there will be times when you want to compare two plots side by side or group several charts into a summary view. For such situations, Matplotlib provides the subplot() method. To illustrate this, let's start by generating data for two different sine waves:

time = np.arange(-12.5, 12.5, 0.1)

amplitude = np.sin(time)

amplitude_halved = np.sin(time) / 2

One way to compare these waveforms is to plot them within the same Axes object:

plt.plot(time, amplitude, c='k', label='sine1')

plt.plot(time, amplitude_halved, c='firebrick', ls='--', label='sine2')

plt.legend()

By default, the two curves would be plotted in different colors (blue and orange). We overrode this by specifying black ('k') and "firebrick" red. We also changed the line style using the ls parameter; otherwise, both lines would have been solid. (For a list of available marker and line style characters, visit this site).

If you're comparing several curves, a single plot can become cluttered and hard to interpret. In such cases, you can create separate stacked plots using the subplot() method. The following diagram illustrates the syntax for this method, where four subplots (Axes) are arranged within a single Figure container.

The subplots will be organized in a grid, and the first two arguments passed to the subplot() method define the grid's dimensions. The first argument indicates the number of rows, the second indicates the number of columns, and the third specifies the index of the active subplot (highlighted in gray in the diagram).

The active subplot is the one you are currently plotting in when you call a method like plot() or scatter(). Unlike most Python constructs, the first index is 1 instead of 0.

Matplotlib keeps track of the "current figure" to know which Axes is currently being worked on. For example, when you call plt.plot(), pyplot creates a new "current figure" Axes for plotting. When using multiple subplots, the index argument tells pyplot which subplot represents the "current figure."

For convenience, you don't need to use commas with the subplot() arguments. For instance, plt.subplot(223) is equivalent to plt.subplot(2, 2, 3), although the former may be less readable.

Now, let's plot our sine waves as two separate stacked plots. We will invoke the subplot() method and adjust its active subplot argument to change the current subplot. For each active subplot, the plot() method will display the data specific to that subplot, as follows:

plt.subplot(2, 1, 1)

plt.plot(time, amplitude, label='sine1')

plt.legend(loc='upper right')

plt.subplot(2, 1, 2)

plt.ylim(-1, 1)

plt.plot(time, amplitude_halved, label='sine2')

plt.legend(loc='best')

Note that if you don’t specify the y-limits on the second plot, pyplot will automatically scale the graph so that both subplots appear identical. By manually setting the scale on the second subplot using the ylim() method, it becomes clear that the second sine wave has half the amplitude of the first.

This overview covers some basic syntax for the pyplot approach. Next, let's turn to the object-oriented style.

Using the Object-Oriented Style The object-oriented plotting method typically requires a bit more code than the previously described pyplot approach, but it allows you to maximize your use of Matplotlib. By explicitly creating Figure and Axes objects, you can exert greater control over your plots, better comprehend interactions with other libraries, and create plots with multiple x- and y-axes.

Creating and Modifying Plots with the Object-Oriented Style To become familiar with the object-oriented style, let's recreate the catenary plot from earlier in the article. To showcase the enhanced functionality of this style, we’ll position the y-axis in the center of the plot.

import numpy as np

import matplotlib.pyplot as plt

x = np.arange(-5, 5, 0.1)

y = np.cosh(x)

fig, ax = plt.subplots()

The code above creates a single empty figure. To customize the plot, call the Axes object’s set() method and pass keyword arguments for the title, axis labels, and limits. The set() method is a convenience function that allows you to set multiple properties at once instead of calling specific methods for each.

ax.set(title='A Catenary',

xlabel='Horizontal Distance',

ylabel='Height',

xlim=(-8, 8.1),

ylim=(0, 60))

Next, we will reposition the y-axis to the center of the chart. In Matplotlib, spines are the lines connecting the axis tick marks and delineating the area containing the plotted data.

By default, spines are placed around a plot with tick marks and labels positioned along the left and bottom margins. However, spines can be relocated to arbitrary positions. With the object-oriented style, we can achieve this by using the set_position() method of the Spine subclass.

The following code moves the left (y) axis to the 0 value on the x-axis, then sets the line width to 2 to make the axis stand out from the background grid we will use later.

ax.spines.left.set_position('zero')

ax.spines.left.set_linewidth(2)

The next line disables the right boundary of the plot by setting its color to none:

ax.spines.right.set_color('none')

The next three lines repeat this process for the bottom and top axes, respectively:

ax.spines.bottom.set_position('zero')

ax.spines.bottom.set_linewidth(2)

ax.spines.top.set_color('none')

To complete the plot, we add a background grid and call the plot() method, passing in the x and y data, setting the line width to 3 and the color to firebrick:

ax.grid()

ax.plot(x, y, lw=3, color='firebrick')

If you skip the code related to the spines, you can replicate the pyplot version of this figure with a similar amount of code. Thus, the verbosity of the object-oriented style is largely due to its enhanced capabilities, which users typically take advantage of.

Methods in the pyplot approach have equivalents in the object-oriented style, though the method names may differ. For instance, title() in pyplot becomes set_title(), and xticks() becomes set_xticks(). This is one reason why it’s advisable to choose one approach and stick with it.

Some common methods for creating object-oriented plots are listed in the following table. You can find additional methods for generating box plots, violin plots, and more in the plot types index and the Matplotlib gallery.

Common methods for manipulating Figure and Axes objects are provided in the subsequent tables. In many instances, these operate similarly to the pyplot methods, although the method names may differ.

As previously mentioned, the code examples in these tables illustrate simple cases. Most methods accept a variety of arguments, enabling you to fine-tune your plots concerning properties such as font style, line widths, colors, rotation angles, and more. To explore further, refer to the Matplotlib documentation.

Working with Subplots Like the pyplot approach, the object-oriented style supports subplots. Although multiple methods exist for assigning subplots to Figure and Axes objects, the plt.subplots() method is convenient and returns a NumPy array, allowing you to select subplots using standard indexing or unique names such as axs[0, 0] or ax1. Another advantage is that you can preview the subplots’ layout before plotting any data.

It's important to note that the object-oriented method for creating subplots is spelled subplots, while the pyplot approach uses subplot. Remember this distinction by associating the simpler technique, pyplot, with the shorter name.

Calling plt.subplots() without arguments generates a single empty plot, technically creating a 1×1 AxesSubplot object.

fig, ax = plt.subplots()

Generating multiple subplots functions similarly to the plt.subplot() method, but without an index argument for the active subplot. The first argument specifies the number of rows, while the second denotes the number of columns. By convention, multiple Axes are given the plural name, axs, to avoid confusion with a single Axes instance.

Passing the plt.subplots() method two arguments controls the number of subplots and their layout. The following code creates a 2×2 grid of subplots, storing two AxesSubplot objects in the axs variable.

fig, axs = plt.subplots(2, 2)

axs

To activate a subplot, you can use its index. In this example, we plot on the second subplot in the first row:

fig, axs = plt.subplots(2, 2)

axs[0, 1].plot([1, 2, 3])

Alternatively, you can name and store the subplots individually using tuple unpacking for multiple Axes. Each row of subplots will need to be in its own tuple, allowing you to select a subplot by name instead of relying on a less-readable index:

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)

ax3.plot([1, 2, 3])

In both the pyplot approach and the object-oriented style, you can add whitespace around the subplots by invoking the tight_layout() method on the Figure object:

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)

ax3.plot([1, 2, 3])

fig.tight_layout()

Now the subplots appear less cramped. For the pyplot approach, you would use plt.tight_layout().

Alternative Methods for Creating Subplots Regardless of the method you choose, there are higher-level alternatives available to help you divide a figure into a grid of subareas. This is particularly useful for creating subplots with varying widths and heights, resulting in multipanel displays that are ideal for summarizing information in presentations and reports.

Among these paneling tools are Matplotlib's GridSpec module and its subplot_mosaic() method. Here’s an example created with GridSpec:

For more information on these tools, check out the Matplotlib documentation on Working with Multiple Figures and Axes and Arranging Multiple Axes in a Figure, as well as my GridSpec tutorial article in Better Programming.

Summary If you’re programming in Python, understanding Matplotlib is essential. To fully grasp Matplotlib, you need to become familiar with its primary plotting terminology and the two plotting styles.

The Figure object serves as the canvas for your plots, controlling aspects such as plot size, aspect ratio, spacing between subplots, supertitles, and the ability to save the plot.

Figure objects can contain multiple Axes objects, which form what we typically consider figures or diagrams. These include lines, points, text, titles, and the plot's coordinate system. Multiple Axes objects within the same Figure constitute subplots.

Within an Axes object, the Axis element represents numerical values along the x, y, or z axes, including tick marks, labels, and limits.

Matplotlib provides two main approaches for plotting. The pyplot method is designed for quick and easy plotting, suitable for exploratory data analysis. This approach creates Figure and Axes objects automatically and makes most decisions (like axis scaling, colors, line styles, etc.) for you, though you can override them if needed.

For more complex plots, such as those for reports and presentations, the object-oriented style explicitly creates Figure and Axes objects (traditionally labeled as fig and ax). This style provides greater control and makes it easier to understand interactions with other Python libraries.

If you're unfamiliar with these two plotting paradigms, it's easy to become confused when using code snippets found online, such as on Stack Overflow. Because the methods used in each approach are similar but distinct, the Matplotlib developers advise choosing one method and applying it consistently.

Citations 1. “Python Tools for Scientists: An Introduction to Using Anaconda, JupyterLab, and Python’s Scientific Libraries” (No Starch Press, 2023) by Lee Vaughan.

Thanks! Thank you for reading! Follow me for more Quick Success Data Science articles in the future. If you're interested in learning more about Matplotlib and other Python plotting libraries, check out my book, Python Tools for Scientists, available online and at fine bookstores like Barnes and Noble.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

# Pigs: Underappreciated Geniuses with Unique Qualities

Discover the surprising intelligence and social behavior of pigs, challenging common misconceptions about these remarkable animals.

Understanding Avian Flu and Its Implications for Humanity

Explore the implications of avian flu and zoonotic diseases, emphasizing the interconnection between humans and nature.

Innovative Training Methods for Physical Neural Networks

Exploring groundbreaking training techniques for physical neural networks using light waves, paving the way for energy-efficient AI systems.

A Comprehensive Guide to Boost Your Productivity

Discover key strategies to enhance your productivity and overcome common distractions.

Navigating Your Career: Insights for Effective Management

Explore practical strategies to manage your career ambitions effectively while balancing risks and rewards.

Medium’s New Changes: A Welcome Surprise for Writers

Exploring recent changes on Medium and their positive impact on writer engagement and visibility.

Unlocking the Magic of Fibonacci: Converting Miles to Kilometers

Discover how the Fibonacci sequence can simplify the conversion between miles and kilometers in an intriguing mathematical trick.

Mastering Network Applications with Python Socket Programming

Discover how to create networked applications using Python's socket programming, covering TCP, UDP, and practical examples.