Matplotlib Basics | Complete Guide to Data Visualization in Python

Matplotlib Basics | Complete Guide to Data Visualization in Python

이 글의 핵심

A hands-on Matplotlib guide: line, bar, histogram, and scatter plots, multi-panel figures, export settings, and workflow tips for clean charts.

Introduction

“Turn data into pictures”

Matplotlib is Python’s standard library for data visualization.


1. Matplotlib basics

Installation

pip install matplotlib

First plot

import matplotlib.pyplot as plt

# Data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Plot
plt.plot(x, y)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Line plot')
plt.show()

2. Line plots

Basic line plot

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.plot(x, y1, label='sin(x)', color='blue', linestyle='-')
plt.plot(x, y2, label='cos(x)', color='red', linestyle='--')

plt.xlabel('x')
plt.ylabel('y')
plt.title('Trigonometric functions')
plt.legend()
plt.grid(True)
plt.show()

3. Bar charts

Vertical bars

categories = ['A', 'B', 'C', 'D']
values = [25, 40, 30, 55]

plt.bar(categories, values, color='skyblue')
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Bar chart')
plt.show()

Horizontal bars

plt.barh(categories, values, color='lightgreen')
plt.xlabel('Value')
plt.ylabel('Category')
plt.title('Horizontal bar chart')
plt.show()

4. Histogram

# Normal-ish sample
data = np.random.randn(1000)

plt.hist(data, bins=30, color='purple', alpha=0.7, edgecolor='black')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()

5. Scatter plot

x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50)
sizes = 1000 * np.random.rand(50)

plt.scatter(x, y, c=colors, s=sizes, alpha=0.5, cmap='viridis')
plt.colorbar()
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter plot')
plt.show()

6. Multiple subplots

fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Top-left
axes[0, 0].plot([1, 2, 3], [1, 4, 9])
axes[0, 0].set_title('Line')

# Top-right
axes[0, 1].bar(['A', 'B', 'C'], [3, 7, 5])
axes[0, 1].set_title('Bar')

# Bottom-left
axes[1, 0].hist(np.random.randn(100), bins=20)
axes[1, 0].set_title('Histogram')

# Bottom-right
axes[1, 1].scatter(np.random.rand(50), np.random.rand(50))
axes[1, 1].set_title('Scatter')

plt.tight_layout()
plt.show()

7. Practical example

Sales visualization

import matplotlib.pyplot as plt
import pandas as pd

# Data
sales_data = pd.DataFrame({
    'month': ['1월', '2월', '3월', '4월', '5월', '6월'],
    'sales': [150, 180, 165, 220, 250, 240],
    'profit': [30, 45, 35, 60, 75, 70]
})

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Revenue trend
ax1.plot(sales_data['month'], sales_data['sales'],
         marker='o', linewidth=2, markersize=8)
ax1.set_title('월별 매출 추이', fontsize=14, fontweight='bold')
ax1.set_xlabel('월')
ax1.set_ylabel('매출 (만원)')
ax1.grid(True, alpha=0.3)

# Profit bars
ax2.bar(sales_data['month'], sales_data['profit'],
        color='green', alpha=0.7)
ax2.set_title('월별 수익', fontsize=14, fontweight='bold')
ax2.set_xlabel('월')
ax2.set_ylabel('수익 (만원)')

plt.tight_layout()
plt.savefig('sales_report.png', dpi=300)
plt.show()

Practical tips

Styling

# Built-in style
plt.style.use('seaborn-v0_8')

# Font for CJK labels (OS-specific; example: Windows)
plt.rcParams['font.family'] = 'Malgun Gothic'
plt.rcParams['axes.unicode_minus'] = False

# Figure size
plt.figure(figsize=(10, 6))

# Color palette
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']

Going deeper

Scatter with regression line and residual histogram (runnable)

Synthetic data with numpy, linear fit via polyfit, and a residual histogram—includes savefig for reports.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
x = np.linspace(0, 10, 80)
y = 2.5 * x + 1.0 + rng.normal(0, 1.8, size=x.shape)

coef = np.polyfit(x, y, 1)
y_hat = np.poly1d(coef)(x)
residuals = y - y_hat

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4), constrained_layout=True)
ax1.scatter(x, y, alpha=0.7, label="Observed")
ax1.plot(x, y_hat, color="crimson", linewidth=2, label="Linear fit")
ax1.set_title("Scatter with regression line")
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.hist(residuals, bins=18, color="steelblue", edgecolor="black", alpha=0.85)
ax2.set_title("잔차 분포")
ax2.grid(True, alpha=0.3)

fig.savefig("regression_residuals.png", dpi=200)
plt.show()

Common mistakes

  • Calling savefig after show() in some backends yields empty files—order matters.
  • Missing fonts for non-Latin labels—configure per OS.
  • Mixing OO API and pyplot state so artists land on the wrong axes.

Caveats

  • Journals often prefer vector formats (PDF/SVG); for raster, set dpi explicitly.
  • Consider colorblind-friendly palettes (cividis, etc.).

In production

  • Share styles via matplotlibrc or plt.style.context.
  • For batch reports, call plt.close(fig) to free memory.

Alternatives

ToolBest for
MatplotlibFine control, papers, non-interactive backends
SeabornQuick statistical plots
PlotlyInteractive web charts

Further reading


Summary

Key takeaways

  1. Matplotlib: core Python plotting library
  2. pyplot: convenient stateful API
  3. Chart types: line, bar, histogram, scatter
  4. Subplots: grid of axes
  5. Style: colors, fonts, layout

Next steps