본문으로 건너뛰기
Previous
Next
Pandas Basics | Complete Guide to Python Data Analysis

Pandas Basics | Complete Guide to Python Data Analysis

Pandas Basics | Complete Guide to Python Data Analysis

이 글의 핵심

Pandas basics for Python: Series and DataFrame, I/O, filtering, groupby, merge, and real-world patterns. Practical tutorial with examples for CSV, Excel, and analytics workflows.

Introduction

“The essential tool for data analysis”

Pandas is Python’s core library for data analysis and manipulation.

1. Pandas basics

Installation

pip install pandas

Series and DataFrame

import pandas as pd
# Series (1D)
s = pd.Series([1, 2, 3, 4, 5])
print(s)
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# DataFrame (2D)
df = pd.DataFrame({
    'name': ['철수', '영희', '민수'],
    'age': [25, 30, 28],
    'city': ['서울', '부산', '대구']
})
print(df)
#   name  age city
# 0  철수   25  서울
# 1  영희   30  부산
# 2  민수   28  대구

2. Reading and writing data

CSV files

# Read CSV
df = pd.read_csv('data.csv')
# Write CSV
df.to_csv('output.csv', index=False, encoding='utf-8-sig')
# Read selected columns only
df = pd.read_csv('data.csv', usecols=['name', 'age'])
# Custom delimiter
df = pd.read_csv('data.tsv', sep='\t')

Excel files

# Read Excel
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
# Write Excel
df.to_excel('output.xlsx', index=False)

3. Exploring data

Basic information

# First 5 rows
print(df.head())
# Last 5 rows
print(df.tail())
# Summary info
print(df.info())
# Descriptive statistics
print(df.describe())
# Shape
print(df.shape)  # (rows, columns)
# Column names
print(df.columns)

4. Selecting data

Column selection

# Single column
ages = df['age']
# Multiple columns
subset = df[['name', 'age']]
# Boolean filtering
adults = df[df['age'] >= 30]
seoul_users = df[df['city'] == '서울']
# Compound conditions
result = df[(df['age'] >= 25) & (df['city'] == '서울')]

Row selection

# By position (iloc)
first_row = df.iloc[0]
first_three = df.iloc[:3]
# By label (loc)
df_indexed = df.set_index('name')
chul = df_indexed.loc['철수']
# Conditional rows
young = df[df['age'] < 30]

5. Transforming data

Add and drop columns

# Add columns
df['country'] = '한국'
df['birth_year'] = 2026 - df['age']
# Drop one column
df = df.drop('country', axis=1)
# Drop multiple columns
df = df.drop(['col1', 'col2'], axis=1)

Changing values

# Update specific values
df.loc[df['name'] == '철수', 'age'] = 26
# Apply a function
df['age_group'] = df['age'].apply(
    lambda x: '청년' if x < 30 else '중년'
)
# Apply to multiple columns
df[['age', 'birth_year']] = df[['age', 'birth_year']].applymap(int)

6. Grouping and aggregation

groupby

# Average age by city
city_avg = df.groupby('city')['age'].mean()
print(city_avg)
# Multiple aggregations
result = df.groupby('city').agg({
    'age': ['mean', 'min', 'max'],
    'name': 'count'
})
print(result)

7. Combining data

merge (joins)

# Two DataFrames
df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['철수', '영희', '민수']
})
df2 = pd.DataFrame({
    'id': [1, 2, 4],
    'score': [85, 90, 88]
})
# Inner join
merged = pd.merge(df1, df2, on='id', how='inner')
print(merged)
#    id name  score
# 0   1  철수     85
# 1   2  영희     90
# Left join
merged = pd.merge(df1, df2, on='id', how='left')

concat (stacking)

# Concatenate vertically
df_concat = pd.concat([df1, df2], ignore_index=True)
# Concatenate horizontally
df_concat = pd.concat([df1, df2], axis=1)

8. Practical example

Sales analysis

import pandas as pd
# Load data
sales = pd.read_csv('sales.csv')
# Basic info
print(f"총 {len(sales)}건의 판매")
print(f"총 매출: {sales['amount'].sum():,}원")
# Monthly revenue
sales['date'] = pd.to_datetime(sales['date'])
sales['month'] = sales['date'].dt.month
monthly_sales = sales.groupby('month')['amount'].sum()
print(monthly_sales)
# Top 10 products by revenue
top_products = sales.groupby('product')['amount'].sum().sort_values(ascending=False).head(10)
print(top_products)
# Save results
monthly_sales.to_csv('monthly_report.csv')

Practical tips

Pandas tips

# ✅ Memory optimization
df = pd.read_csv('large.csv', dtype={'id': 'int32'})
# ✅ Missing values
df = df.dropna()   # drop rows with NaN
df = df.fillna(0)  # fill with 0
# ✅ Duplicates
df = df.drop_duplicates()
# ✅ Method chaining
result = (df
    .query('age >= 25')
    .groupby('city')['age']
    .mean()
    .sort_values(ascending=False)
)

Summary

Key takeaways

  1. Pandas: the standard library for tabular data analysis
  2. DataFrame: 2D labeled table
  3. I/O: CSV, Excel, JSON
  4. Operations: filtering, grouping, merging
  5. Aggregation: sum, mean, count

Next steps

  • [NumPy](/en/blog/python-series-17-numpy/
  • [Matplotlib visualization](/en/blog/python-series-18-matplotlib/

  • [Hands-on data analysis with Pandas](/en/blog/python-series-20-data-analysis/

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Pandas basics for Python: Series and DataFrame, I/O, filtering, groupby, merge, and real-world patterns. Practical tutor… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. Python 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.


같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.

  • [NumPy Basics](/en/blog/python-series-17-numpy/
  • [Matplotlib Basics](/en/blog/python-series-18-matplotlib/
  • [Hands-on Data Analysis with Python | Pandas Workflows](/en/blog/python-series-20-data-analysis/

이 글에서 다루는 키워드 (관련 검색어)

Python, Pandas, Data Analysis, DataFrame, CSV, Data Wrangling 등으로 검색하시면 이 글이 도움이 됩니다.