Pandas Basics | Complete Guide to Python Data Analysis
이 글의 핵심
A hands-on guide to Pandas basics: Python’s core library for tabular data—load, explore, transform, and aggregate with clear examples.
Introduction
“The essential tool for data analysis”
Pandas is Python’s core library for data analysis and manipulation.
1. Pandas basics
Installation
pip install pandas
Series and DataFrame
import pandas as pd
# Series (1D)
s = pd.Series([1, 2, 3, 4, 5])
print(s)
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
# DataFrame (2D)
df = pd.DataFrame({
'name': ['철수', '영희', '민수'],
'age': [25, 30, 28],
'city': ['서울', '부산', '대구']
})
print(df)
# name age city
# 0 철수 25 서울
# 1 영희 30 부산
# 2 민수 28 대구
2. Reading and writing data
CSV files
# Read CSV
df = pd.read_csv('data.csv')
# Write CSV
df.to_csv('output.csv', index=False, encoding='utf-8-sig')
# Read selected columns only
df = pd.read_csv('data.csv', usecols=['name', 'age'])
# Custom delimiter
df = pd.read_csv('data.tsv', sep='\t')
Excel files
# Read Excel
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
# Write Excel
df.to_excel('output.xlsx', index=False)
3. Exploring data
Basic information
# First 5 rows
print(df.head())
# Last 5 rows
print(df.tail())
# Summary info
print(df.info())
# Descriptive statistics
print(df.describe())
# Shape
print(df.shape) # (rows, columns)
# Column names
print(df.columns)
4. Selecting data
Column selection
# Single column
ages = df['age']
# Multiple columns
subset = df[['name', 'age']]
# Boolean filtering
adults = df[df['age'] >= 30]
seoul_users = df[df['city'] == '서울']
# Compound conditions
result = df[(df['age'] >= 25) & (df['city'] == '서울')]
Row selection
# By position (iloc)
first_row = df.iloc[0]
first_three = df.iloc[:3]
# By label (loc)
df_indexed = df.set_index('name')
chul = df_indexed.loc['철수']
# Conditional rows
young = df[df['age'] < 30]
5. Transforming data
Add and drop columns
# Add columns
df['country'] = '한국'
df['birth_year'] = 2026 - df['age']
# Drop one column
df = df.drop('country', axis=1)
# Drop multiple columns
df = df.drop(['col1', 'col2'], axis=1)
Changing values
# Update specific values
df.loc[df['name'] == '철수', 'age'] = 26
# Apply a function
df['age_group'] = df['age'].apply(
lambda x: '청년' if x < 30 else '중년'
)
# Apply to multiple columns
df[['age', 'birth_year']] = df[['age', 'birth_year']].applymap(int)
6. Grouping and aggregation
groupby
# Average age by city
city_avg = df.groupby('city')['age'].mean()
print(city_avg)
# Multiple aggregations
result = df.groupby('city').agg({
'age': ['mean', 'min', 'max'],
'name': 'count'
})
print(result)
7. Combining data
merge (joins)
# Two DataFrames
df1 = pd.DataFrame({
'id': [1, 2, 3],
'name': ['철수', '영희', '민수']
})
df2 = pd.DataFrame({
'id': [1, 2, 4],
'score': [85, 90, 88]
})
# Inner join
merged = pd.merge(df1, df2, on='id', how='inner')
print(merged)
# id name score
# 0 1 철수 85
# 1 2 영희 90
# Left join
merged = pd.merge(df1, df2, on='id', how='left')
concat (stacking)
# Concatenate vertically
df_concat = pd.concat([df1, df2], ignore_index=True)
# Concatenate horizontally
df_concat = pd.concat([df1, df2], axis=1)
8. Practical example
Sales analysis
import pandas as pd
# Load data
sales = pd.read_csv('sales.csv')
# Basic info
print(f"총 {len(sales)}건의 판매")
print(f"총 매출: {sales['amount'].sum():,}원")
# Monthly revenue
sales['date'] = pd.to_datetime(sales['date'])
sales['month'] = sales['date'].dt.month
monthly_sales = sales.groupby('month')['amount'].sum()
print(monthly_sales)
# Top 10 products by revenue
top_products = sales.groupby('product')['amount'].sum().sort_values(ascending=False).head(10)
print(top_products)
# Save results
monthly_sales.to_csv('monthly_report.csv')
Practical tips
Pandas tips
# ✅ Memory optimization
df = pd.read_csv('large.csv', dtype={'id': 'int32'})
# ✅ Missing values
df = df.dropna() # drop rows with NaN
df = df.fillna(0) # fill with 0
# ✅ Duplicates
df = df.drop_duplicates()
# ✅ Method chaining
result = (df
.query('age >= 25')
.groupby('city')['age']
.mean()
.sort_values(ascending=False)
)
Summary
Key takeaways
- Pandas: the standard library for tabular data analysis
- DataFrame: 2D labeled table
- I/O: CSV, Excel, JSON
- Operations: filtering, grouping, merging
- Aggregation: sum, mean, count