Transform data into compelling visual stories that drive decisions and insights
Imagine trying to understand 10,000 rows of sales data in a spreadsheet versus seeing a beautiful chart that instantly shows trends, patterns, and outliers. That's the power of visualization - turning numbers into insights that anyone can understand!
Data Visualization is the graphical representation of data. It uses visual elements like charts, graphs, and maps to help people understand patterns, trends, and insights in data that would be hard to see in raw numbers.
Example: Sales Data
Numbers: 100, 150, 120, 180, 200, 250...
Chart: 📈 Upward trend clearly visible!
Instant Understanding
Humans process visuals 60,000x faster than text
Find Patterns
Spot trends, outliers, and correlations easily
Tell Stories
Communicate insights to non-technical audiences
Drive Decisions
Make data-driven choices with confidence
Matplotlib is the foundation of Python visualization - like the Swiss Army knife of plotting! It gives you complete control over every element of your charts. Think of it as the "low-level" tool that other libraries build upon.
Line plots show trends over time. Perfect for stock prices, temperature changes, or any continuous data that changes over time.
# Import libraries
import matplotlib.pyplot as plt
import numpy as np
# Create data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [15000, 18000, 22000, 25000, 30000, 35000]
# Create line plot
plt.figure(figsize=(10, 6)) # Set size
plt.plot(months, sales, marker='o', linewidth=2, color='#8b5cf6')
plt.title('Monthly Sales Trend', fontsize=16, fontweight='bold')
plt.xlabel('Month', fontsize=12)
plt.ylabel('Sales ($)', fontsize=12)
plt.grid(True, alpha=0.3) # Add grid
plt.tight_layout() # Prevent label cutoff
plt.show()
# Scatter plot - show relationship between two variables
study_hours = [1, 2, 3, 4, 5, 6, 7, 8]
test_scores = [55, 60, 65, 70, 75, 85, 90, 95]
plt.figure(figsize=(8, 6))
plt.scatter(study_hours, test_scores, s=100, c='#8b5cf6', alpha=0.6)
plt.title('Study Hours vs Test Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Test Score')
plt.show()
# Bar chart - compare categories
products = ['Laptop', 'Phone', 'Tablet', 'Watch']
revenue = [45000, 65000, 30000, 20000]
plt.figure(figsize=(10, 6))
plt.bar(products, revenue, color='#8b5cf6', alpha=0.7)
plt.title('Product Revenue Comparison')
plt.xlabel('Product')
plt.ylabel('Revenue ($)')
plt.xticks(rotation=45) # Rotate labels
plt.show()
# Histogram - show distribution of data
ages = np.random.normal(35, 10, 1000) # 1000 ages, mean 35, std 10
plt.figure(figsize=(10, 6))
plt.hist(ages, bins=30, color='#8b5cf6', alpha=0.7, edgecolor='black')
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.axvline(ages.mean(), color='red', linestyle='--', label='Mean')
plt.legend()
plt.show()
Use plt.style.use('seaborn') at the start of your script to make Matplotlib plots look more modern and professional!
Seaborn is built on top of Matplotlib but makes beautiful statistical plots with less code! It's like Matplotlib with a designer's touch - perfect for exploring relationships in data.
Heatmaps show data as colors in a matrix. Perfect for correlation matrices, confusion matrices, or any grid-based data.
# Import Seaborn
import seaborn as sns
import pandas as pd
# Create sample data
data = pd.DataFrame({
'Math': [85, 90, 78, 92, 88],
'Science': [88, 85, 80, 95, 90],
'English': [75, 80, 85, 88, 82]
})
# Create correlation heatmap
plt.figure(figsize=(8, 6))
corr = data.corr() # Calculate correlations
sns.heatmap(corr, annot=True, cmap='Purples', center=0)
plt.title('Subject Correlation Heatmap')
plt.show()
# Pair plot - visualize all pairwise relationships
iris = sns.load_dataset('iris') # Load sample dataset
sns.pairplot(iris, hue='species', palette='Set2')
plt.suptitle('Iris Dataset Pair Plot', y=1.02)
plt.show()
# Violin plot - shows distribution shape
tips = sns.load_dataset('tips')
plt.figure(figsize=(10, 6))
sns.violinplot(x='day', y='total_bill', data=tips, palette='Purples')
plt.title('Total Bill Distribution by Day')
plt.show()
# Box plot - shows quartiles and outliers
plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=tips, palette='Set2')
plt.title('Total Bill Box Plot by Day')
plt.show()
Plotly creates interactive charts that users can zoom, pan, and hover over! Perfect for dashboards and web applications. Your charts come alive with interactivity!
# Import Plotly
import plotly.graph_objects as go
import plotly.express as px
# Create interactive line chart
dates = pd.date_range('2024-01-01', periods=30)
values = np.cumsum(np.random.randn(30)) + 100
fig = go.Figure()
fig.add_trace(go.Scatter(
x=dates, y=values,
mode='lines+markers',
name='Stock Price',
line=dict(color='#8b5cf6', width=2)
))
fig.update_layout(
title='Interactive Stock Price Chart',
xaxis_title='Date',
yaxis_title='Price ($)',
hovermode='x unified'
)
fig.show()
# 3D scatter plot
df = px.data.iris()
fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', z='petal_width',
color='species', size='petal_length',
title='3D Iris Dataset')
fig.show()
Plotly Express (px): High-level, quick plots with one line of code
Graph Objects (go): Low-level, complete control over every detail
Streamlit turns Python scripts into interactive web apps in minutes! No HTML, CSS, or JavaScript needed. Perfect for creating data dashboards and sharing your analysis with others.
# Install: pip install streamlit
# Run: streamlit run app.py
# app.py
import streamlit as st
import pandas as pd
import plotly.express as px
# Title and description
st.title('📊 Sales Dashboard')
st.write('Interactive sales analysis dashboard')
# Sidebar filters
year = st.sidebar.selectbox('Select Year', [2022, 2023, 2024])
region = st.sidebar.multiselect('Select Region', ['East', 'West', 'North', 'South'])
# Load and display data
df = pd.read_csv('sales.csv')
st.dataframe(df.head())
# Metrics
col1, col2, col3 = st.columns(3)
col1.metric('Total Revenue', '$1.2M', '+12%')
col2.metric('Orders', '1,234', '+5%')
col3.metric('Customers', '567', '+8%')
# Interactive chart
fig = px.line(df, x='date', y='revenue', title='Revenue Trend')
st.plotly_chart(fig, use_container_width=True)
Choosing the right chart is crucial! The wrong chart can confuse your audience, while the right one makes insights crystal clear. Here's your decision guide:
Use for: Trends over time
Examples: Stock prices, temperature, website traffic
Best when: Showing continuous data with time on x-axis
Use for: Comparing categories
Examples: Sales by product, survey responses
Best when: Comparing 3-10 categories
Use for: Parts of a whole
Examples: Market share, budget breakdown
Best when: Showing 2-5 categories that sum to 100%
Use for: Relationships between variables
Examples: Height vs weight, price vs demand
Best when: Looking for correlations
Use for: Distribution and outliers
Examples: Salary ranges, test scores
Best when: Comparing distributions across groups
Use for: Matrix data with color intensity
Examples: Correlations, confusion matrix
Best when: Showing patterns in 2D data
Colors aren't just decoration - they communicate meaning! But 8% of men and 0.5% of women have color blindness. Your beautiful red-green chart might be invisible to them!
Sequential (for ordered data)
Light to dark of same color: Perfect for heatmaps, choropleth maps
sns.color_palette("Blues")
sns.color_palette("Purples")
Diverging (for data with meaningful midpoint)
Two colors meeting at middle: Perfect for showing positive/negative
sns.color_palette("RdBu") # Red-Blue
sns.color_palette("PiYG") # Pink-Green
Qualitative (for categories)
Distinct colors: Perfect for categorical data
sns.color_palette("Set2")
sns.color_palette("tab10")
Let's put everything together! We'll analyze a sales dataset and create a comprehensive visualization report using Matplotlib, Seaborn, and Plotly.
# Complete Visualization Project: Sales Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
plt.style.use('seaborn-v0_8')
# Step 1: Create sample sales data
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=365)
products = ['Laptop', 'Phone', 'Tablet', 'Watch', 'Headphones']
regions = ['East', 'West', 'North', 'South']
data = []
for date in dates:
for _ in range(np.random.randint(5, 15)):
data.append({
'date': date,
'product': np.random.choice(products),
'region': np.random.choice(regions),
'quantity': np.random.randint(1, 5),
'price': np.random.uniform(50, 1500)
})
df = pd.DataFrame(data)
df['revenue'] = df['quantity'] * df['price']
df['month'] = df['date'].dt.month
df['day_of_week'] = df['date'].dt.day_name()
# Step 2: Overview Statistics
print("=== SALES OVERVIEW ===")
print(f"Total Revenue: ${df['revenue'].sum():,.2f}")
print(f"Total Orders: {len(df):,}")
print(f"Average Order Value: ${df['revenue'].mean():,.2f}")
print(f"Date Range: {df['date'].min()} to {df['date'].max()}")
# Step 3: Revenue Trend Over Time (Matplotlib)
daily_revenue = df.groupby('date')['revenue'].sum()
monthly_revenue = df.groupby('month')['revenue'].sum()
plt.figure(figsize=(14, 6))
plt.plot(daily_revenue.index, daily_revenue.values, color='#8b5cf6', linewidth=1.5)
plt.title('Daily Revenue Trend 2024', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Revenue ($)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('revenue_trend.png', dpi=300, bbox_inches='tight')
plt.show()
# Step 4: Product Performance (Seaborn)
product_stats = df.groupby('product').agg({
'revenue': 'sum',
'quantity': 'sum'
}).sort_values('revenue', ascending=False)
plt.figure(figsize=(10, 6))
sns.barplot(x=product_stats.index, y=product_stats['revenue'], palette='Purples_r')
plt.title('Revenue by Product', fontsize=16, fontweight='bold')
plt.xlabel('Product', fontsize=12)
plt.ylabel('Total Revenue ($)', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Step 5: Regional Analysis (Heatmap)
pivot_data = df.pivot_table(
values='revenue',
index='product',
columns='region',
aggfunc='sum'
)
plt.figure(figsize=(10, 6))
sns.heatmap(pivot_data, annot=True, fmt='.0f', cmap='Purples', cbar_kws={'label': 'Revenue ($)'})
plt.title('Revenue Heatmap: Product vs Region', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()
# Step 6: Interactive Dashboard (Plotly)
fig = px.sunburst(
df,
path=['region', 'product'],
values='revenue',
title='Revenue Distribution: Region > Product',
color='revenue',
color_continuous_scale='Purples'
)
fig.update_layout(height=600)
fig.show()
# Step 7: Key Insights Summary
print("\\n=== KEY INSIGHTS ===")
best_product = product_stats.index[0]
best_region = df.groupby('region')['revenue'].sum().idxmax()
best_day = df.groupby('day_of_week')['revenue'].sum().idxmax()
print(f"1. Top Product: {best_product} (${product_stats.loc[best_product, 'revenue']:,.2f})")
print(f"2. Best Region: {best_region}")
print(f"3. Best Day: {best_day}")
print(f"4. Growth Trend: {(monthly_revenue.iloc[-1] / monthly_revenue.iloc[0] - 1) * 100:.1f}% from Jan to Dec")
You can now create stunning visualizations that tell compelling data stories! In the next module, we'll dive into Machine Learning Fundamentals - building models that learn from data and make predictions.