Module 6: Database Optimization
Master query optimization, indexing strategies, and performance tuning for high-scale applications.
Why Database Optimization Matters
A slow database is like traffic congestion - it affects everything. Database optimization ensures your application stays fast as data grows. A well-optimized query can be 100x faster than an unoptimized one!
⚡ Performance Impact:
- User Experience: Faster page loads, happier users
- Cost Savings: Less server resources needed
- Scalability: Handle more users with same hardware
- Reliability: Fewer timeouts and errors
- Competitive Edge: Speed is a feature
Query Optimization Techniques
1. Use EXPLAIN to Analyze Queries
EXPLAIN shows how the database executes your query - like an X-ray for SQL.
-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM users
WHERE email = 'john@example.com';
-- Look for:
-- Seq Scan (bad) vs Index Scan (good)
-- Execution time
-- Rows scanned vs rows returned
2. Select Only What You Need
Don't use SELECT * - it wastes bandwidth and memory.
❌ Slow:
SELECT * FROM users;
✅ Fast:
SELECT id, username, email
FROM users;
3. Avoid N+1 Query Problem
Loading related data in a loop causes many queries. Use JOINs instead.
❌ N+1 Problem:
// 1 query for posts
posts = SELECT * FROM posts;
// N queries for authors
for each post:
SELECT * FROM users
WHERE id = post.user_id;
✅ Single Query:
SELECT
posts.*,
users.username
FROM posts
JOIN users
ON posts.user_id = users.id;
4. Use LIMIT for Pagination
Don't load all records at once. Use pagination.
-- Page 1 (first 20 records)
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;
-- Page 2 (next 20 records)
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 20;
5. Avoid Functions on Indexed Columns
Using functions on indexed columns prevents index usage.
❌ Can't Use Index:
SELECT * FROM users
WHERE LOWER(email) =
'john@example.com';
✅ Uses Index:
SELECT * FROM users
WHERE email =
'john@example.com';
Advanced Indexing Strategies
Indexes are like book indexes - they help find information quickly without reading everything. But too many indexes slow down writes.
Single Column Index
Index frequently queried columns.
-- Create index
CREATE INDEX idx_users_email ON users(email);
-- This query will use the index
SELECT * FROM users WHERE email = 'john@example.com';
Composite Index
Index multiple columns together. Order matters!
-- Create composite index
CREATE INDEX idx_posts_user_created
ON posts(user_id, created_at);
-- Can use index for:
WHERE user_id = 1
WHERE user_id = 1 AND created_at > '2024-01-01'
-- Cannot use index for:
WHERE created_at > '2024-01-01' -- Wrong order!
Partial Index
Index only rows that match a condition - saves space.
-- Index only active users
CREATE INDEX idx_active_users
ON users(username)
WHERE is_active = true;
-- Index only unpublished posts
CREATE INDEX idx_draft_posts
ON posts(user_id)
WHERE published_at IS NULL;
Covering Index
Include all columns needed by a query in the index.
-- Create covering index
CREATE INDEX idx_posts_covering
ON posts(user_id, created_at)
INCLUDE (title, views);
-- This query only reads the index (fast!)
SELECT title, views FROM posts
WHERE user_id = 1
ORDER BY created_at DESC;
💡 Index Best Practices:
- • Index columns used in WHERE, JOIN, ORDER BY
- • Don't over-index - each index slows writes
- • Monitor index usage and remove unused ones
- • Consider index size vs benefit
- • Rebuild indexes periodically
- • Use UNIQUE indexes for constraints
Connection Pooling
Creating database connections is expensive. Connection pooling reuses connections like a taxi stand - taxis wait for passengers instead of driving away after each ride.
// Node.js with pg (PostgreSQL)
const { Pool } = require('pg').Pool;
const pool = new Pool({
host: 'localhost',
database: 'myapp',
max: 20, // Max connections
idleTimeoutMillis: 30000, // Close idle after 30s
connectionTimeoutMillis: 2000 // Wait 2s for connection
});
// Use pooled connection
const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);
🎯 Pool Size Guidelines:
- • Formula: connections = ((core_count * 2) + effective_spindle_count)
- • Typical: 10-20 connections for most apps
- • Monitor: Watch for connection exhaustion
- • Don't: Set pool size too high (wastes resources)
Query Result Caching
Cache frequently accessed query results to avoid hitting the database.
// Cache with Redis
async function getUser(userId) {
const cacheKey = `user:${userId}`;
// Try cache first
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Query database
const result = await pool.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
// Cache for 1 hour
await redis.setex(cacheKey, 3600, JSON.stringify(result.rows[0]));
return result.rows[0];
}
Table Partitioning
Split large tables into smaller pieces for better performance. Like organizing files into folders by year.
-- Partition by date range
CREATE TABLE logs (
id SERIAL,
message TEXT,
created_at TIMESTAMP
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE logs_2024_01 PARTITION OF logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE logs_2024_02 PARTITION OF logs
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Queries automatically use correct partition
SELECT * FROM logs
WHERE created_at >= '2024-01-15'
AND created_at < '2024-01-20';
Performance Monitoring
You can't optimize what you don't measure. Monitor these key metrics:
📊 Key Metrics:
- • Query execution time
- • Slow query log
- • Connection pool usage
- • Cache hit ratio
- • Index usage statistics
- • Table bloat
- • Lock contention
🔧 Monitoring Tools:
- • pg_stat_statements (PostgreSQL)
- • EXPLAIN ANALYZE
- • pgAdmin
- • DataDog, New Relic
- • Grafana + Prometheus
- • Application Performance Monitoring
-- Find slow queries (PostgreSQL)
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
-- Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan < 10
ORDER BY idx_scan;
Optimization Checklist
✅ Query Level
- □ Use EXPLAIN ANALYZE on slow queries
- □ Select only needed columns
- □ Use JOINs instead of multiple queries
- □ Add LIMIT for pagination
- □ Avoid functions on indexed columns
✅ Index Level
- □ Index WHERE clause columns
- □ Index JOIN columns
- □ Use composite indexes wisely
- □ Remove unused indexes
- □ Monitor index size
✅ Application Level
- □ Use connection pooling
- □ Implement caching layer
- □ Batch operations when possible
- □ Use read replicas for scaling
- □ Monitor performance metrics
✅ Database Level
- □ Regular VACUUM and ANALYZE
- □ Partition large tables
- □ Archive old data
- □ Tune database configuration
- □ Plan for backups and recovery
🛠️ Hands-On Project: Optimize a Slow Application
Take a slow application and optimize it using all the techniques learned.
Project Tasks:
- ✓ Identify slow queries using EXPLAIN
- ✓ Add appropriate indexes
- ✓ Implement connection pooling
- ✓ Add Redis caching layer
- ✓ Fix N+1 query problems
- ✓ Set up monitoring dashboard
- ✓ Measure before/after performance
📚 Module Summary
You've mastered database optimization:
- ✓ Query optimization techniques
- ✓ Advanced indexing strategies
- ✓ Connection pooling
- ✓ Query result caching
- ✓ Table partitioning
- ✓ Performance monitoring
Congratulations! You've completed the Database Engineering learning path!