Module 6: Database Optimization

Master query optimization, indexing strategies, and performance tuning for high-scale applications.

Why Database Optimization Matters

A slow database is like traffic congestion - it affects everything. Database optimization ensures your application stays fast as data grows. A well-optimized query can be 100x faster than an unoptimized one!

⚡ Performance Impact:

User Experience: Faster page loads, happier users
Cost Savings: Less server resources needed
Scalability: Handle more users with same hardware
Reliability: Fewer timeouts and errors
Competitive Edge: Speed is a feature

Query Optimization Techniques

1. Use EXPLAIN to Analyze Queries

EXPLAIN shows how the database executes your query - like an X-ray for SQL.

-- PostgreSQL

EXPLAIN ANALYZE

SELECT * FROM users

WHERE email = 'john@example.com';

-- Look for:

-- Seq Scan (bad) vs Index Scan (good)

-- Execution time

-- Rows scanned vs rows returned

2. Select Only What You Need

Don't use SELECT * - it wastes bandwidth and memory.

❌ Slow:

SELECT * FROM users;

✅ Fast:

SELECT id, username, email

FROM users;

3. Avoid N+1 Query Problem

Loading related data in a loop causes many queries. Use JOINs instead.

❌ N+1 Problem:

// 1 query for posts

posts = SELECT * FROM posts;

// N queries for authors

for each post:

SELECT * FROM users

WHERE id = post.user_id;

✅ Single Query:

SELECT

posts.*,

users.username

FROM posts

JOIN users

ON posts.user_id = users.id;

4. Use LIMIT for Pagination

Don't load all records at once. Use pagination.

-- Page 1 (first 20 records)

SELECT * FROM posts

ORDER BY created_at DESC

LIMIT 20 OFFSET 0;

-- Page 2 (next 20 records)

SELECT * FROM posts

ORDER BY created_at DESC

LIMIT 20 OFFSET 20;

5. Avoid Functions on Indexed Columns

Using functions on indexed columns prevents index usage.

❌ Can't Use Index:

SELECT * FROM users

WHERE LOWER(email) =

'john@example.com';

✅ Uses Index:

SELECT * FROM users

WHERE email =

'john@example.com';

Advanced Indexing Strategies

Indexes are like book indexes - they help find information quickly without reading everything. But too many indexes slow down writes.

Single Column Index

Index frequently queried columns.

-- Create index

CREATE INDEX idx_users_email ON users(email);

-- This query will use the index

SELECT * FROM users WHERE email = 'john@example.com';

Composite Index

Index multiple columns together. Order matters!

-- Create composite index

CREATE INDEX idx_posts_user_created

ON posts(user_id, created_at);

-- Can use index for:

WHERE user_id = 1

WHERE user_id = 1 AND created_at > '2024-01-01'

-- Cannot use index for:

WHERE created_at > '2024-01-01' -- Wrong order!

Partial Index

Index only rows that match a condition - saves space.

-- Index only active users

CREATE INDEX idx_active_users

ON users(username)

WHERE is_active = true;

-- Index only unpublished posts

CREATE INDEX idx_draft_posts

ON posts(user_id)

WHERE published_at IS NULL;

Covering Index

Include all columns needed by a query in the index.

-- Create covering index

CREATE INDEX idx_posts_covering

ON posts(user_id, created_at)

INCLUDE (title, views);

-- This query only reads the index (fast!)

SELECT title, views FROM posts

WHERE user_id = 1

ORDER BY created_at DESC;

💡 Index Best Practices:

• Index columns used in WHERE, JOIN, ORDER BY
• Don't over-index - each index slows writes
• Monitor index usage and remove unused ones
• Consider index size vs benefit
• Rebuild indexes periodically
• Use UNIQUE indexes for constraints

Connection Pooling

Creating database connections is expensive. Connection pooling reuses connections like a taxi stand - taxis wait for passengers instead of driving away after each ride.

// Node.js with pg (PostgreSQL)

const { Pool } = require('pg').Pool;

const pool = new Pool({

host: 'localhost',

database: 'myapp',

max: 20, // Max connections

idleTimeoutMillis: 30000, // Close idle after 30s

connectionTimeoutMillis: 2000 // Wait 2s for connection

});

// Use pooled connection

const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);

🎯 Pool Size Guidelines:

• Formula: connections = ((core_count * 2) + effective_spindle_count)
• Typical: 10-20 connections for most apps
• Monitor: Watch for connection exhaustion
• Don't: Set pool size too high (wastes resources)

Query Result Caching

Cache frequently accessed query results to avoid hitting the database.

// Cache with Redis

async function getUser(userId) {

const cacheKey = `user:${userId}`;

// Try cache first

const cached = await redis.get(cacheKey);

if (cached) return JSON.parse(cached);

// Query database

const result = await pool.query(

'SELECT * FROM users WHERE id = $1',

[userId]

);

// Cache for 1 hour

await redis.setex(cacheKey, 3600, JSON.stringify(result.rows[0]));

return result.rows[0];

}

Table Partitioning

Split large tables into smaller pieces for better performance. Like organizing files into folders by year.

-- Partition by date range

CREATE TABLE logs (

id SERIAL,

message TEXT,

created_at TIMESTAMP

) PARTITION BY RANGE (created_at);

-- Create partitions

CREATE TABLE logs_2024_01 PARTITION OF logs

FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE logs_2024_02 PARTITION OF logs

FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Queries automatically use correct partition

SELECT * FROM logs

WHERE created_at >= '2024-01-15'

AND created_at < '2024-01-20';

Performance Monitoring

You can't optimize what you don't measure. Monitor these key metrics:

📊 Key Metrics:

• Query execution time
• Slow query log
• Connection pool usage
• Cache hit ratio
• Index usage statistics
• Table bloat
• Lock contention

🔧 Monitoring Tools:

• pg_stat_statements (PostgreSQL)
• EXPLAIN ANALYZE
• pgAdmin
• DataDog, New Relic
• Grafana + Prometheus
• Application Performance Monitoring

-- Find slow queries (PostgreSQL)

SELECT

query,

calls,

total_time,

mean_time,

max_time

FROM pg_stat_statements

ORDER BY mean_time DESC

LIMIT 10;

-- Check index usage

SELECT

schemaname,

tablename,

indexname,

idx_scan

FROM pg_stat_user_indexes

WHERE idx_scan < 10

ORDER BY idx_scan;

Optimization Checklist

✅ Query Level

□ Use EXPLAIN ANALYZE on slow queries
□ Select only needed columns
□ Use JOINs instead of multiple queries
□ Add LIMIT for pagination
□ Avoid functions on indexed columns

✅ Index Level

□ Index WHERE clause columns
□ Index JOIN columns
□ Use composite indexes wisely
□ Remove unused indexes
□ Monitor index size

✅ Application Level

□ Use connection pooling
□ Implement caching layer
□ Batch operations when possible
□ Use read replicas for scaling
□ Monitor performance metrics

✅ Database Level

□ Regular VACUUM and ANALYZE
□ Partition large tables
□ Archive old data
□ Tune database configuration
□ Plan for backups and recovery

🛠️ Hands-On Project: Optimize a Slow Application

Take a slow application and optimize it using all the techniques learned.

Project Tasks:

✓ Identify slow queries using EXPLAIN
✓ Add appropriate indexes
✓ Implement connection pooling
✓ Add Redis caching layer
✓ Fix N+1 query problems
✓ Set up monitoring dashboard
✓ Measure before/after performance

📚 Module Summary

You've mastered database optimization:

✓ Query optimization techniques
✓ Advanced indexing strategies
✓ Connection pooling
✓ Query result caching
✓ Table partitioning
✓ Performance monitoring

Congratulations! You've completed the Database Engineering learning path!