Back to Database Engineering

Module 6: Database Optimization

Master query optimization, indexing strategies, and performance tuning for high-scale applications.

Why Database Optimization Matters

A slow database is like traffic congestion - it affects everything. Database optimization ensures your application stays fast as data grows. A well-optimized query can be 100x faster than an unoptimized one!

⚡ Performance Impact:

  • User Experience: Faster page loads, happier users
  • Cost Savings: Less server resources needed
  • Scalability: Handle more users with same hardware
  • Reliability: Fewer timeouts and errors
  • Competitive Edge: Speed is a feature

Query Optimization Techniques

1. Use EXPLAIN to Analyze Queries

EXPLAIN shows how the database executes your query - like an X-ray for SQL.

-- PostgreSQL

EXPLAIN ANALYZE

SELECT * FROM users

WHERE email = 'john@example.com';

-- Look for:

-- Seq Scan (bad) vs Index Scan (good)

-- Execution time

-- Rows scanned vs rows returned

2. Select Only What You Need

Don't use SELECT * - it wastes bandwidth and memory.

❌ Slow:

SELECT * FROM users;

✅ Fast:

SELECT id, username, email

FROM users;

3. Avoid N+1 Query Problem

Loading related data in a loop causes many queries. Use JOINs instead.

❌ N+1 Problem:

// 1 query for posts

posts = SELECT * FROM posts;

// N queries for authors

for each post:

SELECT * FROM users

WHERE id = post.user_id;

✅ Single Query:

SELECT

posts.*,

users.username

FROM posts

JOIN users

ON posts.user_id = users.id;

4. Use LIMIT for Pagination

Don't load all records at once. Use pagination.

-- Page 1 (first 20 records)

SELECT * FROM posts

ORDER BY created_at DESC

LIMIT 20 OFFSET 0;

-- Page 2 (next 20 records)

SELECT * FROM posts

ORDER BY created_at DESC

LIMIT 20 OFFSET 20;

5. Avoid Functions on Indexed Columns

Using functions on indexed columns prevents index usage.

❌ Can't Use Index:

SELECT * FROM users

WHERE LOWER(email) =

'john@example.com';

✅ Uses Index:

SELECT * FROM users

WHERE email =

'john@example.com';

Advanced Indexing Strategies

Indexes are like book indexes - they help find information quickly without reading everything. But too many indexes slow down writes.

Single Column Index

Index frequently queried columns.

-- Create index

CREATE INDEX idx_users_email ON users(email);

-- This query will use the index

SELECT * FROM users WHERE email = 'john@example.com';

Composite Index

Index multiple columns together. Order matters!

-- Create composite index

CREATE INDEX idx_posts_user_created

ON posts(user_id, created_at);

-- Can use index for:

WHERE user_id = 1

WHERE user_id = 1 AND created_at > '2024-01-01'

-- Cannot use index for:

WHERE created_at > '2024-01-01' -- Wrong order!

Partial Index

Index only rows that match a condition - saves space.

-- Index only active users

CREATE INDEX idx_active_users

ON users(username)

WHERE is_active = true;

-- Index only unpublished posts

CREATE INDEX idx_draft_posts

ON posts(user_id)

WHERE published_at IS NULL;

Covering Index

Include all columns needed by a query in the index.

-- Create covering index

CREATE INDEX idx_posts_covering

ON posts(user_id, created_at)

INCLUDE (title, views);

-- This query only reads the index (fast!)

SELECT title, views FROM posts

WHERE user_id = 1

ORDER BY created_at DESC;

💡 Index Best Practices:

  • • Index columns used in WHERE, JOIN, ORDER BY
  • • Don't over-index - each index slows writes
  • • Monitor index usage and remove unused ones
  • • Consider index size vs benefit
  • • Rebuild indexes periodically
  • • Use UNIQUE indexes for constraints

Connection Pooling

Creating database connections is expensive. Connection pooling reuses connections like a taxi stand - taxis wait for passengers instead of driving away after each ride.

// Node.js with pg (PostgreSQL)

const { Pool } = require('pg').Pool;

const pool = new Pool({

host: 'localhost',

database: 'myapp',

max: 20, // Max connections

idleTimeoutMillis: 30000, // Close idle after 30s

connectionTimeoutMillis: 2000 // Wait 2s for connection

});

// Use pooled connection

const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);

🎯 Pool Size Guidelines:

  • Formula: connections = ((core_count * 2) + effective_spindle_count)
  • Typical: 10-20 connections for most apps
  • Monitor: Watch for connection exhaustion
  • Don't: Set pool size too high (wastes resources)

Query Result Caching

Cache frequently accessed query results to avoid hitting the database.

// Cache with Redis

async function getUser(userId) {

const cacheKey = `user:${userId}`;

// Try cache first

const cached = await redis.get(cacheKey);

if (cached) return JSON.parse(cached);

// Query database

const result = await pool.query(

'SELECT * FROM users WHERE id = $1',

[userId]

);

// Cache for 1 hour

await redis.setex(cacheKey, 3600, JSON.stringify(result.rows[0]));

return result.rows[0];

}

Table Partitioning

Split large tables into smaller pieces for better performance. Like organizing files into folders by year.

-- Partition by date range

CREATE TABLE logs (

id SERIAL,

message TEXT,

created_at TIMESTAMP

) PARTITION BY RANGE (created_at);

-- Create partitions

CREATE TABLE logs_2024_01 PARTITION OF logs

FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE logs_2024_02 PARTITION OF logs

FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Queries automatically use correct partition

SELECT * FROM logs

WHERE created_at >= '2024-01-15'

AND created_at < '2024-01-20';

Performance Monitoring

You can't optimize what you don't measure. Monitor these key metrics:

📊 Key Metrics:

  • • Query execution time
  • • Slow query log
  • • Connection pool usage
  • • Cache hit ratio
  • • Index usage statistics
  • • Table bloat
  • • Lock contention

🔧 Monitoring Tools:

  • • pg_stat_statements (PostgreSQL)
  • • EXPLAIN ANALYZE
  • • pgAdmin
  • • DataDog, New Relic
  • • Grafana + Prometheus
  • • Application Performance Monitoring

-- Find slow queries (PostgreSQL)

SELECT

query,

calls,

total_time,

mean_time,

max_time

FROM pg_stat_statements

ORDER BY mean_time DESC

LIMIT 10;

-- Check index usage

SELECT

schemaname,

tablename,

indexname,

idx_scan

FROM pg_stat_user_indexes

WHERE idx_scan < 10

ORDER BY idx_scan;

Optimization Checklist

✅ Query Level

  • □ Use EXPLAIN ANALYZE on slow queries
  • □ Select only needed columns
  • □ Use JOINs instead of multiple queries
  • □ Add LIMIT for pagination
  • □ Avoid functions on indexed columns

✅ Index Level

  • □ Index WHERE clause columns
  • □ Index JOIN columns
  • □ Use composite indexes wisely
  • □ Remove unused indexes
  • □ Monitor index size

✅ Application Level

  • □ Use connection pooling
  • □ Implement caching layer
  • □ Batch operations when possible
  • □ Use read replicas for scaling
  • □ Monitor performance metrics

✅ Database Level

  • □ Regular VACUUM and ANALYZE
  • □ Partition large tables
  • □ Archive old data
  • □ Tune database configuration
  • □ Plan for backups and recovery

🛠️ Hands-On Project: Optimize a Slow Application

Take a slow application and optimize it using all the techniques learned.

Project Tasks:

  • ✓ Identify slow queries using EXPLAIN
  • ✓ Add appropriate indexes
  • ✓ Implement connection pooling
  • ✓ Add Redis caching layer
  • ✓ Fix N+1 query problems
  • ✓ Set up monitoring dashboard
  • ✓ Measure before/after performance

📚 Module Summary

You've mastered database optimization:

  • ✓ Query optimization techniques
  • ✓ Advanced indexing strategies
  • ✓ Connection pooling
  • ✓ Query result caching
  • ✓ Table partitioning
  • ✓ Performance monitoring

Congratulations! You've completed the Database Engineering learning path!