Module 5: Data Modeling & Design
Master database design principles, normalization, and schema patterns for scalable applications.
What is Data Modeling?
Data modeling is like creating a blueprint for your house before building it. It's the process of designing how data will be stored, organized, and related in your database. Good data modeling prevents future headaches and ensures your application scales smoothly.
šÆ Why Data Modeling Matters:
- Data Integrity: Prevents inconsistencies and errors
- Performance: Optimized queries and faster access
- Scalability: Easier to grow and modify
- Clarity: Team understands data structure
- Maintenance: Simpler to update and debug
Entity-Relationship Diagrams (ERD)
ERDs are visual representations of your data model - like a map showing how different pieces of data connect. They use simple shapes to represent entities (tables) and their relationships.
š ERD Components:
- Entities: Objects or concepts (Users, Products, Orders)
- Attributes: Properties of entities (name, email, price)
- Relationships: How entities connect (User "places" Order)
- Cardinality: One-to-one, one-to-many, many-to-many
Example: E-commerce ERD
āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā
ā USERS ā ā ORDERS ā ā PRODUCTS ā
āāāāāāāāāāāāāā⤠āāāāāāāāāāāāāā⤠āāāāāāāāāāāāāāā¤
ā id (PK) āāāāāā ā id (PK) ā āāāāāā id (PK) ā
ā username ā ā ā user_id(FK) ā ā ā name ā
ā email ā āāāā{'>'} ā total ā ā ā price ā
ā created_at ā ā status ā ā ā stock ā
āāāāāāāāāāāāāāā ā created_at ā ā āāāāāāāāāāāāāāā
āāāāāāāāāāāāāāā ā
ā ā
ā ā
ā¼ ā
āāāāāāāāāāāāāāā ā
ā ORDER_ITEMS ā ā
āāāāāāāāāāāāāā⤠ā
ā id (PK) ā ā
ā order_id(FK)āāāāāā
ā product_id āāāāāā
ā quantity ā
ā price ā
āāāāāāāāāāāāāāāDatabase Relationships
1. One-to-One (1:1)
One record in Table A relates to exactly one record in Table B. Example: User and Profile.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE
);
CREATE TABLE user_profiles (
id SERIAL PRIMARY KEY,
user_id INTEGER UNIQUE REFERENCES users(id),
bio TEXT,
avatar_url VARCHAR(255)
);
2. One-to-Many (1:N)
One record in Table A can relate to many records in Table B. Example: User has many Posts.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50)
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
title VARCHAR(200),
content TEXT
);
3. Many-to-Many (M:N)
Many records in Table A relate to many in Table B. Requires a junction table. Example: Students and Courses.
CREATE TABLE students (
id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE courses (
id SERIAL PRIMARY KEY,
title VARCHAR(100)
);
-- Junction table
CREATE TABLE enrollments (
student_id INTEGER REFERENCES students(id),
course_id INTEGER REFERENCES courses(id),
enrolled_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (student_id, course_id)
);
Normalization
Normalization is the process of organizing data to reduce redundancy and improve integrity. Think of it as decluttering - each piece of information should have one clear home.
First Normal Form (1NF)
Each column contains atomic (indivisible) values. No repeating groups.
ā Not 1NF:
Orders
id | customer | products
1 | John | Apple, Banana
ā 1NF:
OrderItems
order_id | product
1 | Apple
1 | Banana
Second Normal Form (2NF)
Must be in 1NF. All non-key attributes depend on the entire primary key.
ā Not 2NF:
OrderItems
order_id | product_id | product_name
1 | 10 | Apple
ā 2NF:
OrderItems: order_id, product_id
Products: id, name
Third Normal Form (3NF)
Must be in 2NF. No transitive dependencies (non-key attributes depending on other non-key attributes).
ā Not 3NF:
Orders
id | customer_id | customer_city
1 | 100 | NYC
ā 3NF:
Orders: id, customer_id
Customers: id, city
š” When to Normalize:
- ⢠Normalize for: Data integrity, reducing redundancy, write-heavy apps
- ⢠Denormalize for: Read performance, analytics, reporting
- ⢠Balance: Start normalized, denormalize only when needed
Denormalization
Sometimes breaking normalization rules improves performance. Denormalization adds redundancy intentionally to speed up reads.
-- Normalized (requires JOIN)
SELECT posts.title, users.username
FROM posts
JOIN users ON posts.user_id = users.id;
-- Denormalized (faster, but duplicates username)
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER,
username VARCHAR(50), -- Denormalized!
title VARCHAR(200)
);
SELECT title, username FROM posts; -- No JOIN needed
ā Denormalize When:
- ⢠Read performance is critical
- ⢠Data rarely changes
- ⢠Complex JOINs slow queries
- ⢠Analytics/reporting needs
ā Risks:
- ⢠Data inconsistency
- ⢠Increased storage
- ⢠Complex update logic
- ⢠Harder to maintain
Common Schema Design Patterns
Soft Delete Pattern
Instead of deleting records, mark them as deleted. Allows recovery and audit trails.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50),
deleted_at TIMESTAMP NULL
);
-- "Delete" user
UPDATE users SET deleted_at = NOW() WHERE id = 1;
-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;
Audit Trail Pattern
Track who created/modified records and when.
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title VARCHAR(200),
created_at TIMESTAMP DEFAULT NOW(),
created_by INTEGER REFERENCES users(id),
updated_at TIMESTAMP,
updated_by INTEGER REFERENCES users(id)
);
Polymorphic Association
One table relates to multiple other tables. Example: Comments on Posts and Photos.
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
content TEXT,
commentable_type VARCHAR(50), -- 'Post' or 'Photo'
commentable_id INTEGER -- ID in that table
);
-- Get comments for a post
SELECT * FROM comments
WHERE commentable_type = 'Post' AND commentable_id = 1;
Versioning Pattern
Keep history of changes to records.
CREATE TABLE document_versions (
id SERIAL PRIMARY KEY,
document_id INTEGER,
version INTEGER,
content TEXT,
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE(document_id, version)
);
Database Migrations
Migrations are version control for your database schema. They track changes and allow you to evolve your database structure safely.
// Example migration (Node.js with Knex.js)
exports.up = function(knex) {
return knex.schema.createTable('users', (table) => {
table.increments('id').primary();
table.string('username', 50).unique().notNullable();
table.string('email', 100).unique().notNullable();
table.timestamps(true, true);
});
};
exports.down = function(knex) {
return knex.schema.dropTable('users');
};
š§ Migration Best Practices:
- ⢠Always write both up and down migrations
- ⢠Test migrations on staging before production
- ⢠Keep migrations small and focused
- ⢠Never modify existing migrations
- ⢠Backup database before running migrations
- ⢠Use transactions when possible
Database Design Process
Step 1: Requirements Gathering
Understand what data you need to store and how it will be used.
Step 2: Identify Entities
List all the main objects (Users, Products, Orders, etc.).
Step 3: Define Relationships
Determine how entities relate (one-to-many, many-to-many).
Step 4: Create ERD
Draw the entity-relationship diagram.
Step 5: Normalize
Apply normalization rules to reduce redundancy.
Step 6: Optimize
Add indexes, consider denormalization for performance.
Step 7: Implement
Create tables, constraints, and migrations.
š ļø Hands-On Project: Library Management System
Design a complete database schema for a library management system.
Project Requirements:
- ā Members can borrow multiple books
- ā Books can have multiple authors
- ā Track borrowing history
- ā Late fee calculation
- ā Book reservations
- ā Audit trail for all changes
- ā Soft delete for members
š Module Summary
You've mastered data modeling and database design:
- ā Entity-Relationship Diagrams
- ā Database relationships (1:1, 1:N, M:N)
- ā Normalization and denormalization
- ā Common schema design patterns
- ā Database migrations
- ā Design process and best practices
Next: Learn database optimization and performance tuning!