Back to Database Engineering

Module 5: Data Modeling & Design

Master database design principles, normalization, and schema patterns for scalable applications.

What is Data Modeling?

Data modeling is like creating a blueprint for your house before building it. It's the process of designing how data will be stored, organized, and related in your database. Good data modeling prevents future headaches and ensures your application scales smoothly.

šŸŽÆ Why Data Modeling Matters:

  • Data Integrity: Prevents inconsistencies and errors
  • Performance: Optimized queries and faster access
  • Scalability: Easier to grow and modify
  • Clarity: Team understands data structure
  • Maintenance: Simpler to update and debug

Entity-Relationship Diagrams (ERD)

ERDs are visual representations of your data model - like a map showing how different pieces of data connect. They use simple shapes to represent entities (tables) and their relationships.

šŸ“Š ERD Components:

  • Entities: Objects or concepts (Users, Products, Orders)
  • Attributes: Properties of entities (name, email, price)
  • Relationships: How entities connect (User "places" Order)
  • Cardinality: One-to-one, one-to-many, many-to-many

Example: E-commerce ERD

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”         ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”         ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│   USERS     │         │   ORDERS    │         │  PRODUCTS   │
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤         ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤         ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
│ id (PK)     │────┐    │ id (PK)     │    ā”Œā”€ā”€ā”€ā”€ā”‚ id (PK)     │
│ username    │    │    │ user_id(FK) │    │    │ name        │
│ email       │    └───{'>'} │ total       │    │    │ price       │
│ created_at  │         │ status      │    │    │ stock       │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜         │ created_at  │    │    ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                        ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜    │
                               │           │
                               │           │
                               ā–¼           │
                        ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”    │
                        │ ORDER_ITEMS │    │
                        ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤    │
                        │ id (PK)     │    │
                        │ order_id(FK)ā”‚ā”€ā”€ā”€ā”€ā”˜
                        │ product_id  ā”‚ā”€ā”€ā”€ā”€ā”˜
                        │ quantity    │
                        │ price       │
                        ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Database Relationships

1. One-to-One (1:1)

One record in Table A relates to exactly one record in Table B. Example: User and Profile.

CREATE TABLE users (

id SERIAL PRIMARY KEY,

username VARCHAR(50) UNIQUE

);

CREATE TABLE user_profiles (

id SERIAL PRIMARY KEY,

user_id INTEGER UNIQUE REFERENCES users(id),

bio TEXT,

avatar_url VARCHAR(255)

);

2. One-to-Many (1:N)

One record in Table A can relate to many records in Table B. Example: User has many Posts.

CREATE TABLE users (

id SERIAL PRIMARY KEY,

username VARCHAR(50)

);

CREATE TABLE posts (

id SERIAL PRIMARY KEY,

user_id INTEGER REFERENCES users(id),

title VARCHAR(200),

content TEXT

);

3. Many-to-Many (M:N)

Many records in Table A relate to many in Table B. Requires a junction table. Example: Students and Courses.

CREATE TABLE students (

id SERIAL PRIMARY KEY,

name VARCHAR(100)

);

CREATE TABLE courses (

id SERIAL PRIMARY KEY,

title VARCHAR(100)

);

-- Junction table

CREATE TABLE enrollments (

student_id INTEGER REFERENCES students(id),

course_id INTEGER REFERENCES courses(id),

enrolled_at TIMESTAMP DEFAULT NOW(),

PRIMARY KEY (student_id, course_id)

);

Normalization

Normalization is the process of organizing data to reduce redundancy and improve integrity. Think of it as decluttering - each piece of information should have one clear home.

First Normal Form (1NF)

Each column contains atomic (indivisible) values. No repeating groups.

āŒ Not 1NF:

Orders

id | customer | products

1 | John | Apple, Banana

āœ… 1NF:

OrderItems

order_id | product

1 | Apple

1 | Banana

Second Normal Form (2NF)

Must be in 1NF. All non-key attributes depend on the entire primary key.

āŒ Not 2NF:

OrderItems

order_id | product_id | product_name

1 | 10 | Apple

āœ… 2NF:

OrderItems: order_id, product_id

Products: id, name

Third Normal Form (3NF)

Must be in 2NF. No transitive dependencies (non-key attributes depending on other non-key attributes).

āŒ Not 3NF:

Orders

id | customer_id | customer_city

1 | 100 | NYC

āœ… 3NF:

Orders: id, customer_id

Customers: id, city

šŸ’” When to Normalize:

  • • Normalize for: Data integrity, reducing redundancy, write-heavy apps
  • • Denormalize for: Read performance, analytics, reporting
  • • Balance: Start normalized, denormalize only when needed

Denormalization

Sometimes breaking normalization rules improves performance. Denormalization adds redundancy intentionally to speed up reads.

-- Normalized (requires JOIN)

SELECT posts.title, users.username

FROM posts

JOIN users ON posts.user_id = users.id;

-- Denormalized (faster, but duplicates username)

CREATE TABLE posts (

id SERIAL PRIMARY KEY,

user_id INTEGER,

username VARCHAR(50), -- Denormalized!

title VARCHAR(200)

);

SELECT title, username FROM posts; -- No JOIN needed

āœ… Denormalize When:

  • • Read performance is critical
  • • Data rarely changes
  • • Complex JOINs slow queries
  • • Analytics/reporting needs

āŒ Risks:

  • • Data inconsistency
  • • Increased storage
  • • Complex update logic
  • • Harder to maintain

Common Schema Design Patterns

Soft Delete Pattern

Instead of deleting records, mark them as deleted. Allows recovery and audit trails.

CREATE TABLE users (

id SERIAL PRIMARY KEY,

username VARCHAR(50),

deleted_at TIMESTAMP NULL

);

-- "Delete" user

UPDATE users SET deleted_at = NOW() WHERE id = 1;

-- Query active users

SELECT * FROM users WHERE deleted_at IS NULL;

Audit Trail Pattern

Track who created/modified records and when.

CREATE TABLE posts (

id SERIAL PRIMARY KEY,

title VARCHAR(200),

created_at TIMESTAMP DEFAULT NOW(),

created_by INTEGER REFERENCES users(id),

updated_at TIMESTAMP,

updated_by INTEGER REFERENCES users(id)

);

Polymorphic Association

One table relates to multiple other tables. Example: Comments on Posts and Photos.

CREATE TABLE comments (

id SERIAL PRIMARY KEY,

content TEXT,

commentable_type VARCHAR(50), -- 'Post' or 'Photo'

commentable_id INTEGER -- ID in that table

);

-- Get comments for a post

SELECT * FROM comments

WHERE commentable_type = 'Post' AND commentable_id = 1;

Versioning Pattern

Keep history of changes to records.

CREATE TABLE document_versions (

id SERIAL PRIMARY KEY,

document_id INTEGER,

version INTEGER,

content TEXT,

created_at TIMESTAMP DEFAULT NOW(),

UNIQUE(document_id, version)

);

Database Migrations

Migrations are version control for your database schema. They track changes and allow you to evolve your database structure safely.

// Example migration (Node.js with Knex.js)

exports.up = function(knex) {

return knex.schema.createTable('users', (table) => {

table.increments('id').primary();

table.string('username', 50).unique().notNullable();

table.string('email', 100).unique().notNullable();

table.timestamps(true, true);

});

};

exports.down = function(knex) {

return knex.schema.dropTable('users');

};

šŸ”§ Migration Best Practices:

  • • Always write both up and down migrations
  • • Test migrations on staging before production
  • • Keep migrations small and focused
  • • Never modify existing migrations
  • • Backup database before running migrations
  • • Use transactions when possible

Database Design Process

Step 1: Requirements Gathering

Understand what data you need to store and how it will be used.

Step 2: Identify Entities

List all the main objects (Users, Products, Orders, etc.).

Step 3: Define Relationships

Determine how entities relate (one-to-many, many-to-many).

Step 4: Create ERD

Draw the entity-relationship diagram.

Step 5: Normalize

Apply normalization rules to reduce redundancy.

Step 6: Optimize

Add indexes, consider denormalization for performance.

Step 7: Implement

Create tables, constraints, and migrations.

šŸ› ļø Hands-On Project: Library Management System

Design a complete database schema for a library management system.

Project Requirements:

  • āœ“ Members can borrow multiple books
  • āœ“ Books can have multiple authors
  • āœ“ Track borrowing history
  • āœ“ Late fee calculation
  • āœ“ Book reservations
  • āœ“ Audit trail for all changes
  • āœ“ Soft delete for members

šŸ“š Module Summary

You've mastered data modeling and database design:

  • āœ“ Entity-Relationship Diagrams
  • āœ“ Database relationships (1:1, 1:N, M:N)
  • āœ“ Normalization and denormalization
  • āœ“ Common schema design patterns
  • āœ“ Database migrations
  • āœ“ Design process and best practices

Next: Learn database optimization and performance tuning!