Back to AI & Machine Learning

Module 4: PyTorch for Deep Learning

Learn PyTorch, the flexible deep learning framework preferred by researchers

πŸ”₯ What is PyTorch?

PyTorch is Facebook's deep learning framework that feels like writing regular Python code. It's loved by researchers for its flexibility and by companies for its production capabilities. Think of it as the "Pythonic" way to do deep learning!

Why PyTorch?

Feels like regular Python (easy to debug)
Dynamic computation graphs (flexible)
Preferred by researchers (cutting-edge)
Used by Tesla, OpenAI, Meta

🌟 Fun Fact:

PyTorch powers many AI breakthroughs including GPT models, DALL-E, and Tesla's self-driving system!

βš–οΈ PyTorch vs TensorFlow

Both are excellent! The choice often comes down to personal preference and use case. Let's compare:

FeaturePyTorchTensorFlow
Learning CurveEasier (Pythonic)Steeper (but Keras helps)
DebuggingEasy (standard Python)Harder (graph-based)
FlexibilityVery flexibleMore structured
ResearchPreferred by researchersAlso popular
ProductionGood (TorchServe)Excellent (TF Serving)
MobilePyTorch MobileTensorFlow Lite (better)
CommunityGrowing fastLarger, more mature

πŸ’‘ Which to Choose?

  • β€’ PyTorch: Research, experimentation, learning, flexibility
  • β€’ TensorFlow: Production deployment, mobile apps, established pipelines
  • β€’ Both: Many companies use both! Learn one deeply, understand the other.

πŸ“¦ Tensors and Operations

Tensors are PyTorch's version of NumPy arrays, but with superpowers! They can run on GPUs for massive speed boosts and automatically calculate gradients for training neural networks.

What are Tensors?

A tensor is a multi-dimensional array. 1D tensor = vector, 2D tensor = matrix, 3D+ tensor = higher dimensions.

# Install PyTorch

pip install torch torchvision

# Import PyTorch

import torch

import torch.nn as nn

# Create tensors

x = torch.tensor([1, 2, 3]) # 1D tensor

y = torch.tensor([[1, 2], [3, 4]]) # 2D tensor

z = torch.zeros(3, 3) # 3x3 zeros

r = torch.randn(2, 3) # Random normal

# Tensor operations

a = torch.tensor([1.0, 2.0, 3.0])

b = torch.tensor([4.0, 5.0, 6.0])

print(a + b) # Element-wise addition

print(a * b) # Element-wise multiplication

print(torch.dot(a, b)) # Dot product

# Move to GPU (if available)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

x_gpu = x.to(device) # Now runs on GPU!

πŸ—οΈ Building Models with nn.Module

In PyTorch, you build models by creating a class that inherits from nn.Module. This gives you full control over your model's architecture - it's like building with LEGO blocks!

# Define a simple neural network

class SimpleNN(nn.Module):

def __init__(self):

super(SimpleNN, self).__init__()

# Define layers

self.fc1 = nn.Linear(784, 128) # Input to hidden

self.fc2 = nn.Linear(128, 64) # Hidden layer

self.fc3 = nn.Linear(64, 10) # Hidden to output

self.relu = nn.ReLU()

def forward(self, x):

# Define forward pass

x = self.relu(self.fc1(x))

x = self.relu(self.fc2(x))

x = self.fc3(x) # No activation on output

return x

# Create model instance

model = SimpleNN()

print(model)

# Make a prediction

sample_input = torch.randn(1, 784) # Batch of 1

output = model(sample_input)

print(f"Output shape: {output.shape}") # [1, 10]

🎯 Key Concepts:

  • β€’ __init__: Define all layers here
  • β€’ forward: Define how data flows through layers
  • β€’ nn.Linear: Fully connected layer
  • β€’ nn.ReLU: Activation function
  • β€’ PyTorch automatically handles backpropagation!

πŸ”„ Training Loops Explained

Unlike TensorFlow/Keras where training is handled by model.fit(), PyTorch gives you full control with explicit training loops. This seems harder at first but gives you ultimate flexibility!

The Training Loop Pattern

1. Forward pass: Get predictions

2. Calculate loss: How wrong are we?

3. Backward pass: Calculate gradients

4. Update weights: Improve the model

5. Repeat for all batches and epochs

Complete Training Example

# Setup

import torch.optim as optim

from torch.utils.data import DataLoader, TensorDataset

model = SimpleNN()

criterion = nn.CrossEntropyLoss() # Loss function

optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop

num_epochs = 10

for epoch in range(num_epochs):

model.train() # Set to training mode

running_loss = 0.0

for inputs, labels in train_loader:

# 1. Zero gradients

optimizer.zero_grad()

# 2. Forward pass

outputs = model(inputs)

# 3. Calculate loss

loss = criterion(outputs, labels)

# 4. Backward pass

loss.backward()

# 5. Update weights

optimizer.step()

running_loss += loss.item()

avg_loss = running_loss / len(train_loader)

print(f"Epoch {epoch+1}, Loss: {avg_loss:.4f}")

# Evaluation

model.eval() # Set to evaluation mode

correct = 0

total = 0

with torch.no_grad(): # Don't calculate gradients

for inputs, labels in test_loader:

outputs = model(inputs)

_, predicted = torch.max(outputs.data, 1)

total += labels.size(0)

correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total

print(f"Test Accuracy: {accuracy:.2f} %")

πŸ’‘ Important Steps:

  • β€’ optimizer.zero_grad(): Clear old gradients
  • β€’ loss.backward(): Calculate gradients
  • β€’ optimizer.step(): Update weights
  • β€’ model.train()/eval(): Switch modes
  • β€’ torch.no_grad(): Disable gradients for evaluation

πŸŽ“ Transfer Learning

Why train from scratch when you can use a model that's already learned from millions of images? Transfer learning lets you use pre-trained models and adapt them to your specific task - it's like hiring an expert instead of training a beginner!

The Concept

A model trained on ImageNet (1.4 million images, 1000 classes) has learned to recognize edges, textures, shapes, and objects. You can use this knowledge for your task!

Pre-trained Model (ImageNet) β†’ Your Task (Cats vs Dogs)

Benefits:

β€’ Train faster (hours instead of days)

β€’ Need less data (hundreds instead of thousands)

β€’ Better accuracy (expert knowledge)

Using Pre-trained Models

# Import torchvision models

import torchvision.models as models

import torch.nn as nn

# Load pre-trained ResNet

model = models.resnet18(pretrained=True)

print(model) # See the architecture

# Freeze all layers (don't train them)

for param in model.parameters():

param.requires_grad = False

# Replace final layer for your task

# ResNet18 has 512 features before final layer

num_classes = 2 # Cats vs Dogs

model.fc = nn.Linear(512, num_classes)

# Now train only the final layer

optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

# Train as usual...

🎯 Two Strategies:

  • Feature Extraction: Freeze all layers, train only final layer
    Use when: Small dataset, similar to ImageNet
  • Fine-tuning: Unfreeze some layers, train them slowly
    Use when: Larger dataset, different from ImageNet

πŸ† Popular Pre-trained Models

PyTorch provides many pre-trained models through torchvision. Here are the most popular ones:

ResNet (Residual Network)

Uses "skip connections" to train very deep networks (up to 152 layers!). Winner of ImageNet 2015.

models.resnet18(pretrained=True)
models.resnet50(pretrained=True)

βœ… Good balance of speed and accuracy

VGG (Visual Geometry Group)

Simple architecture with many layers. Easy to understand but slower than ResNet.

models.vgg16(pretrained=True)
models.vgg19(pretrained=True)

βœ… Good for learning, simple architecture

MobileNet

Designed for mobile devices. Small, fast, but slightly less accurate.

models.mobilenet_v2(pretrained=True)

βœ… Perfect for mobile apps and edge devices

EfficientNet

State-of-the-art efficiency. Best accuracy for the size.

models.efficientnet_b0(pretrained=True)

βœ… Best for production (accuracy + speed)

⚑ Model Optimization

Make your models faster and smaller without losing much accuracy. Essential for production deployment!

Mixed Precision Training

Use 16-bit floats instead of 32-bit. Trains 2-3x faster with same accuracy!

from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
with autocast():
Β Β output = model(input)

Model Quantization

Convert to 8-bit integers. 4x smaller, 2-4x faster, minimal accuracy loss.

quantized_model = torch.quantization.quantize_dynamic(
Β Β model, {nn.Linear}, dtype=torch.qint8
)

Model Pruning

Remove unnecessary weights. Can reduce size by 50-90%!

import torch.nn.utils.prune as prune
prune.l1_unstructured(module, name='weight', amount=0.3)

Knowledge Distillation

Train a small model to mimic a large model. Get 90% accuracy with 10% size!

Teacher (large model) β†’ Student (small model)

🎯 Complete PyTorch Workflow

Let's put it all together with a complete image classification example using transfer learning.

# Complete workflow: Image classification

import torch

import torch.nn as nn

import torch.optim as optim

from torchvision import datasets, transforms, models

from torch.utils.data import DataLoader

# 1. Data preparation

transform = transforms.Compose([

transforms.Resize(256),

transforms.CenterCrop(224),

transforms.ToTensor(),

transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

])

train_dataset = datasets.ImageFolder('data/train', transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# 2. Load pre-trained model

model = models.resnet18(pretrained=True)

num_features = model.fc.in_features

model.fc = nn.Linear(num_features, 2) # 2 classes

# 3. Setup training

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = model.to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)

# 4. Training loop

for epoch in range(10):

model.train()

for inputs, labels in train_loader:

inputs, labels = inputs.to(device), labels.to(device)

optimizer.zero_grad()

outputs = model(inputs)

loss = criterion(outputs, labels)

loss.backward()

optimizer.step()

# 5. Save model

torch.save(model.state_dict(), 'model.pth')

# 6. Load and use model

model.load_state_dict(torch.load('model.pth'))

model.eval()

with torch.no_grad():

prediction = model(test_image)

πŸ“š Learning Resources

Official Documentation

Practice & Community

🎯 What's Next?

You now understand PyTorch and can build flexible deep learning models! In the next module, we'll explore Large Language Models - the technology behind ChatGPT, Claude, and other AI assistants.