Learn PyTorch, the flexible deep learning framework preferred by researchers
PyTorch is Facebook's deep learning framework that feels like writing regular Python code. It's loved by researchers for its flexibility and by companies for its production capabilities. Think of it as the "Pythonic" way to do deep learning!
PyTorch powers many AI breakthroughs including GPT models, DALL-E, and Tesla's self-driving system!
Both are excellent! The choice often comes down to personal preference and use case. Let's compare:
| Feature | PyTorch | TensorFlow |
|---|---|---|
| Learning Curve | Easier (Pythonic) | Steeper (but Keras helps) |
| Debugging | Easy (standard Python) | Harder (graph-based) |
| Flexibility | Very flexible | More structured |
| Research | Preferred by researchers | Also popular |
| Production | Good (TorchServe) | Excellent (TF Serving) |
| Mobile | PyTorch Mobile | TensorFlow Lite (better) |
| Community | Growing fast | Larger, more mature |
Tensors are PyTorch's version of NumPy arrays, but with superpowers! They can run on GPUs for massive speed boosts and automatically calculate gradients for training neural networks.
A tensor is a multi-dimensional array. 1D tensor = vector, 2D tensor = matrix, 3D+ tensor = higher dimensions.
# Install PyTorch
pip install torch torchvision
# Import PyTorch
import torch
import torch.nn as nn
# Create tensors
x = torch.tensor([1, 2, 3]) # 1D tensor
y = torch.tensor([[1, 2], [3, 4]]) # 2D tensor
z = torch.zeros(3, 3) # 3x3 zeros
r = torch.randn(2, 3) # Random normal
# Tensor operations
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
print(a + b) # Element-wise addition
print(a * b) # Element-wise multiplication
print(torch.dot(a, b)) # Dot product
# Move to GPU (if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x_gpu = x.to(device) # Now runs on GPU!
In PyTorch, you build models by creating a class that inherits from nn.Module. This gives you full control over your model's architecture - it's like building with LEGO blocks!
# Define a simple neural network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
# Define layers
self.fc1 = nn.Linear(784, 128) # Input to hidden
self.fc2 = nn.Linear(128, 64) # Hidden layer
self.fc3 = nn.Linear(64, 10) # Hidden to output
self.relu = nn.ReLU()
def forward(self, x):
# Define forward pass
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.fc3(x) # No activation on output
return x
# Create model instance
model = SimpleNN()
print(model)
# Make a prediction
sample_input = torch.randn(1, 784) # Batch of 1
output = model(sample_input)
print(f"Output shape: {output.shape}") # [1, 10]
Unlike TensorFlow/Keras where training is handled by model.fit(), PyTorch gives you full control with explicit training loops. This seems harder at first but gives you ultimate flexibility!
1. Forward pass: Get predictions
2. Calculate loss: How wrong are we?
3. Backward pass: Calculate gradients
4. Update weights: Improve the model
5. Repeat for all batches and epochs
# Setup
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
model = SimpleNN()
criterion = nn.CrossEntropyLoss() # Loss function
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
num_epochs = 10
for epoch in range(num_epochs):
model.train() # Set to training mode
running_loss = 0.0
for inputs, labels in train_loader:
# 1. Zero gradients
optimizer.zero_grad()
# 2. Forward pass
outputs = model(inputs)
# 3. Calculate loss
loss = criterion(outputs, labels)
# 4. Backward pass
loss.backward()
# 5. Update weights
optimizer.step()
running_loss += loss.item()
avg_loss = running_loss / len(train_loader)
print(f"Epoch {epoch+1}, Loss: {avg_loss:.4f}")
# Evaluation
model.eval() # Set to evaluation mode
correct = 0
total = 0
with torch.no_grad(): # Don't calculate gradients
for inputs, labels in test_loader:
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f"Test Accuracy: {accuracy:.2f} %")
Why train from scratch when you can use a model that's already learned from millions of images? Transfer learning lets you use pre-trained models and adapt them to your specific task - it's like hiring an expert instead of training a beginner!
A model trained on ImageNet (1.4 million images, 1000 classes) has learned to recognize edges, textures, shapes, and objects. You can use this knowledge for your task!
Pre-trained Model (ImageNet) β Your Task (Cats vs Dogs)
Benefits:
β’ Train faster (hours instead of days)
β’ Need less data (hundreds instead of thousands)
β’ Better accuracy (expert knowledge)
# Import torchvision models
import torchvision.models as models
import torch.nn as nn
# Load pre-trained ResNet
model = models.resnet18(pretrained=True)
print(model) # See the architecture
# Freeze all layers (don't train them)
for param in model.parameters():
param.requires_grad = False
# Replace final layer for your task
# ResNet18 has 512 features before final layer
num_classes = 2 # Cats vs Dogs
model.fc = nn.Linear(512, num_classes)
# Now train only the final layer
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)
# Train as usual...
PyTorch provides many pre-trained models through torchvision. Here are the most popular ones:
Uses "skip connections" to train very deep networks (up to 152 layers!). Winner of ImageNet 2015.
β Good balance of speed and accuracy
Simple architecture with many layers. Easy to understand but slower than ResNet.
β Good for learning, simple architecture
Designed for mobile devices. Small, fast, but slightly less accurate.
β Perfect for mobile apps and edge devices
State-of-the-art efficiency. Best accuracy for the size.
β Best for production (accuracy + speed)
Make your models faster and smaller without losing much accuracy. Essential for production deployment!
Mixed Precision Training
Use 16-bit floats instead of 32-bit. Trains 2-3x faster with same accuracy!
Model Quantization
Convert to 8-bit integers. 4x smaller, 2-4x faster, minimal accuracy loss.
Model Pruning
Remove unnecessary weights. Can reduce size by 50-90%!
Knowledge Distillation
Train a small model to mimic a large model. Get 90% accuracy with 10% size!
Teacher (large model) β Student (small model)
Let's put it all together with a complete image classification example using transfer learning.
# Complete workflow: Image classification
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
# 1. Data preparation
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
train_dataset = datasets.ImageFolder('data/train', transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# 2. Load pre-trained model
model = models.resnet18(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2) # 2 classes
# 3. Setup training
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 4. Training loop
for epoch in range(10):
model.train()
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# 5. Save model
torch.save(model.state_dict(), 'model.pth')
# 6. Load and use model
model.load_state_dict(torch.load('model.pth'))
model.eval()
with torch.no_grad():
prediction = model(test_image)
You now understand PyTorch and can build flexible deep learning models! In the next module, we'll explore Large Language Models - the technology behind ChatGPT, Claude, and other AI assistants.