Deploy AI models to production with APIs, Docker, cloud platforms, and MLOps
You've trained an amazing AI model on your laptop - great! But how do users access it? Deployment is the process of making your model available to the world through APIs, web apps, or mobile apps.
Model Deployment means taking your trained ML model and making it accessible to users or other applications. It's like opening a restaurant after perfecting your recipes!
The Journey:
1. Train model on your computer
2. Save the model
3. Create an API to serve predictions
4. Deploy to a server/cloud
5. Users can now use your model!
REST API
Most common - HTTP endpoints for predictions
Batch Processing
Process large datasets offline
Edge Deployment
Run on devices (phones, IoT)
Streaming
Real-time predictions on data streams
FastAPI is the modern, fast way to build APIs in Python. It's perfect for serving ML models - automatic documentation, type checking, and blazing fast performance!
# Install FastAPI and dependencies
pip install fastapi uvicorn python-multipart
# main.py - Create ML API
from fastapi import FastAPI, File, UploadFile
from pydantic import BaseModel
import joblib
import numpy as np
from PIL import Image
import io
# Initialize FastAPI app
app = FastAPI(title="ML Model API")
# Load trained model at startup
model = joblib.load("model.pkl")
# Define request/response models
class PredictionInput(BaseModel):
feature1: float
feature2: float
feature3: float
class PredictionOutput(BaseModel):
prediction: float
confidence: float
# Health check endpoint
@app.get("/")
def root():
return {"status": "healthy", "model": "loaded"}
# Prediction endpoint
@app.post("/predict", response_model=PredictionOutput)
def predict(data: PredictionInput):
# Prepare input
features = np.array([[data.feature1, data.feature2, data.feature3]])
# Make prediction
prediction = model.predict(features)[0]
confidence = model.predict_proba(features).max()
return {
"prediction": float(prediction),
"confidence": float(confidence)
}
# Image classification endpoint
@app.post("/predict-image")
async def predict_image(file: UploadFile = File(...)):
# Read and preprocess image
contents = await file.read()
image = Image.open(io.BytesIO(contents))
image = image.resize((224, 224))
img_array = np.array(image) / 255.0
# Predict
prediction = model.predict(np.expand_dims(img_array, axis=0))
return {"class": int(prediction[0])}
# Run the server
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
# Start the server
uvicorn main:app --reload
# Visit http://localhost:8000/docs for auto-generated docs!
# Test with curl
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{\"feature1\": 1.5, \"feature2\": 2.3, \"feature3\": 0.8}'
# Test with Python requests
import requests
response = requests.post(
"http://localhost:8000/predict",
json={"feature1": 1.5, "feature2": 2.3, "feature3": 0.8}
)
print(response.json())
Docker packages your model, code, and dependencies into a container that runs anywhere. It's like shipping your entire development environment with your model!
"It works on my machine" is no longer an excuse! Docker ensures your model runs the same way everywhere - your laptop, colleague's computer, or production server.
# Dockerfile
# Use official Python image
FROM python:3.9-slim
# Set working directory
WORKDIR /app
# Copy requirements first (for caching)
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 8000
# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
# Build Docker image
docker build -t ml-api:latest .
# Run container
docker run -p 8000:8000 ml-api:latest
# Run in background
docker run -d -p 8000:8000 --name ml-api ml-api:latest
# View logs
docker logs ml-api
# Stop container
docker stop ml-api
# docker-compose.yml
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
environment:
- MODEL_PATH=/models/model.pkl
volumes:
- ./models:/models
redis:
image: redis:alpine
ports:
- "6379:6379"
# Start all services
docker-compose up -d
Cloud platforms provide scalable infrastructure for ML models. Deploy once, handle millions of requests!
Complete ML platform - train, deploy, and manage models at scale.
# Deploy with SageMaker SDK
import sagemaker
from sagemaker.sklearn import SKLearnModel
model = SKLearnModel(
model_data='s3://bucket/model.tar.gz',
role=role,
entry_point='inference.py',
framework_version='1.0-1'
)
predictor = model.deploy(
instance_type='ml.t2.medium',
initial_instance_count=1
)
Pros:
Cons:
Enterprise-grade ML platform with strong integration with Microsoft ecosystem.
Features:
Best For:
Integrated with TensorFlow, great for deep learning and large-scale deployments.
Strengths:
Use Cases:
MLOps (Machine Learning Operations) is DevOps for ML. It's about automating and monitoring the entire ML lifecycle - from training to deployment to monitoring.
1. Data Collection → Gather and version data
2. Data Validation → Check data quality
3. Model Training → Train and experiment
4. Model Validation → Test performance
5. Model Deployment → Push to production
6. Monitoring → Track performance
7. Retraining → Update with new data
Track different versions of models like code (Git for models)
Log hyperparameters, metrics, and artifacts (MLflow, Weights & Biases)
Automate testing and deployment of models
Track accuracy, latency, data drift in production
Compare model versions with real users
# Add monitoring to FastAPI
from prometheus_client import Counter, Histogram
import time
# Define metrics
prediction_counter = Counter('predictions_total', 'Total predictions')
prediction_latency = Histogram('prediction_latency_seconds', 'Prediction latency')
@app.post("/predict")
def predict(data: PredictionInput):
start_time = time.time()
# Make prediction
result = model.predict(...)
# Record metrics
prediction_counter.inc()
prediction_latency.observe(time.time() - start_time)
return result
You've completed the AI & Machine Learning course! You now have the skills to build, train, and deploy AI models. From Python basics to production deployment, you're ready to create real-world AI applications!