Module 8: AI Integration & Deployment

Deploy AI models to production with APIs, Docker, cloud platforms, and MLOps

🚀 What is Model Deployment?

You've trained an amazing AI model on your laptop - great! But how do users access it? Deployment is the process of making your model available to the world through APIs, web apps, or mobile apps.

Simple Definition

Model Deployment means taking your trained ML model and making it accessible to users or other applications. It's like opening a restaurant after perfecting your recipes!

The Journey:

1. Train model on your computer

2. Save the model

3. Create an API to serve predictions

4. Deploy to a server/cloud

5. Users can now use your model!

🌟 Why Deployment Matters:

• Accessibility: Users can access your model from anywhere
• Scalability: Handle thousands of requests simultaneously
• Monitoring: Track performance and errors in real-time
• Updates: Improve models without disrupting users
• Value: Turn research into real-world impact

Deployment Options

REST API

Most common - HTTP endpoints for predictions

Batch Processing

Process large datasets offline

Edge Deployment

Run on devices (phones, IoT)

Streaming

Real-time predictions on data streams

⚡ FastAPI for ML Models

FastAPI is the modern, fast way to build APIs in Python. It's perfect for serving ML models - automatic documentation, type checking, and blazing fast performance!

Complete ML API Example

# Install FastAPI and dependencies

pip install fastapi uvicorn python-multipart

# main.py - Create ML API

from fastapi import FastAPI, File, UploadFile

from pydantic import BaseModel

import joblib

import numpy as np

from PIL import Image

import io

# Initialize FastAPI app

app = FastAPI(title="ML Model API")

# Load trained model at startup

model = joblib.load("model.pkl")

# Define request/response models

class PredictionInput(BaseModel):

feature1: float

feature2: float

feature3: float

class PredictionOutput(BaseModel):

prediction: float

confidence: float

# Health check endpoint

@app.get("/")

def root():

return {"status": "healthy", "model": "loaded"}

# Prediction endpoint

@app.post("/predict", response_model=PredictionOutput)

def predict(data: PredictionInput):

# Prepare input

features = np.array([[data.feature1, data.feature2, data.feature3]])

# Make prediction

prediction = model.predict(features)[0]

confidence = model.predict_proba(features).max()

return {

"prediction": float(prediction),

"confidence": float(confidence)

}

# Image classification endpoint

@app.post("/predict-image")

async def predict_image(file: UploadFile = File(...)):

# Read and preprocess image

contents = await file.read()

image = Image.open(io.BytesIO(contents))

image = image.resize((224, 224))

img_array = np.array(image) / 255.0

# Predict

prediction = model.predict(np.expand_dims(img_array, axis=0))

return {"class": int(prediction[0])}

# Run the server

if __name__ == "__main__":

import uvicorn

uvicorn.run(app, host="0.0.0.0", port=8000)

Testing the API

# Start the server

uvicorn main:app --reload

# Visit http://localhost:8000/docs for auto-generated docs!

# Test with curl

curl -X POST "http://localhost:8000/predict" \

-H "Content-Type: application/json" \

-d '{\"feature1\": 1.5, \"feature2\": 2.3, \"feature3\": 0.8}'

# Test with Python requests

import requests

response = requests.post(

"http://localhost:8000/predict",

json={"feature1": 1.5, "feature2": 2.3, "feature3": 0.8}

)

print(response.json())

✨ FastAPI Benefits:

• Automatic Docs: Interactive API documentation at /docs
• Type Safety: Pydantic models catch errors early
• Fast: Built on Starlette and Pydantic (very fast)
• Async Support: Handle many requests concurrently
• Easy Testing: Built-in test client

🐳 Docker for ML Models

Docker packages your model, code, and dependencies into a container that runs anywhere. It's like shipping your entire development environment with your model!

Why Docker?

"It works on my machine" is no longer an excuse! Docker ensures your model runs the same way everywhere - your laptop, colleague's computer, or production server.

Dockerfile for ML API

# Dockerfile

# Use official Python image

FROM python:3.9-slim

# Set working directory

WORKDIR /app

# Copy requirements first (for caching)

COPY requirements.txt .

# Install dependencies

RUN pip install --no-cache-dir -r requirements.txt

# Copy application code

COPY . .

# Expose port

EXPOSE 8000

# Run the application

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Docker Commands

# Build Docker image

docker build -t ml-api:latest .

# Run container

docker run -p 8000:8000 ml-api:latest

# Run in background

docker run -d -p 8000:8000 --name ml-api ml-api:latest

# View logs

docker logs ml-api

# Stop container

docker stop ml-api

Docker Compose (Multi-Container)

# docker-compose.yml

version: '3.8'

services:

api:

build: .

ports:

- "8000:8000"

environment:

- MODEL_PATH=/models/model.pkl

volumes:

- ./models:/models

redis:

image: redis:alpine

ports:

- "6379:6379"

# Start all services

docker-compose up -d

🎯 Docker Best Practices:

• Use slim/alpine base images to reduce size
• Copy requirements.txt first for better caching
• Don't include training data in image
• Use .dockerignore to exclude unnecessary files
• Tag images with versions (not just :latest)

☁️ Cloud Deployment

Cloud platforms provide scalable infrastructure for ML models. Deploy once, handle millions of requests!

Major Cloud Platforms

AWS

AWS SageMaker

Complete ML platform - train, deploy, and manage models at scale.

# Deploy with SageMaker SDK

import sagemaker

from sagemaker.sklearn import SKLearnModel

model = SKLearnModel(

model_data='s3://bucket/model.tar.gz',

role=role,

entry_point='inference.py',

framework_version='1.0-1'

)

predictor = model.deploy(

instance_type='ml.t2.medium',

initial_instance_count=1

)

Pros:

• Fully managed
• Auto-scaling
• Built-in monitoring

Cons:

• Can be expensive
• AWS-specific
• Learning curve

Azure

Azure Machine Learning

Enterprise-grade ML platform with strong integration with Microsoft ecosystem.

Features:

• AutoML capabilities
• MLOps integration
• Designer (no-code)
• Strong security

Best For:

• Enterprise deployments
• Microsoft stack users
• Compliance needs

GCP

Google Cloud AI Platform

Integrated with TensorFlow, great for deep learning and large-scale deployments.

Strengths:

• TensorFlow integration
• TPU support
• Vertex AI (unified)
• Good pricing

Use Cases:

• Deep learning
• Computer vision
• NLP applications

🔄 MLOps Pipeline

MLOps (Machine Learning Operations) is DevOps for ML. It's about automating and monitoring the entire ML lifecycle - from training to deployment to monitoring.

The MLOps Lifecycle

1. Data Collection → Gather and version data

2. Data Validation → Check data quality

3. Model Training → Train and experiment

4. Model Validation → Test performance

5. Model Deployment → Push to production

6. Monitoring → Track performance

7. Retraining → Update with new data

Key MLOps Concepts

Model Versioning

Track different versions of models like code (Git for models)

Experiment Tracking

Log hyperparameters, metrics, and artifacts (MLflow, Weights & Biases)

CI/CD for ML

Automate testing and deployment of models

Model Monitoring

Track accuracy, latency, data drift in production

A/B Testing

Compare model versions with real users

Simple Monitoring Example

# Add monitoring to FastAPI

from prometheus_client import Counter, Histogram

import time

# Define metrics

prediction_counter = Counter('predictions_total', 'Total predictions')

prediction_latency = Histogram('prediction_latency_seconds', 'Prediction latency')

@app.post("/predict")

def predict(data: PredictionInput):

start_time = time.time()

# Make prediction

result = model.predict(...)

# Record metrics

prediction_counter.inc()

prediction_latency.observe(time.time() - start_time)

return result

⚠️ Common Production Issues:

• Data Drift: Input data changes over time
• Model Decay: Accuracy decreases as world changes
• Latency: Slow predictions hurt user experience
• Resource Usage: Memory/CPU spikes
• Errors: Edge cases not seen in training

✨ Production Best Practices

Do's

• Version everything: Code, data, models
• Monitor actively: Set up alerts
• Test thoroughly: Unit, integration, load tests
• Use caching: Cache frequent predictions
• Implement fallbacks: Handle model failures
• Log predictions: For debugging and retraining
• Gradual rollouts: Test with small traffic first
• Document APIs: Clear documentation for users

✕Don'ts

• Don't deploy untested: Always test first
• Don't ignore monitoring: Catch issues early
• Don't hardcode configs: Use environment variables
• Don't skip validation: Validate inputs
• Don't forget security: Authentication, rate limiting
• Don't ignore costs: Monitor cloud spending
• Don't deploy without rollback: Have a plan B
• Don't forget documentation: Future you will thank you

📚 Learning Resources

Tools & Platforms

• FastAPI - Modern Python API framework
• Docker - Containerization platform
• AWS SageMaker - ML platform
• MLflow - Experiment tracking

Learning Resources

• Made With ML - MLOps course
• Full Stack Deep Learning - Production ML
• MLOps.org - MLOps community
• Sentdex - Deployment tutorials

🎉 Congratulations!

You've completed the AI & Machine Learning course! You now have the skills to build, train, and deploy AI models. From Python basics to production deployment, you're ready to create real-world AI applications!

What You've Learned:

Python, NumPy, Pandas for data science

Machine learning fundamentals

Deep learning with TensorFlow/PyTorch

Large Language Models and prompt engineering

Natural Language Processing

Computer Vision and CNNs

Model deployment and MLOps

Production best practices

← Previous: Computer Vision Explore More Courses →