Module 5: Large Language Models

Master LLMs, transformers, prompt engineering, and build AI-powered applications

🤖 What are Large Language Models?

Imagine having a super-smart assistant that has read almost everything on the internet - books, articles, code, conversations. It can write essays, answer questions, write code, translate languages, and even have conversations. That's a Large Language Model (LLM)!

Simple Definition

A Large Language Model is an AI trained on massive amounts of text (billions of words) to understand and generate human-like text. It predicts what words should come next based on patterns it learned from training data.

Think of it like super-advanced autocomplete:

You type: "The capital of France is..."

LLM predicts: "Paris"

But it can do much more complex tasks!

🌟 Real-World Examples:

• ChatGPT: Conversational AI that answers questions and helps with tasks
• GitHub Copilot: Writes code based on your comments
• Grammarly: Improves your writing with AI suggestions
• Google Translate: Translates between languages
• Content Generation: Write articles, emails, marketing copy

Why "Large"?

Massive Data

Trained on billions of web pages, books, and documents

Huge Parameters

GPT-3 has 175 billion parameters (learned values)

Powerful Compute

Requires thousands of GPUs and months to train

🔄 How Transformers Work

Transformers are the architecture behind modern LLMs. Think of them as a super-efficient way to understand context and relationships between words, no matter how far apart they are in a sentence.

The Attention Mechanism

The key innovation is "attention" - the model learns which words are important for understanding other words. It's like highlighting the most relevant parts of a text.

Example sentence: "The cat sat on the mat because it was tired."

Question: What does "it" refer to?

Attention mechanism looks back and focuses on "cat" (not "mat")

It understands context and relationships!

Transformer Components

Tokenization

Break text into pieces (tokens) - words or subwords

"Hello world" → ["Hello", " world"]

Embeddings

Convert tokens to numbers (vectors) that capture meaning

"cat" and "kitten" have similar vectors

Self-Attention

Each word looks at all other words to understand context

Determines which words are relevant to each other

Feed-Forward Networks

Process the attended information through neural networks

Transforms and refines the understanding

💡 Why Transformers Changed Everything:

• Parallel Processing: Can process all words at once (not one-by-one)
• Long-Range Dependencies: Understands relationships across entire documents
• Transfer Learning: Pre-train once, fine-tune for many tasks
• Scalability: Performance improves with more data and compute

🎯 Popular LLMs: GPT, BERT, Claude

Different LLMs are designed for different tasks. Let's understand the major players and when to use each.

GPT

GPT (Generative Pre-trained Transformer)

Created by OpenAI. Best for generating text, conversations, and creative tasks. Predicts the next word based on previous words (left-to-right).

Best For:

• Text generation and completion
• Conversations (ChatGPT)
• Code generation
• Creative writing
• Question answering

Versions:

• GPT-3.5: Fast, cost-effective
• GPT-4: Most capable, slower
• GPT-4 Turbo: Faster GPT-4

BERT

BERT (Bidirectional Encoder Representations)

Created by Google. Best for understanding text. Looks at words from both directions (bidirectional) to understand context better.

Best For:

• Text classification
• Sentiment analysis
• Named entity recognition
• Question answering
• Search and retrieval

Key Difference:

BERT understands text (encoder), GPT generates text (decoder)

Claude

Claude (by Anthropic)

Focused on being helpful, harmless, and honest. Great for long documents and detailed analysis. Similar to GPT but with different training approach.

Best For:

• Long document analysis
• Detailed explanations
• Code review and debugging
• Research assistance
• Safe, reliable outputs

Strengths:

• 100K+ token context
• Strong reasoning
• Constitutional AI training

🔌 OpenAI API Integration

Instead of training your own LLM (which costs millions), you can use APIs to access powerful models. Let's build a simple chatbot using OpenAI's API!

Setup

# Install the OpenAI library

pip install openai

# Get your API key from platform.openai.com

# Set it as an environment variable

export OPENAI_API_KEY='your-api-key-here'

Complete Example: AI Chatbot

# Import the library

from openai import OpenAI

import os

# Initialize the client

client = OpenAI(

api_key=os.environ.get("OPENAI_API_KEY")

)

# Simple chat completion

response = client.chat.completions.create(

model="gpt-3.5-turbo", # or "gpt-4"

messages=[

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "Explain quantum computing in simple terms."}

temperature=0.7, # Creativity (0-2)

max_tokens=150 # Response length

)

# Get the response

answer = response.choices[0].message.content

print(answer)

Interactive Chatbot

# Chatbot with conversation history

def chatbot():

messages = [

{"role": "system", "content": "You are a friendly AI assistant."}

]

print("Chatbot started! Type 'quit' to exit.")

while True:

# Get user input

user_input = input("You: ")

if user_input.lower() == 'quit':

break

# Add user message to history

messages.append({"role": "user", "content": user_input})

# Get AI response

response = client.chat.completions.create(

model="gpt-3.5-turbo",

messages=messages

)

assistant_message = response.choices[0].message.content

messages.append({"role": "assistant", "content": assistant_message})

print(f"AI: {assistant_message}")

# Run the chatbot

chatbot()

⚠️ Important Parameters:

• temperature (0-2): Higher = more creative/random, Lower = more focused/deterministic
• max_tokens: Maximum length of response (1 token ≈ 0.75 words)
• top_p (0-1): Alternative to temperature for controlling randomness
• frequency_penalty: Reduce repetition of words
• presence_penalty: Encourage new topics

✍️ Prompt Engineering

Prompt engineering is the art of writing instructions that get the best results from LLMs. The same question asked differently can give vastly different answers!

What is a Prompt?

A prompt is the input you give to an LLM. It can be a question, instruction, or context. Good prompts are clear, specific, and provide necessary context.

Prompt Engineering Techniques

1. Be Specific and Clear

❌ Vague Prompt:

"Tell me about dogs."

Too broad, unclear what you want

✅ Specific Prompt:

"List 5 dog breeds suitable for apartment living, with brief descriptions."

Clear, specific, actionable

2. Provide Context and Role

✅ Good Prompt:

"You are an experienced Python developer. Explain list comprehensions to a beginner who knows basic Python syntax. Use simple examples."

Sets role, audience, and style

3. Use Examples (Few-Shot Learning)

Prompt:

"Convert these sentences to questions:

Sentence: The sky is blue.

Question: What color is the sky?

Sentence: She lives in Paris.

Question: Where does she live?

Sentence: The meeting starts at 3pm.

Question:"

AI completes: "What time does the meeting start?"

4. Chain of Thought (Step-by-Step)

✅ Better Results:

"Solve this problem step by step: If a train travels 60 mph for 2.5 hours, how far does it go? Show your work."

Asking for steps improves accuracy

5. Specify Output Format

"List the top 3 programming languages for web development.

Format your response as:

1. [Language]: [Brief description]

2. [Language]: [Brief description]

3. [Language]: [Brief description]"

🎯 Pro Tips:

• Iterate: Try different phrasings to find what works best
• Use delimiters: Triple quotes or ### to separate sections
• Set constraints: "In 100 words or less..."
• Ask for verification: "Double-check your answer"
• Use system messages for consistent behavior

🔍 RAG (Retrieval Augmented Generation)

LLMs don't know about your company's data or recent events. RAG solves this by retrieving relevant information from your documents and feeding it to the LLM along with the question!

How RAG Works

1. Index Your Documents

Split documents into chunks, convert to embeddings, store in vector database

2. User Asks Question

"What is our company's vacation policy?"

3. Retrieve Relevant Chunks

Search vector database for most similar content

4. Augment Prompt

Combine question + retrieved context

5. Generate Answer

LLM answers based on provided context

Simple RAG Example

# Install required libraries

pip install openai chromadb

# Import libraries

from openai import OpenAI

import chromadb

# Sample documents

documents = [

"Our company offers 20 days of vacation per year.",

"Employees can work remotely 3 days per week.",

"Health insurance covers dental and vision."

]

# Create vector database

client = chromadb.Client()

collection = client.create_collection("company_docs")

# Add documents

collection.add(

documents=documents,

ids=["doc1", "doc2", "doc3"]

)

# Query function

def ask_question(question):

# Retrieve relevant documents

results = collection.query(

query_texts=[question],

n_results=2 # Top 2 relevant docs

)

# Get context

context = "\\n".join(results['documents'][0])

# Create prompt with context

prompt = f"""Answer the question based on this context:

Context: {context}

Question: {question}

Answer:"""

# Get answer from LLM

openai_client = OpenAI()

response = openai_client.chat.completions.create(

model="gpt-3.5-turbo",

messages=[{"role": "user", "content": prompt}]

)

return response.choices[0].message.content

# Test it

answer = ask_question("How many vacation days do we get?")

print(answer)

🎯 RAG Use Cases:

• Customer Support: Answer questions from documentation
• Internal Knowledge Base: Company policies, procedures
• Research Assistant: Query scientific papers
• Code Documentation: Search and explain codebases
• Legal/Compliance: Find relevant regulations

🔗 LangChain: Building AI Apps

LangChain is a framework that makes it easy to build applications with LLMs. It provides tools for chaining prompts, managing memory, connecting to data sources, and more!

What is LangChain?

LangChain is like a toolkit for LLM applications. Instead of writing everything from scratch, you use pre-built components for common tasks like prompts, chains, agents, and memory.

Key Concepts

Chains

Combine multiple LLM calls in sequence. Output of one becomes input of next.

Agents

LLM decides which tools to use and in what order to accomplish a task.

Memory

Remember previous conversations and context across interactions.

Tools

Give LLM access to external functions like search, calculators, APIs.

Simple LangChain Example

# Install LangChain

pip install langchain langchain-openai

# Basic chain example

from langchain_openai import ChatOpenAI

from langchain.prompts import ChatPromptTemplate

from langchain.chains import LLMChain

# Initialize LLM

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

# Create prompt template

prompt = ChatPromptTemplate.from_template(

"Write a {length} poem about {topic}."

)

# Create chain

chain = prompt | llm

# Run chain

result = chain.invoke({

"length": "short",

"topic": "artificial intelligence"

})

print(result.content)

Chatbot with Memory

# Chatbot that remembers conversation

from langchain.memory import ConversationBufferMemory

from langchain.chains import ConversationChain

# Create memory

memory = ConversationBufferMemory()

# Create conversation chain

conversation = ConversationChain(

llm=llm,

memory=memory,

verbose=True # See what's happening

)

# Have a conversation

conversation.predict(input="Hi, my name is Alice.")

conversation.predict(input="What's my name?")

# It remembers: "Your name is Alice!"

🚀 What You Can Build:

• Chatbots with memory and personality
• Document Q&A systems (RAG)
• Agents that use tools (search, calculator, APIs)
• Multi-step reasoning systems
• Code generation and analysis tools

💰 Token Limits and Costs

LLMs charge by tokens (pieces of words). Understanding tokens and costs is crucial for building cost-effective applications!

What are Tokens?

Tokens are pieces of words. 1 token ≈ 0.75 words (or 4 characters). Both input and output count!

Examples:

"Hello world" = 2 tokens

"ChatGPT is amazing!" = 5 tokens

"artificial intelligence" = 4 tokens

Pricing (as of 2024)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
GPT-3.5 Turbo	$0.50	$1.50	16K tokens
GPT-4	$30.00	$60.00	8K tokens
GPT-4 Turbo	$10.00	$30.00	128K tokens
Claude 3 Sonnet	$3.00	$15.00	200K tokens

💡 Cost Optimization Tips:

• Use GPT-3.5 for simple tasks, GPT-4 for complex reasoning
• Keep prompts concise - every word costs money
• Cache responses for repeated questions
• Use streaming for better UX without extra cost
• Set max_tokens to limit response length
• Monitor usage with OpenAI dashboard

✨ Best Practices for Using LLMs

Do's

• Validate outputs: Always verify critical information
• Use system messages: Set consistent behavior
• Handle errors: Implement retry logic and fallbacks
• Monitor costs: Track token usage
• Test prompts: Iterate to find what works
• Add context: Provide relevant information
• Use streaming: Better UX for long responses
• Implement rate limiting: Avoid API limits

✕Don'ts

• Don't trust blindly: LLMs can hallucinate (make things up)
• Don't expose API keys: Keep them secret and secure
• Don't send sensitive data: Privacy concerns
• Don't ignore token limits: Requests will fail
• Don't use for critical decisions: Without human review
• Don't forget error handling: APIs can fail
• Don't over-engineer: Start simple
• Don't ignore costs: Can add up quickly

🔒 Security & Privacy:

• Never hardcode API keys - use environment variables
• Don't send PII (personally identifiable information) to LLMs
• Use OpenAI's data retention policies (opt-out of training)
• Implement input validation to prevent prompt injection
• Consider using Azure OpenAI for enterprise compliance

📚 Learning Resources

Official Documentation

• OpenAI API Docs - Complete API reference
• Anthropic Claude Docs - Claude API guide
• Hugging Face Docs - Open-source models
• LangChain Docs - Framework documentation

Learning Platforms

• DeepLearning.AI - Free LLM courses
• Prompting Guide - Prompt engineering
• OpenAI Cookbook - Code examples
• Andrej Karpathy - Deep dives on YouTube

🎯 What's Next?

You now understand LLMs, transformers, and how to build AI applications! Next, we'll explore Natural Language Processing (NLP) - the techniques that power text understanding and generation.

← Previous: PyTorch Next: Natural Language Processing →