Master LLMs, transformers, prompt engineering, and build AI-powered applications
Imagine having a super-smart assistant that has read almost everything on the internet - books, articles, code, conversations. It can write essays, answer questions, write code, translate languages, and even have conversations. That's a Large Language Model (LLM)!
A Large Language Model is an AI trained on massive amounts of text (billions of words) to understand and generate human-like text. It predicts what words should come next based on patterns it learned from training data.
Think of it like super-advanced autocomplete:
You type: "The capital of France is..."
LLM predicts: "Paris"
But it can do much more complex tasks!
Massive Data
Trained on billions of web pages, books, and documents
Huge Parameters
GPT-3 has 175 billion parameters (learned values)
Powerful Compute
Requires thousands of GPUs and months to train
Transformers are the architecture behind modern LLMs. Think of them as a super-efficient way to understand context and relationships between words, no matter how far apart they are in a sentence.
The key innovation is "attention" - the model learns which words are important for understanding other words. It's like highlighting the most relevant parts of a text.
Example sentence: "The cat sat on the mat because it was tired."
Question: What does "it" refer to?
Attention mechanism looks back and focuses on "cat" (not "mat")
It understands context and relationships!
Break text into pieces (tokens) - words or subwords
"Hello world" → ["Hello", " world"]
Convert tokens to numbers (vectors) that capture meaning
"cat" and "kitten" have similar vectors
Each word looks at all other words to understand context
Determines which words are relevant to each other
Process the attended information through neural networks
Transforms and refines the understanding
Different LLMs are designed for different tasks. Let's understand the major players and when to use each.
Created by OpenAI. Best for generating text, conversations, and creative tasks. Predicts the next word based on previous words (left-to-right).
Best For:
Versions:
Created by Google. Best for understanding text. Looks at words from both directions (bidirectional) to understand context better.
Best For:
Key Difference:
BERT understands text (encoder), GPT generates text (decoder)
Focused on being helpful, harmless, and honest. Great for long documents and detailed analysis. Similar to GPT but with different training approach.
Best For:
Strengths:
Instead of training your own LLM (which costs millions), you can use APIs to access powerful models. Let's build a simple chatbot using OpenAI's API!
# Install the OpenAI library
pip install openai
# Get your API key from platform.openai.com
# Set it as an environment variable
export OPENAI_API_KEY='your-api-key-here'
# Import the library
from openai import OpenAI
import os
# Initialize the client
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY")
)
# Simple chat completion
response = client.chat.completions.create(
model="gpt-3.5-turbo", # or "gpt-4"
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.7, # Creativity (0-2)
max_tokens=150 # Response length
)
# Get the response
answer = response.choices[0].message.content
print(answer)
# Chatbot with conversation history
def chatbot():
messages = [
{"role": "system", "content": "You are a friendly AI assistant."}
]
print("Chatbot started! Type 'quit' to exit.")
while True:
# Get user input
user_input = input("You: ")
if user_input.lower() == 'quit':
break
# Add user message to history
messages.append({"role": "user", "content": user_input})
# Get AI response
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages
)
assistant_message = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_message})
print(f"AI: {assistant_message}")
# Run the chatbot
chatbot()
Prompt engineering is the art of writing instructions that get the best results from LLMs. The same question asked differently can give vastly different answers!
A prompt is the input you give to an LLM. It can be a question, instruction, or context. Good prompts are clear, specific, and provide necessary context.
❌ Vague Prompt:
"Tell me about dogs."
Too broad, unclear what you want
✅ Specific Prompt:
"List 5 dog breeds suitable for apartment living, with brief descriptions."
Clear, specific, actionable
✅ Good Prompt:
"You are an experienced Python developer. Explain list comprehensions to a beginner who knows basic Python syntax. Use simple examples."
Sets role, audience, and style
Prompt:
"Convert these sentences to questions:
Sentence: The sky is blue.
Question: What color is the sky?
Sentence: She lives in Paris.
Question: Where does she live?
Sentence: The meeting starts at 3pm.
Question:"
AI completes: "What time does the meeting start?"
✅ Better Results:
"Solve this problem step by step: If a train travels 60 mph for 2.5 hours, how far does it go? Show your work."
Asking for steps improves accuracy
"List the top 3 programming languages for web development.
Format your response as:
1. [Language]: [Brief description]
2. [Language]: [Brief description]
3. [Language]: [Brief description]"
LLMs don't know about your company's data or recent events. RAG solves this by retrieving relevant information from your documents and feeding it to the LLM along with the question!
1. Index Your Documents
Split documents into chunks, convert to embeddings, store in vector database
2. User Asks Question
"What is our company's vacation policy?"
3. Retrieve Relevant Chunks
Search vector database for most similar content
4. Augment Prompt
Combine question + retrieved context
5. Generate Answer
LLM answers based on provided context
# Install required libraries
pip install openai chromadb
# Import libraries
from openai import OpenAI
import chromadb
# Sample documents
documents = [
"Our company offers 20 days of vacation per year.",
"Employees can work remotely 3 days per week.",
"Health insurance covers dental and vision."
]
# Create vector database
client = chromadb.Client()
collection = client.create_collection("company_docs")
# Add documents
collection.add(
documents=documents,
ids=["doc1", "doc2", "doc3"]
)
# Query function
def ask_question(question):
# Retrieve relevant documents
results = collection.query(
query_texts=[question],
n_results=2 # Top 2 relevant docs
)
# Get context
context = "\\n".join(results['documents'][0])
# Create prompt with context
prompt = f"""Answer the question based on this context:
Context: {context}
Question: {question}
Answer:"""
# Get answer from LLM
openai_client = OpenAI()
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Test it
answer = ask_question("How many vacation days do we get?")
print(answer)
LangChain is a framework that makes it easy to build applications with LLMs. It provides tools for chaining prompts, managing memory, connecting to data sources, and more!
LangChain is like a toolkit for LLM applications. Instead of writing everything from scratch, you use pre-built components for common tasks like prompts, chains, agents, and memory.
Combine multiple LLM calls in sequence. Output of one becomes input of next.
LLM decides which tools to use and in what order to accomplish a task.
Remember previous conversations and context across interactions.
Give LLM access to external functions like search, calculators, APIs.
# Install LangChain
pip install langchain langchain-openai
# Basic chain example
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
# Create prompt template
prompt = ChatPromptTemplate.from_template(
"Write a {length} poem about {topic}."
)
# Create chain
chain = prompt | llm
# Run chain
result = chain.invoke({
"length": "short",
"topic": "artificial intelligence"
})
print(result.content)
# Chatbot that remembers conversation
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
# Create memory
memory = ConversationBufferMemory()
# Create conversation chain
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True # See what's happening
)
# Have a conversation
conversation.predict(input="Hi, my name is Alice.")
conversation.predict(input="What's my name?")
# It remembers: "Your name is Alice!"
LLMs charge by tokens (pieces of words). Understanding tokens and costs is crucial for building cost-effective applications!
Tokens are pieces of words. 1 token ≈ 0.75 words (or 4 characters). Both input and output count!
Examples:
"Hello world" = 2 tokens
"ChatGPT is amazing!" = 5 tokens
"artificial intelligence" = 4 tokens
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|
| GPT-3.5 Turbo | $0.50 | $1.50 | 16K tokens |
| GPT-4 | $30.00 | $60.00 | 8K tokens |
| GPT-4 Turbo | $10.00 | $30.00 | 128K tokens |
| Claude 3 Sonnet | $3.00 | $15.00 | 200K tokens |
You now understand LLMs, transformers, and how to build AI applications! Next, we'll explore Natural Language Processing (NLP) - the techniques that power text understanding and generation.