Build neural networks and deep learning models with TensorFlow and Keras
Imagine your brain: billions of neurons connected together, passing signals to recognize patterns, make decisions, and learn from experience. Artificial Neural Networks work the same way - they're computer systems inspired by the human brain!
A Neural Network is a series of algorithms that tries to recognize patterns in data by mimicking how the human brain works. It consists of layers of interconnected "neurons" that process and transform information.
Structure:
Input Layer → Hidden Layers → Output Layer
Example: Image Recognition
Pixels → Pattern Detection → "It's a cat!"
"Deep" refers to having many hidden layers (sometimes hundreds!). More layers = ability to learn more complex patterns. Traditional ML uses 0-2 layers, deep learning uses 10-100+ layers.
A neuron is the basic building block of a neural network. Think of it as a tiny decision-maker that takes inputs, processes them, and produces an output.
1. Receive Inputs
Gets numbers from previous layer (or raw data)
inputs = [0.5, 0.8, 0.3]
2. Multiply by Weights
Each input has a "weight" (importance)
weights = [0.4, 0.6, 0.2]
weighted = [0.2, 0.48, 0.06]
3. Sum and Add Bias
Add all weighted inputs + bias term
sum = 0.2 + 0.48 + 0.06 + 0.1 = 0.84
4. Apply Activation Function
Transform the sum (adds non-linearity)
output = activation(0.84) = 0.70
Imagine deciding whether to go to a party. You consider factors (inputs): weather (0.8 good), friends going (0.9), tiredness (0.3). Each factor has importance (weights). You sum them up, and if the total exceeds your threshold, you go! That's how a neuron decides.
Activation functions add non-linearity to neural networks, allowing them to learn complex patterns. Without them, neural networks would just be fancy linear regression!
The most popular activation function. Simple: if input is positive, keep it; if negative, make it zero.
ReLU(x) = max(0, x)
Examples:
ReLU(5) = 5
ReLU(-3) = 0
✅ Fast, works well in hidden layers, prevents vanishing gradients
Squashes any number into a range between 0 and 1. Perfect for probabilities!
Sigmoid(x) = 1 / (1 + e^(-x))
Examples:
Sigmoid(0) = 0.5
Sigmoid(5) = 0.99
Sigmoid(-5) = 0.01
✅ Good for binary classification output layer
Converts numbers into probabilities that sum to 1. Perfect for multi-class classification!
Input: [2.0, 1.0, 0.1]
Softmax: [0.66, 0.24, 0.10]
Sum = 1.0 (100%)
✅ Perfect for output layer when predicting multiple classes
TensorFlow is Google's powerful deep learning framework. Keras is a high-level API that makes TensorFlow easy to use. Think of TensorFlow as the engine and Keras as the steering wheel!
# Install TensorFlow
pip install tensorflow
# Import libraries
import tensorflow as tf
from tensorflow import keras
from keras import layers
import numpy as np
# Create a simple neural network
model = keras.Sequential([
# Input layer (784 features for 28x28 images)
layers.Dense(128, activation='relu', input_shape=(784,)),
# Hidden layer
layers.Dense(64, activation='relu'),
# Output layer (10 classes for digits 0-9)
layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(
optimizer='adam', # How to update weights
loss='sparse_categorical_crossentropy', # Error function
metrics=['accuracy'] # What to track
)
# Train the model
history = model.fit(
X_train, y_train,
epochs=10, # Number of times to see all data
batch_size=32, # Process 32 samples at a time
validation_split=0.2 # Use 20% for validation
)
# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_acc:.2%}")
# Make predictions
predictions = model.predict(X_test[:5])
print(f"Predicted classes: {np.argmax(predictions, axis=1)}")
Convolutional Neural Networks (CNNs) are specialized for processing images. They're inspired by how your visual cortex processes what you see - detecting edges, then shapes, then objects!
Imagine sliding a small window (filter) across an image, looking for specific patterns like edges or corners. That's convolution! The network learns what patterns to look for.
Layer 1: Detects edges and simple patterns
Layer 2: Combines edges into shapes
Layer 3: Combines shapes into parts (eyes, ears)
Layer 4: Combines parts into objects (cat, dog)
# Build a CNN for image classification
model = keras.Sequential([
# Convolutional layer: learn 32 filters
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
# Pooling: reduce size, keep important features
layers.MaxPooling2D((2, 2)),
# Another conv layer: learn 64 filters
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
# Flatten to 1D for dense layers
layers.Flatten(),
# Dense layers for classification
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
Recurrent Neural Networks (RNNs) are designed for sequential data like text, time series, or audio. They have "memory" - they remember previous inputs when processing new ones!
When reading a sentence, you remember previous words to understand the current word. RNNs work the same way - they pass information from one step to the next.
Input: "The cat sat on the ___"
↓
RNN remembers: "cat" + "sat" + "on" + "the"
↓
Prediction: "mat" (makes sense in context!)
Long Short-Term Memory (LSTM) is an improved RNN that can remember information for longer periods. It's like having both short-term and long-term memory!
# Build an LSTM for text classification
model = keras.Sequential([
# Embedding: convert words to vectors
layers.Embedding(10000, 128),
# LSTM layer with 64 units
layers.LSTM(64),
# Output layer
layers.Dense(1, activation='sigmoid')
])
Training a neural network is like teaching a student. You show examples, check their answers, and help them improve. Let's understand the key concepts.
Epochs
One complete pass through all training data. More epochs = more learning, but too many = overfitting.
epochs=10 means see all data 10 times
Batches
Number of samples processed before updating weights. Smaller batches = more updates but noisier.
batch_size=32 means process 32 samples at once
Loss Function
Measures how wrong the model is. Training tries to minimize this. Lower loss = better model.
Common: CrossEntropy (classification), MSE (regression)
Optimizer
Algorithm that updates weights to reduce loss. Adam is most popular - it's like smart trial and error.
Adam, SGD, RMSprop are common optimizers
You now understand deep learning and can build neural networks with TensorFlow! In the next module, we'll explore PyTorch - another powerful deep learning framework with a different approach.