Master Python fundamentals and essential libraries for AI and Machine Learning
Imagine you want to teach a computer to recognize cats in photos. You need a language that's easy to write, has powerful tools for working with data, and can handle complex math. That's Python - the most popular programming language for AI and Machine Learning!
Python is a beginner-friendly programming language that reads almost like English. It's the #1 choice for AI/ML because it has amazing libraries (pre-built tools) that do the heavy lifting for you.
Example: Print "Hello AI!"
print("Hello AI!")
That's it! Just one line.
Easy to Learn
Reads like English, perfect for beginners
Rich Libraries
NumPy, Pandas, TensorFlow, PyTorch - all the tools you need
Huge Community
Millions of developers, tons of tutorials and help
Industry Standard
Used by Google, Facebook, Netflix, and more
Let's learn the fundamental building blocks of Python. Think of these as the alphabet and grammar you need before writing sentences.
A variable is like a labeled box where you store information. You give it a name and put data inside.
# Numbers (integers and floats)
age = 25
temperature = 98.6
# Text (strings)
name = "Alice"
message = "Hello, AI!"
# True/False (booleans)
is_student = True
has_experience = False
# Lists (collections of items)
scores = [95, 87, 92, 88]
names = ["Alice", "Bob", "Charlie"]
A function is like a recipe - you give it ingredients (inputs), it follows steps, and gives you a result (output). Functions help you reuse code instead of writing the same thing over and over.
# Define a function
def greet(name):
return f"Hello, {name}!"
# Use the function
message = greet("Alice")
print(message) # Output: Hello, Alice!
# Function with multiple parameters
def calculate_average(numbers):
total = sum(numbers)
count = len(numbers)
return total / count
avg = calculate_average([95, 87, 92, 88])
print(avg) # Output: 90.5
Think of a function like a coffee machine. You put in coffee beans and water (inputs), the machine does its magic (processing), and you get coffee (output). You don't need to know how the machine works internally - you just use it!
NumPy (Numerical Python) is like a supercharged calculator for Python. It lets you work with large arrays of numbers incredibly fast - essential for AI/ML where you're processing millions of data points.
An array is like a grid of numbers. A 1D array is a list, a 2D array is a table, and a 3D array is like a cube of numbers. Images are 3D arrays (height × width × colors)!
# Import NumPy
import numpy as np
# Create a 1D array (like a list)
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d) # [1 2 3 4 5]
# Create a 2D array (like a table)
arr2d = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr2d)
# [[1 2 3]
# [4 5 6]]
# Create arrays with special functions
zeros = np.zeros(5) # [0. 0. 0. 0. 0.]
ones = np.ones(3) # [1. 1. 1.]
range_arr = np.arange(0, 10, 2) # [0 2 4 6 8]
random_arr = np.random.rand(3, 3) # 3x3 random numbers
# Math operations (element-wise)
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])
print(a + b) # [11 22 33 44]
print(a * b) # [10 40 90 160]
print(a ** 2) # [1 4 9 16] (square each)
# Statistical operations
data = np.array([95, 87, 92, 88, 90])
print(data.mean()) # 90.4 (average)
print(data.std()) # 2.86 (standard deviation)
print(data.min()) # 87 (minimum)
print(data.max()) # 95 (maximum)
# Reshaping arrays
arr = np.arange(12) # [0 1 2 ... 11]
reshaped = arr.reshape(3, 4) # 3 rows, 4 columns
print(reshaped)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
NumPy is 50-100x faster than regular Python lists! When training AI models with millions of calculations, this speed difference is crucial. Plus, it's the foundation for Pandas, TensorFlow, and PyTorch.
Pandas is like Excel for Python - but way more powerful! It lets you work with tables of data (called DataFrames), clean messy data, and analyze it easily. 90% of AI/ML work is preparing data, and Pandas makes it simple.
A DataFrame is a table with rows and columns, like a spreadsheet. Each column can have a different type of data (numbers, text, dates).
# Import Pandas
import pandas as pd
# Create a DataFrame from a dictionary
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'score': [95, 87, 92]
}
df = pd.DataFrame(data)
print(df)
# Output:
# name age score
# 0 Alice 25 95
# 1 Bob 30 87
# 2 Charlie 35 92
# Read from CSV file (most common)
df = pd.read_csv('data.csv')
# Read from Excel
df = pd.read_excel('data.xlsx')
# Read from JSON
df = pd.read_json('data.json')
# Quick look at your data
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Data types and missing values
df.describe() # Statistical summary
# Select columns
names = df['name'] # Single column
subset = df[['name', 'age']] # Multiple columns
# Filter rows
adults = df[df['age'] >= 30] # Age 30 or older
high_scores = df[df['score'] > 90] # Score above 90
# Add new columns
df['grade'] = df['score'].apply(lambda x: 'A' if x >= 90 else 'B')
# Handle missing data
df.dropna() # Remove rows with missing values
df.fillna(0) # Replace missing with 0
df['age'].fillna(df['age'].mean()) # Fill with average
# Group and aggregate
df.groupby('grade')['score'].mean() # Average score by grade
Imagine you have a CSV file with 10,000 customer records. Some have missing ages, some have duplicate entries. With Pandas, you can clean this data in just a few lines - remove duplicates, fill missing values, filter by criteria, and calculate statistics. This would take hours in Excel!
A picture is worth a thousand numbers! Visualization helps you understand patterns in data that you'd never see in tables. Matplotlib and Seaborn are Python's main visualization libraries.
Matplotlib is like a digital canvas where you can draw any type of chart. It's very flexible but requires more code.
# Import libraries
import matplotlib.pyplot as plt
import numpy as np
# Line plot
x = np.arange(0, 10, 0.1)
y = np.sin(x)
plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.show()
# Bar chart
categories = ['A', 'B', 'C', 'D']
values = [23, 45, 56, 78]
plt.bar(categories, values, color='purple')
plt.title('Sales by Category')
plt.show()
# Scatter plot
x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x, y, alpha=0.5)
plt.title('Random Points')
plt.show()
Seaborn builds on Matplotlib but makes beautiful, statistical plots with less code. It's perfect for data analysis and works great with Pandas DataFrames.
# Import Seaborn
import seaborn as sns
sns.set_theme() # Use Seaborn's beautiful style
# Histogram (distribution of data)
sns.histplot(data=df, x='age', bins=20)
plt.title('Age Distribution')
plt.show()
# Box plot (see outliers and quartiles)
sns.boxplot(data=df, x='grade', y='score')
plt.title('Scores by Grade')
plt.show()
# Heatmap (correlation between variables)
correlation = df.corr()
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Feature Correlations')
plt.show()
# Pair plot (see all relationships at once)
sns.pairplot(df, hue='grade')
plt.show()
Jupyter Notebooks are like interactive documents where you can write code, see results immediately, add notes, and create visualizations - all in one place. They're the standard tool for data science and AI/ML work.
Interactive Execution
Run code in small chunks (cells) and see results instantly
Mix Code and Text
Add explanations, notes, and documentation alongside your code
Visualizations Inline
See charts and graphs right below the code that created them
Easy Sharing
Share notebooks with others - they can see code, results, and explanations
# Install Jupyter
pip install jupyter
# Start Jupyter Notebook
jupyter notebook
# This opens in your web browser!
# Or use JupyterLab (modern interface)
pip install jupyterlab
jupyter lab
Let's put everything together! Here's a complete example that loads data, cleans it, analyzes it, and creates visualizations.
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Create sample student data
data = {
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'math_score': [95, 87, 92, 88, 90],
'science_score': [88, 92, 85, 95, 89],
'study_hours': [5, 3, 4, 6, 4]
}
df = pd.DataFrame(data)
# Calculate average score
df['avg_score'] = (df['math_score'] + df['science_score']) / 2
# Assign grades
def assign_grade(score):
if score >= 90: return 'A'
elif score >= 80: return 'B'
else: return 'C'
df['grade'] = df['avg_score'].apply(assign_grade)
# Display results
print(df)
print(f"\\nClass Average: {df['avg_score'].mean():.2f}")
print(f"Top Student: {df.loc[df['avg_score'].idxmax(), 'name']}")
# Create visualizations
plt.figure(figsize=(12, 4))
# Bar chart of scores
plt.subplot(1, 3, 1)
df.plot(x='name', y=['math_score', 'science_score'], kind='bar', ax=plt.gca())
plt.title('Scores by Subject')
plt.xticks(rotation=45)
# Scatter plot: study hours vs average score
plt.subplot(1, 3, 2)
plt.scatter(df['study_hours'], df['avg_score'], s=100, alpha=0.6)
plt.xlabel('Study Hours')
plt.ylabel('Average Score')
plt.title('Study Time vs Performance')
# Grade distribution
plt.subplot(1, 3, 3)
df['grade'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('Grade Distribution')
plt.tight_layout()
plt.show()
You now have the Python foundation for AI/ML! In the next module, we'll dive into Machine Learning fundamentals - building your first predictive models with scikit-learn.