Goal: To create machines/algorithms that can "think" — simulate human cognitive
abilities like learning, reasoning, and problem-solving.
Historical Context
Myths & Legends: Automatons in Ancient Egypt, Talos (Greece), Golem (Europe)
Science Fiction: Cyberiada (Lem), Star Wars, Blade Runner, Alien
Modern Era: Turing Test (1950), Birth of AI term (1956)
Key Tools Used in AI
Mathematical Foundations
Logic & reasoning
Probability calculus
Linear algebra
Optimization theory
Application Domains
Game theory & economics
Robotics & control
Natural language processing
Computer vision
AI involves optimization: training ML models, tuning hyperparameters, finding optimal routes, selecting
best game moves, and more.
2
Vectors & Matrices
Vector Operations
Operation
Formula
Result
Dot Product
x·y = Σ xᵢyᵢ
Scalar
Vector Length
||x|| = √(Σ xᵢ²)
Scalar
Euclidean Distance
d(x,y) = √(Σ (xᵢ-yᵢ)²)
Scalar
Orthogonality
Two vectors are orthogonal (perpendicular) if and only if their dot
product equals zero:
x·y = 0 ⟺ x ⊥ y
Matrix Operations
Addition: Element-by-element: Cᵢⱼ = Aᵢⱼ + Bᵢⱼ
Scalar Multiplication:Bᵢⱼ = αAᵢⱼ
Matrix Multiplication:Cᵢⱼ = Σ AᵢₖBₖⱼ (dot product of row i and column
j)
Transposition:Aᵀᵢⱼ = Aⱼᵢ (rows ↔ columns)
Note: Matrix multiplication is NOT commutative: AB ≠ BA in general. Also,
(AB)ᵀ = BᵀAᵀ
3
Machine Learning Basics
Decision Table Structure
Data is organized as a table where each row is a case/observation and each
column is an attribute (feature). The last column is typically the decision/target attribute.
Types of Learning
Supervised Learning
Training data includes labeled examples (input-output pairs)
Classification: Discrete output (class labels)
Regression: Continuous output (numbers)
Unsupervised Learning
No labels — discover hidden patterns
Clustering: Group similar items
Dimensionality Reduction: PCA
Classic Example: Iris Dataset
Classify 3 species of iris flowers based on 4 attributes:
Unlabeled data is often easier to obtain (from instruments/computers), while labeled data requires human
intervention (e.g., manual sentiment analysis).
4
Classifier Evaluation
Confusion Matrix
Predicted Positive
Predicted Negative
Actually Positive
TP (True Positive)
FN (False Negative)
Actually Negative
FP (False Positive)
TN (True Negative)
Key Metrics
Metric
Formula
Meaning
Accuracy
(TP + TN) / Total
Overall correctness
Precision
TP / (TP + FP)
Of predicted positives, how many are correct?
Recall
TP / (TP + FN)
Of actual positives, how many did we find?
F-measure
2·P·R / (P + R)
Harmonic mean of Precision & Recall
Precision vs Recall Trade-off: High precision = few false positives; High recall = few
false negatives. F-measure balances both.
5
k-Nearest Neighbors (k-NN)
Algorithm
Calculate distance from query point to all training points
Select the k closest neighbors
Vote: classify based on majority class among neighbors
Distance Metrics
Euclidean: d(x,y) = √(Σ (xᵢ - yᵢ)²)
Pro tip: Compare d² instead of d (skip square root) — same ordering, easier math!
Choosing k
Small k (e.g., 1): Sensitive to noise, may overfit
Large k: Smoother boundaries, but may underfit
Odd k: Helps avoid ties in 2-class problems
k-NN is a lazy learner — no explicit training phase. All computation happens at
prediction time.