🧠 NAI - Tools of Artificial Intelligence

Comprehensive Lecture Notes | Key Concepts & Formulas

1

Introduction to Artificial Intelligence

What is AI?

Goal: To create machines/algorithms that can "think" — simulate human cognitive abilities like learning, reasoning, and problem-solving.

Historical Context

Key Tools Used in AI

Mathematical Foundations

  • Logic & reasoning
  • Probability calculus
  • Linear algebra
  • Optimization theory

Application Domains

  • Game theory & economics
  • Robotics & control
  • Natural language processing
  • Computer vision
AI involves optimization: training ML models, tuning hyperparameters, finding optimal routes, selecting best game moves, and more.
2

Vectors & Matrices

Vector Operations

Operation Formula Result
Dot Product x·y = Σ xᵢyᵢ Scalar
Vector Length ||x|| = √(Σ xᵢ²) Scalar
Euclidean Distance d(x,y) = √(Σ (xᵢ-yᵢ)²) Scalar

Orthogonality

Two vectors are orthogonal (perpendicular) if and only if their dot product equals zero:
x·y = 0 ⟺ x ⊥ y

Matrix Operations

Note: Matrix multiplication is NOT commutative: AB ≠ BA in general. Also, (AB)ᵀ = BᵀAᵀ
3

Machine Learning Basics

Decision Table Structure

Data is organized as a table where each row is a case/observation and each column is an attribute (feature). The last column is typically the decision/target attribute.

Types of Learning

Supervised Learning

Training data includes labeled examples (input-output pairs)

  • Classification: Discrete output (class labels)
  • Regression: Continuous output (numbers)

Unsupervised Learning

No labels — discover hidden patterns

  • Clustering: Group similar items
  • Dimensionality Reduction: PCA

Classic Example: Iris Dataset

Classify 3 species of iris flowers based on 4 attributes:
  • Sepal length (cm)
  • Sepal width (cm)
  • Petal length (cm)
  • Petal width (cm)
Species: Iris-setosa, Iris-versicolor, Iris-virginica
Unlabeled data is often easier to obtain (from instruments/computers), while labeled data requires human intervention (e.g., manual sentiment analysis).
4

Classifier Evaluation

Confusion Matrix

Predicted Positive Predicted Negative
Actually Positive TP (True Positive) FN (False Negative)
Actually Negative FP (False Positive) TN (True Negative)

Key Metrics

Metric Formula Meaning
Accuracy (TP + TN) / Total Overall correctness
Precision TP / (TP + FP) Of predicted positives, how many are correct?
Recall TP / (TP + FN) Of actual positives, how many did we find?
F-measure 2·P·R / (P + R) Harmonic mean of Precision & Recall
Precision vs Recall Trade-off: High precision = few false positives; High recall = few false negatives. F-measure balances both.
5

k-Nearest Neighbors (k-NN)

Algorithm

Distance Metrics

Euclidean: d(x,y) = √(Σ (xᵢ - yᵢ)²)

Pro tip: Compare d² instead of d (skip square root) — same ordering, easier math!

Choosing k

k-NN is a lazy learner — no explicit training phase. All computation happens at prediction time.
6

Naive Bayes Classifier

Bayes' Theorem

P(Class | Features) ∝ P(Class) × P(Features | Class)

The "Naive" Assumption

Assume all features are conditionally independent given the class:
P(f₁, f₂, ..., fₙ | Class) = P(f₁|Class) × P(f₂|Class) × ... × P(fₙ|Class)

Classification Rule

Laplace Smoothing

When P(feature|Class) = 0 (zero count), apply smoothing:
P(value|Class) = (count + 1) / (total + number_of_possible_values)
Despite the "naive" independence assumption often being wrong, Naive Bayes works surprisingly well in practice, especially for text classification!
7

Perceptrons & Neural Networks

Perceptron Model

A perceptron is a simple mathematical model of a neuron:
net = Σ wᵢxᵢ = w·x (weighted sum)
output = activation(net - θ)
where θ (theta) is the threshold.

Activation Functions

Type Output Type Values
Unipolar (discrete) Binary {0, 1}
Bipolar (discrete) Binary {-1, 1}
Unipolar (continuous) Sigmoid [0, 1]
Bipolar (continuous) Tanh [-1, 1]

Perceptron Learning Rule (Delta Rule)

w' = w + α × (d - y) × x
θ' = θ - α × (d - y)
Key insight: If perceptron doesn't activate when it should → decrease threshold. If it activates when it shouldn't → increase threshold.
8

Decision Trees & ID3 Algorithm

Knowledge Representation

ID3 Algorithm Concept

Builds trees by selecting the attribute that provides the most information gain at each step.

Entropy

H(S) = -Σ pᵢ log₂(pᵢ)

Information Gain

Gain(S, A) = H(S) - Σ (|Sᵥ|/|S|) × H(Sᵥ)

Select attribute A with highest information gain at each node.

Decision trees are highly interpretable — they can be converted to simple IF-THEN rules that humans can understand!
9

Model Selection & Overfitting

The Bias-Variance Trade-off

Underfitting (High Bias)

  • Model too simple
  • Can't capture patterns
  • Poor on both train & test

Overfitting (High Variance)

  • Model too complex
  • Memorizes training data
  • Great train, poor test

Model Complexity Examples

Cross-Validation

k-Fold Cross-Validation:
  1. Split data into k equal parts (folds)
  2. Train on k-1 folds, test on 1 fold
  3. Repeat k times (each fold is test once)
  4. Average the results
Cross-validation gives more reliable performance estimates than a single train-test split, especially with limited data.
10

Clustering Algorithms

k-Means Algorithm

Centroid Update:
cᵢ = (1/|Cᵢ|) × Σ x for all x in cluster Cᵢ

Hierarchical Clustering

k-Means Tips

k-Means is sensitive to initial centroid positions. Multiple runs with different initializations can help find better solutions.
11

Optimization & Greedy Algorithms

Role of Optimization in AI

Greedy Approach

Make the locally optimal choice at each step, hoping to find a global optimum. Works for some problems, not all!

Matroids & Rado-Edmonds Theorem

Key insight: A greedy algorithm produces an optimal solution if and only if the problem has a matroid structure.

Huffman Coding

Local Search Methods

12

Knapsack Problem

Problem Definition

Given items with values and weights, and a knapsack with limited capacity, select items to maximize total value without exceeding capacity.

Problem Variants

Variant Description Solution
0/1 Knapsack Take or leave each item Dynamic Programming
Fractional Can take fractions of items Greedy (by value/weight ratio)

Solution Strategy (0/1)

Value/Weight Ratio

For fractional knapsack, sort items by value/weight ratio (efficiency) and take items greedily.
efficiency = value / weight
For 0/1 knapsack with small instances (like exams), enumerate all valid combinations. For larger instances, use dynamic programming.