1️⃣ CS Sem 1 2️⃣ CS Sem 2 3️⃣ CS Sem 3 4️⃣ CS Sem 4 5️⃣ CS Sem 5 6️⃣ CS Sem 6 💡 IT Branch 📡 ECE Branch 🏫 Class 9 🎒 Class 10 🔬 Class 11 🧪 Class 12 🎓 MCA / PG 📜 PhD / Research

Computer ScienceSEM-7Machine Learning

Machine Learning — Complete Notes with Algorithms and Implementation

✍️ WohoTech Team📅 Last Updated: 2026-03-11📄 58 pages · 2.9 MB

Introduction to Machine Learning

Machine Learning is a subset of AI where systems learn from data to improve performance without being explicitly programmed.

Types of ML:

Supervised Learning — learns from labeled data; makes predictions
Unsupervised Learning — finds patterns in unlabeled data
Reinforcement Learning — learns through rewards and penalties
Semi-supervised — mix of labeled and unlabeled data

ML Pipeline:

Data Collection → Data Preprocessing → Feature Engineering → 
Model Selection → Training → Evaluation → Deployment → Monitoring

Supervised Learning

Linear Regression

Predicts continuous output from input features.

Simple Linear Regression: y = β₀ + β₁x

Cost Function (MSE): J = (1/2m)Σ(ŷᵢ - yᵢ)²

Gradient Descent: β₁ = β₁ - α·(∂J/∂β₁)

α = learning rate
Repeat until convergence

Multiple Linear Regression: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

Assumptions: Linearity, independence, homoscedasticity, normality of residuals

Regularization:

Ridge (L2): J = MSE + λΣβᵢ² (shrinks coefficients)
Lasso (L1): J = MSE + λΣ|βᵢ| (can zero out coefficients = feature selection)
Elastic Net: Combination of L1 and L2

Logistic Regression (Classification!)

Despite the name, used for binary classification.

Sigmoid Function: σ(z) = 1/(1+e⁻ᶻ) → output between 0 and 1

Decision boundary: P(y=1|x) ≥ 0.5 → class 1

Cost Function (Binary Cross-Entropy): J = -(1/m)Σ[ylog(ŷ) + (1-y)log(1-ŷ)]

Decision Trees

Tree-like model of decisions.

Splitting Criteria:

Gini Impurity: Gini = 1 - Σpᵢ² (lower is better)
Information Gain (Entropy): H = -Σpᵢlog₂(pᵢ)

CART Algorithm: Builds binary decision tree recursively.

Random Forest: Ensemble of decision trees

Each tree trained on random subset of data (bagging)
Random subset of features at each split
Final prediction: majority vote (classification) or mean (regression)
Reduces overfitting compared to single tree

Support Vector Machine (SVM)

Finds the maximum margin hyperplane separating classes.

Support Vectors: Data points closest to hyperplane
Kernel Trick: Maps data to higher dimension for non-linear separation
- Linear, RBF (Radial Basis Function), Polynomial, Sigmoid kernels

K-Nearest Neighbors (KNN)

Classify by majority vote of K nearest neighbors
Distance measures: Euclidean, Manhattan, Minkowski
Lazy learner — no training phase; all computation at prediction time
K too small: overfitting; K too large: underfitting

Naive Bayes

Based on Bayes' theorem with naive independence assumption: P(C|features) ∝ P(C) × Πᵢ P(featureᵢ|C)

Types: Gaussian, Multinomial, Bernoulli
Fast, works well with text classification (spam filter)

Unsupervised Learning

K-Means Clustering

Initialize K centroids randomly
Assign each point to nearest centroid
Recalculate centroids as mean of cluster members
Repeat 2-3 until convergence

Objective: Minimize within-cluster sum of squares
Choosing K: Elbow method (plot inertia vs K; find "elbow")
Problem: Sensitive to initialization, assumes spherical clusters

Hierarchical Clustering

Builds hierarchy of clusters.

Agglomerative (bottom-up): Start with each point as cluster; merge closest
Divisive (top-down): Start with one cluster; split

Linkage methods: Single (min dist), Complete (max dist), Average, Ward's

PCA (Principal Component Analysis)

Reduces dimensionality while preserving maximum variance.

Standardize data
Calculate covariance matrix
Find eigenvectors (principal components) and eigenvalues
Sort by eigenvalue (variance explained)
Project data onto top k components

Variance explained: eigenvalue/sum of eigenvalues

Neural Networks

Perceptron: Simplest neural network — single neuron

Output = activation(w·x + b)
Learns linearly separable problems

Multilayer Perceptron (MLP):

Input layer → Hidden layer(s) → Output layer
Hidden layers enable learning non-linear patterns

Activation Functions:

| Function | Formula | Range | Use | |---|---|---|---| | Sigmoid | 1/(1+e⁻ˣ) | (0,1) | Binary output | | Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | (-1,1) | Hidden layers | | ReLU | max(0,x) | [0,∞) | Most popular hidden | | Leaky ReLU | max(0.01x, x) | (-∞,∞) | Fixes dying ReLU | | Softmax | eˣᵢ/Σeˣʲ | (0,1), sum=1 | Multi-class output |

Backpropagation:

Forward pass: compute output
Calculate loss
Backward pass: compute gradients using chain rule
Update weights: w = w - α·∂L/∂w

Deep Learning:

CNN (Convolutional Neural Networks): images
RNN (Recurrent Neural Networks): sequences, text
Transformer: NLP, state of the art

Model Evaluation

Classification Metrics

Confusion Matrix:

           Predicted +    Predicted -
Actual +   TP             FN
Actual -   FP             TN

Accuracy = (TP+TN)/(TP+TN+FP+FN)
Precision = TP/(TP+FP) — of predicted positives, how many are correct?
Recall (Sensitivity) = TP/(TP+FN) — of actual positives, how many found?
F1 Score = 2×(Precision×Recall)/(Precision+Recall)
ROC-AUC: Area under ROC curve; AUC=1 perfect, AUC=0.5 random

When to use Precision vs Recall:

Spam detection: high Precision (avoid false positives — real emails in spam)
Cancer detection: high Recall (avoid false negatives — miss cancer cases)

Regression Metrics

MAE = mean(|yᵢ-ŷᵢ|)
MSE = mean((yᵢ-ŷᵢ)²)
RMSE = √MSE
R² (R-squared) = 1 - SS_res/SS_tot (1=perfect, 0=no better than mean)

Cross-Validation

K-fold CV: Split data into K parts; train on K-1, test on 1; repeat K times
Prevents overfitting to single train/test split
Better estimate of generalization performance

Solved PYQ Questions

Q1 (2023): What is gradient descent and why do we use it? Gradient descent is an optimization algorithm that minimizes a cost function by iteratively moving in the direction of steepest descent (negative gradient). Used because most ML cost functions are too complex for analytical solutions — gradient descent finds minimum numerically.

Q2 (2023): Decision tree for spam detection has accuracy 95%. Is this a good model? Not necessarily. If 95% of emails are not spam, a model that always predicts "not spam" would also achieve 95% accuracy. Use Precision, Recall, F1-Score, and AUC-ROC for better evaluation of imbalanced classification.

Q3 (2022): Explain bias-variance tradeoff with diagram.

High Bias, Low Variance: Simple model, consistently wrong (underfitting — e.g., linear model for non-linear data)
Low Bias, High Variance: Complex model, fits training perfectly but fails on new data (overfitting)
Goal: Balance — moderate complexity model with good generalization (use regularization + cross-validation)

Quick Revision Checklist

Supervised vs unsupervised vs reinforcement learning
Linear regression: gradient descent, MSE cost function
Logistic regression: sigmoid, binary classification
Decision tree: Gini/entropy splitting, Random Forest ensemble
SVM: maximum margin, kernel trick
K-means: algorithm steps, elbow method
PCA: dimensionality reduction, eigenvectors
Neural network: activation functions (sigmoid, ReLU, softmax)
Backpropagation: forward pass → loss → backward pass → weight update
Evaluation: confusion matrix, precision, recall, F1, cross-validation

📄 Download Complete PDF Notes

Complete Machine Learning notes for B.Tech CS Semester 7 — supervised/unsupervised learning, regression, classification, clustering, neural networks, and model evaluation.

58 pages · 2.9 MB · Updated 2026-03-11

Free Download ↓

❓ Frequently Asked Questions

What is the difference between supervised and unsupervised learning?▾

Supervised: labeled training data with correct answers (regression, classification). Unsupervised: no labels, find patterns on own (clustering, dimensionality reduction).

What is overfitting and how to prevent it?▾

Overfitting: model memorizes training data but fails on new data (high variance). Prevention: more training data, regularization (L1/L2), dropout, cross-validation, pruning.

What is the bias-variance tradeoff?▾

Bias: error from oversimplified model (underfitting). Variance: error from overcomplex model (overfitting). Goal: find sweet spot with low bias AND low variance — achieved through proper model complexity and regularization.

📌 Related Notes

CSSEM-4

DBMS Complete Notes — B.Tech CS Sem 4

Database Management Systems

csSEM-6

Compiler Design — Complete Notes CS Sem 6

Compiler Design

CSSEM-6

Machine Learning Complete Notes — B.Tech CS Sem 6

Machine Learning

CSSEM-1

Engineering Mathematics 1 — Calculus, Matrices, Differential Equations

Engineering Mathematics 1