Loading...
Loading...
Machine Learning is a subset of AI where systems learn from data to improve performance without being explicitly programmed.
Types of ML:
ML Pipeline:
Data Collection → Data Preprocessing → Feature Engineering →
Model Selection → Training → Evaluation → Deployment → Monitoring
Predicts continuous output from input features.
Simple Linear Regression: y = β₀ + β₁x
Cost Function (MSE): J = (1/2m)Σ(ŷᵢ - yᵢ)²
Gradient Descent: β₁ = β₁ - α·(∂J/∂β₁)
Multiple Linear Regression: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ
Assumptions: Linearity, independence, homoscedasticity, normality of residuals
Regularization:
Despite the name, used for binary classification.
Sigmoid Function: σ(z) = 1/(1+e⁻ᶻ) → output between 0 and 1
Decision boundary: P(y=1|x) ≥ 0.5 → class 1
Cost Function (Binary Cross-Entropy): J = -(1/m)Σ[ylog(ŷ) + (1-y)log(1-ŷ)]
Tree-like model of decisions.
Splitting Criteria:
CART Algorithm: Builds binary decision tree recursively.
Random Forest: Ensemble of decision trees
Finds the maximum margin hyperplane separating classes.
Based on Bayes' theorem with naive independence assumption: P(C|features) ∝ P(C) × Πᵢ P(featureᵢ|C)
Builds hierarchy of clusters.
Linkage methods: Single (min dist), Complete (max dist), Average, Ward's
Reduces dimensionality while preserving maximum variance.
Variance explained: eigenvalue/sum of eigenvalues
Perceptron: Simplest neural network — single neuron
Multilayer Perceptron (MLP):
Activation Functions:
| Function | Formula | Range | Use | |---|---|---|---| | Sigmoid | 1/(1+e⁻ˣ) | (0,1) | Binary output | | Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | (-1,1) | Hidden layers | | ReLU | max(0,x) | [0,∞) | Most popular hidden | | Leaky ReLU | max(0.01x, x) | (-∞,∞) | Fixes dying ReLU | | Softmax | eˣᵢ/Σeˣʲ | (0,1), sum=1 | Multi-class output |
Backpropagation:
Deep Learning:
Confusion Matrix:
Predicted + Predicted -
Actual + TP FN
Actual - FP TN
When to use Precision vs Recall:
Q1 (2023): What is gradient descent and why do we use it? Gradient descent is an optimization algorithm that minimizes a cost function by iteratively moving in the direction of steepest descent (negative gradient). Used because most ML cost functions are too complex for analytical solutions — gradient descent finds minimum numerically.
Q2 (2023): Decision tree for spam detection has accuracy 95%. Is this a good model? Not necessarily. If 95% of emails are not spam, a model that always predicts "not spam" would also achieve 95% accuracy. Use Precision, Recall, F1-Score, and AUC-ROC for better evaluation of imbalanced classification.
Q3 (2022): Explain bias-variance tradeoff with diagram.
Complete Machine Learning notes for B.Tech CS Semester 7 — supervised/unsupervised learning, regression, classification, clustering, neural networks, and model evaluation.
58 pages · 2.9 MB · Updated 2026-03-11
Supervised: labeled training data with correct answers (regression, classification). Unsupervised: no labels, find patterns on own (clustering, dimensionality reduction).
Overfitting: model memorizes training data but fails on new data (high variance). Prevention: more training data, regularization (L1/L2), dropout, cross-validation, pruning.
Bias: error from oversimplified model (underfitting). Variance: error from overcomplex model (overfitting). Goal: find sweet spot with low bias AND low variance — achieved through proper model complexity and regularization.
DBMS Complete Notes — B.Tech CS Sem 4
Database Management Systems
Compiler Design — Complete Notes CS Sem 6
Compiler Design
Machine Learning Complete Notes — B.Tech CS Sem 6
Machine Learning
Engineering Mathematics 1 — Calculus, Matrices, Differential Equations
Engineering Mathematics 1
Programming Fundamentals Using C — Complete Notes
Programming Fundamentals (C)
Your feedback helps us improve notes and tutorials.