DBMS Topics
Data Mining
Last Updated : 21 May, 2026
Data Mining is the process of discovering interesting, previously unknown patterns, correlations, anomalies, and insights from large amounts of data stored in databases a
What is Data Mining?
Data Mining is the process of discovering interesting, previously unknown patterns, correlations, anomalies, and insights from large amounts of data stored in databases and data warehouses.
It is part of the larger KDD (Knowledge Discovery in Databases) process.
KDD Process (Knowledge Discovery in Databases)
Data Mining Tasks
1. Classification
Assigns data items to predefined categories based on their attributes.
| Goal | Given attributes, predict a category label. |
| Input | Age=35, Income=75000, Married=Yes, Owns_Home=No |
| Output | Credit Risk = LOW / MEDIUM / HIGH |
2. Clustering
Groups data items into clusters such that items in the same cluster are similar and items in different clusters are dissimilar. No predefined labels.
| Goal | Group data items by similarity (unsupervised). |
| Customer Data | Group into clusters: |
| Cluster 1 | Young, high income, tech-savvy |
| Cluster 2 | Middle-aged, average income, family-oriented |
| Cluster 3 | Senior, low income, low digital engagement |
3. Association Rule Mining
Discovers interesting relationships (associations) between variables in large datasets.
| Goal | Find rules of the form: IF {A, B} THEN {C} |
| Support | 40% of transactions contain all three |
| Confidence | 80% of transactions with Bread+Butter also have Milk |
| Lift | How much more likely than random co-occurrence |
| Algorithm | Apriori Algorithm |
4. Regression
Predicts a continuous numerical value based on input attributes.
| Goal | Predict a numeric output. |
| Input | Size=1500 sqft, Bedrooms=3, Location=Delhi |
| Output | House Price = ₹85,00,000 |
5. Anomaly Detection (Outlier Analysis)
Identifies data points that deviate significantly from normal behavior.
Decision Tree Example
Apriori Algorithm — Step by Step
| T1 | {Bread, Butter, Milk} |
| T2 | {Bread, Butter} |
| T3 | {Butter, Milk} |
| T4 | {Bread, Milk} |
| T5 | {Bread, Butter, Milk} |
| {Bread} | 4/5 = 80% ✓ |
| {Butter} | 4/5 = 80% ✓ |
| {Milk} | 4/5 = 80% ✓ |
| {Bread, Butter} | 3/5 = 60% ✓ |
| {Bread, Milk} | 3/5 = 60% ✓ |
| {Butter, Milk} | 3/5 = 60% ✓ |
| {Bread, Butter, Milk} | 2/5 = 40% ✗ (below min_support) |
| Association Rules from {Bread, Butter} | 60% support: |
| Bread | Butter: confidence = 3/4 = 75% |
| Butter | Bread: confidence = 3/4 = 75% |
Data Mining vs Machine Learning vs Statistics
| Statistics | Mathematical framework for inference from data |
| Data Mining | Discovering patterns in DATABASES (large-scale) |
| Machine Learning | Algorithms that LEARN from data to make predictions |
Data Mining Tools
| Tool | Description |
|---|---|
| Weka | Open-source ML/DM toolkit (Java) |
| RapidMiner | Visual DM workflow designer |
| Python (sklearn, pandas) | Most popular DM/ML library ecosystem |
| R | Statistical computing and DM |
| KNIME | Open-source analytics platform |
| Apache Mahout | Distributed ML on Hadoop |
| SQL with analytics functions | RANK, PARTITION, window functions |
Exam Focus
Revise definitions, diagrams, examples, and short-answer points for Data Mining.
Interview Use
Prepare one clear explanation, one practical example, and one common mistake for this DBMS topic.
Search Terms
dbms, database management system, database notes, sql, unit, data, mining, data mining
Related DBMS Topics