Basic Machine Learning Concepts
1. What is Machine Learning?
Answer:
Machine Learning (ML) is a subset of Artificial Intelligence (AI) where systems learn patterns from data and improve performance over time without being explicitly programmed. Instead of writing step-by-step rules, we give the machine data and let it infer rules or predictions.
2. What are the main types of Machine Learning?
Answer:
-
Supervised Learning : The model learns from labeled data (input-output pairs). Example: predicting house prices using Linear Regression.
-
Unsupervised Learning : The model finds hidden patterns in unlabeled data. Example: customer segmentation using K-Means clustering.
-
Reinforcement Learning : The model learns by interacting with an environment and receiving rewards or penalties. Example: game-playing agents like AlphaGo.
3. Explain Overfitting and Underfitting.
Answer:
-
Overfitting – The model learns the training data too well, including noise, and performs poorly on unseen data.
-
Underfitting – The model is too simple to capture underlying patterns and performs poorly both on training and test data.
The goal is to find the balance (generalization). Techniques like cross-validation, regularization, and pruning help.
4. What is a Confusion Matrix?
Answer:
A confusion matrix is a performance evaluation tool for classification models. It shows the counts of:
-
True Positives (TP): Correctly predicted positives
-
False Positives (FP): Incorrectly predicted positives
-
True Negatives (TN): Correctly predicted negatives
-
False Negatives (FN): Incorrectly predicted negatives
From this, we derive:
-
Precision = TP / (TP + FP) – Out of predicted positives, how many are correct.
-
Recall = TP / (TP + FN) – Out of actual positives, how many we captured.
-
Accuracy = (TP + TN) / Total – Overall correctness.
5. Difference between Classification and Regression?
Answer:
-
Classification: Predicts discrete categories (spam vs. not spam).
-
Regression: Predicts continuous values (predicting house prices).
6. What is Bias-Variance Tradeoff?
Answer:
-
High Bias (Underfitting): Model is too simple and misses important patterns.
-
High Variance (Overfitting): Model is too complex and learns noise.
-
The tradeoff is about balancing bias and variance to achieve good generalization.
7. What is Cross-Validation?
Answer:
Cross-validation is a resampling method to evaluate models. The dataset is split into k folds. The model is trained on k-1 folds and tested on the remaining fold, repeated k times. The average performance gives a more reliable estimate.
8. What are some common Machine Learning algorithms?
Answer:
-
Linear Regression (predict continuous values)
-
Logistic Regression (binary classification)
-
Decision Trees & Random Forests (classification & regression)
-
Support Vector Machines (SVM)
-
K-Nearest Neighbors (KNN)
-
Naïve Bayes
-
Neural Networks
9. What is Feature Engineering?
Answer:
Feature engineering is the process of transforming raw data into meaningful features that improve model performance. This can involve:
-
Scaling and normalization
-
Encoding categorical variables
-
Creating interaction features
-
Handling missing values
10. What is Regularization?
Answer:
Regularization reduces overfitting by penalizing large model weights. Common techniques:
-
L1 Regularization (Lasso): Shrinks coefficients to zero (feature selection).
-
L2 Regularization (Ridge): Shrinks coefficients but does not eliminate them.
-
ElasticNet: Combines both L1 and L2.
Comments
Post a Comment