Top 10 Basic Machine Learning Interview Questions

Basic Machine Learning Concepts 


1. What is Machine Learning?

Answer:
Machine Learning (ML) is a subset of Artificial Intelligence (AI) where systems learn patterns from data and improve performance over time without being explicitly programmed. Instead of writing step-by-step rules, we give the machine data and let it infer rules or predictions.


2. What are the main types of Machine Learning?

Answer:

  1. Supervised Learning : The model learns from labeled data (input-output pairs). Example: predicting house prices using Linear Regression.

  2. Unsupervised Learning : The model finds hidden patterns in unlabeled data. Example: customer segmentation using K-Means clustering.

  3. Reinforcement Learning : The model learns by interacting with an environment and receiving rewards or penalties. Example: game-playing agents like AlphaGo.


3. Explain Overfitting and Underfitting.

Answer:

  • Overfitting – The model learns the training data too well, including noise, and performs poorly on unseen data.

  • Underfitting – The model is too simple to capture underlying patterns and performs poorly both on training and test data.
    The goal is to find the balance (generalization). Techniques like cross-validation, regularization, and pruning help.



4. What is a Confusion Matrix?

Answer:
A confusion matrix is a performance evaluation tool for classification models. It shows the counts of:

  • True Positives (TP): Correctly predicted positives

  • False Positives (FP): Incorrectly predicted positives

  • True Negatives (TN): Correctly predicted negatives

  • False Negatives (FN): Incorrectly predicted negatives

From this, we derive:

  • Precision = TP / (TP + FP) – Out of predicted positives, how many are correct.

  • Recall = TP / (TP + FN) – Out of actual positives, how many we captured.

  • Accuracy = (TP + TN) / Total – Overall correctness.





5. Difference between Classification and Regression?

Answer:

  • Classification: Predicts discrete categories (spam vs. not spam).

  • Regression: Predicts continuous values (predicting house prices).


6. What is Bias-Variance Tradeoff?

Answer:

  • High Bias (Underfitting): Model is too simple and misses important patterns.

  • High Variance (Overfitting): Model is too complex and learns noise.

  • The tradeoff is about balancing bias and variance to achieve good generalization.



7. What is Cross-Validation?

Answer:
Cross-validation is a resampling method to evaluate models. The dataset is split into k folds. The model is trained on k-1 folds and tested on the remaining fold, repeated k times. The average performance gives a more reliable estimate.


8. What are some common Machine Learning algorithms?

Answer:

  • Linear Regression (predict continuous values)

  • Logistic Regression (binary classification)

  • Decision Trees & Random Forests (classification & regression)

  • Support Vector Machines (SVM)

  • K-Nearest Neighbors (KNN)

  • Naïve Bayes

  • Neural Networks


9. What is Feature Engineering?

Answer:
Feature engineering is the process of transforming raw data into meaningful features that improve model performance. This can involve:

  • Scaling and normalization

  • Encoding categorical variables

  • Creating interaction features

  • Handling missing values


10. What is Regularization?

Answer:
Regularization reduces overfitting by penalizing large model weights. Common techniques:

  • L1 Regularization (Lasso): Shrinks coefficients to zero (feature selection).

  • L2 Regularization (Ridge): Shrinks coefficients but does not eliminate them.

  • ElasticNet: Combines both L1 and L2.



Comments