This document provides guidance on topics and skills you should prepare for the ML Lab exam. The actual exam will contain questions based on these topics. Practice these concepts thoroughly using Google Colab.

1. Data Preprocessing & Handling

What to prepare:

  • Loading datasets (Titanic, Iris, Wine, Breast Cancer, Diabetes, Boston Housing)
  • Checking for missing values and displaying counts
  • Filling missing values using mean/median
  • Splitting data into train-test sets (80-20 split)
  • Understanding train_test_split and random_state

Practice Tasks:

  • Work with different datasets and explore their structure
  • Practice handling missing data with various techniques
  • Understand how to properly split datasets

2. Linear Regression

What to prepare:

  • Building simple Linear Regression models
  • Training on regression datasets
  • Displaying actual vs predicted values

Practice Tasks:

  • Build regression models on various datasets
  • Understand the impact of scaling on model performance
  • Practice displaying model predictions

3. Logistic Regression

What to prepare:

  • Binary classification using Logistic Regression
  • Calculating accuracy score
  • Displaying confusion matrix
  • Generating classification reports (precision, recall, F1-score)
  • Understanding binary vs multi-class classification

Practice Tasks:

  • Build classification models on different datasets
  • Practice evaluating models with various metrics

4. Decision Trees

What to prepare:

  • Building Decision Tree classifiers
  • Visualizing decision trees using plot_tree
  • Understanding parameters: max_depth, min_samples_leaf, min_samples_split
  • Decision Tree Pruning

Practice Tasks:

  • Build decision trees with different depth settings
  • Practice visualization of tree structures

5. K-Nearest Neighbors (K-NN)

What to prepare:

  • Building K-NN classifiers
  • Testing with different K values (K=3, 5, 7)
  • Plotting accuracy vs K value
  • Calculating precision, recall, and confusion matrix

Practice Tasks:

  • Experiment with different K values on various datasets
  • Understand how K affects model performance
  • Practice plotting performance metrics

6. Support Vector Machines (SVM)

What to prepare:

  • Building SVM classifiers with linear kernel
  • Building SVM classifiers with RBF kernel
  • Binary classification using SVM
  • Calculating accuracy and confusion matrix

Practice Tasks:

  • Work with both linear and RBF kernels
  • Compare performance across different datasets

7. Naive Bayes Classifier

What to prepare:

  • Working with small datasets provided in exam
  • Making predictions on new instances

Practice Tasks: Practice with small datasets on various domains:

  • Weather-related predictions
  • Purchase behavior predictions
  • Classification problems with categorical features

8. FIND-S Algorithm

What to prepare:

  • Starting with most specific hypothesis (all attributes as ‘ϕ’ or null)
  • Updating hypothesis based on positive examples only
  • Ignoring negative examples
  • Using ‘?’ for generalizing attributes
  • Displaying hypothesis after each update

Practice Tasks:

  • Implement the algorithm on different small datasets
  • Practice with 4-6 attribute problems
  • Work with 4-8 training examples
  • Understand hypothesis generalization process

Expected Dataset Types:

  • Weather-related (Sky, Temperature, Humidity, Wind)
  • Sports-related (various conditions)
  • 4-6 attributes with 4-8 training examples

9. CANDIDATE ELIMINATION Algorithm

What to prepare:

  • Maintaining both S (Specific) and G (General) boundaries
  • Updating S boundary for positive examples
  • Updating G boundary for negative examples
  • Displaying version space (S and G) after each example

Practice Tasks:

  • Work with datasets having both positive and negative examples

Expected Dataset Types:

  • 4-5 attributes
  • 4-6 training examples
  • Mix of positive and negative examples

10. Model Evaluation Metrics

What to prepare:

  • Accuracy score calculation
  • Confusion matrix display and interpretation
  • Precision, Recall, F1-score understanding
  • Classification report generation
  • Understanding true positive, false positive, etc.

Practice Tasks:

  • Work with various evaluation metrics
  • Practice interpreting confusion matrices
  • Understand metric selection based on problem type

11. Visualization Skills

What to prepare:

  • Plotting decision trees

Practice Tasks:

  • Practice different types of visualizations
  • Work with tree visualization functions
  • Create comparison plots for model metrics

Important Points to Remember

  • Some of your lab exam questions are taken from the assignments and presentations you are currently working.
  • Know how to handle missing values, check data quality, split datasets, and prepare data for modeling.
  • Refer datasets mentioned in your assignments and presentations.
  • In the lab exam, we will look for three things:
    • Understanding the question – Read carefully, identify what’s being asked
    • Knowing which algorithm to use – Choose the right tool for the problem
    • Understanding how it works – Not just running code, but knowing what the algorithm does, how data preprocessing impacts results, and how to measure performance using appropriate metrics.

Machine Learning Lab Exam – Answer Template (for your reference)

    Good luck with your preparation! Work hard and you’ll do great!

    Exam Day Preparation Checklist

    ✅ Can you load and preprocess datasets?

    ✅ Can you build Linear Regression models?

    ✅ Can you build Logistic Regression models?

    ✅ Can you build and visualize Decision Trees?

    ✅ Can you implement Decision Tree pruning?

    ✅ Can you build K-NN classifiers with different K values?

    ✅ Can you build SVM classifiers (linear and RBF)?

    ✅ Can you work with Naive Bayes on small datasets?

    ✅ Can you implement FIND-S algorithm from scratch?

    ✅ Can you implement CANDIDATE ELIMINATION from scratch?

    ✅ Can you calculate and display accuracy, confusion matrix?

    ✅ Can you display classification reports?

    ✅ Can you create basic visualizations?