ML Assignment and Presentation (For 2022 Batch)

These assignments are designed to be more straightforward to implement and present, while still providing valuable learning experiences. They have clearer scopes, use well-established techniques, and work with readily available datasets.

Please read these instructions carefully before proceeding.

1. You need to select ONE assignment out of the 50 assignments listed below.

2. Assignment allotment is on a FIRST COME, FIRST SERVE BASIS. Each student must work on a unique assignment.

3. Once a topic is selected by a student, it will not be available for others.

4. Submit the Assignment Allotment Form at the earliest (DEADLINE: 26-10-2025) to secure your preferred assignment.

ASSIGNMENT ALLOTMENT FORM

5. Check the Assignment Allotment Form (Responses) Google Sheet (link shared in the ML WhatsApp group) to see which assignments are already taken.

6. For assignments with dataset links, you must use the specified dataset. For assignments where example datasets are mentioned, you may use those or similar alternatives. If no dataset is specified, you may select an appropriate dataset based on your assignment question.

7. Check Deliverables and Submission Guidelines after the Assignment List.

Assignment List

1. Iris Flower Classification Comparison

Implement and compare three different classification algorithms (e.g., Decision Tree, k-NN, Logistic Regression) on the classic Iris flower dataset. Focus on visualizing decision boundaries and comparing model performance metrics.

2. Boston Housing Price Prediction

Implement a linear regression model to predict housing prices using the Boston Housing dataset. Apply feature scaling, evaluate using appropriate metrics, and visualize the importance of different features.

3. Digit Recognition with k-Nearest Neighbors

Implement a k-NN classifier for recognizing handwritten digits using the MNIST dataset. Experiment with different values of k and distance metrics to optimize performance.

4. Sentiment Analysis with Naive Bayes

Create a simple sentiment classifier using the Naive Bayes algorithm on movie reviews or product reviews. Compare the performance of different text preprocessing techniques.

Datasets: e.g., IMDB, Amazon reviews

5. Customer Segmentation with k-Means

Apply k-means clustering to segment customers based on purchase behavior. Visualize the clusters and interpret what each customer segment represents.

Datasets: e.g., Online Retail dataset

6. Wine Quality Classification

Build a classification model to predict wine quality based on physicochemical properties. Compare the performance of two different algorithms of your choice.

Datasets: e.g., UCI Wine Quality dataset

7. Credit Card Fraud Detection

Implement a simple model to detect fraudulent credit card transactions. Focus on handling the class imbalance problem using techniques like undersampling, oversampling.

Datasets: e.g., Simplified/sampled versions of fraud datasets

8. Titanic Survival Prediction

Create a model to predict passenger survival on the Titanic. Perform feature engineering on passenger attributes and compare the performance of different classifiers.

9. Weather Prediction with Decision Trees

Build a simple weather prediction model using decision trees to predict whether it will rain tomorrow based on today’s weather conditions.

10. Fake News Detection

Implement a basic fake news detector using a simple text classification approach and a classifier of your choice.

11. Movie Recommendation with Simple Collaborative Filtering

Implement a basic user-based or item-based collaborative filtering system for movie recommendations using a small subset of a movie ratings dataset.

Datasets: e.g., Small subset of MovieLens dataset

12. Student Performance Prediction

Build a regression model to predict student performance based on demographic and study habit features. Compare the performance of linear regression and one non-linear algorithm.

Datasets: e.g., UCI Student Performance dataset

13. Diabetes Prediction with Ensemble Methods

Use ensemble methods (Random Forest or Gradient Boosting) to predict diabetes diagnosis based on diagnostic measurements.

Datasets: e.g., Pima Indians Diabetes dataset

14. Email Spam Classification

Build a simple spam filter using text features and a basic classifier like Naive Bayes or Logistic Regression.

Datasets: e.g., Email spam datasets

15. Stock Price Trend Prediction with Moving Averages

Implement a simple stock trend prediction system using moving averages and technical indicators. Focus on a single stock or index for simplicity.

16. Customer Churn Prediction

Build a model to predict customer churn using a telecommunications or banking dataset.

17. Credit Risk Assessment

Develop a credit scoring model to predict loan default risk. Implement a classification algorithm with appropriate handling of class imbalance.

Datasets: e.g., German Credit dataset

18. Market Basket Analysis with Association Rules

Implement association rule mining (Apriori algorithm) to discover patterns in transaction data and identify items frequently purchased together.

19. Text Classification with Bag-of-Words

Implement a simple text classifier using the Bag-of-Words approach and a classifier of your choice. Apply to news categorization or topic classification.

Datasets: e.g., 20 Newsgroups

20. Sales Forecasting with Time Series

Implement simple time series forecasting methods (moving average, exponential smoothing) to predict future sales for a retail company.

Datasets: e.g., superstore sales

21. Employee Attrition Analysis

Build a model to predict employee attrition using HR analytics data. Focus on identifying the most important factors contributing to attrition.

Datasets: e.g., IBM HR Analytics dataset

22. Income Level Prediction

Develop a model to predict income levels (above/below threshold) based on demographic and employment data using the Adult/Census Income dataset.

23. Online Ad Click-Through Rate Prediction

Build a model to predict whether a user will click on an advertisement based on user and ad features.

24. Restaurant Revenue Prediction

Create a regression model to predict restaurant revenue based on location, demographics, and restaurant characteristics.

Datasets: e.g., TFI Restaurant Revenue dataset

25. Song Genre Classification using Audio Features

Predict music genres using audio features extracted from songs (without using raw audio data). Focus on features like tempo, energy, danceability, etc.

Datasets: e.g., Spotify features dataset

26. Product Recommendation with Content-Based Filtering

Implement a simple content-based recommendation system that suggests products based on item features and user preferences.

27. Song Popularity Prediction

Develop a regression model to predict the popularity of songs based on audio features and metadata.

Datasets: e.g., Spotify features dataset

28. Call Center Volume Forecasting

Build a time series forecasting model to predict call center volume by hour or day to help with staff scheduling.

Datasets: e.g., Call center datasets

29. Network Intrusion Detection

Implement a classification model to detect network intrusions or anomalous network behavior using the KDD Cup 99 dataset or a simplified version.

30. Hospital Readmission Prediction

Build a model to predict which patients are likely to be readmitted to a hospital within 30 days after discharge.

Datasets: e.g., Healthcare datasets

31. Rental Price Prediction

Create a regression model to predict rental prices based on property features and location data.

Datasets: e.g., Housing/rental datasets

32. Email Campaign Response Prediction

Develop a model to predict which customers will respond to an email marketing campaign based on customer characteristics and past behavior.

Datasets: e.g., Marketing datasets

33. Air Quality Prediction

Build a regression model to predict air quality (PM2.5 or Air Quality Index) based on weather conditions and time variables.

Dataset: UCI Air Quality Dataset

34. Flight Delay Prediction

Develop a model to predict flight delays based on flight information, weather conditions, and airport data.

Dataset: Bureau of Transportation Statistics or Kaggle Flight Delays

35. Bike Sharing Demand Prediction

Create a regression model to predict hourly bike rental demand based on weather and seasonal factors.

Dataset: UCI Bike Sharing Dataset

36. Water Potability Analysis

Develop a classification model to predict whether water is safe for drinking based on various quality metrics.

Dataset: Water Potability Dataset

37. Supermarket Sales Analysis

Build a sales forecasting model for a supermarket chain, predicting sales by product category or store.

Dataset: Supermarket Sales Dataset

38. Heart Disease Prediction

Create a classification model to predict the presence of heart disease based on patient attributes.

Dataset: UCI Heart Disease Dataset

39. Student Academic Performance Prediction

Develop a model to predict student performance based on demographic, social, and academic factors.

Dataset: Student Performance Dataset

40. Car Price Prediction

Build a regression model to predict used car prices based on features like make, model, year, mileage, and other specifications.

Dataset: Used Cars Dataset (simplified version)

41. Telecom Customer Churn Analysis

Create a classification model to identify customers likely to leave a telecom service provider.

Dataset: Telco Customer Churn

42. Online News Popularity Prediction

Develop a model to predict how popular an online article will be based on its content and metadata.

Dataset: Online News Popularity

43. Mushroom Edibility Classification

Build a classifier to determine whether a mushroom is edible or poisonous based on its physical characteristics.

Dataset: UCI Mushroom Dataset

44. Diabetes Progression Prediction

Create a regression model to predict disease progression in diabetes patients based on diagnostic measurements.

Dataset: Diabetes Dataset

45. Credit Card Approval Prediction

Develop a classification model to predict whether a credit card application will be approved based on applicant information.

Dataset: Credit Card Approval Dataset

46. Job Satisfaction Prediction

Build a model to predict employee job satisfaction based on various workplace and personal factors.

Dataset: Job Satisfaction Dataset

47. Mobile App Rating Prediction

Create a regression model to predict mobile app ratings based on app features and metadata.

Dataset: Mobile App Store Dataset

48. Concrete Strength Prediction

Build a regression model to predict the compressive strength of concrete based on its components and age.

Dataset: Concrete Compressive Strength

49. Traffic Accident Severity Analysis

Develop a model to predict the severity of traffic accidents based on location, time, and weather conditions.

Dataset: UK Road Safety Data

50. Fertilizer Recommendation System

Create a classification model to recommend the type of fertilizer based on soil characteristics and crop type.

Dataset: Fertilizer Prediction Dataset

Deliverables

Each student must submit the following:

1. Project Code

Well-documented Python code with clear instructions for execution
Include all necessary files to reproduce your results
Use comments to explain your implementation decisions
Submit code as Jupyter/Google Colab Notebooks (.ipynb), and dataset files

2. Technical Report (PDF format, 5-10 pages) (Also hard copy for Mid II Assignment)

Abstract: Brief summary of the project (150-250 words)
Introduction: Problem statement and background
Methodology: Detailed explanation of your approach
Implementation: Key aspects of your code implementation
Experiments: Dataset description, preprocessing steps, and evaluation metrics
Results: Presentation and analysis of results with visualizations
Discussion: Interpretation of results and limitations
Conclusion: Summary of findings and future work
References: List of all sources cited

3. Video Presentation (5-8 minutes)

Introduction to the problem and its significance
Brief overview of the dataset used
Explanation of your methodology and implementation
Demonstration of key results with visualizations
Discussion of challenges faced and solutions implemented
Conclusion and potential improvements
Format: MP4 file

Submission Guidelines

All deliverables must be submitted through the submission link.
Name your files as: RollNumber_AssignmentNumber_ItemName (e.g., 22251A17XX_AssignmentXX_Report.pdf)
This assignment must represent your own individual work. While you may discuss general concepts with your classmates, the implementation, report, and presentation must be entirely your own.
You can take help of AI tools.
Deliverables Submission Deadline: 02-11-2025.

Good luck with your assignments and presentation!

Submission Link

ML ASSIGNMENT & PRESENTATION SUBMISSION

CONSENT FORM (for uploading best assignment presentation videos in CHANDRAS EDU Youtube channel) [Optional]