These assignments are designed to be more straightforward to implement and present, while still providing valuable learning experiences. They have clearer scopes, use well-established techniques, and work with readily available datasets.
Please read these instructions carefully before proceeding.
1. You need to select ONE assignment out of the 50 assignments listed below.
2. Assignment allotment is on a FIRST COME, FIRST SERVE BASIS. Each student must work on a unique assignment.
3. Once a topic is selected by a student, it will not be available for others.
4. Submit the Assignment Allotment Form at the earliest (DEADLINE: 26-10-2025) to secure your preferred assignment.
5. Check the Assignment Allotment Form (Responses) Google Sheet (link shared in the ML WhatsApp group) to see which assignments are already taken.
6. For assignments with dataset links, you must use the specified dataset. For assignments where example datasets are mentioned, you may use those or similar alternatives. If no dataset is specified, you may select an appropriate dataset based on your assignment question.
7. Check Deliverables and Submission Guidelines after the Assignment List.
Assignment List
1. Iris Flower Classification Comparison
Implement and compare three different classification algorithms (e.g., Decision Tree, k-NN, Logistic Regression) on the classic Iris flower dataset. Focus on visualizing decision boundaries and comparing model performance metrics.
2. Boston Housing Price Prediction
Implement a linear regression model to predict housing prices using the Boston Housing dataset. Apply feature scaling, evaluate using appropriate metrics, and visualize the importance of different features.
3. Digit Recognition with k-Nearest Neighbors
Implement a k-NN classifier for recognizing handwritten digits using the MNIST dataset. Experiment with different values of k and distance metrics to optimize performance.
4. Sentiment Analysis with Naive Bayes
Create a simple sentiment classifier using the Naive Bayes algorithm on movie reviews or product reviews. Compare the performance of different text preprocessing techniques.
Datasets: e.g., IMDB, Amazon reviews
5. Customer Segmentation with k-Means
Apply k-means clustering to segment customers based on purchase behavior. Visualize the clusters and interpret what each customer segment represents.
Datasets: e.g., Online Retail dataset
6. Wine Quality Classification
Build a classification model to predict wine quality based on physicochemical properties. Compare the performance of two different algorithms of your choice.
Datasets: e.g., UCI Wine Quality dataset
7. Credit Card Fraud Detection
Implement a simple model to detect fraudulent credit card transactions. Focus on handling the class imbalance problem using techniques like undersampling, oversampling.
Datasets: e.g., Simplified/sampled versions of fraud datasets
8. Titanic Survival Prediction
Create a model to predict passenger survival on the Titanic. Perform feature engineering on passenger attributes and compare the performance of different classifiers.
9. Weather Prediction with Decision Trees
Build a simple weather prediction model using decision trees to predict whether it will rain tomorrow based on today’s weather conditions.
10. Fake News Detection
Implement a basic fake news detector using a simple text classification approach and a classifier of your choice.
11. Movie Recommendation with Simple Collaborative Filtering
Implement a basic user-based or item-based collaborative filtering system for movie recommendations using a small subset of a movie ratings dataset.
Datasets: e.g., Small subset of MovieLens dataset
12. Student Performance Prediction
Build a regression model to predict student performance based on demographic and study habit features. Compare the performance of linear regression and one non-linear algorithm.
Datasets: e.g., UCI Student Performance dataset
13. Diabetes Prediction with Ensemble Methods
Use ensemble methods (Random Forest or Gradient Boosting) to predict diabetes diagnosis based on diagnostic measurements.
Datasets: e.g., Pima Indians Diabetes dataset
14. Email Spam Classification
Build a simple spam filter using text features and a basic classifier like Naive Bayes or Logistic Regression.
Datasets: e.g., Email spam datasets
15. Stock Price Trend Prediction with Moving Averages
Implement a simple stock trend prediction system using moving averages and technical indicators. Focus on a single stock or index for simplicity.
16. Customer Churn Prediction
Build a model to predict customer churn using a telecommunications or banking dataset.
17. Credit Risk Assessment
Develop a credit scoring model to predict loan default risk. Implement a classification algorithm with appropriate handling of class imbalance.
Datasets: e.g., German Credit dataset
18. Market Basket Analysis with Association Rules
Implement association rule mining (Apriori algorithm) to discover patterns in transaction data and identify items frequently purchased together.
19. Text Classification with Bag-of-Words
Implement a simple text classifier using the Bag-of-Words approach and a classifier of your choice. Apply to news categorization or topic classification.
Datasets: e.g., 20 Newsgroups
20. Sales Forecasting with Time Series
Implement simple time series forecasting methods (moving average, exponential smoothing) to predict future sales for a retail company.
Datasets: e.g., superstore sales
21. Employee Attrition Analysis
Build a model to predict employee attrition using HR analytics data. Focus on identifying the most important factors contributing to attrition.
Datasets: e.g., IBM HR Analytics dataset
22. Income Level Prediction
Develop a model to predict income levels (above/below threshold) based on demographic and employment data using the Adult/Census Income dataset.
23. Online Ad Click-Through Rate Prediction
Build a model to predict whether a user will click on an advertisement based on user and ad features.
24. Restaurant Revenue Prediction
Create a regression model to predict restaurant revenue based on location, demographics, and restaurant characteristics.
Datasets: e.g., TFI Restaurant Revenue dataset
25. Song Genre Classification using Audio Features
Predict music genres using audio features extracted from songs (without using raw audio data). Focus on features like tempo, energy, danceability, etc.
Datasets: e.g., Spotify features dataset
26. Product Recommendation with Content-Based Filtering
Implement a simple content-based recommendation system that suggests products based on item features and user preferences.
27. Song Popularity Prediction
Develop a regression model to predict the popularity of songs based on audio features and metadata.
Datasets: e.g., Spotify features dataset
28. Call Center Volume Forecasting
Build a time series forecasting model to predict call center volume by hour or day to help with staff scheduling.
Datasets: e.g., Call center datasets
29. Network Intrusion Detection
Implement a classification model to detect network intrusions or anomalous network behavior using the KDD Cup 99 dataset or a simplified version.
30. Hospital Readmission Prediction
Build a model to predict which patients are likely to be readmitted to a hospital within 30 days after discharge.
Datasets: e.g., Healthcare datasets
31. Rental Price Prediction
Create a regression model to predict rental prices based on property features and location data.
Datasets: e.g., Housing/rental datasets
32. Email Campaign Response Prediction
Develop a model to predict which customers will respond to an email marketing campaign based on customer characteristics and past behavior.
Datasets: e.g., Marketing datasets
33. Air Quality Prediction
Build a regression model to predict air quality (PM2.5 or Air Quality Index) based on weather conditions and time variables.
Dataset: UCI Air Quality Dataset
34. Flight Delay Prediction
Develop a model to predict flight delays based on flight information, weather conditions, and airport data.
Dataset: Bureau of Transportation Statistics or Kaggle Flight Delays
35. Bike Sharing Demand Prediction
Create a regression model to predict hourly bike rental demand based on weather and seasonal factors.
Dataset: UCI Bike Sharing Dataset
36. Water Potability Analysis
Develop a classification model to predict whether water is safe for drinking based on various quality metrics.
Dataset: Water Potability Dataset
37. Supermarket Sales Analysis
Build a sales forecasting model for a supermarket chain, predicting sales by product category or store.
Dataset: Supermarket Sales Dataset
38. Heart Disease Prediction
Create a classification model to predict the presence of heart disease based on patient attributes.
Dataset: UCI Heart Disease Dataset
39. Student Academic Performance Prediction
Develop a model to predict student performance based on demographic, social, and academic factors.
Dataset: Student Performance Dataset
40. Car Price Prediction
Build a regression model to predict used car prices based on features like make, model, year, mileage, and other specifications.
Dataset: Used Cars Dataset (simplified version)
41. Telecom Customer Churn Analysis
Create a classification model to identify customers likely to leave a telecom service provider.
Dataset: Telco Customer Churn
42. Online News Popularity Prediction
Develop a model to predict how popular an online article will be based on its content and metadata.
Dataset: Online News Popularity
43. Mushroom Edibility Classification
Build a classifier to determine whether a mushroom is edible or poisonous based on its physical characteristics.
Dataset: UCI Mushroom Dataset
44. Diabetes Progression Prediction
Create a regression model to predict disease progression in diabetes patients based on diagnostic measurements.
Dataset: Diabetes Dataset
45. Credit Card Approval Prediction
Develop a classification model to predict whether a credit card application will be approved based on applicant information.
Dataset: Credit Card Approval Dataset
46. Job Satisfaction Prediction
Build a model to predict employee job satisfaction based on various workplace and personal factors.
Dataset: Job Satisfaction Dataset
47. Mobile App Rating Prediction
Create a regression model to predict mobile app ratings based on app features and metadata.
Dataset: Mobile App Store Dataset
48. Concrete Strength Prediction
Build a regression model to predict the compressive strength of concrete based on its components and age.
Dataset: Concrete Compressive Strength
49. Traffic Accident Severity Analysis
Develop a model to predict the severity of traffic accidents based on location, time, and weather conditions.
Dataset: UK Road Safety Data
50. Fertilizer Recommendation System
Create a classification model to recommend the type of fertilizer based on soil characteristics and crop type.
Dataset: Fertilizer Prediction Dataset
Deliverables
Each student must submit the following:
1. Project Code
- Well-documented Python code with clear instructions for execution
- Include all necessary files to reproduce your results
- Use comments to explain your implementation decisions
- Submit code as Jupyter/Google Colab Notebooks (.ipynb), and dataset files
2. Technical Report (PDF format, 5-10 pages) (Also hard copy for Mid II Assignment)
- Abstract: Brief summary of the project (150-250 words)
- Introduction: Problem statement and background
- Methodology: Detailed explanation of your approach
- Implementation: Key aspects of your code implementation
- Experiments: Dataset description, preprocessing steps, and evaluation metrics
- Results: Presentation and analysis of results with visualizations
- Discussion: Interpretation of results and limitations
- Conclusion: Summary of findings and future work
- References: List of all sources cited
3. Video Presentation (5-8 minutes)
- Introduction to the problem and its significance
- Brief overview of the dataset used
- Explanation of your methodology and implementation
- Demonstration of key results with visualizations
- Discussion of challenges faced and solutions implemented
- Conclusion and potential improvements
- Format: MP4 file
Submission Guidelines
- All deliverables must be submitted through the submission link.
- Name your files as: RollNumber_AssignmentNumber_ItemName (e.g., 22251A17XX_AssignmentXX_Report.pdf)
- This assignment must represent your own individual work. While you may discuss general concepts with your classmates, the implementation, report, and presentation must be entirely your own.
- You can take help of AI tools.
- Deliverables Submission Deadline: 02-11-2025.
Good luck with your assignments and presentation!
Submission Link
- CONSENT FORM (for uploading best assignment presentation videos in CHANDRAS EDU Youtube channel) [Optional]