Predicting the Quality of Wine

In Spring 2020, I participated in a semester-long Data Science course offered by Forge (formerly HackCville). Throughout this course, I developed data science skills such as data cleaning, machine learning and web scraping in Python.

The second project for this course was a machine learning project focused on predicting the quality of wine using a variety of variables including citric acid, density, pH and alcohol to name a few. To see who could generate the most successful model, we created a Kaggle competition that scored each competitor’s model using the F1 score. The data on Kaggle is linked here and my code on GitHub is linked here.

Data Cleaning

Exploratory Data Analysis

Machine Learning Models

Logistic Regression

The F1 score for this model was 0.752411 so the model did a decent job at predicting the quality of wine.

K-Nearest Neighbors

The F1 score for this model was 0.715189, so it performed slightly worse than the logistic regression model.

Random Forest Classifier

The F1 score for this model was the highest of the three at 0.805194. This model did a good job at predicting the quality of wine.

Conclusions

Since Kaggle calculated the F1 score of our models with only 20% of the test data, the values were slightly different than what I calculated. Compared to the other 12 participants in the competition, my model proved to be most successful when predicting wine quality on 20% of the test data with a mean F1 score of 0.84375.

This was the first machine learning project that I completed independently and I was very excited to not only develop new skills but place first in the Kaggle competition! I learned the differences between machine learning models, how to fit the best model with hyperparameters and how to compare the models using the F1 score. In future projects, I would like to analyze hyperparameters on all models and use more performance metrics to measure my models.

I am a third year student at the University of Virginia studying Applied Statistics and Data Science and am excited to showcase my machine learning projects.