Using Classification models on Pima Indians Diabetes Data set

The goal of my project is to predict whether a patient has diabetes or not using supervised classification. Diabetes is a chronic disease that occurs when the pancreas are unable to produce Insulin thus resulting in high level sugars to not be broken down by the body thus resulting in damage to organs and tissues.

The data set used is the Pima Indians Diabetes dataset. It has 9 features that will be used to tune the model:

  1. Glucose: Concentration of glucose in the body
  2. Blood Pressure: The diastolic blood pressure (mm…

The goal is to understand the concept of over fitting by using polynomial regression.

Over fitting is one of the challenges faced in Machine Learning. It usually occurs when the model that we are trying to implement is too complex or the input data set that we have used for training the model is too small. Due to this the model performs well on the training data set giving us less errors. However, the model suffers when dealing with test data set thus giving large amounts of error.

In our assignment I was tasked with creating 20 pairs of Input…


The goal is to apply a Convolutional Neural Net Model on the CIFAR10 image data set and test the accuracy of the model on the basis of image classification.

CIFAR10 is a collection of images used to train Machine Learning and Computer Vision algorithms. It contains 60K images having dimension of 32x32 with ten different classes such as airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. We train our Neural Net Model specifically Convolutional Neural Net (CNN) on this data set.

CNN's are a class of Deep Learning Algorithms that can recognize and and classify particular features…


My contribution towards the Titanic Challenge on Kaggle.

The goal is to predict the survival rate of passengers given data sets about the information about the passengers onboard.

I used the method of K-Nearest Neighbours to predict the survival rate of the passengers.

References (Taken help from):

1) https://youtu.be/hxauqndYYUo

2) https://youtu.be/50sWPzlmxOE

3) https://youtu.be/HnLiVutur8A

4) https://www.kaggle.com/spidy20/titanic-eda-with-80-prediction-on-sb

5) https://www.kaggle.com/biswarupray/knn-titanic

The original tutorial submission of titanic data set used the concept of Random Forest Classifier to predict the survived passengers. Base score we get is: 0.77511

I experimented on the same notebook using various other classification models such as Extra Trees Classifiers, ADABoost…

Shonit Gangoly

Software Engineer in the making

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store