11

Click here to load reader

Kaggle presentation friday

Embed Size (px)

Citation preview

Page 1: Kaggle presentation friday

An analysis of the Titanic dataset to explore whether port of embarkation influenced survival rates.

PETER REYNOLDSSEAMUS O’ CONGHAILEDAVID BOURKE

Page 2: Kaggle presentation friday

Introduction

What is Kaggle, its workings & what it asks competitors to do

The Titanic competition and what it broadly asks competitors to do

The data available to competitors

The question within the Titanic dataset that we focused on

Page 3: Kaggle presentation friday

Data Mining & Machine Learning

Data mining is a process whereby we try and “discover novel, interesting and potentially useful patterns from large datasets”

Machine Learning as Lantz (2013) points out is “interested in the development of computer algorithms for transforming data into intelligent action”

How these processes help us discover patterns within large datasets

Page 4: Kaggle presentation friday

Our Approach

We chose a Classification approach as it suited the data we were handling.

The Classification tool we used – Decision Tree

Cross and Split Validation

Our use of Rapidminer, what it is, why we chose it.

Page 5: Kaggle presentation friday

Implementation

Clean and prepare the data  Build a Decision Tree Apply the model Apply The validation model  Different types validation sampling models. Export results (Data file) Submit findings to Kaggle

Page 6: Kaggle presentation friday

Decision Tree

Page 7: Kaggle presentation friday

Cross Validation & Split validation

Linear Divides the example set into partitions

Shuffled Builds Subsets Stratified Builds random subsets Automatic Stratified by default Leave One OutApplies the model line by line to Test set

Page 8: Kaggle presentation friday

Results

1st Kaggle prediction accuracy of 24.42%

Revised Model for 2nd attempt

2nd Kaggle prediction accuracy of 77.51%

Page 9: Kaggle presentation friday

Results

Survived

Southampton• 197 Passengers

Cherbourg• 83 Passengers

Queenstown• 41 Passengers

Died

Southampton• 719 Passengers

Cherbourg• 187 Passengers

Queenstown• 82 Passengers

Survival Rates

Southampton• 21%

Cherbourg• 31%

Queenstown• 33%

Page 10: Kaggle presentation friday

Conclusion

Results show that there is no significant evidence to prove correlation

Queenstown had highest survival rate even though passengers were predominantly 3rd Class passengers

Other models including Naïve Bayes or Random Forest could possibly yield higher prediction accuracy

Scope for future work

Page 11: Kaggle presentation friday

Questions?