Titanic Classification Disaster Kaggle

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

[2] “Titanic: Machine Learning from Disaster,” Kaggle.com. [Online].

https://www.kaggle.com/c/titanic-gettingStarted. [Accessed: 13-Dec-2013].
[3] Wiki, “Titanic.” [Online]. Available: http://en.wikipedia.org/wiki/Titanic. [Accessed: 13-Dec-2013].
Field Action Comment
PassengerId Removed Not needed for analysis as it’s
just an identifier
Survived Converted to No/Yes Needed Nominal identifier
Pclass Removed -> created Class Needed Nominal Identifier
column instead
Class New column Simple calculation based upon
Age Removed -> created AgeGroup Wanted simple classification
class coding
AgeGroup Formula based; some values not Arbitrarily did the following:
supplied. But ended up with 4 =IF(F2="", "Unk",IF(F2<10,
group other than Unknown "Child", IF(F2<20, "Adolescent",
(Child, Adolescent, Adult, Old) IF(F2<50, "Adult", "Old"))))
Ecode Removed -> created class Needed nominal identifier
Embarked New column that converted
Ecode to the real name of the
departure point for the
@relation 'train4-weka.filters.unsupervised.attribute.Remove-R1,3,6,8'

@attribute Survived {No,Yes}

@attribute Class {1st,2nd,3rd}
@attribute Sex {male,female}
@attribute AgeGroup {Child,Adolescent,Adult,Old,Unk}
@attribute Embarked {Southampton,Cherbourg,Queenstown,Unk}

J48 pruned tree

Sex = male: No (577.0/109.0)

Sex = female
| Class = 3rd
| | Embarked = Southampton: No (88.0/33.0)
| | Embarked = Cherbourg: Yes (23.0/8.0)
| | Embarked = Queenstown
| | | AgeGroup = Child: Yes (0.0)
| | | AgeGroup = Adolescent: Yes (5.0/1.0)
| | | AgeGroup = Adult: No (5.0/1.0)
| | | AgeGroup = Old: Yes (0.0)
| | | AgeGroup = Unk: Yes (23.0/4.0)
| | Embarked = Unk: No (0.0)
| Class = 1st: Yes (94.0/3.0)
| Class = 2nd: Yes (76.0/6.0)

Number of Leaves : 11

Size of the tree : 15

=== Summary ===

Correctly Classified Instances 722 81.0325 %

Incorrectly Classified Instances 169 18.9675 %
Kappa statistic 0.5714
Mean absolute error 0.2911
Root mean squared error 0.385
Relative absolute error 61.5359 %
Root relative squared error 79.1696 %
Information gain
Amount of information gained by knowing the value of the attribute
(Entropy of distribution before the split) –(entropy of distribution after it)
Claude Shannon, American mathematician and scientist 1916–2001

You might also like