Titanic Classification Disaster Kaggle

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

[2] “Titanic: Machine Learning from Disaster,” Kaggle.com. [Online].

Available:
https://www.kaggle.com/c/titanic-gettingStarted. [Accessed: 13-Dec-2013].
[3] Wiki, “Titanic.” [Online]. Available: http://en.wikipedia.org/wiki/Titanic. [Accessed: 13-Dec-2013].
http://www.cs.waikato.ac.nz/ml/weka/
https://www.kaggle.com/c/titanic-gettingStarted
Field Action Comment
PassengerId Removed Not needed for analysis as it’s
just an identifier
Survived Converted to No/Yes Needed Nominal identifier
Pclass Removed -> created Class Needed Nominal Identifier
column instead
Class New column Simple calculation based upon
PClass
Age Removed -> created AgeGroup Wanted simple classification
class coding
AgeGroup Formula based; some values not Arbitrarily did the following:
supplied. But ended up with 4 =IF(F2="", "Unk",IF(F2<10,
group other than Unknown "Child", IF(F2<20, "Adolescent",
(Child, Adolescent, Adult, Old) IF(F2<50, "Adult", "Old"))))
Ecode Removed -> created class Needed nominal identifier
Embarked
Embarked New column that converted
Ecode to the real name of the
departure point for the
passenger
@relation 'train4-weka.filters.unsupervised.attribute.Remove-R1,3,6,8'

@attribute Survived {No,Yes}


@attribute Class {1st,2nd,3rd}
@attribute Sex {male,female}
@attribute AgeGroup {Child,Adolescent,Adult,Old,Unk}
@attribute Embarked {Southampton,Cherbourg,Queenstown,Unk}

@data
No,3rd,male,Adult,Southampton
Yes,1st,female,Adult,Cherbourg
J48 pruned tree
------------------

Sex = male: No (577.0/109.0)


Sex = female
| Class = 3rd
| | Embarked = Southampton: No (88.0/33.0)
| | Embarked = Cherbourg: Yes (23.0/8.0)
| | Embarked = Queenstown
| | | AgeGroup = Child: Yes (0.0)
| | | AgeGroup = Adolescent: Yes (5.0/1.0)
| | | AgeGroup = Adult: No (5.0/1.0)
| | | AgeGroup = Old: Yes (0.0)
| | | AgeGroup = Unk: Yes (23.0/4.0)
| | Embarked = Unk: No (0.0)
| Class = 1st: Yes (94.0/3.0)
| Class = 2nd: Yes (76.0/6.0)

Number of Leaves : 11

Size of the tree : 15

=== Summary ===

Correctly Classified Instances 722 81.0325 %


Incorrectly Classified Instances 169 18.9675 %
Kappa statistic 0.5714
Mean absolute error 0.2911
Root mean squared error 0.385
Relative absolute error 61.5359 %
Root relative squared error 79.1696 %
Information gain
Amount of information gained by knowing the value of the attribute
(Entropy of distribution before the split) –(entropy of distribution after it)
Claude Shannon, American mathematician and scientist 1916–2001

You might also like