XG Boost PDF
XG Boost PDF
XG Boost PDF
net/publication/315494229
CITATIONS READS
0 62
1 author:
Benjamin Manning
University of Georgia
6 PUBLICATIONS 4 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Benjamin Manning on 22 March 2017.
4969
represented the identity of the person doing the related Lastly, the minimum sum of instance weights default
typing tasks. setting of 1 was left unchanged for training. Using the
parameters previously described, the models produced
Preprocessing, Model Description and Results results as shown in Table I.
The dataset was analyzed to ascertain if any of the features
could be removed from dataset. Two features: one Conclusion
representing the individual session identification and a
second feature representing the individual observation These results are favorable for the use of keyboard
number, were removed from the data to ensure that no bias dynamics data to identify that the user of system is the
was created by features associated with identification only. same user the training data was obtained from. This
This reduced the dimensionality of the dataset to thirty-one process could also be scaled and expanded to allow for
features and one dependent variable; these elements were extended training using larger amounts of unstructured
used to train and test the model for the study. data such as daily postings in social media or common
The Classification and Regression Training (caret) daily and routine tasks that require text entry into the
package for R was chosen for model training and testing system.
and the dataset was randomly stratified and divided into Typing is an integral input interaction for a vast majority
separate training and testing sets containing 70% of the of user based software systems and it would be easy to
data and 30% of the data, respectively. integrate a similar predictive model into the lifecycle of
The algorithm chosen for this study, Extreme Gradient software development for high security scenarios.
Boosting (XGB), is a combination of Gradient Descent Minimally, this would help to prevent inside attacks where
(Burges, et al. 2005) and Boosting (Dietterich 2000) and users that are not the owners of valid credentials try to
offers different tuning parameters that were modified for enter passwords and other login information that are not
with the goal of building an optimal model. The tuning their own, but are deemed valid for entry into the system.
parameters included number of iterations, maximum tree In addition, this prospective model could also be used to
depth, shrinkage, minimum loss reduction, subsample rates assist verification during larger multi-factor authentication
and minimum sum of instance weights. In order to create a scenarios where users enter simple username/password
necessary comparison with XGB models were also created combinations along with other text information - such as
using C50 and KNN so the benefit of using XGB could be the answers to challenge questions.
easily seen. Similar tuning parameters for both of the Both of these scenarios provide two examples, one for
additional algorithms were also established before training each part of a system, that demonstrate how Gradient
began. Boosting models and machine learning can be used to
When using XGB the number of iterations specified how prevent attacks from the inside of a system by identifying
many times the data would be analyzed; 150 was chosen as users through the validation of an identity using behavioral
the optimal number of iterations through numerous cycles biometrics in the form of keyboard stroke dynamics; this is
of training to reduce any unnecessary training time. The notable as the findings of this study are part of a much
maximum depth of the model tree was limited to just two larger proposal on designing a decision support system for
branches to adequately prevent overfitting. Shrinkage was multifactor authentication.
set at .3 to ensure the model would be strong enough to
generalize to new data when making predictions while also
improving the performance of the model in the most
References
optimal way. The minimum loss reduction or Gamma was Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M.,
set at 0 for the entire training process and the fraction of Hamilton, N., & Hullender, G. (2005, August). Learning to rank
observations selected (subsample rates) for each tree was using gradient descent. In Proceedings of the 22nd international
conference on Machine learning (pp. 89-96). ACM.
set at 0.6.
Dietterich, T. G. (2000, June). Ensemble methods in machine
learning. In International workshop on multiple classifier
Accuracy Kappa systems (pp. 1-15). Springer Berlin Heidelberg.
Friedman, J. H. (2002). Stochastic gradient
1 boosting. Computational Statistics & Data Analysis, 38(4), 367-
378.
0.5 Killourhy, K. S., & Maxion, R. A. (2009, June). Comparing
anomaly-detection algorithms for keystroke dynamics. In 2009
IEEE/IFIP International Conference on Dependable Systems &
0 Networks (pp. 125-134). IEEE.
XGB C50 KNN Quinlan, J. R. (1986). Induction of decision trees. Machine
learning, 1(1), 81-106.
Table I – Model Metric Comparison
4970