Skip to content

[MRG + 1] Fix reference in fetch_kddcup99 based on #7861 #8071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 19, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions sklearn/datasets/kddcup99.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def fetch_kddcup99(subset=None, shuffle=False, random_state=None,

The KDD Cup '99 dataset was created by processing the tcpdump portions
of the 1998 DARPA Intrusion Detection System (IDS) Evaluation dataset,
created by MIT Lincoln Lab [1] . The artificial data was generated using
created by MIT Lincoln Lab [1]. The artificial data was generated using
a closed network and hand-injected attacks to produce a large number of
different types of attack with normal activity in the background.
As the initial goal was to produce a large training set for supervised
Expand Down Expand Up @@ -134,7 +134,7 @@ def fetch_kddcup99(subset=None, shuffle=False, random_state=None,
shuffle : bool, default=False
Whether to shuffle dataset.

percent10 : bool, default=False
percent10 : bool, default=True
Whether to load only 10 percent of the data.

download_if_missing : bool, default=True
Expand All @@ -155,9 +155,11 @@ def fetch_kddcup99(subset=None, shuffle=False, random_state=None,
Detection Evaluation Richard Lippmann, Joshua W. Haines,
David J. Fried, Jonathan Korba, Kumar Das

.. [2] A Geometric Framework for Unsupervised Anomaly Detection: Detecting
Intrusions in Unlabeled Data (2002) by Eleazar Eskin, Andrew Arnold,
Michael Prerau, Leonid Portnoy, Sal Stolfo
.. [2] K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. Online
unsupervised outlier detection using finite mixtures with
discounting learning algorithms. In Proceedings of the sixth
ACM SIGKDD international conference on Knowledge discovery
and data mining, pages 320-324. ACM Press, 2000.

"""
kddcup99 = _fetch_brute_kddcup99(shuffle=shuffle, percent10=percent10,
Expand Down Expand Up @@ -214,7 +216,7 @@ def fetch_kddcup99(subset=None, shuffle=False, random_state=None,

def _fetch_brute_kddcup99(subset=None, data_home=None,
download_if_missing=True, random_state=None,
shuffle=False, percent10=False):
shuffle=False, percent10=True):

"""Load the kddcup99 dataset, downloading it if necessary.

Expand Down Expand Up @@ -242,7 +244,7 @@ def _fetch_brute_kddcup99(subset=None, data_home=None,
shuffle : bool, default=False
Whether to shuffle dataset.

percent10 : bool, default=False
percent10 : bool, default=True
Whether to load only 10 percent of the data.

Returns
Expand Down