Skip to content

permutation_test_score always outputs 0.0099 as p-value regardless of input #8379

@afshinrahimi

Description

@afshinrahimi

Description

permutation_test_score's p-value is always 0.0099 no matter what dataset/estimator is used. The p-value produced in the user guide is also this magic number.

Steps/Code to Reproduce

from __future__ import print_function

import logging
import numpy as np
from optparse import OptionParser
import sys
from time import time
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.grid_search import GridSearchCV
from sklearn.linear_model import SGDClassifier
from sklearn.cross_validation import permutation_test_score

categories = ['sci.space', 'comp.graphics']
data_train = fetch_20newsgroups(subset='train', categories=categories,
                                shuffle=True, random_state=42)


y_train = data_train.target
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5,
                             stop_words='english')
X_train = vectorizer.fit_transform(data_train.data)
estimator = SGDClassifier(loss='log', penalty= 'l2',l1_ratio=0.5, n_iter=20, shuffle=True, verbose=False, 
                   n_jobs=6, alpha=1e-6, average=False, class_weight='balanced')
parameters = {'alpha': [10 ** a for a in range(-10, -1)]}
clf = GridSearchCV(estimator, parameters, cv=5, scoring='f1_macro', n_jobs=5)
clf.fit(X_train, y_train)
score, permutation_scores, pvalue = permutation_test_score(clf.best_estimator_, X_train, y_train, cv=5, n_jobs=5,
                                                   n_permutations=100, scoring='accuracy')
print ('socre %f p-value %f' %( score, pvalue))

Versions

Please run the following snippet and paste the output below.

import platform; print(platform.platform())
Linux-3.13.0-76-generic-x86_64-with-Ubuntu-14.04-trusty
import sys; print("Python", sys.version)
('Python', '2.7.6 (default, Oct 26 2016, 20:30:19) \n[GCC 4.8.4]')
import numpy; print("NumPy", numpy.version)
('NumPy', '1.11.3')
import scipy; print("SciPy", scipy.version)
('SciPy', '0.18.1')
import sklearn; print("Scikit-Learn", sklearn.version)
('Scikit-Learn', '0.17.1')

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions