Skip to content

AttributeError: 'PrettyPrinter' object has no attribute '_indent_at_name' #12906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adrinjalali opened this issue Jan 2, 2019 · 11 comments · Fixed by #12938
Closed

AttributeError: 'PrettyPrinter' object has no attribute '_indent_at_name' #12906

adrinjalali opened this issue Jan 2, 2019 · 11 comments · Fixed by #12938

Comments

@adrinjalali
Copy link
Member

There's a failing example in #12654, and here's a piece of code causing it:

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.decomposition import PCA, NMF
from sklearn.feature_selection import SelectKBest, chi2

pipe = Pipeline([
    # the reduce_dim stage is populated by the param_grid
    ('reduce_dim', 'passthrough'),
    ('classify', LinearSVC(dual=False, max_iter=10000))
])

N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
    {
        'reduce_dim': [PCA(iterated_power=7), NMF()],
        'reduce_dim__n_components': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
    {
        'reduce_dim': [SelectKBest(chi2)],
        'reduce_dim__k': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
]
reducer_labels = ['PCA', 'NMF', 'KBest(chi2)']

grid = GridSearchCV(pipe, cv=5, n_jobs=1, param_grid=param_grid, iid=False)
from tempfile import mkdtemp
from joblib import Memory

# Create a temporary folder to store the transformers of the pipeline
cachedir = mkdtemp()
memory = Memory(location=cachedir, verbose=10)
cached_pipe = Pipeline([('reduce_dim', PCA()),
                        ('classify', LinearSVC(dual=False, max_iter=10000))],
                       memory=memory)

# This time, a cached pipeline will be used within the grid search
grid = GridSearchCV(cached_pipe, cv=5, n_jobs=1, param_grid=param_grid,
                    iid=False, error_score='raise')
digits = load_digits()
grid.fit(digits.data, digits.target)

With the stack trace:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/path/to//sklearn/model_selection/_search.py", line 683, in fit
    self._run_search(evaluate_candidates)
  File "/path/to//sklearn/model_selection/_search.py", line 1127, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/path/to//sklearn/model_selection/_search.py", line 672, in evaluate_candidates
    cv.split(X, y, groups)))
  File "/path/to//sklearn/externals/joblib/parallel.py", line 917, in __call__
    if self.dispatch_one_batch(iterator):
  File "/path/to//sklearn/externals/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "/path/to//sklearn/externals/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/path/to//sklearn/externals/joblib/_parallel_backends.py", line 182, in apply_async
    result = ImmediateResult(func)
  File "/path/to//sklearn/externals/joblib/_parallel_backends.py", line 549, in __init__
    self.results = batch()
  File "/path/to//sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/path/to//sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/path/to//sklearn/model_selection/_validation.py", line 511, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/path/to//sklearn/pipeline.py", line 279, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/path/to//sklearn/pipeline.py", line 244, in _fit
    **fit_params_steps[name])
  File "/path/to/packages/joblib/memory.py", line 555, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/path/to/packages/joblib/memory.py", line 521, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/path/to/packages/joblib/memory.py", line 720, in call
    print(format_call(self.func, args, kwargs))
  File "/path/to/packages/joblib/func_inspect.py", line 356, in format_call
    path, signature = format_signature(func, *args, **kwargs)
  File "/path/to/packages/joblib/func_inspect.py", line 340, in format_signature
    formatted_arg = _format_arg(arg)
  File "/path/to/packages/joblib/func_inspect.py", line 322, in _format_arg
    formatted_arg = pformat(arg, indent=2)
  File "/path/to/packages/joblib/logger.py", line 54, in pformat
    out = pprint.pformat(obj, depth=depth, indent=indent)
  File "/usr/lib64/python3.7/pprint.py", line 58, in pformat
    compact=compact).pformat(object)
  File "/usr/lib64/python3.7/pprint.py", line 144, in pformat
    self._format(object, sio, 0, 0, {}, 0)
  File "/usr/lib64/python3.7/pprint.py", line 167, in _format
    p(self, object, stream, indent, allowance, context, level + 1)
  File "/path/to//sklearn/utils/_pprint.py", line 175, in _pprint_estimator
    if self._indent_at_name:
AttributeError: 'PrettyPrinter' object has no attribute '_indent_at_name'
@rth
Copy link
Member

rth commented Jan 3, 2019

So for some reason, the class is PrettyPrinter instead of _EstimatorPrettyPrinter (which inherits from PrettyPrinter). But then

  File "/path/to//sklearn/utils/_pprint.py", line 175, in _pprint_estimator
    if self._indent_at_name:

is a _EstimatorPrettyPrinter method, so I don't understand what is going on...

@adrinjalali
Copy link
Member Author

By the way, the example also fails on master, but somehow circle-ci on master is green.

@adrinjalali
Copy link
Member Author

by example, I mean examples/compose/plot_compare_reduction.py

@adrinjalali
Copy link
Member Author

#12791 seems to be failing for the same reason.

@rth
Copy link
Member

rth commented Jan 5, 2019

By the way, the example also fails on master, but somehow circle-ci on master is green.

I can't see it in the latest build on master.

@jnothman
Copy link
Member

jnothman commented Jan 6, 2019

I think it's because this line should involve a .copy()

_dispatch = pprint.PrettyPrinter._dispatch

@jnothman
Copy link
Member

jnothman commented Jan 6, 2019

That is, we're modifying the dispatch used by pprint rather than the local pretty printer.

But then it's also a bit weird that _pprint_estimator references a method on the class. This means that configuration of the class cannot affect anything. Rather it should perhaps reference an instancemethod on a configuration singleton??

@jnothman
Copy link
Member

jnothman commented Jan 7, 2019

@NicolasHug do you want to fix this, or should we open it to other contributors?

@NicolasHug
Copy link
Member

Thanks @jnothman I didn't see this, I'll take a look

@NicolasHug
Copy link
Member

You're right @jnothman we should make a copy of _dispatch.

The bug happens because joblib is using calling PrettyPrinter on an estimator, but the _dispatch dict of PrettyPrinter has been updated by _EstimatorPrettyPrinter sometime before, which tells the PrettyPrinter object to use _EstimatorPrettyPrinter._pprint_estimator to render BaseEstimator objects.

Pretty sneaky... I'll submit a fix.

However I'm not sure I follow your concern about _pprint_estimator being a method.

@NicolasHug
Copy link
Member

Minimal reproducing example:

from pprint import PrettyPrinter
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
PrettyPrinter().pprint(lr)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants