Skip to content

DOC improve docstring of covariance module following doc guideline #16105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 27 additions & 28 deletions sklearn/covariance/_elliptic_envelope.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ class EllipticEnvelope(OutlierMixin, MinCovDet):

Parameters
----------
store_precision : boolean, optional (default=True)
store_precision : bool, default=True
Specify if the estimated precision is stored.

assume_centered : boolean, optional (default=False)
assume_centered : bool, default=False
If True, the support of robust location and covariance estimates
is computed, and a covariance estimate is recomputed from it,
without centering the data.
Expand All @@ -28,16 +28,17 @@ class EllipticEnvelope(OutlierMixin, MinCovDet):
If False, the robust location and covariance are directly computed
with the FastMCD algorithm without additional treatment.

support_fraction : float in (0., 1.), optional (default=None)
support_fraction : float, default=None
The proportion of points to be included in the support of the raw
MCD estimate. If None, the minimum value of support_fraction will
be used within the algorithm: `[n_sample + n_features + 1] / 2`.
Range is (0, 1).

contamination : float in (0., 0.5), optional (default=0.1)
contamination : float, default=0.1
The amount of contamination of the data set, i.e. the proportion
of outliers in the data set.
of outliers in the data set. Range is (0, 0.5).

random_state : int, RandomState instance or None, optional (default=None)
random_state : int or RandomState instance, default=None
The seed of the pseudo random number generator to use when shuffling
the data. If int, random_state is the seed used by the random number
generator; If RandomState instance, random_state is the random number
Expand All @@ -46,17 +47,17 @@ class EllipticEnvelope(OutlierMixin, MinCovDet):

Attributes
----------
location_ : array-like, shape (n_features,)
location_ : ndarray of shape (n_features,)
Estimated robust location

covariance_ : array-like, shape (n_features, n_features)
covariance_ : ndarray of shape (n_features, n_features)
Estimated robust covariance matrix

precision_ : array-like, shape (n_features, n_features)
precision_ : ndarray of shape (n_features, n_features)
Estimated pseudo inverse matrix.
(stored only if store_precision is True)

support_ : array-like, shape (n_samples,)
support_ : ndarray of shape (n_samples,)
A mask of the observations that have been used to compute the
robust estimates of location and shape.

Expand Down Expand Up @@ -102,7 +103,6 @@ class EllipticEnvelope(OutlierMixin, MinCovDet):
.. [1] Rousseeuw, P.J., Van Driessen, K. "A fast algorithm for the
minimum covariance determinant estimator" Technometrics 41(3), 212
(1999)

"""
def __init__(self, store_precision=True, assume_centered=False,
support_fraction=None, contamination=0.1,
Expand All @@ -119,12 +119,11 @@ def fit(self, X, y=None):

Parameters
----------
X : numpy array or sparse matrix, shape (n_samples, n_features).
Training data
X : {array-like, sparse matrix} of shape (n_samples, n_features)
Training data.

y : Ignored
not used, present for API consistency by convention.

Not used, present for API consistency by convention.
"""
super().fit(X)
self.offset_ = np.percentile(-self.dist_, 100. * self.contamination)
Expand All @@ -135,17 +134,16 @@ def decision_function(self, X):

Parameters
----------
X : array-like, shape (n_samples, n_features)
X : array-like of shape (n_samples, n_features)
The data matrix.

Returns
-------

decision : array-like, shape (n_samples, )
decision : ndarray of shape (n_samples, )
Decision function of the samples.
It is equal to the shifted Mahalanobis distances.
The threshold for being an outlier is 0, which ensures a
compatibility with other outlier detection algorithms.

"""
check_is_fitted(self)
negative_mahal_dist = self.score_samples(X)
Expand All @@ -156,11 +154,12 @@ def score_samples(self, X):

Parameters
----------
X : array-like, shape (n_samples, n_features)
X : array-like of shape (n_samples, n_features)
The data matrix.

Returns
-------
negative_mahal_distances : array-like, shape (n_samples, )
negative_mahal_distances : array-like of shape (n_samples,)
Opposite of the Mahalanobis distances.
"""
check_is_fitted(self)
Expand All @@ -173,11 +172,12 @@ def predict(self, X):

Parameters
----------
X : array-like, shape (n_samples, n_features)
X : array-like of shape (n_samples, n_features)
The data matrix.

Returns
-------
is_inlier : array, shape (n_samples,)
is_inlier : ndarray of shape (n_samples,)
Returns -1 for anomalies/outliers and +1 for inliers.
"""
X = check_array(X)
Expand All @@ -196,19 +196,18 @@ def score(self, X, y, sample_weight=None):

Parameters
----------
X : array-like, shape (n_samples, n_features)
X : array-like of shape (n_samples, n_features)
Test samples.

y : array-like, shape (n_samples,) or (n_samples, n_outputs)
y : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.

sample_weight : array-like, shape (n_samples,), optional
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.

Returns
-------
score : float
Mean accuracy of self.predict(X) wrt. y.

Mean accuracy of self.predict(X) w.r.t. y.
"""
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
59 changes: 27 additions & 32 deletions sklearn/covariance/_empirical_covariance.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,16 @@ def log_likelihood(emp_cov, precision):

Parameters
----------
emp_cov : 2D ndarray (n_features, n_features)
Maximum Likelihood Estimator of covariance
emp_cov : ndarray of shape (n_features, n_features)
Maximum Likelihood Estimator of covariance.

precision : 2D ndarray (n_features, n_features)
The precision matrix of the covariance model to be tested
precision : ndarray of shape (n_features, n_features)
The precision matrix of the covariance model to be tested.

Returns
-------
sample mean of the log-likelihood
log_likelihood_ : float
Sample mean of the log-likelihood.
"""
p = precision.shape[0]
log_likelihood_ = - np.sum(emp_cov * precision) + fast_logdet(precision)
Expand All @@ -52,20 +53,19 @@ def empirical_covariance(X, assume_centered=False):

Parameters
----------
X : ndarray, shape (n_samples, n_features)
X : ndarray of shape (n_samples, n_features)
Data from which to compute the covariance estimate

assume_centered : boolean
assume_centered : bool, default=False
If True, data will not be centered before computation.
Useful when working with data whose mean is almost, but not exactly
zero.
If False, data will be centered before computation.

Returns
-------
covariance : 2D ndarray, shape (n_features, n_features)
covariance : ndarray of shape (n_features, n_features)
Empirical covariance (Maximum Likelihood Estimator).

"""
X = np.asarray(X)
if X.ndim == 1:
Expand All @@ -92,24 +92,24 @@ class EmpiricalCovariance(BaseEstimator):

Parameters
----------
store_precision : bool
store_precision : bool, default=True
Specifies if the estimated precision is stored.

assume_centered : bool
assume_centered : bool, default=False
If True, data are not centered before computation.
Useful when working with data whose mean is almost, but not exactly
zero.
If False (default), data are centered before computation.

Attributes
----------
location_ : array-like, shape (n_features,)
location_ : ndarray of shape (n_features,)
Estimated location, i.e. the estimated mean.

covariance_ : 2D ndarray, shape (n_features, n_features)
covariance_ : ndarray of shape (n_features, n_features)
Estimated covariance matrix

precision_ : 2D ndarray, shape (n_features, n_features)
precision_ : ndarray of shape (n_features, n_features)
Estimated pseudo-inverse matrix.
(stored only if store_precision is True)

Expand Down Expand Up @@ -144,10 +144,9 @@ def _set_covariance(self, covariance):

Parameters
----------
covariance : 2D ndarray, shape (n_features, n_features)
covariance : array-like of shape (n_features, n_features)
Estimated covariance matrix to be stored, and from which precision
is computed.

"""
covariance = check_array(covariance)
# set covariance
Expand All @@ -163,9 +162,8 @@ def get_precision(self):

Returns
-------
precision_ : array-like
precision_ : array-like of shape (n_features, n_features)
The precision matrix associated to the current covariance object.

"""
if self.store_precision:
precision = self.precision_
Expand All @@ -183,13 +181,12 @@ def fit(self, X, y=None):
Training data, where n_samples is the number of samples and
n_features is the number of features.

y
not used, present for API consistence purpose.
y : Ignored
Not used, present for API consistence purpose.

Returns
-------
self : object

"""
X = check_array(X)
if self.assume_centered:
Expand All @@ -214,15 +211,14 @@ def score(self, X_test, y=None):
X_test is assumed to be drawn from the same distribution than
the data used in fit (including centering).

y
not used, present for API consistence purpose.
y : Ignored
Not used, present for API consistence purpose.

Returns
-------
res : float
The likelihood of the data set with `self.covariance_` as an
estimator of its covariance matrix.

"""
# compute empirical covariance of the test set
test_cov = empirical_covariance(
Expand All @@ -242,26 +238,26 @@ def error_norm(self, comp_cov, norm='frobenius', scaling=True,
comp_cov : array-like of shape (n_features, n_features)
The covariance to compare with.

norm : str
norm : {"frobenius", "spectral"}, default="frobenius"
The type of norm used to compute the error. Available error types:
- 'frobenius' (default): sqrt(tr(A^t.A))
- 'spectral': sqrt(max(eigenvalues(A^t.A))
where A is the error ``(comp_cov - self.covariance_)``.

scaling : bool
scaling : bool, default=True
If True (default), the squared error norm is divided by n_features.
If False, the squared error norm is not rescaled.

squared : bool
squared : bool, default=True
Whether to compute the squared error norm or the error norm.
If True (default), the squared error norm is returned.
If False, the error norm is returned.

Returns
-------
The Mean Squared Error (in the sense of the Frobenius norm) between
`self` and `comp_cov` covariance estimators.

result : float
The Mean Squared Error (in the sense of the Frobenius norm) between
`self` and `comp_cov` covariance estimators.
"""
# compute the error
error = comp_cov - self.covariance_
Expand Down Expand Up @@ -296,9 +292,8 @@ def mahalanobis(self, X):

Returns
-------
dist : array, shape = [n_samples,]
dist : ndarray of shape (n_samples,)
Squared Mahalanobis distances of the observations.

"""
precision = self.get_precision()
# compute mahalanobis distances
Expand Down
Loading