Skip to content

Use of class_weight dictionary in LogisticRegression modifies sample_weight object #18347

@KrisKusano

Description

@KrisKusano

Describe the bug

When a dictionary is supplied to keyword argument class_weight with unequal weights and sample_weight is specified, the sample weight object is modified.

Steps/Code to Reproduce

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import numpy as np

X, y = load_iris(return_X_y=True)
np.random.seed(1234)
W = np.random.random(len(X)) * 10.0

print('Sum of weight before: {}'.format(W.sum()))

# fit model
clf = LogisticRegression(random_state=0,
                         class_weight={0: 1.0, 1: 10.0, 2: 1.0},
                         max_iter=200)
clf.fit(X, y, sample_weight=W)

print('Sum of weight after: {}'.format(W.sum()))

Produces results:

Sum of weight before: 761.8436984163643
Sum of weight after: 3075.667075252882

Expected Results

Weight object (W) should be the unaltered after LogisticRegression.fit call.

The expected result can be achieved by deep copying the weight object:

# fit model
import copy
clf = LogisticRegression(random_state=0,
                         class_weight={0: 1.0, 1: 10.0, 2: 1.0},
                         max_iter=200)
clf.fit(X, y, sample_weight=copy.deepcopy(W))
Sum of weight before: 761.8436984163643
Sum of weight after: 761.8436984163643

Actual Results

Weight object (W) was modified (sum of weights before was 761.8, after 3075.7).

Versions

System:
    python: 3.8.3 (default, May 19 2020, 06:50:17) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\kusan\work\bayes_inj_risk\.venv\Scripts\python.exe
   machine: Windows-10-10.0.19041-SP0

Python dependencies:
          pip: 20.2.2
   setuptools: 49.6.0
      sklearn: 0.23.2
        numpy: 1.19.1
        scipy: 1.5.2
       Cython: None
       pandas: 1.1.1
   matplotlib: 3.3.1
       joblib: 0.16.0
threadpoolctl: 2.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions