Skip to content

Implementing Temperature Scaling #29517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 72 commits into from
Closed

Conversation

virchan
Copy link
Member

@virchan virchan commented Jul 18, 2024

Reference Issues/PRs

Towards #28574

What does this implement/fix? Explain your changes.

Temperature scaling is a probability calibration method for multi-class classification. It is given by the formula:

$$[ \ell_1, \cdots , \ell_n] \mapsto \mathrm{softmax} \left( \left[ \frac{\ell_1}{T}, \cdots, \frac{\ell_n}{T} \right] \right)$$

where $\ell_i$ s' are logits from a multi-class classifier, and $T$ is the temperature parameter trained to calibrate the softmax function.

This PR implements the temperature scaling method under the CalibratedClassifierCV class by adding a new option method='temperature' when instantiating the CalibratedClassifierCV object:

calibrated_classifier = CalibratedClassifierCV(base_classifier, 
                                               cv=3, 
                                               method='temperature'
                                              )

When the .fit() method is called, a temperature parameter is trained. The .predict_proba() method then computes multi-class probabilities using the above formula. The temperature parameter is initially set to 1.0 ([Guo et al. 2017, Section 5]), and the optimised temperature takes a value in the interval [1e-2, inf) for numerical stability.

This PR includes two files:

  1. sklearn/calibration_temperature.py, intended to replace sklearn/calibration.py in the final merge.
  2. sklearn/_temperature_scaling_test.py, providing demonstrations of this new feature, which will not be included in the final merge.

References

  1. C. Guo, G. Pleiss, Y. Sun & K. Q. Weinberger. On Calibration of Modern Neural Networks, ICML 2017

`ensemble/_weight_boosting.py` file, moving it below the `Examples`
section for improved organization.

Included an AdaBoost example reference within the DecisionTree class in
the `tree/-class.py` file.
…sion Trees user guide.

- Modified the doc-string wording in the `AdaBoostClassifier` class referencing to the aforementioned example.
virchan added 21 commits June 12, 2024 16:52
- Modified the `negative_log_likelihood` function to allow labels to be one-hot.
- Modified the `negative_log_likelihood` function to allow labels to be one-hot.
- Added the `_temperature_scaling_test.py` file.
…e outputs from `decision_function` function. Also added the `_additive_smoothing` function to avoid numerical instability when applying logarithm.
…argument.

2. Modified the `_temperature_scaling` function. The initial temperature is now 1.0, and the optimised temperature is in interval [1e-2, inf).
3. Revise doc-strings.
Copy link

github-actions bot commented Jul 18, 2024

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


black

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/sklearn/_temperature_scaling_test.py	2024-07-29 16:37:33.325250+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/_temperature_scaling_test.py	2024-07-29 16:37:42.675309+00:00
@@ -1,6 +1,6 @@
-'''
+"""
 This file is created to test if the custom 'TemperatureScaling' class runs properly,
 and serves as proof of work for the changes made to the scikit-learn repository.
 
 The file also includes examples related to developing a temperature scaling method
 for probability calibration in multi-class classification.
@@ -11,11 +11,11 @@
     .. [1]  https://github.com/scikit-learn/scikit-learn/issues/28574. Original issue
             on Github.
 
     .. [2]  On Calibration of Modern Neural Networks,
             C. Guo, G. Pleiss, Y. Sun & K. Q. Weinberger, ICML 2017
-'''
+"""
 
 from sklearn.calibration_temperature import CalibratedClassifierCV_test
 from sklearn import datasets
 from sklearn.model_selection import train_test_split
 from sklearn.svm import SVC
@@ -34,38 +34,35 @@
 SV_classifier: SVC = SVC(probability=False)
 Logistic_classifier: LogisticRegression = LogisticRegression()
 Tree_classifier: DecisionTreeClassifier = DecisionTreeClassifier()
 
 # Initiate the temperature scaling calibrators for the classifiers
-SVC_scaled: CalibratedClassifierCV_test = CalibratedClassifierCV_test(SV_classifier,
-                                                                      cv=3,
-                                                                      method='temperature'
-                                                                      )
-Logistic_scaled: CalibratedClassifierCV_test = CalibratedClassifierCV_test(Logistic_classifier,
-                                                                           cv=7,
-                                                                           method='temperature'
-                                                                           )
-Tree_scaled: CalibratedClassifierCV_test = CalibratedClassifierCV_test(Tree_classifier,
-                                                                       cv=3,
-                                                                       method='temperature'
-                                                                       )
+SVC_scaled: CalibratedClassifierCV_test = CalibratedClassifierCV_test(
+    SV_classifier, cv=3, method="temperature"
+)
+Logistic_scaled: CalibratedClassifierCV_test = CalibratedClassifierCV_test(
+    Logistic_classifier, cv=7, method="temperature"
+)
+Tree_scaled: CalibratedClassifierCV_test = CalibratedClassifierCV_test(
+    Tree_classifier, cv=3, method="temperature"
+)
 
 # Calibrate the classifiers with temperature scaling
 # The calibrators are trained with the output of
 # `decision_function` for the support vector classifier
 # and logistic regression, while they are trained with
 # `predict_proba` for the decision tree classifier.
-SVC_scaled.fit(X_train,y_train)
-Logistic_scaled.fit(X_train,y_train)
-Tree_scaled.fit(X_train,y_train)
+SVC_scaled.fit(X_train, y_train)
+Logistic_scaled.fit(X_train, y_train)
+Tree_scaled.fit(X_train, y_train)
 
 print("Optimal Temperatures For Each Classifiers")
 print(f"{SVC_scaled.calibrated_classifiers_[0].calibrators[0].T_=}")
 print(f"{Logistic_scaled.calibrated_classifiers_[0].calibrators[0].T_=}")
 print(f"{Tree_scaled.calibrated_classifiers_[0].calibrators[0].T_=}")
 
-print('\n')
+print("\n")
 print("Printing calibrated probabilities...")
 print(f"{SVC_scaled.predict_proba(X_calib)=}")
 print(f"{Logistic_scaled.predict_proba(X_calib)=}")
 print(f"{Tree_scaled.predict_proba(X_calib)=}")
 print(f"{y_calib=}")
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/_temperature_scaling_test.py
--- /home/runner/work/scikit-learn/scikit-learn/sklearn/calibration_temperature.py	2024-07-29 16:37:33.325250+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/calibration_temperature.py	2024-07-29 16:37:43.327826+00:00
@@ -659,20 +659,20 @@
     Y = label_binarize(y, classes=classes)
     label_encoder = LabelEncoder().fit(classes)
     pos_class_indices = label_encoder.transform(clf.classes_)
     calibrators = []
 
-    if (method == 'isotonic') or (method == 'sigmoid'):
+    if (method == "isotonic") or (method == "sigmoid"):
         for class_idx, this_pred in zip(pos_class_indices, predictions.T):
             if method == "isotonic":
                 calibrator = IsotonicRegression(out_of_bounds="clip")
             else:  # "sigmoid"
                 calibrator = _SigmoidCalibration()
             calibrator.fit(this_pred, Y[:, class_idx], sample_weight)
             calibrators.append(calibrator)
 
-    elif method == 'temperature':
+    elif method == "temperature":
         calibrator = _TemperatureScaling()
         calibrator.fit(predictions, Y, sample_weight)
         calibrators.append(calibrator)
 
     pipeline = _CalibratedClassifier(clf, calibrators, method=method, classes=classes)
@@ -739,11 +739,11 @@
         pos_class_indices = label_encoder.transform(self.estimator.classes_)
 
         proba = np.zeros((_num_samples(X), n_classes))
 
         # Sigmoid and Isotonic methods
-        if (self.method == 'sigmoid') or (self.method == 'isotonic'):
+        if (self.method == "sigmoid") or (self.method == "isotonic"):
 
             for class_idx, this_pred, calibrator in zip(
                 pos_class_indices, predictions.T, self.calibrators
             ):
                 if n_classes == 2:
@@ -764,13 +764,15 @@
                 proba = np.divide(
                     proba, denominator, out=uniform_proba, where=denominator != 0
                 )
 
         # Temperature Scaling method
-        elif self.method == 'temperature':
-
-            assert len(self.calibrators) == 1, 'Temperature scaling should consists of one calibrator.'
+        elif self.method == "temperature":
+
+            assert (
+                len(self.calibrators) == 1
+            ), "Temperature scaling should consists of one calibrator."
 
             proba = self.calibrators[0].predict(predictions)
 
         # Deal with cases where the predicted probability minimally exceeds 1.0
         proba[(1.0 < proba) & (proba <= 1.0 + 1e-5)] = 1.0
@@ -950,14 +952,11 @@
     np.ndarray
         An array of the same shape as `data` where each row has been normalized
         by subtracting the maximum value of that row.
     """
 
-    row_max: np.ndarray = np.max(data,
-                                 axis=1,
-                                 keepdims=True
-                                 )
+    row_max: np.ndarray = np.max(data, axis=1, keepdims=True)
 
     return data - row_max
 
 
 def _additive_smoothing(probabilities: np.ndarray) -> np.ndarray:
@@ -983,20 +982,23 @@
         The smoothed probability array, with values adjusted to avoid 0 and 1.
     """
 
     n_classes: int = probabilities.shape[1]
 
-    smooth_probabilities: np.ndarray = (probabilities * (n_classes - 1) + 0.5) / n_classes
+    smooth_probabilities: np.ndarray = (
+        probabilities * (n_classes - 1) + 0.5
+    ) / n_classes
 
     smooth_probabilities = smooth_probabilities.astype(dtype=probabilities.dtype)
 
     return smooth_probabilities
 
 
-def _softmax_t(X: np.ndarray,
-               temperature: float,
-               ) -> np.ndarray:
+def _softmax_t(
+    X: np.ndarray,
+    temperature: float,
+) -> np.ndarray:
     """Compute the temperature-scaled softmax of the input array.
 
     Parameters
     ----------
     X : np.ndarray
@@ -1019,13 +1021,11 @@
     softmax_t_output = softmax_t_output.astype(dtype=X.dtype)
 
     return softmax_t_output
 
 
-def _exp_t(X: np.ndarray,
-           temperature: float
-           ) -> np.ndarray:
+def _exp_t(X: np.ndarray, temperature: float) -> np.ndarray:
     """Scale predictions by the inverse temperature and apply the exponential function.
 
     Parameters
     ----------
     X : np.ndarray
@@ -1048,15 +1048,16 @@
     exp_t_output = exp_t_output.astype(dtype=X.dtype)
 
     return exp_t_output
 
 
-def _temperature_scaling(predictions: np.ndarray,
-                         labels: np.ndarray,
-                         sample_weight=None,
-                         initial_temperature: float = 1.0
-                         ) -> float:
+def _temperature_scaling(
+    predictions: np.ndarray,
+    labels: np.ndarray,
+    sample_weight=None,
+    initial_temperature: float = 1.0,
+) -> float:
     """Probability Calibration with temperature scaling (Guo-Pleiss-Sun-Weinberger 2017).
 
     Parameters
     ----------
     predictions : ndarray of shape (n_samples,)
@@ -1081,11 +1082,11 @@
     ----------
     Guo, Pleiss, Sun & Weinberger, "On Calibration of Modern Neural Networks"
     """
 
     def negative_log_likelihood(temperature: float):
-        """ Compute the negative log likelihood loss and its derivative
+        """Compute the negative log likelihood loss and its derivative
             with respect  to temperature.
 
         Parameters
         ----------
         temperature : float
@@ -1118,17 +1119,17 @@
         exp_t: np.ndarray = _exp_t(predictions, temperature)
         exp_t_sum = exp_t.sum(axis=1)
 
         term_1: np.ndarray = predictions
         term_1 = _row_max_normalization(predictions)
-        term_1 /= temperature ** 2
-        term_1 = - term_1[np.arange(term_1.shape[0]), class_indices]
+        term_1 /= temperature**2
+        term_1 = -term_1[np.arange(term_1.shape[0]), class_indices]
         term_1 *= exp_t_sum
 
         term_2: np.ndarray = predictions
         term_2 = _row_max_normalization(term_2)
-        term_2 /= temperature ** 2
+        term_2 /= temperature**2
         term_2 *= exp_t
         term_2 = term_2.sum(axis=1)
 
         dlosses_dts: np.ndarray = (term_1 + term_2) / exp_t_sum
 
@@ -1136,19 +1137,21 @@
         if sample_weight is not None:
             dlosses_dts *= sample_weight
 
         return -losses.sum(), -dlosses_dts.sum()
 
-    temperature_minimizer: minimize = minimize(negative_log_likelihood,
-                                               np.array([initial_temperature]),
-                                               method="L-BFGS-B",
-                                               bounds=[(1e-2, None)],
-                                               jac=True,
-                                               options={"gtol": 1e-6,
-                                                        "ftol": 64 * np.finfo(float).eps,
-                                                        }
-                                               )
+    temperature_minimizer: minimize = minimize(
+        negative_log_likelihood,
+        np.array([initial_temperature]),
+        method="L-BFGS-B",
+        bounds=[(1e-2, None)],
+        jac=True,
+        options={
+            "gtol": 1e-6,
+            "ftol": 64 * np.finfo(float).eps,
+        },
+    )
 
     return temperature_minimizer.x[0]
 
 
 def _is_predict_proba(X: np.ndarray) -> bool:
@@ -1185,16 +1188,11 @@
     T_ : float
         The optimised temperature for probability calibration.
         Available after the calibrator is fitted.
     """
 
-
-    def fit(self,
-            X,
-            y,
-            sample_weight=None
-            ):
+    def fit(self, X, y, sample_weight=None):
         """Fit the model using X, y as training data.
 
         Parameters
         ----------
         X : np.ndarray
@@ -1215,11 +1213,13 @@
         """
 
         # If X are outputs of `decision_function`
         # i.e., logits (e.g., SVC(probability=False) )
         if _is_predict_proba(X):
-            self.T_ = _temperature_scaling(np.log(_additive_smoothing(X)), y, sample_weight)
+            self.T_ = _temperature_scaling(
+                np.log(_additive_smoothing(X)), y, sample_weight
+            )
 
         # If X are outputs of `predict_proba`
         else:
             self.T_ = _temperature_scaling(X, y, sample_weight)
 
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/calibration_temperature.py

Oh no! 💥 💔 💥
2 files would be reformatted, 923 files would be left unchanged.

ruff

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.


sklearn/_temperature_scaling_test.py:1:1: CPY001 Missing copyright notice at top of file
  |
1 | '''
  |  CPY001
2 | This file is created to test if the custom 'TemperatureScaling' class runs properly,
3 | and serves as proof of work for the changes made to the scikit-learn repository.
  |

sklearn/_temperature_scaling_test.py:18:1: I001 [*] Import block is un-sorted or un-formatted
   |
16 |   '''
17 |   
18 | / from sklearn.calibration_temperature import CalibratedClassifierCV_test
19 | | from sklearn import datasets
20 | | from sklearn.model_selection import train_test_split
21 | | from sklearn.svm import SVC
22 | | from sklearn.linear_model import LogisticRegression
23 | | from sklearn.tree import DecisionTreeClassifier
24 | | 
25 | | # We demonstrate with the Iris dataset, because
   | |_^ I001
26 |   # it is small, multi-class, and self-provided.
27 |   X, y = datasets.load_iris(return_X_y=True)
   |
   = help: Organize imports

sklearn/_temperature_scaling_test.py:43:89: E501 Line too long (95 > 88)
   |
41 |                                                                       method='temperature'
42 |                                                                       )
43 | Logistic_scaled: CalibratedClassifierCV_test = CalibratedClassifierCV_test(Logistic_classifier,
   |                                                                                         ^^^^^^^ E501
44 |                                                                            cv=7,
45 |                                                                            method='temperature'
   |

sklearn/calibration_temperature.py:1:1: CPY001 Missing copyright notice at top of file
  |
1 | """Calibration of predicted probabilities."""
  |  CPY001
2 | 
3 | # Author: Alexandre Gramfort <alexandre.gramfort@telecom-paristech.fr>
  |

sklearn/calibration_temperature.py:67:89: E501 Line too long (115 > 88)
   |
66 | class CalibratedClassifierCV_test(ClassifierMixin, MetaEstimatorMixin, BaseEstimator):
67 |     """Probability calibration with isotonic regression, logistic regression, or temperature scaling (in-progress).
   |                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^ E501
68 | 
69 |     This class uses cross-validation to both estimate the parameters of a
   |

sklearn/calibration_temperature.py:771:89: E501 Line too long (103 > 88)
    |
769 |         elif self.method == 'temperature':
770 | 
771 |             assert len(self.calibrators) == 1, 'Temperature scaling should consists of one calibrator.'
    |                                                                                         ^^^^^^^^^^^^^^^ E501
772 | 
773 |             proba = self.calibrators[0].predict(predictions)
    |

sklearn/calibration_temperature.py:988:89: E501 Line too long (90 > 88)
    |
986 |     n_classes: int = probabilities.shape[1]
987 | 
988 |     smooth_probabilities: np.ndarray = (probabilities * (n_classes - 1) + 0.5) / n_classes
    |                                                                                         ^^ E501
989 | 
990 |     smooth_probabilities = smooth_probabilities.astype(dtype=probabilities.dtype)
    |

sklearn/calibration_temperature.py:1058:89: E501 Line too long (89 > 88)
     |
1056 |                          initial_temperature: float = 1.0
1057 |                          ) -> float:
1058 |     """Probability Calibration with temperature scaling (Guo-Pleiss-Sun-Weinberger 2017).
     |                                                                                         ^ E501
1059 | 
1060 |     Parameters
     |

sklearn/calibration_temperature.py:1147:89: E501 Line too long (89 > 88)
     |
1145 |                                                jac=True,
1146 |                                                options={"gtol": 1e-6,
1147 |                                                         "ftol": 64 * np.finfo(float).eps,
     |                                                                                         ^ E501
1148 |                                                         }
1149 |                                                )
     |

sklearn/calibration_temperature.py:1220:89: E501 Line too long (92 > 88)
     |
1218 |         # i.e., logits (e.g., SVC(probability=False) )
1219 |         if _is_predict_proba(X):
1220 |             self.T_ = _temperature_scaling(np.log(_additive_smoothing(X)), y, sample_weight)
     |                                                                                         ^^^^ E501
1221 | 
1222 |         # If X are outputs of `predict_proba`
     |

Found 10 errors.
[*] 1 fixable with the `--fix` option.

Generated for commit: 3ed6eed. Link to the linter CI: here

@adrinjalali
Copy link
Member

Thanks for the PR @virchan. I see you're new to this repo. This PR as is right now, is quite far from something we'd merge. I suggest you continue with a few easier issues in the meantime, get more familiar with the codebase, and then try tackling more challenging issues. So I'm closing this PR for now.

@virchan
Copy link
Member Author

virchan commented Jul 31, 2024

Thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants