scikit-learn · nawarhalabi · Oct 11, 2021 · Oct 11, 2021 · Oct 11, 2021 · Oct 11, 2021
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -960,6 +960,8 @@ details.
    metrics.hamming_loss
    metrics.hinge_loss
    metrics.jaccard_score
+   metrics.lift_curve
+   metrics.lift_score
    metrics.log_loss
    metrics.matthews_corrcoef
    metrics.multilabel_confusion_matrix
@@ -1123,6 +1125,7 @@ See the :ref:`visualizations` section of the user guide for further details.
 
    metrics.plot_confusion_matrix
    metrics.plot_det_curve
+   metrics.plot_lift_curve
    metrics.plot_precision_recall_curve
    metrics.plot_roc_curve
 
@@ -1132,6 +1135,7 @@ See the :ref:`visualizations` section of the user guide for further details.
 
    metrics.ConfusionMatrixDisplay
    metrics.DetCurveDisplay
+   metrics.LiftCurveDisplay
    metrics.PrecisionRecallDisplay
    metrics.RocCurveDisplay
    calibration.CalibrationDisplay

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
@@ -68,6 +68,7 @@ Scoring                                Function
 'f1_macro'                             :func:`metrics.f1_score`                           macro-averaged
 'f1_weighted'                          :func:`metrics.f1_score`                           weighted average
 'f1_samples'                           :func:`metrics.f1_score`                           by multilabel sample
+'lift'                                 :func:`metrics.lift_score`
 'neg_log_loss'                         :func:`metrics.log_loss`                           requires ``predict_proba`` support
 'precision' etc.                       :func:`metrics.precision_score`                    suffixes apply as with 'f1'
 'recall' etc.                          :func:`metrics.recall_score`                       suffixes apply as with 'f1'
@@ -308,6 +309,7 @@ Some of these are restricted to the binary classification case:
    precision_recall_curve
    roc_curve
    det_curve
+   lift_curve
 
 
 Others also work in the multiclass case:
@@ -334,6 +336,7 @@ Some also work in the multilabel case:
    hamming_loss
    jaccard_score
    log_loss
+   lift_score
    multilabel_confusion_matrix
    precision_recall_fscore_support
    precision_score
@@ -740,6 +743,60 @@ In the multilabel case with binary label indicators::
     or superset of the true labels will give a Hamming loss between
     zero and one, exclusive.
 
+.. _lift_score:
+
+Lift
+----
+
+Lift [WikipediaLift2021]_ can be understood in different ways. One way is as
+the ratio of the positive responses of a targeted treatment of a subset of the
+dataset relative to the ratio of positive responses in the dataset as a whole.
+
+Lift can also be understood as a kind of normalised precision of the positive
+class.
+
+.. math::
+
+   Lift = \frac{n \times tp}{(tp + fp) \times (tp + fn)},
+
+.. math::
+
+   Lift = \frac{Precision}{pr}
+
+where :math:`tp`, :math:`fp`, :math:`fn`, :math:`n` and :math:`pr` are the
+true positive count, false positive count, false negative count, dataset size
+and positive classification rate respectively.
+
+:func:`lift_score` in scikit-learn is an implimentation of lift.
+
+Here is an example showing how to calculate::
+
+  >>> from sklearn.metrics import lift_score
+  >>> y_pred = [1, 1, 1, 1, 1, 2, 2, 2]
+  >>> y_true = [1, 1, 1, 2, 2, 1, 2, 2]
+  >>> lift_score(y_true, y_pred)
+  1.2
+
+Related to the :func:`lift_score` is the :func:`lift_curve`. The lift curve
+shows the lift on the y-axis relative to the percentage of the population's
+positive classification rate (percentage of population classified as the
+positive class) on the x-axis.
+
+Intuitively, the lift curve shows what is the precision/effectivness of a
+treatment on a subset of the population as we increase the size of the subset,
+all relative to the effectiveness of a random treatment. it is related closely
+to the :func:`precision_recall_curve`.
+
+:class:`LiftCurveDisplay` can be used to visually represent a lift curve. See
+:class:`LiftCurveDisplay` and :func:`lift_curve` for examples and instructions.
+
+.. topic:: References:
+
+  .. [WikipediaLift2021] Wikipedia contributers. Lift (data mining). Wikipedia
+     October 13, 2021, 21:00 UTC. Available at:
+     https://en.wikipedia.org/wiki/Lift_(data_mining).
+     Accessed October 13, 2021.
+
 .. _precision_recall_f_measure_metrics:
 
 Precision, recall and F-measures

diff --git a/doc/visualizations.rst b/doc/visualizations.rst
@@ -87,6 +87,7 @@ Functions
    inspection.plot_partial_dependence
    metrics.plot_confusion_matrix
    metrics.plot_det_curve
+   metrics.plot_lift_curve
    metrics.plot_precision_recall_curve
    metrics.plot_roc_curve
 
@@ -102,5 +103,6 @@ Display Objects
    inspection.PartialDependenceDisplay
    metrics.ConfusionMatrixDisplay
    metrics.DetCurveDisplay
+   metrics.LiftCurveDisplay
    metrics.PrecisionRecallDisplay
    metrics.RocCurveDisplay
diff --git a/doc/whats_new/v1.1.rst b/doc/whats_new/v1.1.rst
@@ -175,6 +175,14 @@ Changelog
   backward compatibility, but this alias will be removed in 1.3.
   :pr:`21177` by :user:`Julien Jerphanion <jjerphan>`.
 
+- |Feature| :func:`metrics.lift_score` to calculate lift score. In :pr:`21320` by :user:`Nawar Halabi <nawarhalabi>`.
+
+- |Feature| :func:`metrics.lift_curve` to calculate lift curve. basicaly lift values for
+  different positive classification rates (percentage of data points classified
+  positive). In :pr:`21320` by :user:`Nawar Halabi <nawarhalabi>`.
+
+- |Feature| :class:`metrics.LiftCurveDisplay` to plot the :func:`metrics.lift_curve`.
+  In :pr:`21320` by :user:`Nawar Halabi <nawarhalabi>`.
 - |API| Parameters ``sample_weight`` and ``multioutput`` of :func:`metrics.
   mean_absolute_percentage_error` are now keyword-only, in accordance with `SLEP009
   <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>`.

diff --git a/sklearn/metrics/__init__.py b/sklearn/metrics/__init__.py
@@ -15,6 +15,7 @@
 from ._ranking import precision_recall_curve
 from ._ranking import roc_auc_score
 from ._ranking import roc_curve
+from ._ranking import lift_curve
 from ._ranking import top_k_accuracy_score
 
 from ._classification import accuracy_score
@@ -28,6 +29,7 @@
 from ._classification import hinge_loss
 from ._classification import jaccard_score
 from ._classification import log_loss
+from ._classification import lift_score
 from ._classification import matthews_corrcoef
 from ._classification import precision_recall_fscore_support
 from ._classification import precision_score
@@ -86,6 +88,8 @@
 
 from ._plot.det_curve import plot_det_curve
 from ._plot.det_curve import DetCurveDisplay
+from ._plot.lift_curve import plot_lift_curve
+from ._plot.lift_curve import LiftCurveDisplay
 from ._plot.roc_curve import plot_roc_curve
 from ._plot.roc_curve import RocCurveDisplay
 from ._plot.precision_recall_curve import plot_precision_recall_curve
@@ -131,6 +135,9 @@
     "jaccard_score",
     "label_ranking_average_precision_score",
     "label_ranking_loss",
+    "LiftCurveDisplay",
+    "lift_curve",
+    "lift_score",
     "log_loss",
     "make_scorer",
     "nan_euclidean_distances",
@@ -157,6 +164,7 @@
     "pairwise_kernels",
     "plot_confusion_matrix",
     "plot_det_curve",
+    "plot_lift_curve",
     "plot_precision_recall_curve",
     "plot_roc_curve",
     "PrecisionRecallDisplay",

diff --git a/sklearn/metrics/_classification.py b/sklearn/metrics/_classification.py
@@ -1767,6 +1767,126 @@ def precision_score(
     return p
 
 
+def lift_score(
+    y_true,
+    y_pred,
+    *,
+    pos_label=1,
+    sample_weight=None,
+    zero_division="warn",
+):
+    """Compute the lift.
+
+    The lift is the ratio ``(n * tp) / ((tp + fp) * (tp + fn))`` where ``tp``
+    is the number of true positives, ``fp`` the number of false positives and
+    ``fn`` the number of false negatives. The lift is intuitively the relative
+    positive class precision imporvement over selecting a random subset and
+    labeling it positive.
+
+    Another way to think of lift is the ratio ``precision / pr`` where ``pr``
+    is the positive rate in the true set.
+
+    The worst value is 0 but lift does not have an upper bound.
+
+    Read more in the :ref:`User Guide <lift_score>`.
+
+    Parameters
+    ----------
+    y_true : 1d array-like, or label indicator array / sparse matrix
+        Ground truth (correct) target values.
+
+    y_pred : 1d array-like, or label indicator array / sparse matrix
+        Estimated targets as returned by a classifier.
+
+    pos_label : str or int, default=1
+        The class to report if ``average='binary'`` and the data is binary.
+        If the data are multiclass or multilabel, this will be ignored;
+        setting ``labels=[pos_label]`` and ``average != 'binary'`` will report
+        scores for that label only.
+
+    sample_weight : array-like of shape (n_samples,), default=None
+        Sample weights.
+
+    zero_division : "warn", 0 or 1, default="warn"
+        Sets the value to return when there is a zero division. If set to
+        "warn", this acts as 0, but warnings are also raised.
+
+    Returns
+    -------
+    lift : float (if average is not None) or array of float of shape \
+                (n_unique_labels,)
+        Lift of the positive class in binary classification or weighted
+        average of the lift of each class for the multiclass task.
+
+    See Also
+    --------
+    lift_curve : Calculate the lift for different positive rates.
+
+    precision_recall_curve: Calculate precision and recall for different
+        classification thresholds.
+
+    Notes
+    -----
+    When ``true positive + false positive == 0``, lift returns 0 and
+    raises ``UndefinedMetricWarning``. This behavior can be
+    modified with ``zero_division``. which is passed to the precision
+    function.
+    Lift is only possible on binary data.
+
+    References
+    ----------
+    .. [1] `Wikipedia entry for lift
+           <https://en.wikipedia.org/wiki/Lift_(data_mining)>`_.
+    .. [2] `Example of lift in parctice
+           <https://www.kdnuggets.com/2016/03/lift-analysis-data-scientist-secret-weapon.html>`_.
+    .. [3] `Life curve in machine learning
+           <https://howtolearnmachinelearning.com/articles/the-lift-curve-in-machine-learning/>`_.
+
+    Examples
+    --------
+    >>> from sklearn.metrics import lift_score
+    >>> y_true = [0, 1, 1, 0, 0, 1, 1]
+    >>> y_pred = [0, 1, 1, 0, 1, 0, 1]
+    >>> lift_score(y_true, y_pred)
+    1.3125
+    """
+    # Precision
+    p, _, _, _ = precision_recall_fscore_support(
+        y_true,
+        y_pred,
+        average="binary",
+        pos_label=pos_label,
+        warn_for=("precision",),
+        sample_weight=sample_weight,
+        zero_division=zero_division,
+    )
+
+    # True labels
+    y_true = column_or_1d(y_true)
+    y_true = y_true == pos_label
+
+    # Sample weights
+    if sample_weight is None:
+        sample_weight = np.ones(y_true.shape[0], dtype=np.int64)
+    else:
+        sample_weight = column_or_1d(sample_weight)
+        check_consistent_length(y_true, sample_weight)
+
+    # Positive rate and lift
+    pr = (y_true * sample_weight).sum() / sample_weight.sum()
+
+    # Lift
+    if pr == 0 or np.isnan(pr):
+        zero_division_value = np.float64(1.0)
+        if zero_division in ["warn", 0]:
+            zero_division_value = np.float64(0.0)
+        lift = zero_division_value
+    else:
+        lift = p / pr
+
+    return lift
+
+
 def recall_score(
     y_true,
     y_pred,