Skip to content

DOC: update CrossEntropyLoss with note and example of incorrect target specification #155649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Prev Previous commit
Fix more backticks
  • Loading branch information
mikaylagawarecki authored and pytorchmergebot committed Aug 11, 2025
commit 8bf57b51a07532d0f264be05aa7c7d04db710aae
10 changes: 5 additions & 5 deletions torch/nn/modules/loss.py
Original file line number Diff line number Diff line change
Expand Up @@ -1319,14 +1319,14 @@ class probabilities only when a single class label per minibatch item is too res

.. note::
When ``target`` contains class probabilities, it should consist of soft labels—that is,
each `target` entry should represent a probability distribution over the possible classes for a given data sample,
with individual probabilities between `[0,1]` and the total distribution summing to 1.
each ``target`` entry should represent a probability distribution over the possible classes for a given data sample,
with individual probabilities between ``[0,1]`` and the total distribution summing to 1.
This is why the :func:`softmax()` function is applied to the ``target`` in the class probabilities example above.

PyTorch does not validate whether the values provided in `target` lie in the range `[0,1]`
or whether the distribution of each data sample sums to `1`.
PyTorch does not validate whether the values provided in ``target`` lie in the range ``[0,1]``
or whether the distribution of each data sample sums to ``1``.
No warning will be raised and it is the user's responsibility
to ensure that `target` contains valid probability distributions.
to ensure that ``target`` contains valid probability distributions.
Providing arbitrary values may yield misleading loss values and unstable gradients during training.

Examples:
Expand Down
Loading