Add `label_smoothing` param in `nn.BCELoss` and `nn.BCEWithLogitsLoss` #150282

zeshengzong · 2025-03-31T06:41:33Z

Changes

Add label_smoothing param and docs
Add test case for label_smoothing
Remove duplicate description in nn.BCELoss and nn.BCEWithLogitsLoss

Test Result

pytest -s test/test_nn.py -k test_bce

pytorch-bot · 2025-03-31T06:41:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150282

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b4750fb with merge base 842cc77 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zeshengzong · 2025-04-14T11:13:54Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-04-14T11:15:25Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2025-04-14T11:15:30Z

Successfully rebased opt/nn/bce onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout opt/nn/bce && git pull --rebase)

zeshengzong · 2025-07-04T03:02:31Z

Hello @jbschlosser @mikaylagawarecki, please help review this PR, thanks!

zeshengzong · 2025-08-04T09:55:31Z

Hello @jbschlosser @mikaylagawarecki, please help review the change when available, thanks!

jbschlosser

Python side impl looks good to me, thanks for the contribution :)

I will leave it to torch.nn maintainer @mikaylagawarecki for the final stamp though!

mikaylagawarecki · 2025-08-07T01:24:19Z

Hi @zeshengzong I'll leave a proper review on this before the end of the week. Thank you for your patience and your multiple contributions!

mikaylagawarecki

just nit comments, thanks!

mikaylagawarecki · 2025-08-08T21:52:36Z

torch/nn/functional.py

 ) -> Tensor:
    r"""Compute Binary Cross Entropy between the target and input probabilities.

    See :class:`~torch.nn.BCELoss` for details.
-


Hm, why delete the args here and below? The convention in this file seems to be to document the args even though there's the see :class: ... on all the ops

Hi, previously I saw a comment said documents of torch.nn.functional methods intentionally left empty to avoid duplication with class document, like torch.nn.functional.adaptive_avg_pool1d, torch.nn.functional.adaptive_avg_pool2d, torch.nn.functional.adaptive_avg_pool3d, but there are some methods do have param documents like this one.

I think it would be better to consistent content of torch.nn.functional documents (either all have param doc, or all left empty and guide user to class doc), avoid people think it was a mistake for those don't have param docs. But I'm not sure which is the right way to fix them. WDYT? Thanks!

oh, hmm I'm not sure what you mean as it seems the Args are still documented for adaptive_avg_pool1d

pytorch/torch/nn/functional.py

Lines 1357 to 1358 in d596624

Args:

output_size: the target output size (single integer)

I think it would be good not to delete these. If there is an inconsistent convention we can resolve that in a separate PR

mikaylagawarecki · 2025-08-08T21:55:41Z

test/test_nn.py

+    def test_bce_label_smoothing_errors(self):
+        N, C = 3, 4
+        inputs = torch.randn((N, C))
+        target = torch.randn((N, C))
+        for loss_fn in (nn.BCELoss, nn.BCEWithLogitsLoss):
+            loss = loss_fn(label_smoothing=1.2)
+            with self.assertRaisesRegex(AssertionError,
+                                        r"label_smoothing must be between 0\.0"):
+                loss(inputs, target)
+
+    def test_bce_label_smoothing(self):
+        N, C = 3, 4
+        inputs = torch.rand((N, C))
+        target = torch.rand((N, C))
+        label_smoothings = [0.05, 0.15]
+
+        for loss_fn, label_smoothing in product([nn.BCELoss, nn.BCEWithLogitsLoss], label_smoothings):
+            loss = loss_fn(label_smoothing=label_smoothing)
+            output_with_smoothing = loss(inputs, target)
+            target_with_smoothing = target * (1 - label_smoothing) + (1 - target) * label_smoothing
+            loss = loss_fn()
+            output_with_manual_smoothing = loss(inputs, target_with_smoothing)
+            self.assertEqual(output_with_smoothing, output_with_manual_smoothing)
+


would be nice to update the ModuleInputs instead :) (test/test_nn.py is somewhat legacy)

pytorch/torch/testing/_internal/common_modules.py

Lines 1455 to 1542 in 2247aa6

def module_inputs_torch_nn_BCELoss(module_info, device, dtype, requires_grad, training, **kwargs):

make_input = partial(make_tensor, device=device, dtype=dtype, requires_grad=requires_grad)

make_target = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)

make_weight = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)

cases: list[tuple[str, dict]] = [

('', {}),

('reduction_sum', {'reduction': 'sum'}),

('reduction_mean', {'reduction': 'mean'}),

('reduction_none', {'reduction': 'none'}),

('weights', {'weight': make_weight((10,))}),

]

def bce_loss_reference_fn(m, p, i, t, reduction='mean', weight=None):

result = -(t * i.log() + (1 - t) * (1 - i).log())

if weight is not None:

result = result * weight

if reduction == 'none':

return result

elif reduction == 'mean':

return result.sum() / i.numel()

else:

return result.sum()

module_inputs = []

for desc, constructor_kwargs in cases:

module_inputs.append(

ModuleInput(constructor_input=FunctionInput(**constructor_kwargs),

forward_input=FunctionInput(make_input((15, 10), low=1e-2, high=1 - 1e-2),

make_target((15, 10)).gt(0).to(dtype)),

desc=desc,

reference_fn=partial(bce_loss_reference_fn, **constructor_kwargs))

)

scalar_weight = make_weight(())

module_inputs.append(

ModuleInput(constructor_input=FunctionInput(weight=scalar_weight),

forward_input=FunctionInput(make_input((), low=1e-2, high=1 - 1e-2),

make_target(()).gt(0).to(dtype)),

desc='scalar_weight',

reference_fn=partial(bce_loss_reference_fn, weight=scalar_weight))

)

return module_inputs

def module_inputs_torch_nn_BCEWithLogitsLoss(module_info, device, dtype, requires_grad, training, **kwargs):

make_input = partial(make_tensor, device=device, dtype=dtype, requires_grad=requires_grad)

make_target = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)

make_weight = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)

cases: list[tuple[str, dict]] = [

('', {}),

('reduction_sum', {'reduction': 'sum'}),

('reduction_mean', {'reduction': 'mean'}),

('reduction_none', {'reduction': 'none'}),

('weights', {'weight': make_weight((10,))}),

('scalar_weights', {'weight': make_weight(())})

]

def bce_withlogitsloss_reference_fn(m, p, i, t, reduction='mean', weight=None):

# TODO: add pos_weight to the definition here and corresponding SampleInputs

max_val = (-i).clamp(min=0)

result = (1 - t).mul_(i).add_(max_val).add_((-max_val).exp_().add_((-i - max_val).exp_()).log_())

if weight is not None:

result = result * weight

if reduction == 'none':

return result

elif reduction == 'mean':

return result.sum() / i.numel()

else:

return result.sum()

module_inputs = []

for desc, constructor_kwargs in cases:

module_inputs.append(

ModuleInput(constructor_input=FunctionInput(**constructor_kwargs),

forward_input=FunctionInput(make_input((15, 10), low=1e-2, high=1 - 1e-2),

make_target((15, 10)).gt(0).to(dtype)),

desc=desc,

reference_fn=partial(bce_withlogitsloss_reference_fn, **constructor_kwargs))

)

return module_inputs

Changed, thanks!

mikaylagawarecki

Looks good, please run lint again and remove the Args deletion for now :)

zeshengzong · 2025-08-12T06:18:38Z

Changed, thanks!

zeshengzong · 2025-08-12T06:19:27Z

@pytorchbot merge

pytorchmergebot · 2025-08-12T06:21:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the release notes: nn release notes category label Mar 31, 2025

pytorchbot added the open source label Mar 31, 2025

zeshengzong marked this pull request as ready for review March 31, 2025 08:40

zeshengzong requested review from albanD, jbschlosser and mikaylagawarecki as code owners March 31, 2025 08:40

cyyever approved these changes Mar 31, 2025

View reviewed changes

albanD removed their request for review April 7, 2025 19:50

pytorchmergebot force-pushed the opt/nn/bce branch from 4ffccbd to f088eaa Compare April 14, 2025 11:15

zeshengzong force-pushed the opt/nn/bce branch from f088eaa to 0062861 Compare April 15, 2025 01:27

zeshengzong force-pushed the opt/nn/bce branch from 0062861 to 653b42e Compare May 20, 2025 09:26

jbschlosser reviewed Aug 6, 2025

View reviewed changes

mikaylagawarecki reviewed Aug 8, 2025

View reviewed changes

zeshengzong added 4 commits August 11, 2025 02:10

Add label_smoothing param in nn.BCELoss and nn.BCEWithLogitsLoss

88a61a0

Update

8a08bba

Update

b1be28c

Update

d596624

zeshengzong force-pushed the opt/nn/bce branch from 653b42e to d596624 Compare August 11, 2025 07:13

mikaylagawarecki approved these changes Aug 12, 2025

View reviewed changes

Update

b4750fb

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 12, 2025

pytorchmergebot added the merging label Aug 12, 2025

pytorchmergebot added the Merged label Aug 12, 2025

pytorchmergebot closed this in f990490 Aug 12, 2025

pytorchmergebot removed the merging label Aug 12, 2025

	def module_inputs_torch_nn_BCELoss(module_info, device, dtype, requires_grad, training, **kwargs):
	make_input = partial(make_tensor, device=device, dtype=dtype, requires_grad=requires_grad)
	make_target = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)
	make_weight = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)

	cases: list[tuple[str, dict]] = [
	('', {}),
	('reduction_sum', {'reduction': 'sum'}),
	('reduction_mean', {'reduction': 'mean'}),
	('reduction_none', {'reduction': 'none'}),
	('weights', {'weight': make_weight((10,))}),
	]

	def bce_loss_reference_fn(m, p, i, t, reduction='mean', weight=None):
	result = -(t * i.log() + (1 - t) * (1 - i).log())

	if weight is not None:
	result = result * weight

	if reduction == 'none':
	return result
	elif reduction == 'mean':
	return result.sum() / i.numel()
	else:
	return result.sum()

	module_inputs = []
	for desc, constructor_kwargs in cases:
	module_inputs.append(
	ModuleInput(constructor_input=FunctionInput(**constructor_kwargs),
	forward_input=FunctionInput(make_input((15, 10), low=1e-2, high=1 - 1e-2),
	make_target((15, 10)).gt(0).to(dtype)),
	desc=desc,
	reference_fn=partial(bce_loss_reference_fn, **constructor_kwargs))
	)

	scalar_weight = make_weight(())
	module_inputs.append(
	ModuleInput(constructor_input=FunctionInput(weight=scalar_weight),
	forward_input=FunctionInput(make_input((), low=1e-2, high=1 - 1e-2),
	make_target(()).gt(0).to(dtype)),
	desc='scalar_weight',
	reference_fn=partial(bce_loss_reference_fn, weight=scalar_weight))
	)

	return module_inputs


	def module_inputs_torch_nn_BCEWithLogitsLoss(module_info, device, dtype, requires_grad, training, **kwargs):
	make_input = partial(make_tensor, device=device, dtype=dtype, requires_grad=requires_grad)
	make_target = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)
	make_weight = partial(make_tensor, device=device, dtype=dtype, requires_grad=False)

	cases: list[tuple[str, dict]] = [
	('', {}),
	('reduction_sum', {'reduction': 'sum'}),
	('reduction_mean', {'reduction': 'mean'}),
	('reduction_none', {'reduction': 'none'}),
	('weights', {'weight': make_weight((10,))}),
	('scalar_weights', {'weight': make_weight(())})
	]

	def bce_withlogitsloss_reference_fn(m, p, i, t, reduction='mean', weight=None):
	# TODO: add pos_weight to the definition here and corresponding SampleInputs
	max_val = (-i).clamp(min=0)
	result = (1 - t).mul_(i).add_(max_val).add_((-max_val).exp_().add_((-i - max_val).exp_()).log_())

	if weight is not None:
	result = result * weight

	if reduction == 'none':
	return result
	elif reduction == 'mean':
	return result.sum() / i.numel()
	else:
	return result.sum()

	module_inputs = []
	for desc, constructor_kwargs in cases:
	module_inputs.append(
	ModuleInput(constructor_input=FunctionInput(**constructor_kwargs),
	forward_input=FunctionInput(make_input((15, 10), low=1e-2, high=1 - 1e-2),
	make_target((15, 10)).gt(0).to(dtype)),
	desc=desc,
	reference_fn=partial(bce_withlogitsloss_reference_fn, **constructor_kwargs))
	)

	return module_inputs

Add label_smoothing param in nn.BCELoss and nn.BCEWithLogitsLoss #150282

Add label_smoothing param in nn.BCELoss and nn.BCEWithLogitsLoss #150282

Conversation

zeshengzong commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Test Result

Uh oh!

pytorch-bot bot commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150282

✅ No Failures

Uh oh!

zeshengzong commented Apr 14, 2025

Uh oh!

pytorchmergebot commented Apr 14, 2025

Uh oh!

pytorchmergebot commented Apr 14, 2025

Uh oh!

zeshengzong commented Jul 4, 2025

Uh oh!

zeshengzong commented Aug 4, 2025

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki commented Aug 7, 2025

Uh oh!

mikaylagawarecki left a comment

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zeshengzong Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zeshengzong Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki left a comment

Choose a reason for hiding this comment

Uh oh!

zeshengzong commented Aug 12, 2025

Uh oh!

zeshengzong commented Aug 12, 2025

Uh oh!

pytorchmergebot commented Aug 12, 2025

Merge started

Uh oh!

Uh oh!

Add `label_smoothing` param in `nn.BCELoss` and `nn.BCEWithLogitsLoss` #150282

Add `label_smoothing` param in `nn.BCELoss` and `nn.BCEWithLogitsLoss` #150282

zeshengzong commented Mar 31, 2025 •

edited

Loading

pytorch-bot bot commented Mar 31, 2025 •

edited

Loading

mikaylagawarecki Aug 8, 2025 •

edited

Loading

mikaylagawarecki Aug 8, 2025 •

edited

Loading