Skip to content

Update upstream opinfo to generate appropriately scaled sample inputs #158018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

matthewhagraphcore
Copy link
Collaborator

Currently, opinfo generates random inputs for _scaled_mm but does not enforce type saturation, unlike the upstream test implementation which explicitly saturates both fp8 data types.

Problem:
The current random input generation in sample_inputs_scaled_mm may produce values outside the valid range for the input types, potentially missing edge cases that the CUDA tests intentionally cover.

Solution:
Modify the sample_inputs_scaled_mm implementation to:

  1. Apply the same input saturation logic used in CUDA _scaled_mm tests
    def test_scaled_mm_vs_emulated(self, base_dtype):
    torch.manual_seed(42)
    input_dtype = e4m3_type
    output_dtype = base_dtype
    compare_type = torch.float32
    x = torch.randn(16, 16, device="cuda", dtype=base_dtype)
    y = torch.randn(32, 16, device="cuda", dtype=base_dtype).t()
    x_scale = tensor_to_scale(x, input_dtype).float()
    y_scale = tensor_to_scale(y, input_dtype).float()
    x_fp8 = to_fp8_saturated(x * x_scale, input_dtype)
    y_fp8 = to_fp8_saturated(y * y_scale, input_dtype)
    # Calculate actual F8 mm
    out_scaled_mm = mm_float8(
    x_fp8,
    y_fp8,
    a_scale=x_scale,
    b_scale=y_scale,
    output_dtype=output_dtype
    )
    # Calculate emulated F8 mm
    out_emulated = mm_float8_emulated(
    x_fp8,
    x_scale,
    y_fp8,
    y_scale,
    output_dtype
    )
    if output_dtype != base_dtype:
    out_scaled_mm = out_scaled_mm.to(compare_type)
    out_scaled_mm = out_scaled_mm / tensor_to_scale(out_scaled_mm, input_dtype)
    out_emulated = out_emulated.to(compare_type)
    out_emulated = out_emulated / tensor_to_scale(out_emulated, input_dtype)
    if base_dtype in {torch.bfloat16, torch.float16}:
    atol, rtol = 7e-2, 7e-2
    else:
    atol, rtol = 3e-3, 3e-3
    torch.testing.assert_close(out_scaled_mm, out_emulated, atol=atol, rtol=rtol)
  2. Maintain the existing random generation approach but clamp values to type-appropriate ranges.

This required a bit of alteration to adapt to the late realization of the inputs. I think I have done this correctly, but open to suggestions.

@pytorch-bot pytorch-bot bot added the release notes: python_frontend python frontend release notes category label Jul 10, 2025
Copy link

pytorch-bot bot commented Jul 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158018

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit a2174ed with merge base 178515d (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mikaylagawarecki mikaylagawarecki requested a review from drisspg July 14, 2025 15:02
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 14, 2025
@drisspg
Copy link
Contributor

drisspg commented Jul 15, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased PYT-996-update-upstream-opinfo-to-generate-appropriately-scaled-sample-inputs onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout PYT-996-update-upstream-opinfo-to-generate-appropriately-scaled-sample-inputs && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the PYT-996-update-upstream-opinfo-to-generate-appropriately-scaled-sample-inputs branch from 5c2a209 to 5192752 Compare July 15, 2025 23:36
@matthewhagraphcore
Copy link
Collaborator Author

REGRESSION: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17947513212 is 1.80% higher than expected 17630000000 ±+1.50% if this is an expected regression, please update the expected results.

This seems unrelated? I've made no changes to inductor

@matthewhagraphcore
Copy link
Collaborator Author

@drisspg Could I get a re-review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open source release notes: python_frontend python frontend release notes category topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants