JSON reporting #131

asmeurer · 2022-06-24T22:22:32Z

JSON reporting will be done with the pytest-json-report package. This produces a pretty usable JSON and makes it easy to extend it with custom metadata at test run time.

The idea will be to take the JSON generated by pytest-json-report and translate it into something more usable by a reporting tool. To start with, I will focus on test_has_names, which has been reintroduced here, to list which spec names are or are not defined in the given package.

This just tests which names exist without checking anything else. It also doesn't use hypothesis, so it should work even if the library doesn't have proper hypothesis support (assuming the test suite can even run at all in that case, I'm not completely sure).

That way it can be run even on libraries that fail the hypothesis sanity checks.

honno · 2022-06-27T10:14:13Z

array_api_tests/stubs.py

@@ -34,6 +34,10 @@
    f for n, f in inspect.getmembers(array, predicate=inspect.isfunction)
    if n != "__init__"  # probably exists for Sphinx
 ]
+array_attributes = [


How about array_properties?

And in any case, if you could add this to __all__. Maybe this is just a weird side product of how I learnt Python, but I like using it to denote the internally-public API of a module.

honno

pytest-json-report looks nifty!

array_api_tests/test_signatures.py

rgommers · 2022-07-07T14:01:34Z

@asmeurer nice! For review/discussion it would be useful if you added an example of what the JSON output looks like (maybe in a gist?) if you have one.

…he report

…lizable Otherwise it just shows up as a warning in the test results, which might easily be missed.

asmeurer · 2022-08-02T22:51:03Z

This is ready to be merged. I've extracted a lot of custom metadata from the tests in to the report. For hypothesis errors, for now it only extracts the error message (string). I could look into trying to extract the actual data itself and JSON serializing it, but I think that will be harder, so for now I'm going to wait until it looks like we actually need that.

I have uploaded an example report for numpy.array_api here https://gist.github.com/asmeurer/70d9b05e91a67d6b035b9b5305e427f0. Note that the numpy.array_api report is 11 MB uncompressed. I also ran the tests with plain numpy, and the report size is 595 MB (I did not upload it). In general, many failures will balloon the file size because the full traceback is stored for each failure. However, they compress quite well (I'm not sure what the best compression to use for the web is, but a basic zip compression takes the numpy.array_api report down to 375 KB and the plain numpy report down to 3.4 MB).

If there is any other metadata you can think of that should be included, let me know.

…t is used

It was actually not enabling it at all ever.

The inputs could be things that aren't generically recognizable, e.g., for plain numpy, they could be numpy ufuncs.

asmeurer · 2022-08-02T23:54:24Z

Something to keep in mind when doing any further developments on the test suite is to make sure that any inputs to test functions are JSON-serializable. If they aren't, the to_json_serializable function in reporting.py should be updated. I've made it so that dataclasses and namedtuples are serialized as objects (dicts), and functions are serialized to a string of their name, but anything that is unrecognized will just be serialized as its repr.

rgommers · 2022-08-03T11:58:18Z

In general, many failures will balloon the file size because the full traceback is stored for each failure.

Maybe using --tb=native is helpful? Those long tracebacks are usually unhelpful, and pytest has a terrible default there.

This reverts commit f356ffd. It doesn't actually save that much space, so let's just use the default since it has more information and it keeps the code simpler.

asmeurer · 2022-08-03T23:14:02Z

OK, it turns out I was actually wrong here. The tracebacks do contribute a some amount of data in general, but it's not actually the reason the plain numpy report is over half a gigabyte. That's apparently because of the warnings. There are over a million warnings emitted in the run, and they account for 98% of the JSON file size. Of these 98% are the warning "np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here." I guess it is getting emitted for every single hypothesis run.

Here's the full warning metadata that's repeated so many times:

{
    "message": "`np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.\nDeprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations",
    "category": "DeprecationWarning",
    "when": "runtest",
    "filename": "/Users/aaronmeurer/anaconda3/envs/array-apis/lib/python3.9/site-packages/hypothesis/extra/array_api.py",
    "lineno": 119
}

@honno, any idea why this warning would be omitted every single time hypothesis calls np.bool, instead of just once?

I did try switching from the default traceback type to native, but the savings are actually pretty small, so I'm going to revert it because the default type has more information, and because it keeps the code simpler.

honno · 2022-08-04T08:06:45Z

@honno, any idea why this warning would be omitted every single time hypothesis calls np.bool, instead of just once?

(I assume you're testing the top-level numpy namespace)

If you meant emitted, I'd guess it's to do with the pytest reporting steps. Hypothesis/pytest might "merge" warnings together as it does errors in the reporting stage, so pytest would only show one warning, but however we capturing warnings here ignores that. Generally it makes sense that warning is emitted many times, as np.bool is accessed many times due to the vast number of Hypothesis examples generated in a run of the whole test suite.

honno · 2022-08-04T14:06:04Z

Note flaky CI failure is due to #125

honno

Like prefering array_attributes -> array_properties, personally I'd prefer test_has_names.py:: test_has_names -> test_has_attributes.py:: test_has_attributes.

honno · 2022-08-04T14:11:54Z

conftest.py

@@ -7,8 +7,9 @@
 from array_api_tests import _array_module as xp
 from array_api_tests._array_module import _UndefinedStub

-settings.register_profile("xp_default", deadline=800)
+from reporting import pytest_metadata, add_extra_json_metadata # noqa


~~Could you add a small comment explaining what's being imported here and why? I assume it's doing some magic when imported.~~ Oh import reporting is importing the file you've introduced. I'd lean towards just deleting reporting.py and putting those fixtures in conftest.py, but I see value in not bloating conftest.py.

Yes, the names have to be defined here so that pytest will see them as fixtures. I decided to keep the reporting in a separate file to keep things clean, since it's already a good bit of code and could potentially be more if we decide to add more stuff to the report in the future.

asmeurer · 2022-08-04T18:49:30Z

personally I'd prefer test_has_names.py:: test_has_names -> test_has_attributes.py:: test_has_attributes

Your suggestion is to have two separate files for this? I don't really see the value in that, especially since the file is so short and uses the same logic for names and attributes.

asmeurer · 2022-08-04T18:50:53Z

My point with the warnings is that by default, warnings are only omitted once per warning:

>>> import numpy as np
>>> np.bool
<stdin>:1: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
<class 'bool'>
>>> np.bool
<class 'bool'>

But this exact same warning is emitted every single time. So somewhere something is changing the warnings filter to enable them always. I did see some warnings code in hypothesis but I wasn't sure if it was what was causing this. I guess it could also be pytest doing it somehow.

honno · 2022-08-04T19:17:33Z

personally I'd prefer test_has_names.py:: test_has_names -> test_has_attributes.py:: test_has_attributes

Your suggestion is to have two separate files for this? I don't really see the value in that, especially since the file is so short and uses the same logic for names and attributes.

Ah I was being lazy heh, I meant rename test_has_names.py to test_has_attributes.py, and rename the test case test_has_names to test_has_attributes.

honno · 2022-08-04T19:18:31Z

My point with the warnings is that by default, warnings are only omitted once per warning:
>>> import numpy as np
>>> np.bool
<stdin>:1: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
<class 'bool'>
>>> np.bool
<class 'bool'>
But this exact same warning is emitted every single time. So somewhere something is changing the warnings filter to enable them always. I did see some warnings code in hypothesis but I wasn't sure if it was what was causing this. I guess it could also be pytest doing it somehow.

Ooo, this feels like Numpy is doing some magic which pytest/Hypothesis isn't equipped for? Probably requires some hacking around to fix.

rgommers · 2022-08-04T20:02:59Z

Ooo, this feels like Numpy is doing some magic which pytest/Hypothesis isn't equipped for? Probably requires some hacking around to fix.

It shouldn't do anything fancy; the one thing is filterwarnings = error in pytest.ini.

asmeurer · 2022-08-04T21:29:22Z

NumPy's pytest.ini is irrelevant here. It looks like the pytest defaults are to store each warning invocation separately. Since the warning metadata doesn't even contain enough information to distinguish the different warnings, I think the simplest solution here is to post-process the JSON to deduplicate the warnings. (probably a lot of the stuff I'm doing here could be upstreamed to the pytest-json-report library)

This was causing the report file sizes to be huge, especially for plain numpy.

asmeurer · 2022-08-04T23:58:12Z

OK, I deduplicated the warnings output. The plain numpy report is now 11 MB and numpy.array_api is 6.1 MB (both uncompressed). I've uploaded examples of each to the gist. I'd like to hear feedback from Athan on whether I need to try to reduce this file size any further. Again, these are uncompressed sizes, and they compress well (both are under 0.5 MB after basic zip compression).

honno · 2022-08-05T08:21:48Z

Ooo, this feels like Numpy is doing some magic which pytest/Hypothesis isn't equipped for? Probably requires some hacking around to fix.

It shouldn't do anything fancy; the one thing is filterwarnings = error in pytest.ini.

Ah my confusion here is say the following

# Using ipython
>>> class foopy_factory:
...     @property
...     def bool(self):
...         warn("don't use this", DeprecationWarning)
...         return bool
... 
>>> foopy = foopy_factory()
>>> foopy.bool
DeprecationWarning: don't use this
bool
>>> foopy.bool
DeprecationWarning: don't use this  # warning is emitted again
boo
...
>>> np.bool
DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`...
np.bool
>>> bool
DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`...  # warning is emitted again
np.bool

the warning is emitted for each access of foopy.bool, but Aaron's example only show it emits the first time.

>>> import numpy as np
>>> np.bool
<stdin>:1: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
<class 'bool'>
>>> np.bool
<class 'bool'>

I see now that the emit-once behaviour is seen using the default python interpreter, but not using say IPython.

# Using python
>>> class foopy_factory:
...     @property
...     def bool(self):
...         warn("don't use this", DeprecationWarning)
...         return bool
... 
>>> foopy = foopy_factory()
>>> foopy.bool
DeprecationWarning: don't use this
bool
>>> foopy.bool
bool  # no warning this time

Seems like Aaron's found a solution anywho, but might be worth noting that environment things can affect this behaviour... or I might be missing something (python just catches duplicate warnings?), I'm pretty ignorant on this stuff heh.

asmeurer · 2022-08-19T16:52:22Z

Need to have this merged so I can start testing it.

kgryte · 2022-08-22T18:49:03Z

@asmeurer Looking through the sample JSON, they look fine to me. No need to reduce further, IMO. The raw files can be parsed and transformed for subsequent presentation on, e.g., a website. So not too concerned regarding initial report file sizes, particularly as we can store them on disk in compressed format.

kgryte · 2022-08-22T18:51:48Z

One thing we should consider, however, is versioning the JSON format. E.g., using JSON schema. This would allow versioning our backend server APIs to handle transformations of different versions of the JSON report.

In which case, adding a __version__ field to the JSON report could may sense which refers to the relevant schema version.

asmeurer added 4 commits June 24, 2022 16:13

Reintroduce test_has_names

17b3eab

This just tests which names exist without checking anything else. It also doesn't use hypothesis, so it should work even if the library doesn't have proper hypothesis support (assuming the test suite can even run at all in that case, I'm not completely sure).

Move test_has_names to its own file

fdb7ffd

That way it can be run even on libraries that fail the hypothesis sanity checks.

Add .report.json to .gitignore

cfbef56

Store the module name in the test metadata

7623741

honno reviewed Jun 27, 2022

View reviewed changes

array_api_tests/test_signatures.py Outdated Show resolved Hide resolved

Revert import change in test_signatures.py

3f25af0

kgryte mentioned this pull request Jul 7, 2022

RFC: tracking array API test suite compliance data-apis/array-api#462

Open

asmeurer added 7 commits August 1, 2022 13:16

Some work on adding additional metadata to the json report

ed4d58e

Move reporting stuff to its own file, and add parametrize values to t…

7f25cec

…he report

Make more types of test parameters JSON serializable

1f5284e

Include the array_api_tests version in the JSON report metadata

d094951

Add hypothesis information to the JSON report metadata

0c94174

Add a check that the custom JSON report metadata is always JSON seria…

befbd80

…lizable Otherwise it just shows up as a warning in the test results, which might easily be missed.

Add array_attributes to stubs.__all__

05c7802

asmeurer marked this pull request as ready for review August 2, 2022 22:27

asmeurer added 8 commits August 2, 2022 17:20

Don't enable the add_api_name_to_metadata fixture unless --json-repor…

9497ecd

…t is used

Use a better name for the fixture

b5234ad

Fix undefined name

f8e562c

Allow classes to be JSON serialized in the JSON report

3dca2ad

Add pytest-json-report to requirements.txt

d62073a

Add a check that pytest-json-report is installed

1c42d3a

Remove attempt to only enable the fixture when --json-report is used

98db893

It was actually not enabling it at all ever.

Add a repr() fallback for any non-JSON-serializable type

775d25b

The inputs could be things that aren't generically recognizable, e.g., for plain numpy, they could be numpy ufuncs.

honno self-requested a review August 3, 2022 08:58

asmeurer added 2 commits August 3, 2022 14:04

Use shorter tracebacks in the JSON report

f356ffd

Revert "Use shorter tracebacks in the JSON report"

a1ccd19

This reverts commit f356ffd. It doesn't actually save that much space, so let's just use the default since it has more information and it keeps the code simpler.

honno reviewed Aug 4, 2022

View reviewed changes

De-duplicate warnings metadata in the report JSON

0841ef3

This was causing the report file sizes to be huge, especially for plain numpy.

asmeurer merged commit 66ab89c into data-apis:master Aug 19, 2022

This was referenced Aug 25, 2022

CI: array API test job failing kokkos/pykokkos#63

Closed

BUG: can't run tests after JSON reporting addition #138

Closed

JSON reporting #131

JSON reporting #131

Uh oh!

Conversation

asmeurer commented Jun 24, 2022

Uh oh!

honno Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

honno Jun 27, 2022

Choose a reason for hiding this comment

Uh oh!

honno left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rgommers commented Jul 7, 2022

Uh oh!

asmeurer commented Aug 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmeurer commented Aug 2, 2022

Uh oh!

rgommers commented Aug 3, 2022

Uh oh!

asmeurer commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

honno commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

honno commented Aug 4, 2022

Uh oh!

honno left a comment

Choose a reason for hiding this comment

Uh oh!

honno Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmeurer Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

asmeurer commented Aug 4, 2022

Uh oh!

asmeurer commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

honno commented Aug 4, 2022

Uh oh!

honno commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Aug 4, 2022

Uh oh!

asmeurer commented Aug 4, 2022

Uh oh!

asmeurer commented Aug 4, 2022

Uh oh!

honno commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmeurer commented Aug 19, 2022

Uh oh!

kgryte commented Aug 22, 2022

Uh oh!

kgryte commented Aug 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

asmeurer commented Aug 2, 2022 •

edited

Loading

asmeurer commented Aug 3, 2022 •

edited

Loading

honno commented Aug 4, 2022 •

edited

Loading

honno Aug 4, 2022 •

edited

Loading

asmeurer commented Aug 4, 2022 •

edited

Loading

honno commented Aug 4, 2022 •

edited

Loading

honno commented Aug 5, 2022 •

edited

Loading

kgryte commented Aug 22, 2022 •

edited

Loading