TST: Improve tests for numpy.pad #12789

lagru · 2019-01-17T23:42:37Z

This aims to improve the test suite for numpy.pad in anticipation for #11358. E.g. refactor the classes ValueError1, ValueError2, ..., add some missing edge cases and group tests that cover the same aspect of pad. e67ecc3 actually removes ineffective unit tests.

and extend coverage to all modes and more variations of a negative stat_length.

Can be grouped with already existing test class checking the behavior for the reflect mode.

Can be grouped in class TestPadWidth as this test checks if an ndarray is accepted as the value to pad_width

These test were ineffective. The TypeError raised in these test was not actually due to pad_width receiving the wrong type but due to the missing parameter mode. Added missing type complex to the appropriate existing test checking for pad_widths type behavior.

lagru · 2019-01-17T23:44:17Z

numpy/lib/tests/test_arraypad.py

@@ -12,6 +12,20 @@
 from numpy.lib.arraypad import _as_pairs


+_all_modes = {


At some point it made sense to introduce this global variable to avoid duplication.

lagru · 2019-01-17T23:48:07Z

numpy/lib/tests/test_arraypad.py

+        ((1, 2), (3, 4), (5, 6)),
+        ((3, 4, 5), (0, 1, 2)),
+    ])
+    @pytest.mark.parametrize("mode", _all_modes.keys())


Using parametrize instead of for loops makes the code more readable but really bloats the number of unit tests and thus the output of pytest. What's the general preference here?

I think we tend to use parameterize now. If it really does bloat output too much, maybe we can also ask pytest to add some nobs to help.

Okay, good to know.

lagru · 2019-01-17T23:49:52Z

Oh, and reviewing is probably easiest if you follow the commits one by one. Most are independent of another.

seberg

Looks very thorough to me. Do you want to add morethings, or should we merge this?

seberg · 2019-02-12T19:21:41Z

numpy/lib/tests/test_arraypad.py

+        ((1, 2), (3, 4), (5, 6)),
+        ((3, 4, 5), (0, 1, 2)),
+    ])
+    @pytest.mark.parametrize("mode", _all_modes.keys())


I think we tend to use parameterize now. If it really does bloat output too much, maybe we can also ask pytest to add some nobs to help.

numpy/lib/tests/test_arraypad.py

lagru · 2019-02-12T19:47:56Z

@seberg Thanks for looking at this. It's been some time but I remember that there were a few things I still wanted to look at. I'll do so in the next few days and try to finish this.

Using np.pad instead of directly importing the function seems to be more inline with other test modules.

lagru · 2019-02-14T13:57:40Z

@seberg I think this is ready to go. Test coverage is at 99%. Not covered are:

numpy/numpy/lib/arraypad.py

Line 961 in dea8580

return (array,)

Not really sure what this function does and how to fix that.

numpy/numpy/lib/arraypad.py

Lines 1288 to 1292 in dea8580

    
           if pad_before > 0 or pad_after > 0: 
        
               raise ValueError("There aren't any elements to reflect" 
        
                                " in axis {} of `array`".format(axis)) 
        
           # Skip zero padding on empty axes. 
        
           continue

I think this is a really wird bug. Coverage.py complains that the if statement is never false and therefore never jumped to the continue statement which is plain wrong. If I add a statement (e.g. _ = 1) in front of the continue statement the bug disappears...

numpy/numpy/lib/arraypad.py

Line 1337 in dea8580

elif mode == 'wrap':

Is never false. Can be ignored as it will be obsolete with the rewrite anyway.

Also I couldn't figure out how to use runtests.py for this purpose. I haven't managed to use the -t option to restrict the selected tests (the script complains with ERROR: file or package not found) nor how to use the --coverage option (prints runtests.py: error: unrecognized arguments: --cov-report=html:/**/numpy/build/coverage --cov=/**/numpy/numpy). Instead I used coverage.py together with pytest.

mattip · 2019-02-14T14:06:37Z

-t specifies the path to the python file, i.e. -t numpy/core/tests/test_multiarray.py

--cov needs pytest-cov

mattip · 2019-02-14T14:08:56Z

numpy/lib/tests/test_arraypad.py

+def test_missing_mode():
+    match = r"pad\(\) missing 1 required positional argument: 'mode'"
+    with pytest.raises(TypeError, match=match):
+        np.pad(np.ones((5, 6)), 4)


tests are failing, it seems the message or exception is different?

That's actually intended with this test. The problem seems to be that match doesn't match because the returned message leads with _pad_dispatcher() missing 1 required positional [...] instead of pad(). Should I simply change the match-string or is this actually a bug (e.g. _pad_dispatcher not correctly wrapping)? When I try this in my console the returned error leads with pad().

Just start the match with "missing 1 ...". The _pad_dispatcher is another problem, but out of scope for this PR, pinging @shoyer to take a look

See #12028 (comment)

lagru · 2019-02-14T14:15:58Z

@mattip Thanks! It seems like for some reason I need to specify an absolute path for the -t option; relative won't work for some reason.

The CLI fails due to error message containing a reference to _pad_dispatcher() being returned instead of pad(). For some reason this test passes when run locally.

seberg · 2019-02-14T19:19:22Z

Thanks @lagru! Nice work.

lagru · 2019-02-14T21:30:27Z

You're welcome. 🙂 I'll merge these changes into #11358 tomorrow and have a look at what is left to do.

* ENH: Add support for constant, edge, linear_ramp to new numpy.pad Passes unit tests: - TestConstant - TestEdge - TestZeroPadWidth - TestLegacyVectorFunction - TestNdarrayPadWidth - TestUnicodeInput - TestLinearRamp * MAINT: Simplify diff / change order of functions * MAINT: Revert to old handling of keyword-only arguments * ENH: Add support for stat modes * ENH: Add support for "reflect" mode * MAINT: Remove _slice_column * ENH: Add support for "symmetric" mode * MAINT: Simplify mode "linear_ramp" Creating the linear ramp as an array with 1-sized dimensions except for the one given by `axis` allows implicit broadcasting to the needed shape. This seems to be even a little bit faster that doing this by hand and allows the simplicifaction of the algorithm. Note: Profiling and optimization will be done again at a later stage. * MAINT: Reorder arguments of a sum and fix typo Addresses feedback raised in PR. * ENH: Add support for "wrap" mode This completes the first draft of the complete rewrite meaning all unit tests should pass from this commit onwards. * MAINT: Merge functions for "reflect" and "symmetric" mode The set functions were nearly the same, apart from some index offsets. Merging them reduces code duplication. * TST: Add regression test for gh-11216 The rewrite in past commits fixed this bug. * BUG: Fix edge case for _set_wrap_both when pad_amt contains 0. And include test to protect against regression. * MAINT: Simplify and optimize pad modes Major changes & goals: Don't deal with pad area in the front and back separately. This modularity isn't needed and makes handling of the right edge more awkward. All modes now deal with the left and right side at the same time. Move the creation of the linear ramps fully to its own function which behaves like a vectorized version of linspace. Separate calculation and application of the pad area where possible. This means that _get_edges can be reused for _get_linear_ramps. Combine _normalize_shape and _validate_lengths in a single function which should handles common cases faster. Add new mode "empty" which leaves the padded areas undefined. Add documentation where it was missing. * TST: Don't use np.empty in unit tests * MAINT: Reorder workflow in numpy.pad and deal with empty dimensions Only modes "constant" and "empty" can extend dimensions of size 0. Deal with this edge case gracefully for all other modes either fail or return empty array with padded non-zero dimensions. Handle default values closer to their actual usage. And validate keyword arguments that must be numbers. * MAINT: Add small tweaks to control flow and documentation * BUG: Ensure wrap mode works if right_pad is 0 * ENH: Use reduced region of interest for iterative padding When padding multiple dimensions iteratively corner values are unnecessarily overwritten multiple times. This function reduces the working area for the first dimensions so that corners are excluded. * MAINT: Restore original argument order in _slice_at_axis * MAINT: Keep original error message of broadcast_to * MAINT: Restore old behavior for non-number end_values. * BENCH: Make the pad benchmark pagefault in setup * ENH/TST: Preserve memory layout (order) of the input array and add appropriate unit test. * STY: Revert cosmetical changes to reduce diff * MAINT: Pin dtype to float64 for np.pad's benchmarks * MAINT: Remove redundant code path in _view_roi * MAINT/TST: Provide proper error message for unsupported modes and add appropriate unit test. * STY: Keep docstrings consistent and fix typo. * MAINT: Simplify logical workflow in pad * MAINT: Remove dtype argument from _linear_ramp The responsibility of rounding (but without type conversion) is not really need in _linear_ramp and only makes it a little bit harder to reason about. * DOC: Add version tag to new argument "empty" * MAINT: Default to C-order for padded arrays unless the input is F-contiguous. * MAINT: Name slice of original area consistently for all arguments describing the same thing. * STY: Reduce vertical space * MAINT: Remove shape argument from _slice_at_axis Simplifies calls to this function and the function itself. Using `(...,)` instead should keep this unambiguous. This change is not compatible with Python 2.7 which doesn't support this syntax outside sequence slicing. If that is wanted one could use `(Ellipsis,)` instead. * TST: Test if end_values of linear_ramp are exact which was not given in the old implementation `_arange_ndarray`. * DOC: Improve comments and wrap long line * MAINT: Refactor index_pair to width_pair Calling the right value an index is just plain wrong as it can't be used as such. * MAINT: Make _linear_ramp compatible with size=0 * MAINT: Don't rely on negative indices for slicing Calculating the proper positive index of the start of the right pad area makes it possible to omit the extra code paths for a width of 0. This should make the code easier to reason about. * MAINT: Skip calculation of right_stat if identical If the input area for both sides is the same we don't need to calculate it twice. * TST: Adapt tests from gh-12789 to rewrite of pad * TST: Add tests for mode "empty" * TST: Test dtype persistence for all modes * TST: Test exception for unsupported modes * TST: Test repeated wrapping for each side individually. Reaches some only partially covered if-statments in _set_wrap_both. * TST: Test padding of empty dimension with constant * TST: Test if end_values of linear_ramp are exact which was not given in the old implementation `_arange_ndarray`. (Was accidentally overwritten during the last merge). * TST: Test persistence of memory layout Adapted from an older commit 3ac4d2a which was accidentally overwritten during the last merge. * MAINT: Simplify branching in _set_reflect_both Reduce branching and try to make the calculation of the various indices easier to understand. * TST: Parametrize TestConditionalShortcuts class * TST: Test empty dimension padding for all modes * TST: Keep test parametrization ordered Keep parametrization ordered, otherwise pytest-xdist might believe that different tests were collected during parallelization causing test failures. * DOC: Describe performance improvement of np.pad as well as the new mode "empty" in release notes (see gh-11358). * DOC: Remove outdated / misleading notes These notes are badly worded or actually misleading. For a better explanation on how these functions work have a look at the context and comments just above the lines calling these functions.

* ENH: Add support for constant, edge, linear_ramp to new numpy.pad Passes unit tests: - TestConstant - TestEdge - TestZeroPadWidth - TestLegacyVectorFunction - TestNdarrayPadWidth - TestUnicodeInput - TestLinearRamp * MAINT: Simplify diff / change order of functions * MAINT: Revert to old handling of keyword-only arguments * ENH: Add support for stat modes * ENH: Add support for "reflect" mode * MAINT: Remove _slice_column * ENH: Add support for "symmetric" mode * MAINT: Simplify mode "linear_ramp" Creating the linear ramp as an array with 1-sized dimensions except for the one given by `axis` allows implicit broadcasting to the needed shape. This seems to be even a little bit faster that doing this by hand and allows the simplicifaction of the algorithm. Note: Profiling and optimization will be done again at a later stage. * MAINT: Reorder arguments of a sum and fix typo Addresses feedback raised in PR. * ENH: Add support for "wrap" mode This completes the first draft of the complete rewrite meaning all unit tests should pass from this commit onwards. * MAINT: Merge functions for "reflect" and "symmetric" mode The set functions were nearly the same, apart from some index offsets. Merging them reduces code duplication. * TST: Add regression test for numpygh-11216 The rewrite in past commits fixed this bug. * BUG: Fix edge case for _set_wrap_both when pad_amt contains 0. And include test to protect against regression. * MAINT: Simplify and optimize pad modes Major changes & goals: Don't deal with pad area in the front and back separately. This modularity isn't needed and makes handling of the right edge more awkward. All modes now deal with the left and right side at the same time. Move the creation of the linear ramps fully to its own function which behaves like a vectorized version of linspace. Separate calculation and application of the pad area where possible. This means that _get_edges can be reused for _get_linear_ramps. Combine _normalize_shape and _validate_lengths in a single function which should handles common cases faster. Add new mode "empty" which leaves the padded areas undefined. Add documentation where it was missing. * TST: Don't use np.empty in unit tests * MAINT: Reorder workflow in numpy.pad and deal with empty dimensions Only modes "constant" and "empty" can extend dimensions of size 0. Deal with this edge case gracefully for all other modes either fail or return empty array with padded non-zero dimensions. Handle default values closer to their actual usage. And validate keyword arguments that must be numbers. * MAINT: Add small tweaks to control flow and documentation * BUG: Ensure wrap mode works if right_pad is 0 * ENH: Use reduced region of interest for iterative padding When padding multiple dimensions iteratively corner values are unnecessarily overwritten multiple times. This function reduces the working area for the first dimensions so that corners are excluded. * MAINT: Restore original argument order in _slice_at_axis * MAINT: Keep original error message of broadcast_to * MAINT: Restore old behavior for non-number end_values. * BENCH: Make the pad benchmark pagefault in setup * ENH/TST: Preserve memory layout (order) of the input array and add appropriate unit test. * STY: Revert cosmetical changes to reduce diff * MAINT: Pin dtype to float64 for np.pad's benchmarks * MAINT: Remove redundant code path in _view_roi * MAINT/TST: Provide proper error message for unsupported modes and add appropriate unit test. * STY: Keep docstrings consistent and fix typo. * MAINT: Simplify logical workflow in pad * MAINT: Remove dtype argument from _linear_ramp The responsibility of rounding (but without type conversion) is not really need in _linear_ramp and only makes it a little bit harder to reason about. * DOC: Add version tag to new argument "empty" * MAINT: Default to C-order for padded arrays unless the input is F-contiguous. * MAINT: Name slice of original area consistently for all arguments describing the same thing. * STY: Reduce vertical space * MAINT: Remove shape argument from _slice_at_axis Simplifies calls to this function and the function itself. Using `(...,)` instead should keep this unambiguous. This change is not compatible with Python 2.7 which doesn't support this syntax outside sequence slicing. If that is wanted one could use `(Ellipsis,)` instead. * TST: Test if end_values of linear_ramp are exact which was not given in the old implementation `_arange_ndarray`. * DOC: Improve comments and wrap long line * MAINT: Refactor index_pair to width_pair Calling the right value an index is just plain wrong as it can't be used as such. * MAINT: Make _linear_ramp compatible with size=0 * MAINT: Don't rely on negative indices for slicing Calculating the proper positive index of the start of the right pad area makes it possible to omit the extra code paths for a width of 0. This should make the code easier to reason about. * MAINT: Skip calculation of right_stat if identical If the input area for both sides is the same we don't need to calculate it twice. * TST: Adapt tests from numpygh-12789 to rewrite of pad * TST: Add tests for mode "empty" * TST: Test dtype persistence for all modes * TST: Test exception for unsupported modes * TST: Test repeated wrapping for each side individually. Reaches some only partially covered if-statments in _set_wrap_both. * TST: Test padding of empty dimension with constant * TST: Test if end_values of linear_ramp are exact which was not given in the old implementation `_arange_ndarray`. (Was accidentally overwritten during the last merge). * TST: Test persistence of memory layout Adapted from an older commit 3ac4d2a which was accidentally overwritten during the last merge. * MAINT: Simplify branching in _set_reflect_both Reduce branching and try to make the calculation of the various indices easier to understand. * TST: Parametrize TestConditionalShortcuts class * TST: Test empty dimension padding for all modes * TST: Keep test parametrization ordered Keep parametrization ordered, otherwise pytest-xdist might believe that different tests were collected during parallelization causing test failures. * DOC: Describe performance improvement of np.pad as well as the new mode "empty" in release notes (see numpygh-11358). * DOC: Remove outdated / misleading notes These notes are badly worded or actually misleading. For a better explanation on how these functions work have a look at the context and comments just above the lines calling these functions.

lagru added 7 commits January 17, 2019 16:21

TST: Move test for negative stat_length

fa29e36

and extend coverage to all modes and more variations of a negative stat_length.

TST: Merge tests for pad_width in single class

b26c0c8

TST: Test behavior of pad's kwargs for all modes

60f47bb

TST: Move test to TestReflect

55d91e8

Can be grouped with already existing test class checking the behavior for the reflect mode.

TST: Simplify regression test for object input

f124d76

TST: Move testing pad_width as ndarray

3f602fc

Can be grouped in class TestPadWidth as this test checks if an ndarray is accepted as the value to pad_width

lagru commented Jan 17, 2019

View reviewed changes

charris added 05 - Testing component: numpy.lib 25 - WIP labels Feb 4, 2019

seberg approved these changes Feb 12, 2019

View reviewed changes

lagru added 6 commits February 14, 2019 14:01

TST: Move test for pad_width of zero

fcea96c

TST: Move test for simple stat_length

491d76a

TST: Simplify classes with only one test

7c07506

TST: Add naive test for non-contiguous arrays

2fc12ca

MAINT: Don't import pad directly

bbd4ab5

Using np.pad instead of directly importing the function seems to be more inline with other test modules.

STY: Make class layout consistent in module

bcd558b

lagru changed the title ~~WIP: TST: Improve tests for numpy.pad~~ TST: Improve tests for numpy.pad Feb 14, 2019

mattip reviewed Feb 14, 2019

View reviewed changes

TST: Fix match-string for missing pad mode error

5ac67e7

The CLI fails due to error message containing a reference to _pad_dispatcher() being returned instead of pad(). For some reason this test passes when run locally.

seberg removed the 25 - WIP label Feb 14, 2019

seberg merged commit d5ccaee into numpy:master Feb 14, 2019

lagru deleted the test-old-pad branch February 14, 2019 20:55

lagru added a commit to lagru/numpy that referenced this pull request Feb 15, 2019

TST: Adapt tests from numpygh-12789 to rewrite of pad

b3b22a6

shoyer mentioned this pull request Feb 15, 2019

Tracking issue for implementation of NEP-18 (__array_function__) #12028

Closed

33 tasks

		@@ -12,6 +12,20 @@
		from numpy.lib.arraypad import _as_pairs


		_all_modes = {

Uh oh!

TST: Improve tests for numpy.pad #12789

TST: Improve tests for numpy.pad #12789

Uh oh!

Conversation

lagru commented Jan 17, 2019

Uh oh!

lagru Jan 17, 2019

Choose a reason for hiding this comment

Uh oh!

lagru Jan 17, 2019

Choose a reason for hiding this comment

Uh oh!

seberg Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

lagru Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

lagru commented Jan 17, 2019

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

seberg Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lagru commented Feb 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lagru commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Feb 14, 2019

Uh oh!

mattip Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lagru Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip Feb 14, 2019

Choose a reason for hiding this comment

Uh oh!

shoyer Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

lagru commented Feb 14, 2019

Uh oh!

seberg commented Feb 14, 2019

Uh oh!

lagru commented Feb 14, 2019

Uh oh!

Uh oh!

lagru commented Feb 12, 2019 •

edited

Loading

lagru commented Feb 14, 2019 •

edited

Loading

mattip Feb 14, 2019 •

edited

Loading

lagru Feb 14, 2019 •

edited

Loading