Skip to content

Add a new memleak script that does everything #5360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 5, 2015

Conversation

mdboom
Copy link
Member

@mdboom mdboom commented Oct 30, 2015

This replaces our 4 memleak scripts with one that is able to test any
backend, with or without plot content, and with or without interactive
mode.

The calculation of average increase per iteration has been fixed.
Before, it assumed the increase was monotonically increasing, when in
fact it flucuates quite a bit. Therefore, it now calculates the
difference between each pair of results and averages that.

Also, the results are stored in pre-allocated Numpy arrays rather than
Python lists to avoid including the increasing size of the Python lists
in the results.

@mdboom
Copy link
Member Author

mdboom commented Oct 30, 2015

Not sure how to milestone this. Since it's just a dev utility it can probably go just about anywhere.

Also, I should add since this uses the new tracemalloc module, it is Python 3.x only.

garbage_arr[i] = garbage

print('Average memory consumed per loop: %1.4f bytes\n' %
(np.sum(rss_arr[starti+1:] - rss_arr[starti:-1]) / float(endi - starti)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like just the sum of the differences, which is the end value minus the start value.

((rss_arr[-1] - rss_arr[starti]) / float(endi - starti))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that's true. We need a different mechanism, then -- something that will take into account the spikiness of the data. If you select the start and end points incorrectly here you get wildly different results.

report

@efiring
Copy link
Member

efiring commented Oct 30, 2015

Mike, in the top subplot it looks like pymalloc is showing small fluctuations near 13M, and rss is showing small fluctuations around 150k--maybe rss units here are 512b or 1k blocks.

@mdboom
Copy link
Member Author

mdboom commented Oct 30, 2015

Yes -- rss is in different units. I just haven't gone and implemented that (because the units depend on platform, version of platform etc.) In any case, they should be on different scales, because pymalloc (including only allocations from Python interpreter itself) will always be significantly smaller than rss. The important thing is not their relative sizes but the first derivative anyway.

@QuLogic QuLogic mentioned this pull request Oct 31, 2015
This replaces our 4 memleak scripts with one that is able to test any
backend, with or without plot content, and with or without interactive
mode.

The calculation of average increase per iteration has been fixed.
Before, it assumed the increase was monotonically increasing, when in
fact it flucuates quite a bit.  Therefore, it now calculates the
difference between each pair of results and averages that.

Also, the results are stored in pre-allocated Numpy arrays rather than
Python lists to avoid including the increasing size of the Python lists
in the results.
@mdboom
Copy link
Member Author

mdboom commented Nov 2, 2015

I've updated this so that the average memory increase is calculated based on the peak memory usage rather than an instantaneous reading. This should get around the problem where the reading seems artificially high if it happens to pick a valley as the start end point.

This has also been updated to use the psutil package rather than our home-grown report_memory function. This has a couple of advantages: The units are all in bytes, regardless of the version of the OS being used (the definition of rss has changed over time). It also allows use to track the number of open file handles, as file handle leaking is also something we have trouble with from time to time.

@mdboom mdboom force-pushed the new-memleak-script branch from 6f932a1 to b59627b Compare November 2, 2015 16:16
tacaswell added a commit that referenced this pull request Nov 5, 2015
TST: Add a new memleak script that does everything
@tacaswell tacaswell merged commit d063dee into matplotlib:master Nov 5, 2015
@tacaswell
Copy link
Member

I don't think we have to back-port this to any other branch unless we want to go memory hunting on them.

@QuLogic QuLogic added this to the proposed next point release (2.1) milestone Nov 5, 2015
@mdboom mdboom deleted the new-memleak-script branch November 10, 2015 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants