Skip to content

Fix memory pressure in DefaultMemoryManager #2801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 24, 2020

Conversation

padentomasello
Copy link
Contributor

Summary: This fixes a bug that caused memory to not be garbage collected until we ran out of memory in v3.7.0.

When we switched from MemoryManagerImpl to DefaultMemoryManager, we switched the memory pressure check to check for lock_buffers instead of total_buffers which I'm assuming was unintentional:

https://github.com/arrayfire/arrayfire/blob/v3.6.4/src/backend/common/MemoryManagerImpl.hpp#L379

https://github.com/arrayfire/arrayfire/blob/v3.7.0/src/backend/common/DefaultMemoryManager.cpp#L132

Also, we now check against getMemoryPressureThreshold() which default value is 1.0. Since we were returning 1.0 from getMemoryPressure(), it would never trigger.
(https://github.com/padentomasello/arrayfire/blob/v3.7.0/src/backend/common/DefaultMemoryManager.cpp#L165)

I changed it to return to ratio now instead. This should have the intended behavior, plus allow users to adjust the memory_threshold for meaningful changes.

@jacobkahn
Copy link
Contributor

jacobkahn commented Mar 21, 2020

So after talking to @padentomasello offline, we've discovered the issue is a bit simpler. There are two distinct issues that affect the default memory manager:

  1. A JIT evaluation/memory pressure bug that was introduced when moving JIT heuristics into the memory manager. The new memory pressure measure returns 1.0 if the default JIT eval heuristic is met, and 0.0 otherwise. In the CPU backend, the memoryPressureThreshold (returned by getMemoryPressureThreshold()) defaults to 1.0, and the comparison checks if getMemoryPressure() >= getMemoryPressureThreshold(). In the OpenCL and CUDA backends, however, this check is only getMemoryPressure() > getMemoryPressureThreshold() (to be clear, this is >, not >=), so the JIT eval condition will never be met even if getMemoryPressure(), which means CUDA and OpenCL backends won't call kernel launches, leading to OOMs and odd behavior.
  2. The behavior of garbage collection in the default memory manager has changed slightly in 3.7.0. The check for garbage collection in 3.6.4 during allocations was based on, among other things, a comparison between total_buffers and max_buffers. In 3.7.0, this is implicitly calling getMemoryPressure(), which is comparing lock_buffers and max_buffers.

tl;dr — we don't need the ratio idea presented in this PR and can use the existing getMemoryPressure() implementation, but we have to make sure that JIT eval checks are always checking that getMemoryPressure() >= getMemoryPressureThreshold(), and also that garbage collection checks at alloc time in the memory manager are consistent with 3.6.4 behavior.

cc @umar456 @9prady9

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent catch!

As @jacobkahn mentioned in his response I think the correct fix would be to revert to total_buffers and fix the condition where ever the getMemoryPressure function is used. I want to avoid changing behavior in bug fixes and patch releases especially if we cannot measure the performance implications of such a change.

I noticed that these comparisons will need to be modified:
https://github.com/arrayfire/arrayfire/blob/master/src/backend/cpu/queue.hpp#L72
https://github.com/arrayfire/arrayfire/blob/master/src/backend/cuda/Array.cpp#L251
https://github.com/arrayfire/arrayfire/blob/master/src/backend/opencl/Array.cpp#L299

@jacobkahn
Copy link
Contributor

This gist should cover everything that needs changing for now.

@padentomasello
Copy link
Contributor Author

@jacobkahn @umar456 Thank you for the feedback. I did not initially realize that getMemoryPressure was used outside of the memory manager. I have updated the PR to match @jacobkahn's gist.

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks perfect. Thanks!

@umar456 umar456 merged commit 10a82db into arrayfire:master Mar 24, 2020
umar456 pushed a commit to umar456/arrayfire that referenced this pull request Mar 26, 2020
* Fix getMemoryPressure comparison, and revert DefaultMemoryManager GC comparison.

Co-authored-by: Paden Tomasello <padentomasello@devfair049.maas>
umar456 pushed a commit that referenced this pull request Mar 27, 2020
* Fix getMemoryPressure comparison, and revert DefaultMemoryManager GC comparison.

Co-authored-by: Paden Tomasello <padentomasello@devfair049.maas>
@9prady9 9prady9 added this to the 3.7.1 milestone Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants