Skip to content

[MRG + 2] FIX avoid memory cost when sampling from large parameter grids #4840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 24, 2015

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Jun 9, 2015

Solve the issue reported at #3850 (comment) by making ParameterGrid able to dynamically look up a param setting by index; i.e. list of all possible settings is never realised.

Needs:

  • test
  • docstring
  • clearer code

@jnothman jnothman changed the title [WIP] FIX avoid memory cost when sampling from large parameter grids [MRG] FIX avoid memory cost when sampling from large parameter grids Jun 9, 2015
@jnothman
Copy link
Member Author

jnothman commented Jun 9, 2015

Now MRG. Open for review.

# Try the next grid
ind -= modulo[0]
else:
offsets = ind // modulo[1:] % sizes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if there is a helpful comment that can be done on this line. took me a minute but alright.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes, I guess the code is unnecessarily obfuscating seeing as the numbers are small so efficiency is little concern. I'd initially thought I'd memoize the modulo arrays, but that seems overkill. I'll rewrite more legibly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give this another quick look to see if it needs further clarification.

@amueller
Copy link
Member

amueller commented Jun 9, 2015

Looks good apart from the fact that I'm not 100% sure the tests is non-vacuous.

@amueller
Copy link
Member

amueller commented Jun 9, 2015

thanks for the fix :)

@amueller
Copy link
Member

amueller commented Jun 9, 2015

lgtm.

@amueller amueller changed the title [MRG] FIX avoid memory cost when sampling from large parameter grids [MRG + 1] FIX avoid memory cost when sampling from large parameter grids Jun 9, 2015
@jnothman
Copy link
Member Author

jnothman commented Jun 9, 2015

great, thanks!

On 10 June 2015 at 07:10, Andreas Mueller notifications@github.com wrote:

lgtm.


Reply to this email directly or view it on GitHub
#4840 (comment)
.

@chausler
Copy link

@jnothman super cool 👍 Thanks for the quick fix!, nice and elegant

@jnothman
Copy link
Member Author

Another review?


Returns
-------
params : dict of string to any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same description is used above in object constructor. It means "a dict mapping strings to arbitrary-type values".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok my bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not bad at all!

@TomDLT
Copy link
Member

TomDLT commented Jun 23, 2015

looks good
the code without memoization is much clearer

for sub_grid in self.param_grid:
# XXX: could memoize information used here
if not sub_grid:
if ind == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this branch covered by the test? It looks to me like the test doesn't cover subgrids.

Nevermind, somehow I missed that line. But I still would like a test where a non-empty subgrid follows an empty subgrid, for the else below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@amueller
Copy link
Member

@vene mrg?

@vene
Copy link
Member

vene commented Jun 24, 2015

Looks good to me 👍

@vene vene changed the title [MRG + 1] FIX avoid memory cost when sampling from large parameter grids [MRG + 2] FIX avoid memory cost when sampling from large parameter grids Jun 24, 2015
jnothman added a commit that referenced this pull request Jun 24, 2015
[MRG] FIX avoid memory cost when sampling from large parameter grids
@jnothman jnothman merged commit 7488fc0 into scikit-learn:master Jun 24, 2015
@jnothman
Copy link
Member Author

Thanks, @chausler for reporting, and to the reviewers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants