Skip to content

Map values according to a dict #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

eriknw
Copy link
Member

@eriknw eriknw commented Jun 4, 2022

Closes #246.

A.apply({1: 10, 2: 20}, 999)

Heh, I know this isn't high priority (at all!), but it is oh-so-fun and was quick to do. I think the function we jit with numba is pretty competitive.

I doubt this handles UDTs, and we could probably be a bit smarter with dtypes in general.

@eriknw eriknw requested a review from jim22k June 4, 2022 04:10
@coveralls
Copy link

coveralls commented Jun 4, 2022

Coverage Status

Coverage: 99.449% (-0.02%) from 99.47% when pulling b703d61 on eriknw:apply_dict into 9a4808a on python-graphblas:main.

@jim22k
Copy link
Member

jim22k commented Jun 4, 2022

I'm currently -1 on this. The mental model I see here is that we can take a dict and construct a Matrix from it, then use that to update another Matrix. In that case, it would be assign, not apply.

Looking at your example was not at all obvious what was going on. And that is the worry I have with each new addition. They seem simple, but unless someone knows all the corners and shortcuts, it makes it harder to read someone else's code.

I'm open to being convinced, but these are my initial thoughts on the proposal.

@eriknw
Copy link
Member Author

eriknw commented Jun 4, 2022

Round(s) of convincing (and likely improvements) is expected and appreciated.

The mental model I see here is that we can take a dict and construct a Matrix from it, then use that to update another Matrix. In that case, it would be assign, not apply.

I have no idea what you mean by this. The equivalent recipe is:

C = Matrix(A.dtype, A.nrows, A.ncols)
C(A.S) << default
for k, v in d.items()
    C(A == k) << v

This is just like df.map(d) in pandas. map would probably be a better method name for us too--it reads better, and apply is already pretty loaded. Having a separate method also make future enhancements easier (no idea what, but I'm regularly impressed by the savviness of users and their suggestions).

I don't doubt use cases for this will arise (supposing we get more users), and alternatives to perform this are pretty ugly and most likely slower. So, I think this is worth exploring, but we can keep it on the back-burner and wait until we have conventions for showing examples in docstrings. We've been pushing core features forward faster than we can document them (with good reason), but we need documentation to catch up, and we should be in the habit of requiring sufficient documentation for PRs like this.

Possible enhancements:

  • It would be nice if we could come up with an efficient recipe that could result in missing values instead of using a default value.
  • Alternatively, to use the original value instead of default or missing.
  • Maybe get the default value from d.__missing__ if available (pandas does this).

@eriknw eriknw added discussion Discussing a topic with no specific actions yet feature Something is missing lowpriority labels Jun 4, 2022
@eriknw eriknw changed the title Apply with a dict 🎉 Map values according to a dict Jun 6, 2022
@eriknw
Copy link
Member Author

eriknw commented Jun 12, 2022

Okay, I think the natural API for this is:

def applymap(self, mapping, default=None):

where mapping is a dict-like Mapping. This returns an expression and does not modify the original.

default can be:

  • None: drop non-matching values
  • scalar: use this as the default value
  • Matrix: get default values from this object
    • if default is self, then the original value is "passed through", and we can have a more efficient recipe
    • mymatrix.applymap accepts Matrix and myvector.applymap accepts Vector (naturally)

I think the keys should be coerced to the dtype of the parent object dtype, and values coerced to the output object dtype. This will let us handle UDTs and awkward dtypes like uint64.

Edit: changed name from map to applymap based on suggestion from @jim22k

@eriknw eriknw marked this pull request as draft November 16, 2022 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussing a topic with no specific actions yet feature Something is missing lowpriority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apply dict to map values
3 participants