-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Any way to make BinaryTree support fused type? #7059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Either way, the value of data_arr is overwritten in init, and On 21 July 2016 at 23:55, Yen notifications@github.com wrote:
|
Thanks @jnothman, you're right. However, after some attempts to try to convert And then I found this unfortunate description in the official page here:
The only solution I can come up so far is like this commit, declare another float memoryview attributes and we can decide which memoryview we want to use when we want to access |
You can make it a void * and typecast it within a function, and see how that goes. |
Hello @jnothman , sorry to interrupt again. I'm assuming you mean something like this? (I use variable Above code works, but if I change both
which I can't understand since Any idea? |
Well, I think the first rule of compilers is that "Compiler crash" isn't expected behaviour! I presume you're uncovering a Cython bug. It looks like type inference is failing. Have you attempted to make the code that crashes minimal? I don't know what you mean by should not affect other |
Oh do you mean |
The line numbers there don't seem to correspond to the most recent Cython, so you should probably upgrade before trying to alert someone to a bug. |
@jnothman Thanks for the clarifying. I'll dive into it to see if this is really a Cython bug and get back. |
I produced a minimum example here, but I'm sorry that I overlooked another error that actually happend on top of it, i.e.,
I think this error occured because Cython compiler can't decide which type of In previous PR, we didn't encounter this because fused types like However, in this case, it seems that compiler can't decide what Does this make sense? |
If so, we need fused types to appear in function arguments if we want to use fused types local variables and assign value to it in that function? This seems like a big limitation of Cython fused types and worth to be documented. |
This seems like a big limitation of Cython fused types and worth to be documented.
Maybe you should ask on the Cython mailing list?
|
@GaelVaroquaux thanks, I just asked the question. |
Ah of course, I too missed the context. Yes, fused types only work in functions/methods where the type can be specialised by sniffing the arguments. Which means we just need a way to specify type at |
You may be able to specialise things with the |
I'm worry about since we declare Like here, we can't typecast it to Sorry I am not 100% sure of my concern, and I will wait for your elaboration, thanks a lot! |
Again I'll have to put aside some time to look into it later in the week, On 25 July 2016 at 14:30, Yen notifications@github.com wrote:
|
I think data_sample argument is a nice solution, but I'm curious about how to declare data? Using If so, it seems that we almost need to include |
Yes, we'd use (And we don't actually need a separate |
I've never head of this defaults, and the error message appears in On 5 August 2016 at 14:01, Yen notifications@github.com wrote:
|
It doesn't look like anyone responded there. I suppose one should check On 5 August 2016 at 16:57, Joel Nothman joel.nothman@gmail.com wrote:
|
Well ... Nevermind, it's a cython bug, see below. |
@jnothman After countless repeat experiments, I found out that it's a bug! 🐞 It turns out that if you want to put fused type arguments in extension class methods, you cannot have default value arguments in the same function. See this ipython notebook I just created. |
I've found a way to successfully include fused types arguments in |
Great that we've got some kind of solution! That's a pretty frustrating bug/limitation in Cython. It'd be worth reporting so that at least there's a record of it. |
Yes, I almost killed myself during Cython experiments. And I have to say sorry that I'm currently modifying #7153 , so that I may force push it later since some code on the github will cause problems. |
hey @yenchenlin and @jnothman, I just created a KDTree from a dataset that was stored as uint8. The original data is about 200mb, but the final KDTree is like 1.6gb and when I looked at the arrays they seem to be stored as floats. Is there any progress on keeping the original dtypes in the tree and thereby possibly reducing the memory footprint? Or at least a way to change the dtypes after the tree was constructed? (I'm not familiar with Cython so I didn't get how to access any of the arrays directly in a way that lets you modify them, preferably without breaking anything.) I'm especially concerned as I wanted to use the tree in a WebApp, but the heroku server only allows for 512mb of RAM... >.> Thanks! :) |
Maybe the finishing the refactoring such as #4217 would help. I started some of it in #15097 Another blocker is read-only fused memory-views #10624 that will be only available in the future Cython 3.0.
In any case, this would likely apply to float64, float32. Anything else is so far out of scope, in scikit-learn. |
Is there any update on this? I would really like to be able to construct trees with float32 data. The data I have is single-precision (and large, ~1.5GB), so constructing a tree not only has to copy the data but also doubles the memory usage by converting to float64. If it were possible to construct trees with single-precision data internally, then this data set would be manageable. Thanks! |
Having Yet, it all comes done to a limitation within Cython as explained by cython/cython#3283 (comment). For scikit-learn, we are currently trying to get over this problem by extending some Cython implementations to support 32bit datasets using Tempita (see #22590). This first necessitates refactoring the implementations of |
Hello guys, currently I'm trying to make neighbors tree algorithms to support Cython fused types so that the memory needed can be drastically reduced.
However, both
KDTree
andBallTree
are subclasses ofBinaryTree
, and this line ofBinaryTree
seems to decide the datatype (np.float64
) of data it stores at initialization time, i.e., even before it sees the actual datatype of training data (which may benp.float32
).The reason to initialize array in
__cinit__
is explained in the comments of code:I wonder what is the
rare cases where __init__ is not called
?Can we handle those rare cases as special cases and move the data initialization to
__init__
, so that it can init a array of datatype of training data?Thanks in advance.
ping @jakevdp @jnothman @MechCoder
The text was updated successfully, but these errors were encountered: