-
Notifications
You must be signed in to change notification settings - Fork 0
ENH add pairwise distances backend for LocalOutlierFactor #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4c1d801
to
fed7db5
Compare
@jjerphan I think the part that I might need guidance with is that we actually want the indices and distances computed completely as |
Those distances are stored in the heaps, so after the execution, it should be possible to access the distances (here Those are technical details that I should probably better document. I think I need some more time. |
I understand. However I think the point where I added the code is not correct as _parallel_on_X_prange_iter_finalize will not contain the complete X. I think we might need to define this in _finalize where the computation is complete? |
@jjerphan Does it seem correct to include the implementation in the _finalize_results method? |
Thanks for the heads-up. I think we can first try implementing the logic in |
a0f5945
to
9d66cd7
Compare
@jjerphan I have made changes to accommodate the fit method with the dedicated backend. However there seems to be a complication in the score samples method now. The local reachability density needs the fit distances of the training data X. But now since we use a dedicated backend we are not storing these distances anymore and as a result they are not available in the score_samples method. |
I would be grateful if you could provide any suggestions. I was thinking about storing the distances in the call to fit but then we would still need these distances when computing score_samples. That would mean the fit distances would need to be available in the ArgKminLRD backend but this kind of creates a difference between how we are using it in fit and score_samples. |
I just realized we also need self._lrd in score_samples scikit-learn/sklearn/neighbors/_lof.py Lines 468 to 483 in a9b06a7
This means that we need to return the lrd from the backend in any case. So we can't compute the negative outlier factor in the backend. But then we would also need to return neighbors_indices_fit_X as they are used to index into self._lrd to calculate lrd_ratios_array in the fit method scikit-learn/sklearn/neighbors/_lof.py Lines 290 to 307 in a9b06a7
Looking at this I think we might need to return the neighbor_sample_indices along with lrd. |
9d66cd7
to
1c5ddc0
Compare
1c5ddc0
to
efeab7a
Compare
Hi @OmarManzoor, must scikit-learn#26316 be reviewed before this PR? I trying to understand why this branch targets the |
Closed. This was just for discussion before opening the actual PR on scikit-learn. |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?
CC: @jjerphan Could you kindly have a look at this because I might need some guidance on how to access the indices and distances arrays properly