[MRG] Changed code example for FAQ #7820
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issue
Related to: #7669
What does this implement/fix? Explain your changes.
Problem:
The doc/faq example "How do I deal with string data (or trees, graphs...)?" has a chunk of code that doesn't work if you increase the number of samples.
Fix:
The code example needs the parameter 'algorithm=brute' for the the custom distance metric with data indexing to work correctly. Otherwise, for larger datasets the default 'algorithm=auto' will choose an incompatible algorithm which breaks the custom data indexing.
Any other comments?
Here's a notebook with code demonstrating the issue:
https://github.com/sillystring13/cuddly-barnacle/blob/master/Example%20-%20Sklearn%20FAQ%20code%20test.ipynb