-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
sklearn MDS vs skbio PCoA #15272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd be happy to have the other implementation as an alternative, and a parameter to set is as the |
related to #4485 ? |
What's the relation between PCA and PCoA? They seem quite similar but I'm not really familiar with MDS and have never heard the term PCoA |
PCoA is essentially a PCA, but starting for a distance matrix (which doesn't need to be euclidean). The scikit-bio implementation is quite nice: _principal_coordinate_analysis.py |
Is anyone working on that? If not, I can take it. |
Seems like it's available, feel free to take it @panpiort8 |
Ok, I'll take it (then I suppose 'help wanted' is misleading) |
Could someone review above PR? |
We're busy mostly with the release at the moment. We should be able to focus on this more if you ping in 3 weeks ;) |
Ok, I fully understand. I'll ping soon :) |
gentle ping (; |
Could someone review above PR? (gentle ping no. 2) |
Has this been finally implemented? I read the documentation, but there is nothing about SVD yet :( |
It's almost implemented (PR #16067), but still needs pretty much work to be merged. Feel free to continue this PR. |
Has there been any progress on this recently? Looking forward to decreasing my reliance on |
There is some confusion in the discussion above. PCoA is another name for "classical MDS". It takes a matrix of pairwise distances, performs some linear algebra and eigendecomposition, and gets the embedding. The loss function that is minimized is sometimes called "strain". If the input pairwise distances are Euclidean distances between some vectors, then this procedure is equivalent to PCA of those vectors. In contrast, "metric MDS" optimizes a DIFFERENT loss function called "stress", and can only be solved iteratively. These are two different optimization problems with two different solutions. There is also "non-metric MDS" which is a yet another different thing. I fully agree that classical MDS aka PCoA should be implemented in sklearn. My question is: what is the best API for that? Ideally I would prefer What do people think? |
@amueller @adrinjalali What do you think about the optimal API (see my comment above)? Does |
I think the |
I would like to understand the specific stress function currently implemented in scikit-learn for both metric and non-metric MDS. Does the implementation follow the Kruskal stress function as described in the referenced papers in the documentation? If not, could you kindly clarify the formula used for each and any key differences? |
I replied in more detail in your closed issue, but I think Wikipedia has a good overview: |
I made a new PR to implement classical MDS aka PCoA: #31322. |
Multi Dimensional Scaling (MDS) and Principal Coordinate Analysis (PCoA) are two names for the same dimension reduction technique*.
In scikit-learn, MDS is implemented with the SMACOF algorithm while in other Python libraries (such as scikit-bio) and most R packages offering it, it is implemented using singular value decomposition.
This is usually quite confusing for people who try out the sklearn implementation of MDS when comparing it with other MDS implementations (sklearn stands out).
How could one add another implementation of MDS in sklearn ? Or maybe create a new PCoA method ?
cc @adrinjalali
The text was updated successfully, but these errors were encountered: