Skip to content

Request for project inclusion: Helstrom Quantum Centroid Classifier #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
6 tasks done
leockl opened this issue Jan 24, 2020 · 21 comments
Open
6 tasks done

Request for project inclusion: Helstrom Quantum Centroid Classifier #44

leockl opened this issue Jan 24, 2020 · 21 comments

Comments

@leockl
Copy link

leockl commented Jan 24, 2020

Request for project inclusion in scikit-learn-contrib

  • Project name: helstrom-quantum-centroid-classifier
  • Project description: A package which implements a quantum-inspired supervised classification approach for data with binary classes
  • Authors: Leo Chow, Giuseppe Sergioli, Roberto Giuntini
  • Current repository: https://github.com/leockl/helstrom-quantum-centroid-classifier
  • Requirements:
  • scikit-learn compatible (check_estimator passed)
  • Documentation (guide, API reference, example gallery)
  • Unit tests (coverage: 90%)
  • Python3 compatible
  • PEP8 compliant
  • Continuous integration
@leockl leockl changed the title Request for Project Inclusion: Helstrom Quantum Centroid Classifier Request for project inclusion: Helstrom Quantum Centroid Classifier Jan 24, 2020
@adrinjalali
Copy link
Member

Looks like a descent one to me :)

@leockl
Copy link
Author

leockl commented Jan 29, 2020

Thank you @adrinjalali for reviewing this and for the thumbs up!

@chkoar
Copy link
Member

chkoar commented Mar 5, 2020

Great package. I would use all package names, file names and variable names in snake_lower case.

@chkoar
Copy link
Member

chkoar commented Mar 5, 2020

Couldn't we use a centroids_ array instead of both centroid_class_0_ and centroid_class_1 to stay in line with NearestCentroid?

@leockl
Copy link
Author

leockl commented Mar 9, 2020

Thanks for the upvote @chkoar. We will get back to you on the two feedback you have raised soon, as we are also in the process of updating the package with some new features.

@chkoar
Copy link
Member

chkoar commented May 30, 2020

Any update on this?

@leockl
Copy link
Author

leockl commented Jun 1, 2020

Hey guys, sorry I have been caught up with a few too many things the last couple of weeks. I will get this completed by the end of this week.

My upmost sincere apologies for the delay.

@leockl
Copy link
Author

leockl commented Jun 7, 2020

Hi @chkoar, the package have now been updated:

  • package names, file names and variable names in snake_lower case - with the exception of the variable feature matrix X and variable names using X, to be in line with Scikit-learn naming convention and readability purposes.

  • centroids_ array is now used instead of both centroid_class_0_ and centroid_class_1 to stay in line with NearestCentroid.

Thank you and looking forward to your response.

@chkoar
Copy link
Member

chkoar commented Jun 7, 2020

Hey @leockl ,

  • I would named it as centroids_, as you said. Not centroid_, as it is named now.
  • It seems that the classifier can cope only with binary problems, right? Could you detect that and warn the user? It seems that estimators tags is the way to go.
  • In the scikit-learn ecosystem is not common to have constructors in lower case, no? So, I would name this line to HQC

@leockl
Copy link
Author

leockl commented Jun 8, 2020

Hi @chkoar, yes for now the classifier can only cope with binary problems only. I have already included this warning in the 1st paragraph in the README.md file and in the 1st paragraph in the documentation. The estimator tag is also already included in the source code here. Are these enough? Otherwise, I can include a few more lines of code to produce a warning when there are more than 2 classes.

I will update centroid_ to centroids_ and change the class constructor back to HQC.

Many thanks.

@leockl
Copy link
Author

leockl commented Jul 13, 2020

Hi @chkoar, could you please provide your updated feedback on the estimators tags that you have mentioned above?

@leockl
Copy link
Author

leockl commented Jul 26, 2020

Hi @chkoar, I have updated the package with your previous two suggestions:

  • updated centroid_ to centroids_.
  • changed the class constructor to capitals HQC.

I haven't heard back from you in awhile and I haven't heard back from you about the estimator tag that I have mentioned above if it's good enough. Could you please have a look and complete this. This package has been sitting here for quite awhile now.

Many thanks in advance!

@chkoar
Copy link
Member

chkoar commented Jul 27, 2020

Hey @leockl,

The estimator tag is also already included in the source code here.

I had the impression that the check estimator was testing the case where your have a binary classifier and you are passing a non-binary problem. But as I can see this is not the case.

Hi @chkoar, yes for now the classifier can only cope with binary problems only.

IINM as a user I would expect as many probas (from predict_proba) as the classes of the problem.
So, I suppose that you should raise an exception. Something similar like the following:

if len(self.classes_) > 2:
    raise ValueError('Only 2 classes are supported')

Apart from this, I would add test cases for the HQC learnt attributes. For instance check the centroid's values on a known problem or something like this. Also, it seems to me that requirements.txt contains redundant dependencies, no? Also setup.py is missing install_requires.

ping @rth @glemaitre @adrinjalali

@leockl
Copy link
Author

leockl commented Aug 4, 2020

Hi @chkoar,

Many thanks for your response. I will update all the suggested requests. With the request on adding test cases for the HQC learnt attributes, do you have any examples or guidelines on how this is done?

@chkoar
Copy link
Member

chkoar commented Aug 7, 2020

@leockl could you take a look here?

@leockl
Copy link
Author

leockl commented Sep 2, 2020

Hi @chkoar,

My sincerely apologies for the delay on my reply. I have now updated the package with your suggestions:

  • added the exception error when there are more than 2 classes.
  • added test cases for the HQC learnt attribute hels_bound_.
  • removed redundant dependencies in requirements.txt.
  • included install_requires in setup.py.

Please review. Many thanks in advance.

@leockl
Copy link
Author

leockl commented Nov 1, 2020

Hi @chkoar, could I please get an update on this? Many thanks.

@chkoar
Copy link
Member

chkoar commented Nov 16, 2020

@leockl sorry for the delayed reply.

Below, I list some minors (probably) since I cannot open issues in your fork.

  • It seems that this block and this are identical. You could refactor in order to reduce the LOC.

  • 'ij,ji->' appears three times. It could be a constant.

  • In this and this line we have a self-assignment. Is there a reason?

  • Is u being reused in this and this lines? If not I would change it to _.

  • Is n being reused in this line? If not I would change it to _.

Apart of that, as @adrinjalali said, looks like a decent one.

@rth @glemaitre could this estimator be part of scikit-learn-extra repo? In any case it would be nice if any other folk could review this package.

@leockl
Copy link
Author

leockl commented Nov 20, 2020

Hi @chkoar,

Need to clarify a few points with you:

  • 2nd point: What do you mean by it could be a constant? Could you show an example.
  • 3rd point: This is correct and it's part of the algorithm when n_copies == 1.
  • 4th point: What do you mean by change it to _. Could you show an example.
  • 5th point: I guess this would be the same as the 4th point above.

Could I confirm if you require anymore changes @chkoar?

@rth @glemaitre Please let us know if you have any feedback? If not I would hope this would be the final changes that is required.

Please let me know. Many thanks in advance!

@chkoar
Copy link
Member

chkoar commented Nov 20, 2020

2nd point: What do you mean by it could be a constant? Could you show an example

subscripts = "ij,ji->"

4th point: What do you mean by change it to _. Could you show an example.

Among others, the underscore variable it's a convention for a "throwaway" variable. A variable that won't be used next. Common use cases are iterations and the unpackings.

alist = ["one", "two", "three"]

for index, _ in enumerate(alist):
    print(index) 

@leockl
Copy link
Author

leockl commented Dec 9, 2020

Hi @chkoar,

Many thanks for the above. I have now updated the code according to your change requests:

  • It seems that this block and this are identical. You could refactor in order to reduce the LOC.

Done.

  • 'ij,ji->' appears three times. It could be a constant.

Done.

  • In this and this line we have a self-assignment. Is there a reason?

This is correct and it's part of the algorithm when n_copies == 1. Please refer to the research paper.

  • Is u being reused in this and this lines? If not I would change it to _.

Done.

  • Is n being reused in this line? If not I would change it to _.

This was an error on my part. n is not reused in predict_proba(), therefore I have now removed n here.

I have also updated the code from using check_X_y() and check_array() to self._validate_data() as required by scikit-learn v0.25.

Please find the updated code here:
https://github.com/leockl/HQC/blob/master/Christos/HQC%20-%20CPU%20-%20Christos.py

I haven't uploaded the updated code into pip to release a new version yet. Would be great for you to approve all final changes first, as it takes time to keep releasing a new version every time new changes are requested. I hope you could understand. Many thanks once again Christos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants