Skip to content

Folder/Directory descriptions not present #31443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hardik1408 opened this issue May 28, 2025 · 7 comments
Closed

Folder/Directory descriptions not present #31443

hardik1408 opened this issue May 28, 2025 · 7 comments

Comments

@hardik1408
Copy link

Describe the issue linked to the documentation

I was navigating through the codebase, trying to find source code for some algorithms. I noticed that there are no descriptions of files present within a folder, which would actually make it easier to navigate through the codebase. We can have a small readme file within folders which would describe what is present in that folder.

Suggest a potential alternative/fix

No response

@hardik1408 hardik1408 added Documentation Needs Triage Issue requires triage labels May 28, 2025
@allmight05
Copy link

Is it open? Can I take this PR?

@HussainAther
Copy link

This is a great point. I’ve also found that navigating large codebases like scikit-learn can be challenging without brief descriptions or READMEs in subdirectories. Even a single sentence explaining the intent of each folder (e.g., algorithm families, utilities, tests) could go a long way for new contributors or those exploring the internals for the first time.

It might be helpful to prioritize documentation for the most commonly accessed directories first (like sklearn/linear_model/, sklearn/ensemble/, etc.), and even include guidance for contributors who want to help write these descriptions.

Definitely a solid suggestion that aligns well with improving dev onboarding and overall accessibility.

@hardik1408
Copy link
Author

Yeah exactly, that was the whole point. I am ready to work on this issue, if someone assigns me this

@shivamchhuneja
Copy link
Contributor

The intent behind this sounds great, had similar experience starting up but since I work with scikit it was comparatively easier to navigate.

That said, adding and maintaining separate README files across many subdirectories might be a lot of overhead for relatively low impact, especially as the structure evolves. (IMHO)

It would also require a ton of maintenance I guess with every iteration.

Would it make more sense to:

  • Add brief docstrings
  • Or maybe create a single CODEBASE_OVERVIEW.md file at the root level that explains the role of each major submodule like linear_model, ensemble, neighbors, etc. from a "for contributors" lens?

What do you think?

@adrinjalali
Copy link
Member

Any kind of description about what files in which folders contain what, is doomed to be outdated really fast since we have quite a lot of contributions to the repo merged daily.

However, I don't mind a generic overview of where things are in the contributing.rst file. The issue is most people don't read that file and many questions are already answered there.

Note that this is NOT a good first issue. Needs to be written by someone familiar with the codebase.

Maybe @StefanieSenger wants to have a look.

@adrinjalali adrinjalali removed the Needs Triage Issue requires triage label Jun 4, 2025
@betatim
Copy link
Member

betatim commented Jun 4, 2025

There are docstrings at the top of all the files, including the __init__.py which give a more generic overview of the directory. Combined with the fact that generally the naming of folders/submodules is pretty good I think we have enough existing documentation.

@betatim betatim closed this as completed Jun 4, 2025
@adrinjalali
Copy link
Member

For the record, something I always repeat in our sprints, is along the lines of this answer:

A generic overview of the files could be something like this:

There are a few main categories of files in the repository:

  • Documentation: the documentation is generated from 3 different souces:

    • doc/ which includes all user guides, which are written in ReST (.rst) format.
      sphinx generates HTML files from these files.
    • examples/ which includes all examples. This are written as a Python (.py)
      file, but all comments inside them are in ReST format. sphincs-gallery
      generates .rst files from these, which are then processed by sphinx to
      generate HTML files.
    • The API docs are generated from the docstrings written inside the codebase itself.
  • Implementation: the main part of the codebase is under the sklearn/ directory.
    Most submodules have their own folders there. The best way to find the implementaiton
    of a class is to look for class ClassName in the codebase.

  • Tests: each module has its own set of tests located under a tests folder in the same
    directory as the module. We use pytest to write and run the tests.

  • CI: there are different folders in the repository such as .github or .circleci
    which define our CI pipelines on different platforms. We use CircleCI to generate
    our documentation and Azure and Github actions to run our tests.

  • Benchmarks: asv_benchmarks/ includes a set of benchmarks, which are not touched
    very often and can be used to benchmark certain changes in the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants