Skip to content

feat: introduced initial parser version #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 12, 2024

Conversation

ovflowd
Copy link
Member

@ovflowd ovflowd commented Jul 6, 2024

Closes #2

This PR introduces the initial codebase for the API Doc Parser. It includes the initial codebase for parsing API Doc files into Metadata that our Generators can then ingest.

The main Parser tries to make as few modifications as possible, only making the primary mutations to the tree and generating a horizontal (plain) metadata tree.

Each generator is responsible for consuming the metadata and extra content and doing things with it. For example, the JSON transformer will further iterate these items, transforming extra pieces (such as the method signatures) into more JavaScript objects and the flat list into a Node tree.

The MDX transformer would grab certain pieces of the Metadata and pull them into MDX Components.

The Parser itself is relatively simple.

Review Methodology

  • Please ensure to review this Pull Request extensively
  • Feedback regarding best practices and design patterns is more than welcome; some pieces here were done in a hurry
  • Since I continuously refactored the codebase, some documentation blocks might be with erroneous types/comments. Please feel free to comment, document, or review
  • Please feel free to check this against numerous different API docs

Testing Locally:

npm i
import createParser from './src/parser.mjs';
import createLoader from './src/loader.mjs';
import { writeFileSync } from 'node:fs';

const { loadFiles } = createLoader();
const { parseApiDocs } = createParser();

const writeResult = result =>
  writeFileSync('api-docs.json', JSON.stringify(result, null, 2), 'utf-8');

const apiDocFiles = loadFiles('../node/doc/api/*.md');

const parsedApiDocs = await parseApiDocs(apiDocFiles);

writeResult(parsedApiDocs);

@ovflowd ovflowd requested a review from a team as a code owner July 6, 2024 14:27
@ovflowd ovflowd changed the title feat: introduced initial parser version (not finished) feat: introduced initial parser version Jul 6, 2024
Copy link
Member

@AugustinMauroy AugustinMauroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely add test, at least for utils.
You can use node test runner or vitest. Node test-runner requires no config

@ovflowd
Copy link
Member Author

ovflowd commented Jul 6, 2024

We should definitely add test, at least for utils. You can use node test runner or vitest. Node test-runner requires no config

This is a draft of the initial implementation. Tests will be added as a follow-up as both the structure and API design of this solution will change.

I don't believe that at this point we should be caring much about tests.

@AugustinMauroy
Copy link
Member

I don't believe that at this point we should be caring much about tests.

I agree with you. But I think that at least testing the utility functions would be beneficial to laying the groundwork for the project.

@flakey5
Copy link
Member

flakey5 commented Jul 6, 2024

I don't believe that at this point we should be caring much about tests.

+1, can add them when the structure is sorted and we have a good idea of what the api will look like.

Are there any changes that we already know we want to make to the code?

@ovflowd
Copy link
Member Author

ovflowd commented Jul 6, 2024

FYI: Ive updated the logic to support multiple files (through a glob), updated the example too.

@ovflowd
Copy link
Member Author

ovflowd commented Jul 6, 2024

I don't believe that at this point we should be caring much about tests.

I agree with you. But I think that at least testing the utility functions would be beneficial to laying the groundwork for the project.

The thing is that most of the utility functions are deeply tied to things I cannot guarantee; See, our API docs are super extensive, and I would need to dedicate a lot of time beforehand just with the tests of said functions.

I believe @flakey5 wants to take care of those tests; I just hope those functions are well enough documented for him to understand the expectations. Ill try to add more inline code blocks of historical knowledge and more details of said functions later.

@ovflowd
Copy link
Member Author

ovflowd commented Jul 6, 2024

Are there any changes that we already know we want to make to the code?

Yes. On the other issues it is written what we want to iterate. I doubt the parser itself will change much (if any, but it is a MVP after all), now we want to focus on:

  • Creating the CLI tool (cc @canerakdas)
  • Creating Unit Tests
  • Start writing the legacy HTML generator (which hooks into the output of the parser and builds the HTML templates we have today on the nodejs.org/api)
  • We can use this opportunity for some small cleanup, but ideally we just want to copy the assets statically and the templates and add some extra logic to transform that Markdown syntax into HTML and the extra Metadata into also HTML elements.
  • Then we can start working on the JSON generator (the missing pieces are the generation of the method signatures (https://github.com/nodejs/node/blob/main/tools/doc/json.mjs#L288, https://github.com/nodejs/node/blob/main/tools/doc/json.mjs#L374) which also we want to cleanup, refactor and simplify;

So we are a bit far from having any meaningful generator ready, but the core logic and generation of metadata is working 100% :) eventually we can move the logic used by the JSON generator for the method signatures to the core parser itself. (whenever we have that ready)

Copy link
Member

@AugustinMauroy AugustinMauroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT ! I've tried all kinds of files.

@ovflowd
Copy link
Member Author

ovflowd commented Jul 7, 2024

LGMT ! I've tried all kinds of files.

What is LGMT? Did you mean LGTM?

@ovflowd ovflowd requested review from flakey5 and AugustinMauroy July 7, 2024 19:12
@ovflowd
Copy link
Member Author

ovflowd commented Jul 7, 2024

FYI @nodejs/web-infra Ive made a few refactors, could y'all please re-review?

Copy link
Member

@AugustinMauroy AugustinMauroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not on my laptops but it's look good! I'm sure Claudio, you have made some test.

BTW great and clear doc 👍

@ovflowd
Copy link
Member Author

ovflowd commented Jul 7, 2024

FYI I found some bugs here with the last refactor, going to fix it soon!

@ovflowd
Copy link
Member Author

ovflowd commented Jul 8, 2024

Alrighty, I've updated the code once again and now everything seems clean and working again :)

@ovflowd
Copy link
Member Author

ovflowd commented Jul 8, 2024

Props to @wooorm for the support. Without people like Titus, Node and other projects wouldn't be able to succeed :)

@ovflowd ovflowd requested review from flakey5 and a team July 8, 2024 01:22
@ovflowd
Copy link
Member Author

ovflowd commented Jul 8, 2024

I made some final adjustments, sorry for all the re-review requests! I should have marked this as a draft 😅

Anyhow! Iterating through all the API doc files (which some of which are humungous) took a whooping:

node test.mjs  9.25s user 0.37s system 156% cpu 6.150 total

I still wonder if this can be further optimized 🤔

@ovflowd ovflowd force-pushed the feat/initial-doc-tooling-parser branch from 3f19dfa to 19c4a01 Compare July 9, 2024 21:13
@ovflowd
Copy link
Member Author

ovflowd commented Jul 9, 2024

With the latest commit I was able to simplify the code even more and reduce the overall parsing time around 30%

Copy link
Collaborator

@bmuenzenmeyer bmuenzenmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only really had time for a functional pass - and I am happy to report that this works on a Windows machine too.

I have a question about stability entries within output. Specifically, the experimental 1.X stages are stored in the description (see below). Example: "description": ".2 - Release candidate"

image

Number of occurrences:

image

Example: (from this doc entry)

  {
    "api": "module",
    "slug": "module.html#loadurl-context-nextload",
    "updates": [],
    "changes": [
      {
        "version": "v20.6.0",
        "pr-url": "https://github.com/nodejs/node/pull/47999",
        "description": "Add support for `source` with format `commonjs`."
      },
      {
        "version": [
          "v18.6.0",
          "v16.17.0"
        ],
        "pr-url": "https://github.com/nodejs/node/pull/42623",
        "description": "Add support for chaining load hooks. Each hook must either call `nextLoad()` or include a `shortCircuit` property set to `true` in its return."
      }
    ],
    "heading": {
      "text": "`load(url, context, nextLoad)`",
      "type": "module",
      "name": "`load(url, context, nextLoad)`",
      "depth": 4
    },
    "stability": {
      "index": 1,
      "description": ".2 - Release candidate"
    },
    "content": "<omitted>"
  },

I only mention this because it feels like the parser may not be accounting for this type of data - and if it's locked away in the description, it limits our ability to do things with it in the future or would require us to strip out or custom parse things on consumption side. For example, if we wanted to make a chip/tag/label/badge feature that showed the status as release candidate.

If none of this is a concern, since a 1.0 is the same as a 1.1 or 1.2 in terms of stage/color/rendering, then it's no big deal, but we might be losing some parsed metadata here.

@ovflowd
Copy link
Member Author

ovflowd commented Jul 10, 2024

I only mention this because it feels like the parser may not be accounting for this type of data - and if it's locked away in the description, it limits our ability to do things with it in the future or would require us to strip out or custom parse things on consumption side. For example, if we wanted to make a chip/tag/label/badge feature that showed the status as release candidate.

This is the same parsing code that exists on the living version of the API doc tooling for JSON generation. The difference, of course, is that we are stripping away the HTML Nodes and parsing them into a simple string.

What I can do is also use AST Nodes, but only transform them into a string whenever we are generating a JSON (ie, JSON.stringify)

Copy link
Member

@canerakdas canerakdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!, I added a few small comments, we can add them if necessary. Other than that, great work 🚀 🐢

Copy link
Member

@flakey5 flakey5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm apart from @canerakdas's comments

@ovflowd
Copy link
Member Author

ovflowd commented Jul 11, 2024

cc @bmuenzenmeyer I added the stability index as raw nodes, and @canerakdas Ive applied the code reviews.

Please review the code again 🙏

@canerakdas
Copy link
Member

canerakdas commented Jul 11, 2024

@ovflowd When I compare it with the output of the previous version, the "updates" and "changes" arrays in some objects appear empty. Is this expected? (- 1971 removals, + 215 additions)

image (Previous output, Current output)

@ovflowd
Copy link
Member Author

ovflowd commented Jul 11, 2024

I know why, I did a bobo of changing updateProperties to setProperties and forgot within on section there might be t yaml blocks

@ovflowd
Copy link
Member Author

ovflowd commented Jul 12, 2024

cc @canerakdas Ive fixed the issue. We definitely should add tests as a follow-up PR :P

@ovflowd
Copy link
Member Author

ovflowd commented Jul 12, 2024

@bmuenzenmeyer are you good with the latest revision?

@bmuenzenmeyer
Copy link
Collaborator

I'm on vacation. So please don't wait for me

@ovflowd ovflowd merged commit 020b4ae into main Jul 12, 2024
6 checks passed
@ovflowd ovflowd deleted the feat/initial-doc-tooling-parser branch July 12, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API Docs parser to the repository
5 participants