feat: introduced initial parser version #33

ovflowd · 2024-07-06T14:27:27Z

Closes #2

This PR introduces the initial codebase for the API Doc Parser. It includes the initial codebase for parsing API Doc files into Metadata that our Generators can then ingest.

The main Parser tries to make as few modifications as possible, only making the primary mutations to the tree and generating a horizontal (plain) metadata tree.

Each generator is responsible for consuming the metadata and extra content and doing things with it. For example, the JSON transformer will further iterate these items, transforming extra pieces (such as the method signatures) into more JavaScript objects and the flat list into a Node tree.

The MDX transformer would grab certain pieces of the Metadata and pull them into MDX Components.

The Parser itself is relatively simple.

Review Methodology

Please ensure to review this Pull Request extensively
Feedback regarding best practices and design patterns is more than welcome; some pieces here were done in a hurry
Since I continuously refactored the codebase, some documentation blocks might be with erroneous types/comments. Please feel free to comment, document, or review
Please feel free to check this against numerous different API docs

Testing Locally:

npm i

import createParser from './src/parser.mjs';
import createLoader from './src/loader.mjs';
import { writeFileSync } from 'node:fs';

const { loadFiles } = createLoader();
const { parseApiDocs } = createParser();

const writeResult = result =>
  writeFileSync('api-docs.json', JSON.stringify(result, null, 2), 'utf-8');

const apiDocFiles = loadFiles('../node/doc/api/*.md');

const parsedApiDocs = await parseApiDocs(apiDocFiles);

writeResult(parsedApiDocs);

src/loader.mjs

AugustinMauroy

We should definitely add test, at least for utils.
You can use node test runner or vitest. Node test-runner requires no config

ovflowd · 2024-07-06T17:19:31Z

We should definitely add test, at least for utils. You can use node test runner or vitest. Node test-runner requires no config

This is a draft of the initial implementation. Tests will be added as a follow-up as both the structure and API design of this solution will change.

I don't believe that at this point we should be caring much about tests.

src/constants.mjs

AugustinMauroy · 2024-07-06T17:28:59Z

I don't believe that at this point we should be caring much about tests.

I agree with you. But I think that at least testing the utility functions would be beneficial to laying the groundwork for the project.

flakey5 · 2024-07-06T17:29:22Z

I don't believe that at this point we should be caring much about tests.

+1, can add them when the structure is sorted and we have a good idea of what the api will look like.

Are there any changes that we already know we want to make to the code?

ovflowd · 2024-07-06T18:00:31Z

FYI: Ive updated the logic to support multiple files (through a glob), updated the example too.

ovflowd · 2024-07-06T18:02:02Z

I don't believe that at this point we should be caring much about tests.

I agree with you. But I think that at least testing the utility functions would be beneficial to laying the groundwork for the project.

The thing is that most of the utility functions are deeply tied to things I cannot guarantee; See, our API docs are super extensive, and I would need to dedicate a lot of time beforehand just with the tests of said functions.

I believe @flakey5 wants to take care of those tests; I just hope those functions are well enough documented for him to understand the expectations. Ill try to add more inline code blocks of historical knowledge and more details of said functions later.

ovflowd · 2024-07-06T18:06:40Z

Are there any changes that we already know we want to make to the code?

Yes. On the other issues it is written what we want to iterate. I doubt the parser itself will change much (if any, but it is a MVP after all), now we want to focus on:

Creating the CLI tool (cc @canerakdas)
Creating Unit Tests
Start writing the legacy HTML generator (which hooks into the output of the parser and builds the HTML templates we have today on the nodejs.org/api)
We can use this opportunity for some small cleanup, but ideally we just want to copy the assets statically and the templates and add some extra logic to transform that Markdown syntax into HTML and the extra Metadata into also HTML elements.
Then we can start working on the JSON generator (the missing pieces are the generation of the method signatures (https://github.com/nodejs/node/blob/main/tools/doc/json.mjs#L288, https://github.com/nodejs/node/blob/main/tools/doc/json.mjs#L374) which also we want to cleanup, refactor and simplify;

So we are a bit far from having any meaningful generator ready, but the core logic and generation of metadata is working 100% :) eventually we can move the logic used by the JSON generator for the method signatures to the core parser itself. (whenever we have that ready)

AugustinMauroy

LGMT ! I've tried all kinds of files.

ovflowd · 2024-07-07T12:37:11Z

LGMT ! I've tried all kinds of files.

What is LGMT? Did you mean LGTM?

ovflowd · 2024-07-07T19:12:46Z

FYI @nodejs/web-infra Ive made a few refactors, could y'all please re-review?

AugustinMauroy

I'm not on my laptops but it's look good! I'm sure Claudio, you have made some test.

BTW great and clear doc 👍

ovflowd · 2024-07-07T23:22:16Z

FYI I found some bugs here with the last refactor, going to fix it soon!

ovflowd · 2024-07-08T01:20:03Z

Alrighty, I've updated the code once again and now everything seems clean and working again :)

ovflowd · 2024-07-08T01:20:42Z

Props to @wooorm for the support. Without people like Titus, Node and other projects wouldn't be able to succeed :)

ovflowd · 2024-07-08T03:33:13Z

I made some final adjustments, sorry for all the re-review requests! I should have marked this as a draft 😅

Anyhow! Iterating through all the API doc files (which some of which are humungous) took a whooping:

node test.mjs  9.25s user 0.37s system 156% cpu 6.150 total

I still wonder if this can be further optimized 🤔

src/loader.mjs

src/queries.mjs

src/metadata.mjs

ovflowd · 2024-07-09T22:15:48Z

With the latest commit I was able to simplify the code even more and reduce the overall parsing time around 30%

bmuenzenmeyer

I only really had time for a functional pass - and I am happy to report that this works on a Windows machine too.

I have a question about stability entries within output. Specifically, the experimental 1.X stages are stored in the description (see below). Example: "description": ".2 - Release candidate"

Number of occurrences:

Example: (from this doc entry)

  {
    "api": "module",
    "slug": "module.html#loadurl-context-nextload",
    "updates": [],
    "changes": [
      {
        "version": "v20.6.0",
        "pr-url": "https://github.com/nodejs/node/pull/47999",
        "description": "Add support for `source` with format `commonjs`."
      },
      {
        "version": [
          "v18.6.0",
          "v16.17.0"
        ],
        "pr-url": "https://github.com/nodejs/node/pull/42623",
        "description": "Add support for chaining load hooks. Each hook must either call `nextLoad()` or include a `shortCircuit` property set to `true` in its return."
      }
    ],
    "heading": {
      "text": "`load(url, context, nextLoad)`",
      "type": "module",
      "name": "`load(url, context, nextLoad)`",
      "depth": 4
    },
    "stability": {
      "index": 1,
      "description": ".2 - Release candidate"
    },
    "content": "<omitted>"
  },

I only mention this because it feels like the parser may not be accounting for this type of data - and if it's locked away in the description, it limits our ability to do things with it in the future or would require us to strip out or custom parse things on consumption side. For example, if we wanted to make a chip/tag/label/badge feature that showed the status as release candidate.

If none of this is a concern, since a 1.0 is the same as a 1.1 or 1.2 in terms of stage/color/rendering, then it's no big deal, but we might be losing some parsed metadata here.

ovflowd · 2024-07-10T12:50:26Z

I only mention this because it feels like the parser may not be accounting for this type of data - and if it's locked away in the description, it limits our ability to do things with it in the future or would require us to strip out or custom parse things on consumption side. For example, if we wanted to make a chip/tag/label/badge feature that showed the status as release candidate.

This is the same parsing code that exists on the living version of the API doc tooling for JSON generation. The difference, of course, is that we are stripping away the HTML Nodes and parsing them into a simple string.

What I can do is also use AST Nodes, but only transform them into a string whenever we are generating a JSON (ie, JSON.stringify)

canerakdas

LGTM!, I added a few small comments, we can add them if necessary. Other than that, great work 🚀 🐢

src/metadata.mjs

src/utils/parser.mjs

src/utils/unist.mjs

flakey5

lgtm apart from @canerakdas's comments

ovflowd · 2024-07-11T11:06:41Z

cc @bmuenzenmeyer I added the stability index as raw nodes, and @canerakdas Ive applied the code reviews.

Please review the code again 🙏

canerakdas · 2024-07-11T17:53:58Z

@ovflowd When I compare it with the output of the previous version, the "updates" and "changes" arrays in some objects appear empty. Is this expected? (- 1971 removals, + 215 additions)

(Previous output, Current output)

ovflowd · 2024-07-11T17:56:21Z

I know why, I did a bobo of changing updateProperties to setProperties and forgot within on section there might be t yaml blocks

ovflowd · 2024-07-12T02:09:00Z

cc @canerakdas Ive fixed the issue. We definitely should add tests as a follow-up PR :P

ovflowd · 2024-07-12T16:16:23Z

@bmuenzenmeyer are you good with the latest revision?

bmuenzenmeyer · 2024-07-12T17:37:19Z

I'm on vacation. So please don't wait for me

ovflowd requested a review from a team as a code owner July 6, 2024 14:27

ovflowd changed the title ~~feat: introduced initial parser version (not finished)~~ feat: introduced initial parser version Jul 6, 2024

ovflowd mentioned this pull request Jul 6, 2024

Redesign of the Node.js API Docs nodejs/node#52343

Open

flakey5 approved these changes Jul 6, 2024

View reviewed changes

src/loader.mjs Show resolved Hide resolved

AugustinMauroy reviewed Jul 6, 2024

View reviewed changes

src/constants.mjs Outdated Show resolved Hide resolved

AugustinMauroy approved these changes Jul 6, 2024

View reviewed changes

ovflowd requested review from flakey5 and AugustinMauroy July 7, 2024 19:12

AugustinMauroy approved these changes Jul 7, 2024

View reviewed changes

flakey5 approved these changes Jul 7, 2024

View reviewed changes

ovflowd requested review from flakey5 and a team July 8, 2024 01:22

flakey5 approved these changes Jul 8, 2024

View reviewed changes

canerakdas mentioned this pull request Jul 8, 2024

feat: introduced CLI mode to the parser #36

Merged

2 tasks

canerakdas reviewed Jul 9, 2024

View reviewed changes

src/loader.mjs Outdated Show resolved Hide resolved

src/queries.mjs Show resolved Hide resolved

src/metadata.mjs Outdated Show resolved Hide resolved

feat: introduced initial parser version (not finished)

0885b53

ovflowd force-pushed the feat/initial-doc-tooling-parser branch from 3f19dfa to 19c4a01 Compare July 9, 2024 21:13

ovflowd added 7 commits July 9, 2024 23:13

chore: deep optimization and fix of edge cases

a009ec4

chore: remove console log

a16f029

chore: final cleanup and optimization

d462238

chore: simplification of parser and separation of concerns

edafe6f

chore: updated docs

bf4004e

chore: code review changes

19c4a01

refactor: simplified the parsing of the tree without using a transformer

410770e

ovflowd requested review from canerakdas, AugustinMauroy and flakey5 July 9, 2024 22:15

ovflowd added 4 commits July 10, 2024 00:19

chore: renamed variables for simplicity

1b35124

chore: pass apidoc itself

44f9122

chore: more doc updates

b3ff296

chore: more doc updates

ba96883

bmuenzenmeyer approved these changes Jul 10, 2024

View reviewed changes

canerakdas approved these changes Jul 10, 2024

View reviewed changes

flakey5 approved these changes Jul 10, 2024

View reviewed changes

refactor: applied cdoe reviews and stability index as Parent

77f9884

fix: partial metadata definition

be1927f

canerakdas approved these changes Jul 12, 2024

View reviewed changes

ovflowd merged commit 020b4ae into main Jul 12, 2024
6 checks passed

ovflowd deleted the feat/initial-doc-tooling-parser branch July 12, 2024 21:59

feat: introduced initial parser version #33

feat: introduced initial parser version #33

Uh oh!

Conversation

ovflowd commented Jul 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Methodology

Testing Locally:

Uh oh!

Uh oh!

AugustinMauroy left a comment

Choose a reason for hiding this comment

Uh oh!

ovflowd commented Jul 6, 2024

Uh oh!

Uh oh!

AugustinMauroy commented Jul 6, 2024

Uh oh!

flakey5 commented Jul 6, 2024

Uh oh!

ovflowd commented Jul 6, 2024

Uh oh!

ovflowd commented Jul 6, 2024

Uh oh!

ovflowd commented Jul 6, 2024

Uh oh!

AugustinMauroy left a comment

Choose a reason for hiding this comment

Uh oh!

ovflowd commented Jul 7, 2024

Uh oh!

ovflowd commented Jul 7, 2024

Uh oh!

AugustinMauroy left a comment

Choose a reason for hiding this comment

Uh oh!

ovflowd commented Jul 7, 2024

Uh oh!

ovflowd commented Jul 8, 2024

Uh oh!

ovflowd commented Jul 8, 2024

Uh oh!

ovflowd commented Jul 8, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ovflowd commented Jul 9, 2024

Uh oh!

bmuenzenmeyer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ovflowd commented Jul 10, 2024

Uh oh!

canerakdas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flakey5 left a comment

Choose a reason for hiding this comment

Uh oh!

ovflowd commented Jul 11, 2024

Uh oh!

canerakdas commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ovflowd commented Jul 11, 2024

Uh oh!

ovflowd commented Jul 12, 2024

Uh oh!

ovflowd commented Jul 12, 2024

Uh oh!

bmuenzenmeyer commented Jul 12, 2024

Uh oh!

ovflowd commented Jul 6, 2024 •

edited

Loading

bmuenzenmeyer left a comment •

edited

Loading

canerakdas commented Jul 11, 2024 •

edited

Loading