-
Notifications
You must be signed in to change notification settings - Fork 7
feat: introduced initial parser version #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely add test, at least for utils.
You can use node test runner or vitest. Node test-runner requires no config
This is a draft of the initial implementation. Tests will be added as a follow-up as both the structure and API design of this solution will change. I don't believe that at this point we should be caring much about tests. |
I agree with you. But I think that at least testing the utility functions would be beneficial to laying the groundwork for the project. |
+1, can add them when the structure is sorted and we have a good idea of what the api will look like. Are there any changes that we already know we want to make to the code? |
FYI: Ive updated the logic to support multiple files (through a glob), updated the example too. |
The thing is that most of the utility functions are deeply tied to things I cannot guarantee; See, our API docs are super extensive, and I would need to dedicate a lot of time beforehand just with the tests of said functions. I believe @flakey5 wants to take care of those tests; I just hope those functions are well enough documented for him to understand the expectations. Ill try to add more inline code blocks of historical knowledge and more details of said functions later. |
Yes. On the other issues it is written what we want to iterate. I doubt the parser itself will change much (if any, but it is a MVP after all), now we want to focus on:
So we are a bit far from having any meaningful generator ready, but the core logic and generation of metadata is working 100% :) eventually we can move the logic used by the JSON generator for the method signatures to the core parser itself. (whenever we have that ready) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT ! I've tried all kinds of files.
What is LGMT? Did you mean LGTM? |
FYI @nodejs/web-infra Ive made a few refactors, could y'all please re-review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not on my laptops but it's look good! I'm sure Claudio, you have made some test.
BTW great and clear doc 👍
FYI I found some bugs here with the last refactor, going to fix it soon! |
Alrighty, I've updated the code once again and now everything seems clean and working again :) |
Props to @wooorm for the support. Without people like Titus, Node and other projects wouldn't be able to succeed :) |
I made some final adjustments, sorry for all the re-review requests! I should have marked this as a draft 😅 Anyhow! Iterating through all the API doc files (which some of which are humungous) took a whooping:
I still wonder if this can be further optimized 🤔 |
3f19dfa
to
19c4a01
Compare
With the latest commit I was able to simplify the code even more and reduce the overall parsing time around 30% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only really had time for a functional pass - and I am happy to report that this works on a Windows machine too.
I have a question about stability
entries within output. Specifically, the experimental 1.X stages
are stored in the description
(see below). Example: "description": ".2 - Release candidate"
Number of occurrences:
Example: (from this doc entry)
{
"api": "module",
"slug": "module.html#loadurl-context-nextload",
"updates": [],
"changes": [
{
"version": "v20.6.0",
"pr-url": "https://github.com/nodejs/node/pull/47999",
"description": "Add support for `source` with format `commonjs`."
},
{
"version": [
"v18.6.0",
"v16.17.0"
],
"pr-url": "https://github.com/nodejs/node/pull/42623",
"description": "Add support for chaining load hooks. Each hook must either call `nextLoad()` or include a `shortCircuit` property set to `true` in its return."
}
],
"heading": {
"text": "`load(url, context, nextLoad)`",
"type": "module",
"name": "`load(url, context, nextLoad)`",
"depth": 4
},
"stability": {
"index": 1,
"description": ".2 - Release candidate"
},
"content": "<omitted>"
},
I only mention this because it feels like the parser may not be accounting for this type of data - and if it's locked away in the description
, it limits our ability to do things with it in the future or would require us to strip out or custom parse things on consumption side. For example, if we wanted to make a chip/tag/label/badge feature that showed the status as release candidate
.
If none of this is a concern, since a 1.0
is the same as a 1.1
or 1.2
in terms of stage/color/rendering, then it's no big deal, but we might be losing some parsed metadata here.
This is the same parsing code that exists on the living version of the API doc tooling for JSON generation. The difference, of course, is that we are stripping away the HTML Nodes and parsing them into a simple string. What I can do is also use AST Nodes, but only transform them into a string whenever we are generating a JSON (ie, JSON.stringify) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!, I added a few small comments, we can add them if necessary. Other than that, great work 🚀 🐢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm apart from @canerakdas's comments
cc @bmuenzenmeyer I added the stability index as raw nodes, and @canerakdas Ive applied the code reviews. Please review the code again 🙏 |
@ovflowd When I compare it with the output of the previous version, the "updates" and "changes" arrays in some objects appear empty. Is this expected? ( ![]() |
I know why, I did a bobo of changing updateProperties to setProperties and forgot within on section there might be t yaml blocks |
cc @canerakdas Ive fixed the issue. We definitely should add tests as a follow-up PR :P |
@bmuenzenmeyer are you good with the latest revision? |
I'm on vacation. So please don't wait for me |
Closes #2
This PR introduces the initial codebase for the API Doc Parser. It includes the initial codebase for parsing API Doc files into Metadata that our Generators can then ingest.
The main Parser tries to make as few modifications as possible, only making the primary mutations to the tree and generating a horizontal (plain) metadata tree.
Each generator is responsible for consuming the metadata and extra content and doing things with it. For example, the JSON transformer will further iterate these items, transforming extra pieces (such as the method signatures) into more JavaScript objects and the flat list into a Node tree.
The MDX transformer would grab certain pieces of the Metadata and pull them into MDX Components.
The Parser itself is relatively simple.
Review Methodology
Testing Locally: