feat(mdxish): add legacy variable tokenizer#1339
Conversation
There was a problem hiding this comment.
Some type adjustments I needed to make after adding the Glossary node type introduced some TypeErrors elsewhere
|
I'm not sure why there's failing test, looks like it won't run the build for some reason.. I've verified all tests pass locally though |
There was a problem hiding this comment.
Adding the tokenizer to the table cell magic block slowed parsing down a bit, had to increase this to make it pass
| const EXCLUDED_TAGS = new Set(['HTMLBlock', 'Table', 'Glossary', 'Anchor']); | ||
|
|
||
| const inlineMdProcessor = unified().use(remarkParse); | ||
| const inlineMdProcessor = unified() |
There was a problem hiding this comment.
For parsing legacy variables inside MDX components like <Comp>Hello <<name>></Comp>
| value: string; | ||
| } | ||
|
|
||
| interface Glossary extends Node { |
There was a problem hiding this comment.
Creating a glossary node so we can represent it in mdast
There was a problem hiding this comment.
Nice start here @eaglethrost; thanks for picking this up! Working well in the baseline scenarios, but still seeing some discrepancies with these <<legacy_vars>> when used in a code context, or when trying to escaping said var per the following screenshot/example Markdown: 
- **Old**: <<email>>
- **New**: {user.email}
- ***Escaped***
- **Old**: \<<email>>
- **New**: \{user.email}
- ***Inline Code***
- **Old**: `<<email>>`
- **New**: `{user.email}`
- ***Code Block***
- **Old**:
```
<<email>>
```
- **New**:
```
{user.email}
```
(Also just noticing that it seems like the new {user.var} syntax isn't being escaped properly either, but that's an issue for a different PRβ¦)
Oh I didnβt think weβd want to resolve variables in codes, I thought codes should be untouched. But will look into it π |
commit e657a21 Author: eagletrhost <dimazanugrah12@gmail.com> Date: Mon Feb 16 19:51:12 2026 +1100 refactor: clean code & comment commit c28853d Author: eagletrhost <dimazanugrah12@gmail.com> Date: Mon Feb 16 19:40:18 2026 +1100 test: fix legacy commit 7574b19 Author: eagletrhost <dimazanugrah12@gmail.com> Date: Mon Feb 16 18:36:04 2026 +1100 feat: glossary adjustments commit 6e92df0 Author: eagletrhost <dimazanugrah12@gmail.com> Date: Mon Feb 16 18:13:02 2026 +1100 feat: parse legacy vars in codes & api header block
|
Update: Now it resolves legacy variables in code blocks, though it ended up being more complicated than expected. I still had to resort to some kind of pre-processing the legacy vars in code blocks because the tokenizer doesn't operate on code nodes. Looking to see if there's cleaner way, but it works now. Also extended to match glossary resolution behaviour in legacy Another limitation I'm working on is correctly parsing legacy variables with spaces which doesn't work yet now. Demo: |
|
Another update: So the current state works in tokenising legacy variables to variable nodes IF they're not in code blocks / inline code. I've reverted my preprocessing function to convert legacy vars to MDX vars because neither legacy touches vars in code blocks, and it still won't really work for variables with spaces / special chars. It's also not an option to tokenize code content because it's important to keep the code string intact for the Code component syntax-highlighter to work & not parse it. Hence, as of now legacy vars in codes NO LONGER get resolved. After doing a bit of digging, I found that in legacy, they get resolved in the CodeMirror syntax-highlighter (see the Code component). So I think the best path forward is to extend the syntax-highlighter package to be mdxish aware, and allow it to also parse legacy variables for mdxish (currently it only parses MDX style user variables in MDXish). This way, we can keep the legacy variable syntax in code blocks and have it still get resolved in rendering, and follows how legacy handles it. I've made a PR for that here: readmeio/syntax-highlighter#608 Important @kevinports |
Ok this makes sense. But I do want to push back a little on whether the syntax highlighter is the best place to do this. If you run this example: https://non-git.readme.io/docs/variables you'll see the variable resolution doesn't happen until after the React app mounts (because I believe the syntax highlighting is client side only). So you see a flash of unresolved syntax from the SSR: Screen.Cast.2026-02-18.at.12.48.47.PM.mp4That UX kind of blows right? Is there any way we can do this engine side? or at the very least on SSR?
I am fine with moving the work to resolve vars within code blocks as a fast follow. |
Yeah we definitely can, we just need to pass in the variables list to the engine for resolution. It will be straightforward to do and we just need to extend the function arguments to accept the variables So to summarize, we want to resolve legacy AND mdx variables in codes on the engine side? To do this I think we can just add transformer that visit code nodes and use regex to resolve the variables / extend the variable transformer we have now. Do note though that I think that way of resolving vars in code is different from legacy & mdx, but I guess it would be an improvement.
If you're happier to move this work in a follow up PR, then this PR is basically done! Let me know if you're happy with my plan above and if it's better to create a follow up. (@kevinports) |
kevinports
left a comment
There was a problem hiding this comment.
Lgtm.
That UX kind of blows right? Is there any way we can do this engine side? or at the very least on SSR?
Yeah we definitely can, we just need to pass in the variables list to the engine for resolution. It will be straightforward to do and we just need to extend the function arguments to accept them the variables
So to summarize, we want to resolve legacy AND mdx variables in codes on the engine side? To do this I think we can just add a transformer that visit code nodes and use regex to resolve the variables. Do note though that I think way of resolving vars in code is different from legacy & mdx, but I guess it would be an improvement.
I am fine with moving the work to resolve vars within code blocks as a fast follow
If you're happier to move this work in a follow up PR, then this PR is basically done! Let me know if you're happy with my plan above and if it's better to create a follow up.
Sounds good to me. I think the improvement is worth it while we're here.
[![PR App][icn]][demo] | Fix RM-XYZ :-------------------:|:----------: ## 𧰠Changes As a follow up of #1339, we want to resolve variables in inline code & code blocks, and the tokenizer couldn't do that. This PR adds an additional argument to the engine for the project variables, and a transformer to visit code nodes & use regexes to resolve legacy & MDX variables to their value. ## 𧬠QA & Testing The variables in this should get resolved, test with code blocks ``` `<<name>> {user.name}` // Remove the \, basically make it a code block \``` My name is <<name>> My other name is {user.name} \``` [block:code] { "codes": [ { "code": "My name is <<name>> and {user.name}", "language": "js" } ] } [/block] ``` - [Broken on production][prod]. - [Working in this PR app][demo]. [demo]: https://markdown-pr-PR_NUMBER.herokuapp.com [prod]: https://SUBDOMAIN.readme.io [icn]: https://user-images.githubusercontent.com/886627/160426047-1bee9488-305a-4145-bb2b-09d8b757d38a.svg
## Version 13.3.0 ### β¨ New & Improved * **mdxish:** add legacy variable tokenizer ([#1339](#1339)) ([8e8b11b](8e8b11b)) * add option to perserve variable syntax in plain text compiler ([#1345](#1345)) ([5ab350e](5ab350e)) * **mdxish:** resolve variables in code blocks ([#1350](#1350)) ([a6460f8](a6460f8)) * **mdxish:** use variable name for heading slug generation ([#1340](#1340)) ([61a97d3](61a97d3)) <!--SKIP CI-->
This PR was released!π Changes included in v13.3.0 |


π§° Changes
Adds a legacy variable <<>> micromark tokenizer so that MDXish can parse it to variable nodes. It follows the pattern we have for magic block parsing. This improves the engine architecture and removes the need for the legacy variable preprocessing we're doing in the frontend in the readme repo.
The tokenizer also supports parsing legacy glossary, which it converts to a glossary node (created here), which will then be converted to Glossary component.
𧬠QA & Testing
opts.mdxishpart