Skip to content

feat(mdxish): add legacy variable tokenizer#1339

Merged
eaglethrost merged 23 commits intonextfrom
dimas/RM-15239-parse-legacy-vars-in-mdxish
Feb 19, 2026
Merged

feat(mdxish): add legacy variable tokenizer#1339
eaglethrost merged 23 commits intonextfrom
dimas/RM-15239-parse-legacy-vars-in-mdxish

Conversation

@eaglethrost
Copy link
Contributor

@eaglethrost eaglethrost commented Feb 13, 2026

PR App Fix RM-15239

🧰 Changes

Adds a legacy variable <<>> micromark tokenizer so that MDXish can parse it to variable nodes. It follows the pattern we have for magic block parsing. This improves the engine architecture and removes the need for the legacy variable preprocessing we're doing in the frontend in the readme repo.

The tokenizer also supports parsing legacy glossary, which it converts to a glossary node (created here), which will then be converted to Glossary component.

🧬 QA & Testing

  1. You can test quicker in the markdown playground. Turn on the mdxish flag and type <> syntax in normal and weird cases. The <> syntax should get resolved to VARIABLE. I'm not sure why it's not resolved to a value, maybe because we haven't added the value context, but if it turns to the capitalised version and strips away the <<>>, the tokenizer has worked (because it wasn't doing that before).
  2. If you want to test in the readme app, make sure to prevent the legacy variable preprocessing to happen on MDXish RDMD here by removing the opts.mdxish part

@eaglethrost eaglethrost marked this pull request as draft February 13, 2026 08:54
@eaglethrost eaglethrost changed the title fear(mdxish): add legacy variable tokenizer feat(mdxish): add legacy variable tokenizer Feb 13, 2026
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some type adjustments I needed to make after adding the Glossary node type introduced some TypeErrors elsewhere

@eaglethrost
Copy link
Contributor Author

I'm not sure why there's failing test, looks like it won't run the build for some reason.. I've verified all tests pass locally though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the tokenizer to the table cell magic block slowed parsing down a bit, had to increase this to make it pass

const EXCLUDED_TAGS = new Set(['HTMLBlock', 'Table', 'Glossary', 'Anchor']);

const inlineMdProcessor = unified().use(remarkParse);
const inlineMdProcessor = unified()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For parsing legacy variables inside MDX components like <Comp>Hello <<name>></Comp>

value: string;
}

interface Glossary extends Node {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a glossary node so we can represent it in mdast

@eaglethrost eaglethrost marked this pull request as ready for review February 13, 2026 15:14
@eaglethrost eaglethrost requested review from kevinports and maximilianfalco and removed request for maximilianfalco February 13, 2026 15:14
Copy link
Contributor

@rafegoldberg rafegoldberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start here @eaglethrost; thanks for picking this up! Working well in the baseline scenarios, but still seeing some discrepancies with these <<legacy_vars>> when used in a code context, or when trying to escaping said var per the following screenshot/example Markdown:

- **Old**: <<email>>
- **New**: {user.email}
- ***Escaped***
  - **Old**: \<<email>>
  - **New**: \{user.email}
- ***Inline Code***
  - **Old**: `<<email>>`
  - **New**: `{user.email}`
- ***Code Block***
  - **Old**:
    ```
    <<email>>
    ```
  - **New**:
    ```
    {user.email}
    ```

(Also just noticing that it seems like the new {user.var} syntax isn't being escaped properly either, but that's an issue for a different PR…)

@eaglethrost
Copy link
Contributor Author

Nice start here @eaglethrost; thanks for picking this up! Working well in the baseline scenarios, but still seeing some discrepancies with these <<legacy_vars>> when used in a code context, or when trying to escaping said var per the following screenshot/example Markdown:

- **Old**: <<email>>
- **New**: {user.email}
- ***Escaped***
  - **Old**: \<<email>>
  - **New**: \{user.email}
- ***Inline Code***
  - **Old**: `<<email>>`
  - **New**: `{user.email}`
- ***Code Block***
  - **Old**:
    ```
    <<email>>
    ```
  - **New**:
    ```
    {user.email}
    ```

(Also just noticing that it seems like the new {user.var} syntax isn't being escaped properly either, but that's an issue for a different PR…)

Oh I didn’t think we’d want to resolve variables in codes, I thought codes should be untouched. But will look into it πŸ™

commit e657a21
Author: eagletrhost <dimazanugrah12@gmail.com>
Date:   Mon Feb 16 19:51:12 2026 +1100

    refactor: clean code & comment

commit c28853d
Author: eagletrhost <dimazanugrah12@gmail.com>
Date:   Mon Feb 16 19:40:18 2026 +1100

    test: fix legacy

commit 7574b19
Author: eagletrhost <dimazanugrah12@gmail.com>
Date:   Mon Feb 16 18:36:04 2026 +1100

    feat: glossary adjustments

commit 6e92df0
Author: eagletrhost <dimazanugrah12@gmail.com>
Date:   Mon Feb 16 18:13:02 2026 +1100

    feat: parse legacy vars in codes & api header block
@eaglethrost
Copy link
Contributor Author

Update: Now it resolves legacy variables in code blocks, though it ended up being more complicated than expected. I still had to resort to some kind of pre-processing the legacy vars in code blocks because the tokenizer doesn't operate on code nodes. Looking to see if there's cleaner way, but it works now. Also extended to match glossary resolution behaviour in legacy

Another limitation I'm working on is correctly parsing legacy variables with spaces which doesn't work yet now.

Demo:
https://github.com/user-attachments/assets/94339567-a1b8-4485-8c87-d2d300146ad6

@eaglethrost
Copy link
Contributor Author

eaglethrost commented Feb 17, 2026

Another update: So the current state works in tokenising legacy variables to variable nodes IF they're not in code blocks / inline code. I've reverted my preprocessing function to convert legacy vars to MDX vars because neither legacy touches vars in code blocks, and it still won't really work for variables with spaces / special chars. It's also not an option to tokenize code content because it's important to keep the code string intact for the Code component syntax-highlighter to work & not parse it.

Hence, as of now legacy vars in codes NO LONGER get resolved. After doing a bit of digging, I found that in legacy, they get resolved in the CodeMirror syntax-highlighter (see the Code component). So I think the best path forward is to extend the syntax-highlighter package to be mdxish aware, and allow it to also parse legacy variables for mdxish (currently it only parses MDX style user variables in MDXish). This way, we can keep the legacy variable syntax in code blocks and have it still get resolved in rendering, and follows how legacy handles it.

I've made a PR for that here: readmeio/syntax-highlighter#608

Important @kevinports
So to get this tokenizer full parity with legacy, that PR is blocking this. However, it's not really needed to resolve this akamai ticket which relies on this PR, because it doesn't care about code blocks. If we need to get it resolved asap then we can do a fast follow to add back the code block resolution

@kevinports
Copy link
Contributor

kevinports commented Feb 18, 2026

Hence, as of now legacy vars in codes NO LONGER get resolved. After doing a bit of digging, I found that in legacy, they get resolved in the CodeMirror syntax-highlighter (see the Code component). So I think the best path forward is to extend the syntax-highlighter package to be mdxish aware, and allow it to also parse legacy variables for mdxish (currently it only parses MDX style user variables in MDXish). This way, we can keep the legacy variable syntax in code blocks and have it still get resolved in rendering, and follows how legacy handles it.

Ok this makes sense. But I do want to push back a little on whether the syntax highlighter is the best place to do this. If you run this example: https://non-git.readme.io/docs/variables you'll see the variable resolution doesn't happen until after the React app mounts (because I believe the syntax highlighting is client side only). So you see a flash of unresolved syntax from the SSR:

Screen.Cast.2026-02-18.at.12.48.47.PM.mp4

That UX kind of blows right? Is there any way we can do this engine side? or at the very least on SSR?

So to get this tokenizer full parity with legacy, that PR is blocking this. However, it's not really needed to resolve this akamai ticket which relies on this PR, because it doesn't care about code blocks. If we need to get it resolved asap then we can do a fast follow to add back the code block resolution

I am fine with moving the work to resolve vars within code blocks as a fast follow.

@eaglethrost
Copy link
Contributor Author

eaglethrost commented Feb 18, 2026

That UX kind of blows right? Is there any way we can do this engine side? or at the very least on SSR?

Yeah we definitely can, we just need to pass in the variables list to the engine for resolution. It will be straightforward to do and we just need to extend the function arguments to accept the variables

So to summarize, we want to resolve legacy AND mdx variables in codes on the engine side? To do this I think we can just add transformer that visit code nodes and use regex to resolve the variables / extend the variable transformer we have now. Do note though that I think that way of resolving vars in code is different from legacy & mdx, but I guess it would be an improvement.

I am fine with moving the work to resolve vars within code blocks as a fast follow

If you're happier to move this work in a follow up PR, then this PR is basically done! Let me know if you're happy with my plan above and if it's better to create a follow up. (@kevinports)

Copy link
Contributor

@kevinports kevinports left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm.

That UX kind of blows right? Is there any way we can do this engine side? or at the very least on SSR?

Yeah we definitely can, we just need to pass in the variables list to the engine for resolution. It will be straightforward to do and we just need to extend the function arguments to accept them the variables

So to summarize, we want to resolve legacy AND mdx variables in codes on the engine side? To do this I think we can just add a transformer that visit code nodes and use regex to resolve the variables. Do note though that I think way of resolving vars in code is different from legacy & mdx, but I guess it would be an improvement.

I am fine with moving the work to resolve vars within code blocks as a fast follow

If you're happier to move this work in a follow up PR, then this PR is basically done! Let me know if you're happy with my plan above and if it's better to create a follow up.

Sounds good to me. I think the improvement is worth it while we're here.

@eaglethrost eaglethrost merged commit 8e8b11b into next Feb 19, 2026
21 of 24 checks passed
@eaglethrost eaglethrost deleted the dimas/RM-15239-parse-legacy-vars-in-mdxish branch February 19, 2026 12:43
kevinports pushed a commit that referenced this pull request Feb 20, 2026
[![PR App][icn]][demo] | Fix RM-XYZ
:-------------------:|:----------:

## 🧰 Changes

As a follow up of #1339, we want to resolve variables in inline code &
code blocks, and the tokenizer couldn't do that. This PR adds an
additional argument to the engine for the project variables, and a
transformer to visit code nodes & use regexes to resolve legacy & MDX
variables to their value.

## 🧬 QA & Testing

The variables in this should get resolved, test with code blocks
```
`<<name>> {user.name}`

// Remove the \, basically make it a code block
\```
My name is <<name>>

My other name is {user.name}
\```

[block:code]
{
  "codes": [
    {
      "code": "My name is <<name>> and {user.name}",
      "language": "js"
    }
  ]
}
[/block]
```

- [Broken on production][prod].
- [Working in this PR app][demo].


[demo]: https://markdown-pr-PR_NUMBER.herokuapp.com
[prod]: https://SUBDOMAIN.readme.io
[icn]:
https://user-images.githubusercontent.com/886627/160426047-1bee9488-305a-4145-bb2b-09d8b757d38a.svg
rafegoldberg pushed a commit that referenced this pull request Feb 20, 2026
## Version 13.3.0
### ✨ New & Improved

* **mdxish:** add legacy variable tokenizer ([#1339](#1339)) ([8e8b11b](8e8b11b))
* add option to perserve variable syntax in plain text compiler ([#1345](#1345)) ([5ab350e](5ab350e))
* **mdxish:** resolve variables in code blocks ([#1350](#1350)) ([a6460f8](a6460f8))
* **mdxish:** use variable name for heading slug generation ([#1340](#1340)) ([61a97d3](61a97d3))

<!--SKIP CI-->
@rafegoldberg
Copy link
Contributor

This PR was released!

πŸš€ Changes included in v13.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants