Write a language's grammar once, as an executable definition. Monogram runs it as a real parser, proves it against the language's official conformance suite, then derives the syntax highlighters — TextMate, tree-sitter, Monarch — from that same proven grammar. Highlighting correctness flows down from a parser-verified model instead of up from hand-tuned regex.
mono + grammar — one grammar definition, many derived artifacts.
Status — an active research project; four languages on one shared, language-agnostic engine, each proven as a parser before its highlighter is trusted:
- TypeScript (
typescript.ts) — mature: 100% valid-code coverage, 97.8% bidirectional vstsc. - JavaScript (
javascript.ts) — the standalone ECMAScript base TypeScript builds on (subset → superset); parses real-world JS, with less conformance-corpus depth than TS so far. - HTML (
html.ts) — the engine reaching past token streams into markup; ~95 lines, validated againstparse5. - Vue (
vue.ts) — a dialect ofhtml.ts: SFC blocks that embed Monogram's own TS/JS/CSS, plus directives and{{ }}interpolation.
Requires Node 24+ (runs .ts directly — no build step, no tsx).
npm install
node src/cli.ts typescript.ts # regenerate every artifact from the grammarimport { createParser } from './src/gen-parser.ts';
import grammar from './typescript.ts';
const { parse } = createParser(grammar);
const cst = parse('const x = f(a, b)'); // → a concrete syntax treeA TextMate grammar is a pile of regexes guessing at a language's structure. It's written by hand, independently of any parser, and perpetually wrong at the edges — VS Code's official TypeScript grammar carries 100+ open issues for exactly this reason. Everyone trying to fix it competes on the same losing axis: who can hand-write better regexes.
Take typeof x < y. A regex highlighter has to guess whether < opens a generic argument list or is a less-than comparison — and it guesses wrong somewhere, forever. A parser doesn't guess; the grammar already decides. Monogram inverts the dependency:
-
Write the grammar, then prove it. The grammar is executable — Monogram runs it as a recursive-descent + Pratt (operator-precedence) parser over the TypeScript conformance suite, measured bidirectionally: it must accept every input
tscaccepts and reject every input it rejects. -
Derive the highlighters from that proven grammar, never hand-write them. The TextMate, tree-sitter, and Monarch outputs are all generated from the one parser-validated definition, so their correctness is underwritten by the conformance run, not by regex tuning.
That single source reaches across grammars, too: an embedded snippet runs another Monogram grammar — a <script> body is highlighted by Monogram's own JavaScript, so <script>const x = 1 < 2</script> colours < as a JS operator, the same ambiguity resolved inside the embed. Where VS Code's embeds fray — two independently-written grammars meeting with nothing checking the seam — Monogram owns both sides, so self-verifying that seam becomes possible (a design goal beyond today's standard contentName injection).
The same question, every language at once: take the bugs reported against each hand-written official grammar and ask whether the derived grammar solves them. Which does only the official solve, which does only Monogram solve — and which do both still get wrong (the shared frontier neither reaches today)?
Each hand-written official grammar vs Monogram's derived one, on the bugs filed against it: TypeScript 25/27 (official 9/27) · TSX 10/11 (official 6/11) · HTML 19/20 (official 13/20) · Vue 19/19 (official 15/19). Per-issue detail below — auto-generated by npm run bench:issues.
| issue | Monogram | official |
|---|---|---|
| #1050 — typeof y < string is a relational operator not generic | ✓ | · |
| #978 — typeof x < string then function | ✓ | · |
| #859 — as cast inside < > comparison | ✓ | · |
| #1020 — new Map<number, number>; (no parens) | ✓ | · |
| #855 — new Map</* comment */string, IArgs>() | ✓ | · |
| #853 — throw /foo/ is regex | ✓ | · |
| #804 — /[a-b]/g char class recognized | ✓ | · |
| #869 — x in obj ? x : fallback ternary works | ✓ | · |
| #770 — function call parens are punctuation | ✓ | · |
| #1021 — regex with the v (unicode-sets) flag is recognized | ✓ | · |
#1025 — for-of without surrounding space keeps of a loop keyword |
✓ | · |
#815 — a class method named new is a method name, not the operator |
✓ | · |
#992 — casting to a type named type does not break highlighting |
✓ | · |
#995 — paren-wrapped as keyof typeof assertion tokenizes |
✓ | · |
#891 — from as an ordinary variable is not a keyword |
✓ | · |
#814 — a instanceof B & c keeps the operand a value, not a type |
✓ | · |
#950 — default import named type — the binding is a variable, not the type keyword |
· | · |
#1058 — import defer should scope defer as a keyword |
· | · |
… and 9 more both grammars already handle (✓ / ✓)
| issue | Monogram | official |
|---|---|---|
| #1063 — /\cJ/ control char escape | ✓ | ✓ |
| #736 — obj.example() method gets entity.name.function | ✓ | ✓ |
| #788 — optional chaining ?. is the optional accessor | ✓ | ✓ |
#881 — override modifier on a method is storage.modifier |
✓ | ✓ |
| #1066 — triple-slash reference directive is a comment | ✓ | ✓ |
| #994 — default type-parameter value is colored | ✓ | ✓ |
#1027 — nested generic >> closes two type-arg lists, not a shift |
✓ | ✓ |
#956 — as const satisfies Foo colors the satisfies keyword and the type |
✓ | ✓ |
#907 — typeof x extends string ? 1 : 2 conditional-type ternary |
✓ | ✓ |
| issue | Monogram | official |
|---|---|---|
#967 — generic arrow with a default type in .tsx |
✓ | · |
#979 — const modifier on a type parameter in .tsx |
✓ | · |
#1042/#990 — default generic arrow function in .tsx |
✓ | · |
| #627 — member-expression JSX tag name | ✓ | · |
#825 — < and tag name on separate lines |
· | · |
… and 6 more both grammars already handle (✓ / ✓)
| issue | Monogram | official |
|---|---|---|
| #1033 — JSX component with a generic type argument | ✓ | ✓ |
#794 — non-null ! then / (division) in a JSX-attribute object |
✓ | ✓ |
#585 — // line comment inside a JSX open tag |
✓ | ✓ |
#754 — JSX element right after a /**/ block comment |
✓ | ✓ |
| #667 — arrow function + ternary inside a JSX attribute | ✓ | ✓ |
| #624 — JSX element in an array after a template-literal attribute | ✓ | ✓ |
| issue | Monogram | official |
|---|---|---|
tmbundle#118 — trailing / in an unquoted URL value |
✓ | · |
tmbundle#108 — nested <svg> is a valid tag, not flagged invalid |
✓ | · |
tmbundle#113 — // in an onclick= JS string read as a comment |
✓ | · |
tmbundle#104 — mixed-case onChange= event handler still reads as JS |
✓ | · |
tmbundle#88 — inline style= value embeds CSS |
✓ | · |
tmbundle#65 — < of </script> is HTML punctuation, not source.js |
✓ | · |
tmbundle#74 — < of </style> is HTML punctuation, not source.css |
✓ | · |
tmbundle#85 — //</script> on its own line still closes the script |
· | ✓ |
… and 12 more both grammars already handle (✓ / ✓)
| issue | Monogram | official |
|---|---|---|
tmbundle#124 — slash in unquoted value foo/ |
✓ | ✓ |
vscode#140360 — / inside an unquoted value (path) |
✓ | ✓ |
tmbundle#84 — tag name a prefix of a sibling (<i>/<input>) |
✓ | ✓ |
| tmbundle#117 — SVG camelCase tag name | ✓ | ✓ |
tmbundle#122 — < inside a quoted attr value |
✓ | ✓ |
tmbundle#115 — > inside a quoted attr value |
✓ | ✓ |
tmbundle#97 — space before > in an end tag |
✓ | ✓ |
tmbundle#81 — character entity & in text |
✓ | ✓ |
tmbundle#102 — <style> element CSS is tokenized, not a flat blob |
✓ | ✓ |
tmbundle#50 — onclick= event-handler value is colored as JS |
✓ | ✓ |
tmbundle#51 — self-closing / is tag punctuation |
✓ | ✓ |
tmbundle#82 — <script type="application/json"> body is not parsed as HTML |
✓ | ✓ |
| issue | Monogram | official |
|---|---|---|
#6007/#2096/#520 — as type assertion in directive value |
✓ | · |
#5660 — as const cast in a v-for value |
✓ | · |
#4716/#5571 — as cast followed by another attribute |
✓ | · |
#4291 — <script lang="tsx"> body is embedded code |
✓ | · |
… and 15 more both grammars already handle (✓ / ✓)
| issue | Monogram | official |
|---|---|---|
#3400 — instanceof in {{ }} |
✓ | ✓ |
#5370 — typeof x !== in v-if |
✓ | ✓ |
#5118 — ?. / ?? in {{ }} |
✓ | ✓ |
#1675 — arrow => in {{ }} |
✓ | ✓ |
#6039/#4741 — < operator in {{ }} (not a tag!) |
✓ | ✓ |
| #5722 — negated ternary + quotes in {{ }} | ✓ | ✓ |
#5538/#2060 — trailing export type before </script> |
✓ | ✓ |
#3999 — multi-line <script> start tag doesn't break the code after it |
✓ | ✓ |
#4769 — tag name starting with template |
✓ | ✓ |
#5701 — {{ inside a <script> string |
✓ | ✓ |
#6070 — capitalized component then a <style> block |
✓ | ✓ |
#4410 — dynamic directive argument :[attr] |
✓ | ✓ |
#3727 — .prop modifier shorthand |
✓ | ✓ |
| #2666 — dynamic slot name from a template literal | ✓ | ✓ |
#2560/#1290 — type as a v-for loop variable |
✓ | ✓ |
A sampled ledger of real tracker issues, not an exhaustive audit. Run npm run bench:issues to regenerate (needs the official grammars: VS Code's installed TS/JS/HTML, and the Vue fixtures — see test/vue-bench.ts). Sources: test/issue-cases.ts, test/html-issue-cases.ts, test/vue-issue-cases.ts.
Deriving from a proven parser wins the disambiguation that is TextMate-expressible but infeasible to hand-write — regex-vs-division, generic-vs-comparison, whitespace-fragile multiline generics — the only-Monogram column. The both-miss cases are ones neither grammar gets today — not, by default, ones TextMate can't.
"TextMate can't express X" is not a guess or an assertion; it is a claim to be proven from the model. TextMate is a line-oriented matcher whose only cross-line memory is a finite stack of scope contexts, so a proof exhibits an X whose correct highlighting provably needs memory that model lacks — unbounded lookback to a token that is not an enclosing context. A failed attempt to derive a pattern is not such a proof: a cleverer pattern may exist, and most "impossible for TextMate" folklore is exactly this error — the multiline / nested-generic cases turn out TM-expressible once a parser supplies the pattern, which is why the derived grammar gets them right. Where a construct provably exceeds the model, Monogram's tree-sitter target — a real parser over the whole tree — resolves it.
From one grammar definition (a small TypeScript combinator API), five outputs are fully functional:
- A lexer — tokenizes source straight from the grammar's token definitions; usable on its own (
createLexer(grammar).tokenize). - A CST parser — recursive descent + Pratt precedence on top of the lexer, producing a CST (concrete syntax tree): every token is a node, including punctuation and keywords — roughly 2× an AST's nodes, by design, which is exactly what the highlighter and lossless source reconstruction need.
- A TextMate grammar — a
.tmLanguage.jsonfor VS Code / Sublime syntax highlighting, derived from the same rules, including derived JSDoc-body and regex-internal sub-grammars. (TextMate scopes are the dot-separated labels —entity.name.function,keyword.control— that a theme maps to colors.) - A VS Code language configuration —
language-configuration.json(comments, bracket pairs, auto-close/surround, folding) derived from the same tokens. - CST node types — a TypeScript discriminated union (keyed by rule) for typed tree consumers.
And — from the same grammar — generators for the rest of the ecosystem, at varying maturity:
- tree-sitter —
grammar.js+ a structuralqueries/highlights.scm+ an external scanner for context-sensitive lexing. tree-sitter's GLR absorbs the grammar and compiles to wasm; the derived query scores 95.9% token-family accuracy against a neutraltscoracle — above the official tree-sitter's 92.7% — and is CI-gated bynpm run gate:treesitter. - Monarch — a Monaco (web) tokenizer (functional, bounded by JS-regex limits).
A grammar is a TypeScript module: tokens, operator precedence, and rules built from small combinators. A self-contained mini-example:
import { token, rule, defineGrammar, left, op, sep } from './src/api.ts';
const Ident = token(/[a-zA-Z_$][a-zA-Z0-9_$]*/, { identifier: true });
const Number = token(/[0-9]+(\.[0-9]+)?/);
const Expr = rule($ => [
Ident,
Number,
[$, op, $], // binary operators (precedence declared below)
[$, '(', sep(Expr, ','), ')'], // call: foo(a, b)
[$, '.', Ident], // member: obj.name
]);
export default defineGrammar({
name: 'mini',
tokens: { Ident, Number },
prec: [ left('+', '-'), left('*', '/') ],
rules: { Expr },
entry: Expr,
});The parser uses these rules to build a CST. The highlighter reads the same rule shapes and infers most scopes structurally — with no per-rule annotation:
foo(x)→fooisentity.name.function(from the$ '(' …call form)obj.name→nameisentity.other.property(from the$ '.' Identform)'class' Ident→Identisentity.name.type(from declaration structure)Expr '<' Type '>' '('→ a generic call, not a comparison (from rule structure)
Flat, irreducible facts — which keywords are control flow, which punctuation is an operator — are declared once in a small scopes map (≈50 lines for TypeScript) rather than inferred. Structure is derived; vocabulary is declared.
Nothing in the engine knows about TypeScript. Everything language-specific lives in the grammar — keywords, which token is the identifier, template-literal delimiters, the regex-vs-division lexer ambiguity — all declared per token:
const Template = token(/`…`/, { template: { open: '`', interpOpen: '${', interpClose: '}' } });
const Regex = token(/\/…\//, {
regex: true,
regexContext: {
divisionAfterTypes: ['Ident', 'Number', 'String', 'Template'],
divisionAfterTexts: [')', ']', 'this', 'true', /* … */],
regexAfterTexts: ['return', 'typeof', 'instanceof', /* … */],
},
});test/agnostic.ts proves it directly — the same engine parses a toy grammar whose identifier token is Word, with no templates or regex. The deeper proof is html.ts: markup shares nothing with TypeScript's token stream, yet the same engine handles it (and Vue layers SFC blocks + {{ }} interpolation on top).
A new language is one grammar file on the unchanged engine:
- Write the grammar with the combinator API (
src/api.ts) — tokens, operator precedence, rules. Everything language-specific lives here. - Prove it as a parser against the language's own official test suite, measured bidirectionally (accept what the reference accepts, reject what it rejects).
- Drop in the official TextMate grammar as the baseline, so highlighter coverage is measured against what you're replacing, not asserted.
The lexer, CST types, and all three highlighters fall out of step 1; a dialect (.tsx/.jsx via jsx.ts, or Vue on html.ts) reuses a base grammar's rules by name in a few lines. The conformance/highlighter harnesses are currently TypeScript-specific (they call tsc and read VS Code's grammar) — point them at your own reference compiler.
A handful of token patterns are scoped differently from VS Code's official TypeScript grammar — all intentional, and in some Monogram is arguably more correct (these are deliberate divergences, distinct from the bug-class fixes the ledger measures):
| Token | Monogram | Official | Why we keep ours |
|---|---|---|---|
console in console.log |
support.variable |
variable.other.object |
We highlight built-in globals (console, window, …) distinctly — a deliberate, common choice. |
transform (a function parameter) |
variable.parameter |
entity.name.function |
It is a parameter. Official's heuristic mis-reads name: (…) => T as a function definition; we're more correct. |
error (the method in console.error(…)) |
entity.name.function |
variable.other.readwrite |
We scope a called method as a function name — arguably more informative. |
Built-in class names in type position (e.g.
Errorinextends Error) correctly emitentity.name.type, matching official; in value position (new Error()) they remainsupport.class, also matching official.
Matching the official grammar exactly would, in cases like transform, make the output worse. The metric counts these as differences, not defects.
typescript.ts one grammar (TypeScript combinator API)
│
├─ src/gen-lexer.ts ───────▶ lexer → tokens (standalone: createLexer)
│ ▲ composed by
├─ src/gen-parser.ts ───────▶ CST parser (recursive descent + Pratt + packrat memoization;
│ run against the conformance suite = the grammar's proof)
│
├─ src/gen-tm.ts ───────────▶ typescript.tmLanguage.json (TextMate highlighter)
├─ src/gen-vscode-config.ts ▶ typescript.language-configuration.json (editor behavior)
├─ src/gen-treesitter.ts ───▶ tree-sitter/ (grammar.js + highlights.scm + scanner.c)
├─ src/gen-monarch.ts ──────▶ typescript.monarch.json
└─ src/gen-ast-types.ts ────▶ typescript.cst-types.ts
shared src/grammar-utils.ts structural helpers used across stages
src/api.ts, types.ts the grammar's combinator + type surface
Every target is produced by the same structural scope-inference, retargeted per format — lexer, parser, and generators are generic runtimes; all language specifics live in the grammar.
| Tool | Parser | Highlighting | Single source |
|---|---|---|---|
| TextMate grammars | — | manual regex | — |
| tree-sitter | yes | queries (written separately) | — |
| ANTLR | yes | — | — |
| Langium | yes | Monarch (separate config) | — |
| ungrammar | AST types | — | — |
| Monogram | CST, conformance-proven | derived from the parser grammar | yes |
Every tool here has a real parser; none derives the highlighter from the parser's own grammar as a single source — the one thing Monogram is for.