MUMPS Source (.m)
│
▼
┌─────────────────────────────────────────────────────┐
│ GRAMMAR (grammar/) │
│ Four textX grammar files define MUMPS syntax │
└──────────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ PARSER (parser/) │
│ Two-layer parsing: routine structure, then lines │
│ SemanticAnalyzer transforms textX CST → ASG nodes │
└──────────────────────┬──────────────────────────────┘
│ MRoutine (ASG)
▼
┌─────────────────────────────────────────────────────┐
│ ANALYSIS (analysis/) │
│ Multi-pass enrichment: references → GOTOs → FOR │
│ → quit context → variables → signatures │
└──────────────────────┬──────────────────────────────┘
│ Enriched MRoutine
▼
┌─────────────────────────────────────────────────────┐
│ CODEGEN (codegen/) │
│ Strategy selection → Python source generation │
│ SIMPLE_FUNCTIONS or TRAMPOLINE pattern │
└──────────────────────┬──────────────────────────────┘
│ Python source
▼
┌─────────────────────────────────────────────────────┐
│ RUNTIME (runtime/) + CORE (core/) │
│ Execution support: MArray, MUMPSRuntime, globals, │
│ devices, helpers, shared value semantics │
└─────────────────────────────────────────────────────┘
ASG-first design: Analysis passes enrich the ASG with semantic information (GOTO classifications, loop types, variable flow, scope strategies). Code generation only reads these pre-computed fields — it never recomputes semantic information. If codegen discovers a missing analysis result, the fix belongs in the analysis layer, not codegen.
Four textX grammar files define MUMPS syntax:
| File | Scope |
|---|---|
mumps.tx |
Routine structure: labels, lines, formal parameter lists |
line.tx |
Line content: whitespace, command sequences, comments |
commands.tx |
All MUMPS commands (~60) with case-insensitive matching |
expressions.tx |
Expressions: operators, functions, variables, literals, indirection |
MUMPS has no operator precedence — all binary operators evaluate strictly left-to-right. The expression grammar reflects this by chaining UnaryExpr → BinaryOpTail* without precedence levels.
Parsing uses a two-layer architecture for error tolerance:
-
Layer 1 (
mumps.txvia textX): Parses routine structure — labels, formal parameters, and raw line content strings. Individual lines that fail to parse don't prevent the rest of the routine from being processed. -
Layer 2 (
line.tx/commands.tx/expressions.tx): For each line's raw content, parses commands and expressions into textX CST nodes.
After textX parsing, SemanticAnalyzer.analyze_command() transforms each textX CST command into a fully-typed ASG statement node (MStatement subclass). This is where textX wrapper nodes are unwrapped, expression trees are built, MCall references are created for DO/GOTO targets, and pattern matches are compiled to regex.
Key classes:
MUMPSParser— orchestrates parsing and delegates to analysis functionsSemanticAnalyzer— CST-to-ASG transformationcompile_mumps_line()(compiler.py) — self-contained parse pipeline for XECUTE, so codegen doesn't import parser/analysis internals directly
Pure data model — Python dataclasses representing MUMPS program structure. No behavior beyond tree traversal helpers. See asg-reference.md for details.
The ASG root is MRoutine, containing MLabel nodes, each with a MScope body of MStatement nodes. Expressions are MExpr subtypes. Cross-references (MCall) link GOTO/DO targets to their MLabel definitions.
Six analysis passes run in sequence after parsing. Order matters — later passes depend on results from earlier ones:
| Pass | Function | Enriches |
|---|---|---|
| 1. Reference resolution | resolve_references() |
MCall.target, MCall.is_resolved, MLabel.callers/goto_sources |
| 2. GOTO classification | classify_gotos() |
MGotoStatement.goto_type/codegen_pattern, MRoutine.needs_trampoline |
| 3. FOR analysis | analyze_for_loops() |
MForStatement.loop_type, loop variable modification detection |
| 4. QUIT context | analyze_quit_context() |
MQuitStatement.exits_for/exits_do_block |
| 5. Variable analysis | analyze_variables() |
MLabel.input_variables/output_variables, scope strategy |
| 6. Signature computation | compute_signatures() |
MLabel.signature (FunctionSignature), MRoutine.routine_state_vars |
Additional analysis:
PatternCompilerconverts MUMPS pattern match syntax to Python regex during semantic analysis (pass 0, integrated into parsing)- Fallthrough detection identifies labels that flow into the next label without explicit exit
Translates the enriched ASG into executable Python. See codegen.md for full details.
Two strategies are automatically selected based on ASG analysis flags:
- SIMPLE_FUNCTIONS: Labels become plain Python functions. Variables live in a
_scopedict. Used when there are no cross-label GOTOs or computed offsets. - TRAMPOLINE: Labels return
(next_label, state)tuples dispatched by awhileloop. ARoutineStatedataclass carries variables across label boundaries. Prevents stack overflow for cyclic GOTO patterns.
Key files: routine.py (module structure), statements.py (~6800 lines, all statement types), expressions.py (~2600 lines, all expression types), var_access.py (3-way variable access dispatch), indirection.py (@ expressions and XECUTE), shared_state.py (RoutineState generation).
The runtime library imported by generated Python code. See runtime.md for full details.
MArray— MUMPS hierarchical sparse arrays (each node has both a value AND children)MUMPSRuntime— central runtime instance: I/O, global variables, indirection resolution, XECUTE compilation, error handling, stack frames, intrinsic special variablesGlobalStorageBackend— pluggable global storage with multiple backends:InMemoryGlobalStorage— default in-process backend (no external database)SQLiteGlobalStorage— cross-process JOB/LOCK supportYottaDBGlobalStorage— YottaDB database backend (requires YDB container)IRISGlobalStorage— InterSystems IRIS backend (requires IRIS container)- Backend selection via
M2PY_GLOBAL_BACKENDenv var orget_global_storage()factory
- Device layer —
PrincipalDevice(stdin/stdout),FileDevice,TCPDevice - JOB subprocess support via
job_runner.py
Canonical implementations of MUMPS semantics shared identically by both compile-time codegen and runtime. This layer was created (spec 018–019) to break backward imports from runtime→codegen.
| Module | Purpose |
|---|---|
values.py |
m_str(), m_num(), m_truth(), m_compare(), mumps_canonical_str() |
names.py |
NameTranslator — bidirectional MUMPS↔Python name mapping |
subscripts.py |
SubscriptCanonicalizer — canonical subscript forms per MUMPS spec |
scope.py |
CurrentScope — unified variable access abstraction |
indirection.py |
IndirectionResolver — runtime @-expression resolution |
tokenizer.py |
split_at_toplevel() — delimiter splitting respecting nesting and quotes |
parsing.py |
parse_subscripted_name() — parse ARR(1,2) to ('ARR', ['1', '2']) |
exceptions.py |
LVUNDEFError (M6), VarExpectedError |
The m2py CLI transpiles .m files and directories to Python. Entry point: m2py.cli:main (registered as m2py in [project.scripts]).
| Module | Responsibility |
|---|---|
__init__.py |
Argument parsing (argparse), summary output, exit codes |
transpile.py |
Transpilation pipeline: generate_python → ruff check --fix → ruff format → write. Parallel batch transpilation via ProcessPoolExecutor. Result/summary dataclasses. |
The previous approach using YottaDB opcodes failed due to difficulty with sequential processing of compiler IR, constant folding, and complex control flow reconstruction. textX provides a declarative grammar-to-model transformation that directly produces structured ASG nodes, making the parser maintainable and the CST-to-ASG transformation straightforward.
Both patterns were prototyped for cross-label GOTO handling (spec 006). The trampoline pattern was selected because:
- Better testability — each label is an independent function
- Better refactorability — labels can be extracted, inlined, or composed
- Clearer control flow — the dispatch loop is a simple
whilewith a dictionary lookup - The state machine approach had fewer lines but was harder to test in isolation
Early in development, the runtime module imported codegen utilities for name translation and subscript handling, creating a backward dependency (runtime→codegen). Spec 018–019 extracted these shared concerns into core/, establishing a clean dependency direction: both codegen and runtime depend on core, but never on each other.