Skip to content

Conversation

@loyaltypollution
Copy link
Contributor

Summary:
This pull request (Fixes #68) introduces unified Nearley + Moo–based parsing system. It replaces the old, fragmented setup (manual tokenizer, static grammar, separate AST DSL) with a single declarative pipeline that integrates tokenization, grammar rules, and AST generation.

Key Improvements:

  • Adds a Moo lexer for tokens and keywords.
  • Defines a Nearley grammar aligned with the existing language subset.
  • Embeds AST node generation within grammar rules.
  • Maintains compatibility with generate-ast.ts.
  • Introduces a build step to compile grammar into the parser.

Benefits:

  • Centralized source of truth for grammar (grammar.ne + lexer.moo).
  • Automatic derivation of TokenType enums — no manual syncing.
  • Deprecates tokenizer.ts and Grammar.gram.
  • Easier debugging through Nearley’s readable parse trees.

Next Steps:

  • Expand grammar to support more Python constructs (+=, comprehensions, etc.).
  • Reassess integration with generate-ast.ts -> currently creates ExprNS/StmtNS objects

- Moved error handling logic to a dedicated errors module, improving organization and maintainability.
@loyaltypollution
Copy link
Contributor Author

loyaltypollution commented Oct 29, 2025

Just discovered from link that the code is slow:

main -> statement:* {% flatten %}

Turns out that instead of the post-processor function being executed when all statements are matched it gets executed at every increment:

with 0 statements
with 1 statement
with 2 statements

This was benchmarking result of simple ackermann function
image

In order to ensure the Nearley parser emitted the current internal Python AST , we created these post-processor functions. However they are slow. A move to Nearley parser might require rethinking the internal Python AST

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hard-coded Tokenizer and Parser limit Python language extensibility

1 participant