5/08/2011

Language processing for dummies

I enjoyed reading recently "Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages" by Terence Parr (the ANTLR guy) as available from The Pragmatic Bookshelf. The book uses a design pattern style to present techniques for language processing. Eventually the book gets to leverage ANTLR for many problems, but it also explains language processing in terms of patterns that require regular programming effort without the help of a program generator. This is a clever way of teaching some basics of grammars, parsing, tree walking and other concepts that can be explained all too easily (elsewhere) in a too complicated manner, or in a manner requiring substantial background.

In my (very) advanced Bachelor's course Programming techniques and technologies, I face students who don't know much yet about formal languages and certainly nothing about compiler construction; they are aware though of context-free grammars (EBNFs) for the purpose of syntax definition. The course covers many programming techniques and technologies, and I am willing to dedicate two lectures to language processing. Hence, I designed patterns that are really simple and that allow me to discuss a number of language processors for 101companies in a systematic manner.

"Manual" patterns
  • The Chopper Pattern: Chop input into pieces for classification and iterator-based processing.
  • The Lexer Pattern: Transform character stream into token stream for iterator-based processing.
  • The Copy/Replace Pattern: Implement text transformation by copying tokens from input to output while replacing tokens of interest.
  • The Acceptor Pattern: Implement a context-free grammar with procedures for parsing input and signaling acceptance or rejection.
  • The Parser Pattern: Advance the Acceptor Pattern such that semantic actions are to be executed along with parsing input.

"Generative" patterns
  • The Lexer Generation Pattern: Use a lexer/parser generator to derive a character-to-token stream conversion from a grammar generatively.
  • The Acceptor Generation Pattern: Use a parser generator to derive an acceptor from a grammar generatively.
  • The Parser Generation Pattern: Use a parser generator to derive an parser from a grammar with associated semantic actions generatively.
  • The Text-to-object Pattern: Instantiate the Parser (Generation) Pattern such that the semantic actions construct objects that resemble the syntactical structure of the input.
  • The Object-to-text Pattern: Implement an operation on an object model that writes out the object graph in the underlying textual notation.

Associated 101companies implementations

Slide decks (.pdf): [1]. [2]

Lecture videos (.mov): [1]. [2]

Regards,
PF

No comments:

Post a Comment