Ontario Research Centre for Computer Algebra Technical Reports
The Ontario Research Centre for Computer Algebra

The UWO ORCCA Reading Room

UWO ORCCA TR-06-07 Summary

reflex: A Scanner Transformer for Unicode Grammars

S. L. Huerter and S. M. Watt, July 2006, 34 pages

Abstract: Conventional table-driven scanners and the tools to generate them evolved alongside conventional programming languages with small character sets. Lately the Universal Character Set has swept up-to-date programming languages ahead of their traditional lexical analyzers: its imposing size of more than one million characters far exceeds practical implementation limits for table-based scanners. We address this issue by transforming lexical grammars over Unicode to lexical grammars over smaller alphabets; regular expressions used by the generated scanners are those in which Unicode scalars have been expanded to corresponding code sequences under a chosen Unicode encoding scheme. We introduce an original tool, reflex, to do this translation and which may be applied to scanner specifications in the Flex and Jflex languages.

