| The Ontario Research Centre for Computer Algebra
The UWO ORCCA Reading Room
reflex: A Scanner Transformer for Unicode Grammars
S. L. Huerter and S. M. Watt, July 2006, 34 pages
Abstract: Conventional table-driven scanners and the tools to generate them evolved alongside conventional programming languages with small character sets. Lately the Universal Character Set has swept up-to-date programming languages ahead of their traditional lexical analyzers: its imposing size of more than one million characters far exceeds practical implementation limits for table-based scanners. We address this issue by transforming lexical grammars over Unicode to lexical grammars over smaller alphabets; regular expressions used by the generated scanners are those in which Unicode scalars have been expanded to corresponding code sequences under a chosen Unicode encoding scheme. We introduce an original tool, reflex, to do this translation and which may be applied to scanner specifications in the Flex and Jflex languages.
If you have any questions or comments regarding this page please send mail to