The UWO ORCCA Reading Room

UWO ORCCA TR-06-07 Summary

reflex: A Scanner Transformer for Unicode Grammars

S. L. Huerter and S. M. Watt, July 2006, 34 pages

Abstract: Conventional table-driven scanners and the tools to generate them evolved alongside conventional programming languages with small character sets. Lately the Universal Character Set has swept up-to-date programming languages ahead of their traditional lexical analyzers: its imposing size of more than one million characters far exceeds practical implementation limits for table-based scanners. We address this issue by transforming lexical grammars over Unicode to lexical grammars over smaller alphabets; regular expressions used by the generated scanners are those in which Unicode scalars have been expanded to corresponding code sequences under a chosen Unicode encoding scheme. We introduce an original tool, reflex, to do this translation and which may be applied to scanner specifications in the Flex and Jflex languages.

If you have any questions or comments regarding this page please send mail to tech-reports@orcca.on.ca.

About ORCCA	Members	Events & Seminars	ORCCA Home
Research Activities	Positions	Reading Room
Contact Info