Scanner Assignment

The first step in building a compiler is to create a scanner. You will use the Flex Scanner Generator to construct a scanner generator for the C-Flat Language. It is up to you to carefully read this document and decide what all of the token types are, and define them carefully using flex regular expression. Make sure that you define all reserved words, identifiers, operators, constants, statements, and any other program elements. If you think that the specification is not clear, be sure to ask for a clarification.

Your program must be written in plain C (not C++) and use flex to generate the scanner. You must have a Makefile such that when you type make, all the pieces are compiled and result in a binary program called cflat. make clean should also delete all temporary files, so that the program can be made again from scratch.

Your program must be invoked as follows:

./cflat -scan sourcefile.cflat
It must output to the standard output using printf the symbolic token types for each element of the input. For the token type STRING_LITERAL and CHARACTER_LITERAL, you must also output the string or character, with the quotes removed and any escape codes translated. If the input contains an invalid token, then the program should print a message to the standard error stream (fprintf(stderr,"...")) and exit(1); immediately. Otherwise, the program should exit with return code zero.

For example, if the input looks like this:

string
1534
'a'
Notre Dame
"Notre\nDame";
>=
@
then your output should be:
STRING
INTEGER_LITERAL
CHARACTER_LITERAL a
IDENTIFIER
IDENTIFIER
STRING_LITERAL Notre
Dame
SEMICOLON
GE
scan error: @ is not a valid character
A compiler has many odd corner cases that you must carefully handle. You must test your program extensively by designing and testing a large number of test cases. To encourage you to test thoroughly, we will also require you to turn in ten testing input files. Five should be named good[1-5].cflat and should contain valid tokens. Five should be named bad[1-5].cflat and should contain at least one erroneous token.

For this assignment, correctness of output is the only element of your grade. We will construct some hidden test cases and add to them all of the test cases submitted by students. Your grade will depend upon the fraction of the tests that your scanner correctly processes. In addition, for every test that you write that detects a bug in someone else's scanner, you will receive one extra credit point, up to a mximum of five points.

To turn in the assignment, copy your source files, Makefile, and testing files into your dropbox directory, which is:

/afs/nd.edu/coursefa.08/cse/cse40243.1/dropbox/YOURNAME/scanner