Phase 1:Code for a Simple Case: Integer Token

Tokennizer

We are going to analyze and implement a program to tokenize a string expression. The lecture covered the steps of analysis such as I/O investigation, pseudo-code of various parts. Here is the implementation strategy.

Phase 1:Code for a simple case: integer token

//handle the case of token number.

Input : “125+ x” Expected output: (“125” , number)

Phase 2. Create a framework for a general case as follows

//Read the string expression: StrExp

//Extract into variable firstChar, the first character of StrExp

repeat //until no more char in StrExp

//Inspect firstChar

case firstChar

‘0’..’9’ or ‘.’: // Phase 1

// Get the token and print it out

// Reset token to be null

blank : while ( firstChar is blank)

{ Reduce StrExp by its first character

Extract into into firstChar the first character of StrExp}

‘x’: print x and its type

Reduce StrExp by its first character

Extract into into firstChar the first character of StrExp

LeftParenthesis: print ( and its type

Reduce StrExp by its first character

Extract into into firstChar the first character of StrExp

RightParenthesis: print ) and its type

Reduce StrExp by its first character

Extract into into firstChar the first character of StrExp

…………………………….

endcase

until there is no more char in strExp

Testing:

Input : “ 125.2 + x” Expected output: (“125.2” , number) (“+”, operator) (“x”, Math variable)

Input : “ 125.2 + 45 ” Expected output: (“125.2” , number) (“+”, operator) (“45”, number)

Phase 3: Error Detection and Hidden Tokens.

Try to detect the input error, extraction error and then fix them. The error can be caused by the extraction of a null string.
Expand the current source code to cover the case of

125x or x125.

We have discussed in class that when we find a new token, the token will be print out with its type. Let’s save its type into a variable tokenType. This variable is initialized to be –1 as no token. Before we set a tokenType be be a new value, it holds OF COURSE the token type of the LAST token. We can this variable to see if we have 125x. In this case we inset a new token “*” and its type, then we will print the current token and set the tokenType to be the current type.

Phase 4: Save the output into a linked list

// Build a class of TokenNode of 3 fields: tokenName, tokenType, and tokenNext

class TokenNode

{

String tokenName

inttokenType

TokenNodetokenNext

}

// Declare a variable to hold the linked list of tokens and initialize it to be null.

TokenNodetop  null;

// At any println in your source code, add the following code to print (link) to the variable top

// Create a new node for a new discovered token.

temp  new TokenNode();

temp.tokenName  printed token

temp.tokenType  printed type

temp.tokenNext  null

// Link it to the variable top

temp.tokenNext  top

top  temp

// After all tokens are extracted (in this case StrExp is empty), we print the linked list

// print out the linked list for testing

temp top

while (temp is not null) do

{ print (temp.tokenName);

println(temp.tokenType);

temp temp.tokenNext;

}

Note: You would not be surprised that the token stream is printed in a reversed order. Look at the code and modify it so that the var. top will hold the tokens in a proper order.

Answer:

In this case you need a new variable bottom that references the last node. You must insert the codes below appropriately to get the desired result.

// first time top and bottom are both null

top = null; bottom = null

// for the first node temp, top and bottom point to the same node

top = temp;

bottom = temp;

// second node and after, top is unchanged, but bottom is changed.

bottom.tokenNext = temp;

bottom = temp;

Phase 4 alternate: Save the output into an array.

Option 1:

It is much easier to save the tokens and their types into two different arrays:

String tokenName [] and initialize it to be null in all indices;

int tokenType[] and initialize it to be –1 for all indices.

Each time you print out the token and its type, you need to save them into these arrays of the same index.

tokenName[index]  token;

tokenType[index]  the value you printed out.

Option 2:

It is a little bit harder. You will save a token and its type into a 1-dimentional array of TokenNode.
Each TokenNode is a pair of information (token, type). In this case, you must create a class of tokenNode which is a set of pairs (token, type).

Class TokenNode

{

String tokenName;

int tokenType;

}

// declaration of the array

TokenNode tokenStream[];

You must implement it and write a paragraph to compare pros and cons between the two structures: linked list and array in this particular program Tokenizer.

Grading plan:

1)Documentation 2 points

2)Implementation strategy: 2 points

3)Code/test: 6 points

4)Bonus to do Phase 4/alt and comparison: 3 points.

Note about implementation:

During the course of implementation you might get a lot of error messages regarding about extracting and reducing a string.
In Java for example, you can not reduce string if it is a single-character string by using the built-in method: strExp = strExp.substring(1); This makes sense when you think strExp has one character at the index 0 and there is no index 1.
To overcome this difficulty, you must use if-statement to reduce the string to null:

if ( strExp.length == 1) {strExp = null;}

else { strExp = strExp.substring(1);}

spham (Tokenizer)Page 110/08/2018