Tokennizer
We are going to analyze and implement a program to tokenize a string expression. The lecture covered the steps of analysis such as I/O investigation, pseudo-code of various parts. Here is the implementation strategy.
Phase 1:Code for a simple case: integer token
//handle the case of token number.
Input : “125+ x” Expected output: (“125” , number)
Phase 2. Create a framework for a general case as follows
//Read the string expression: StrExp
//Extract into variable firstChar, the first character of StrExp
repeat //until no more char in StrExp
//Inspect firstChar
case firstChar
‘0’..’9’ or ‘.’: // Phase 1
// Get the token and print it out
// Reset token to be null
blank : while ( firstChar is blank)
{ Reduce StrExp by its first character
Extract into into firstChar the first character of StrExp}
‘x’: print x and its type
Reduce StrExp by its first character
Extract into into firstChar the first character of StrExp
LeftParenthesis: print ( and its type
Reduce StrExp by its first character
Extract into into firstChar the first character of StrExp
RightParenthesis: print ) and its type
Reduce StrExp by its first character
Extract into into firstChar the first character of StrExp
…………………………….
endcase
until there is no more char in strExp
Testing:
Input : “ 125.2 + x” Expected output: (“125.2” , number) (“+”, operator) (“x”, Math variable)
Input : “ 125.2 + 45 ” Expected output: (“125.2” , number) (“+”, operator) (“45”, number)
Phase 3: Error Detection and Hidden Tokens.
- Try to detect the input error, extraction error and then fix them. The error can be caused by the extraction of a null string.
- Expand the current source code to cover the case of
125x or x125.
- We have discussed in class that when we find a new token, the token will be print out with its type. Let’s save its type into a variable tokenType. This variable is initialized to be –1 as no token. Before we set a tokenType be be a new value, it holds OF COURSE the token type of the LAST token. We can this variable to see if we have 125x. In this case we inset a new token “*” and its type, then we will print the current token and set the tokenType to be the current type.
Phase 4: Save the output into a linked list
// Build a class of TokenNode of 3 fields: tokenName, tokenType, and tokenNext
class TokenNode
{
String tokenName
inttokenType
TokenNodetokenNext
}
// Declare a variable to hold the linked list of tokens and initialize it to be null.
TokenNodetop null;
// At any println in your source code, add the following code to print (link) to the variable top
// Create a new node for a new discovered token.
temp new TokenNode();
temp.tokenName printed token
temp.tokenType printed type
temp.tokenNext null
// Link it to the variable top
temp.tokenNext top
top temp
// After all tokens are extracted (in this case StrExp is empty), we print the linked list
// print out the linked list for testing
temp top
while (temp is not null) do
{ print (temp.tokenName);
println(temp.tokenType);
temp temp.tokenNext;
}
Note: You would not be surprised that the token stream is printed in a reversed order. Look at the code and modify it so that the var. top will hold the tokens in a proper order.
Answer:
In this case you need a new variable bottom that references the last node. You must insert the codes below appropriately to get the desired result.
// first time top and bottom are both null
top = null; bottom = null
// for the first node temp, top and bottom point to the same node
top = temp;
bottom = temp;
// second node and after, top is unchanged, but bottom is changed.
bottom.tokenNext = temp;
bottom = temp;
Phase 4 alternate: Save the output into an array.
Option 1:
- It is much easier to save the tokens and their types into two different arrays:
String tokenName [] and initialize it to be null in all indices;
int tokenType[] and initialize it to be –1 for all indices.
- Each time you print out the token and its type, you need to save them into these arrays of the same index.
tokenName[index] token;
tokenType[index] the value you printed out.
Option 2:
- It is a little bit harder. You will save a token and its type into a 1-dimentional array of TokenNode.
- Each TokenNode is a pair of information (token, type). In this case, you must create a class of tokenNode which is a set of pairs (token, type).
Class TokenNode
{
String tokenName;
int tokenType;
}
// declaration of the array
TokenNode tokenStream[];
You must implement it and write a paragraph to compare pros and cons between the two structures: linked list and array in this particular program Tokenizer.
Grading plan:
1)Documentation 2 points
2)Implementation strategy: 2 points
3)Code/test: 6 points
4)Bonus to do Phase 4/alt and comparison: 3 points.
Note about implementation:
- During the course of implementation you might get a lot of error messages regarding about extracting and reducing a string.
- In Java for example, you can not reduce string if it is a single-character string by using the built-in method: strExp = strExp.substring(1); This makes sense when you think strExp has one character at the index 0 and there is no index 1.
- To overcome this difficulty, you must use if-statement to reduce the string to null:
if ( strExp.length == 1) {strExp = null;}
else { strExp = strExp.substring(1);}
spham (Tokenizer)Page 110/08/2018