Comp 501 Projects – Fall 2012

COMP501 Introduction to Programming Languages

Projects - Fall 2012 - Prof. M Werner

Project 1 - Research Paper - Compare another language to Java

Select one of the languages in the University of Michigan’s CIS 400 Programming Languages Guide or from Wikipedia’s language list: and compare it to C++ according to the criteria listed below. All work is individual, and no more than two people can compare the same language.

You are expected to use informed opinion as a part of your rating justifications. References should be cited in the text of your essay (eg. Wirth, 1978). A complete reference list should appear at the paper's end.

Chapters 1 - 3 from Sebesta provide a good starting point. An Internet search will yield valuable material. However, you must use additional sources from the library. We currently receive a number of periodicals including ACM SIGPLAN Notices, ACM Communications, IEEE Computer, ACM Computer Surveys. We also have access, through the library, to a number of on-line databases.

You must have at least one paragraph comparing your language to Java for each of the criteria below. Hence, besides a brief introduction your paper will consist of numbered sections by criteria.

Criteria for Language Design Evaluation:

  1. Efficiency (translation and execution)
  2. Simplicity (readability and writability)
  3. Reliability
  4. Expressivity (does the language facilitate translating a program design into code?)
  5. Abstraction facilities (data and procedural)
  6. Portability (can the programs be ported to different hardware and operating systems?)
  7. Tools availability (is there a unified development environment with an editor, compiler and debugger?)
  8. Major application areas (scientific, systems, business, etc.)
  9. Extent of current use of the language
  10. Reverse engineering (Can the source code be easily recovered from the run-time code?)
  11. Exception handling
  12. Support for object orientation
  13. Support for web programming

Please let me know by Sep 12th what language you are planning to compare to C++. You may send me e-mail or tell me in class. I must formally approve your request before you begin. No more than 2 evaluations of the same language will be approved (so choose early).

Project 2 - Parser for Dots Graphical Description Language

Dots is a graphical description language developed by Eleftherios Koutsofios and Stephen C. North at AT&T Bell Laboratories. A paper describing it is available at Here is a simple example of a dots description and graph.


Here is a fancy graph.


Here is the grammar used for making fancy graphs:

A Graph File Grammar

The following is an abstract grammar of graph files. Double­line brackets [ and ] enclose optional items. Vertical bars | separate alternatives. Curly braces { } enclose items repeated zero or more times.

graph -> [strict] ( digraph | graph ) id {stmt­list}

stmt­list -> [stmt [;] [stmt­list ] ]

stmt -> attr­stmt | node­stmt | edge­stmt | subgraph | id = id

attr­stmt -> (graph | node | edge) [ [ attr­list ] ]

attr­list -> id=id [attr­list ]

node­stmt -> node­id [ opt­attrs ]

node­id -> id [: id ]

opt­attrs -> [attr­list]

edge­stmt -> (node­id | subgraph) edgeRHS [opt­attrs ]

edgeRHS -> edgeop ( node­id | subgraph ) [edgeRHS ]

subgraph -> [subgraph id] { stmt­list } | subgraph id

An id is any alphanumeric string not beginning with a digit, but possibly including underscores; or a number; or any quoted string possibly containing escaped quotes.

An edgeop is ­> in directed graphs and ­­ in undirected graphs.

Semicolons aid readability but are not required except in the rare case that a named subgraph with no body immediate preceeds an anonymous subgraph, because under precedence rules this sequence is parsed as a subgraph with a heading and a body.

A dot graph consists of a collection of nodes and a collection of edges. Nodes are identified by their id, which is unique. A node is created the first time its name appears in the file. Hence, when a node name appears, it is first checked to see if it matches an already created node, if not, a new node is created. Edges are created when nodes are joined by the edge operator -> or --.

Dot graphs may also have a collection of subgraphs, which are themselves dot graphs. Drawing a dot graph requires recursively drawing subgraphs.

Preparing the .jj file for processing by JavaCC.

Consider the production:

graph -> [strict] ( digraph | graph ) id {stmt­list}

It contains a mix of reserved words and symbols, tokens and nonterminals. The goal is to build a Java program to parse the language. Since Java doesn’t allow hyphens in identifiers, change stmt-list to stmt_list.

Enclose reserved words and symbols in double quotes. EBNF metasymbols [,],(,) are left unchanged. Nonterminals are shown as methods. The production will be used to define the method for parsing its LHS, in this case graph. Here is the way the production is represented in the .jj file.

void graph() :

{

//Initialization area

}

{

[ "strict" ] ("digraph" | "graph") <ID> "{" stmt_list() "}"

}

Terminals must be defined as tokens. So this production requires tokens: <STRICT>, <DIGRAPH>, <GRAPH>, <ID>, <LBRACE> AND <RBRACE>. Nonterminals appearing in the RHS of an EBNF production become method calls. So in this case stmt_list() will need to be defined elsewhere in the .jj file.

EBNF uses braces to indicate repetition as in {…}. In JavaCC braces have other uses, repetition is indicated using parentheses and * as in (…)*.

Actions are inserted into the .jj file using braces. Here is an example:

void graph() :

{

//Initialization area

String graphType;

Token t;

}

{

[ "strict" ]

("digraph" {graphType = "Digraph";}| "graph" graphType = "Graph";})

t= <ID>

{System.out.println("Parsing a "+graphType+" named "+ t.image);}

"{" stmt_list() "}"

}

Project 2 Deliverables:

1.Modify the fancy graph grammar by stripping away extra features, so that it is adequate to describe simple graphs. Call this the SimpleDots grammar.

2.Use JavaCC to build a parser to recognize the SimpleDots language.

3.Add actions to the parser, so that it builds a parse tree. This requires building a class for each nonterminal in your grammar. Each production should return an object of its matching class. The object represents an interior node of the parse tree. It has fields set to point to its children. Here is an example taken from a .jj file:

//EdgeRHS = EdgeOp Nodeid [EdgeRHS ]

EdgeRHS EdgeRHS() :

{//initialization section

EdgeRHS edgeRHS = new EdgeRHS();

EdgeOp edgeOp;

Nodeid nodeid;

EdgeRHS nextEdge;

}

{

edgeOp = EdgeOp() {edgeRHS.setEdgeOp(edgeOp);}

nodeid = Nodeid() {edgeRHS.setToNode(nodeid);}

[nextEdge = EdgeRHS() {edgeRHS.setEdgeRHS(nextEdge);} ]

{return edgeRHS;}

}

4.Build Java classes for Graph, Node and Edge. The Graph class contains a hash table named nodes for keeping its collection of Node objects. (We are assuming that all nodes have distinct labels.) It has a print() method, which prints all the Nodes in a graph.

The Node class has a field named label, which is also used as the hash key. It also has vectors named outgoing and incoming for keeping track of edges. It has a method print(), which prints a Node object as follows:

Node: execute

Incoming edges from: parse

Outgoing edges to: make_string, compare, printf

5.Do a traversal of the parse tree to identify the nodes in the graph. When a node is discovered, construct a new Node object and put it in the hash table. Since it is a hash table, there will be no duplicates.

Do a second traversal of the parse tree to identify edges. When you find an edge, construct a new Edge object and add it to both the outgoing collection of its source node and the incoming collection of its target node.

Test your program by parsing at least two simple graphs, and printing out the nodes using the Graph::print() method.

Note: Project 2 consists of parts 1 - 5 above. The parts below will be assigned in the second half of the course. Some of these will be for extra credit.

6.You will now write a program to draw a Graph object. The first job is to lay out the nodes on a canvas. The simplest way is to use a square grid without regard to the overlapping of edges. So, for example, if there are 10 nodes, the next highest perfect square is 16. Hence, use a 4 by 4 grid with 4 nodes in row 1, 4 in row 2, and 2 in row 3. Draw the nodes as ellipses, draw the labels on top, and draw the edges below the nodes from center to center. Draw the arrowheads in the middle of the edges.

7.Improve on Part 5. Place the arrows where the edge strikes the target node. Make the ellipses the right size to hold their labels. Perhaps improve the layout.

8.Make the graph interactive. The user should be able to drag any node, with the links following it.

9.Provide a facility for users to add nodes and edges by clicking and dragging.

10.(extra credit - mucho). Build a parser and graph drawer for the FancyGraph language, including many of the features described in the article.

(Optional – Extra Credit)

Project 3 Build a Postscript Interpreter

Postscript is a stack-based programming language used primarily for preparing documents for printing. Luc Devroye, a professor in the School of Computer Science at McGillUniversity, provides a useful set of links for someone interested in learning about Postscript.

The Adobe company is the authoritative source. In particular, the Blue book contains tutorial and sample scripts.

3.1 Build a postfix interpreter in Java

The interpreter opens a text file containing a stream of numbers and operators. It uses a stack. When a number is scanned it is pushed on the stack. When an operator is scanned, its operands are popped from the stack, the operation applied, and the result is pushed. The operators are:

Arithmetic Operators

add

sub

mul

div – Division resulting in a floating point answer

idiv – Division resulting in an integer answer

Stack Operators

clear – Removes all stack contents

dup - Duplicates top of stack

exch – Reverses the order of top two items on stack

pop – Removes top of stack

== - Destructively prints top of stack

Lines starting with % are comments

3.2 Expand the interpreter to handle defined constants and procedures

A definition looks like this:

/mil 0.001 def //defines the name mil to mean 0.001

or

/square {dup mul} def //defines the procedure square to mean duplicate the top of the stack

// and multiply

When a definition is made it is stored in a dictionary. The dictionary entry associates a key (the word being defined) with a definition. It is convenient to implement dictionaries using a hash table. Postscript allows multiple dictionaries to be stacked.

When a defined name is scanned from the input, its definition is retrieved from the (top) dictionary. If it is a number it is pushed. If it is a procedure it is scanned token by token from the definition and interpreted.

For example, the script:

/five 5 def

/square {dup mul} def

five

square

==

should produce 25 as an output

You may wish to use this grammar for simplified postscript

<program> -> {<entry>} "end"

<entry> -> <number_entry> | <def> | <name> | <string>

<number_entry> -> <number>

<literal> -> "/" <IDENTIFIER>

<name> -> <builtin_operator> | <user_defined_operator>

<user_defined_operator> -> <IDENTIFIER>

<string> -> <STRING>

<procedure> -> <PROCEDURE>

<number> -> <NUMBER>

<builtin_operator> -> <arithmetic_operator> | <stack_operator> | <graphic_operator>

<arithmetic_operator> -> <add> <mul> <sub> <div>

<stack_operator> -> <clear> | <dup> | <exch> | <pop> | <top>

<graphic_operator> -> <lineto> | <moveto> | <stroke>

<add> -> "add"

<mul> -> "mul"

<sub> -> "sub"

<div> -> "div"

<clear> -> "clear"

<dup> -> "dup"

<pop> -> "pop"

<top> -> "=="

<lineto> -> "lineto"

<moveto> -> "moveto"

<stroke> -> "stroke"

<def> -> <literal> ( <number> | <procedure> ) "def"

//Tokens

<NUMBER> -> ["1"-"9"] (["0"-"9"])* | (["0"-"9"])+ "." (["0"-"9"])*

<IDENTIFIER> -> ["a"-"z","A"-"Z"] (["a"-"z","A"-"Z"])*

<STRING> -> "(" ( ( ~["(",")","\"","\\","\n","\r"]))+ ")"

<PROCEDURE> -> "{" ( ( ~["{","}"]))+ "}"

Tokens for simplified postscript

<NUMBER> - Either integer or floating point. Stored as f.p.

<IDENTIFIER> (a-zA-Z)+

Literals are words being defined inside a def operation.

The definition is stored in a dictionary.

Design to enter tokens from a defined procedure into the stream

When a literal is parsed, the definition is stored as a string in a dictionary

When an Identifier is parsed, its definition is looked up in the dictionary. The string found is then tokenized and processed.

Here are two approaches to do this:

1) Instantiate a new parser to read from the string

java.io.StringReader sr = new java.io.StringReader( str );

java.io.Reader r = new java.io.BufferedReader( sr );

XXX parser = new XXX( r );

When this string is consumed this parser is destroyed and parsing resumes from the file.

The same DoubleStack is used for both parsers. This is done by making it static.

Note: To make this work I needed to introduce lexical states.

The IN_DEFINITION state used the PROCEDURE token above. I entered this state on reaching a slash and left it on reaching a "def".

The DEFAULT state was used elsewhere, in particular when parsing a string to prevent the entire string being matched by <PROCEDURE>

2) Interpret the .ps file in two passes.

  1. Enter all definitions into the dictionary and expand all defined words using their definitions. The first pass writes out an intermediary file stripped of definitions and with all defined words expanded.

The second pass is a straightforward interpretation of the remaining postscript commands.

3.3 Add graphics to handle simple postscript files (Extra Credit)

Here is a script that draws a simplified Wentworth logo.

%!PS-Adobe-2.0

/Helvetica findfont

12 scalefont setfont

/thick 25 def

.8 setgray

/inch {144 mul } def

/box { 1 inch 0 rlineto 0 1 inch rlineto -1 inch 0 rlineto 0 -1 inch rlineto } def

200 200 moveto box stroke

/inch { 108 mul } def

218 218 moveto box fill

.0 setgray

232 331 moveto (INSTITUTE OF) show

331 314 moveto

-90 rotate (TECHNOLOGY) show

90 rotate

213 231 moveto

90 rotate (WENTWORTH) show

-90 rotate

250 210 moveto

/Helvetica findfont

9 scalefont setfont

(FOUNDED) show

260 202 moveto (1904) show

showpage

%%EOF

To interpret a script like this you will need to add graphics operations such as moveto, rlineto, stroke, rotate, etc. (See the official postscript documentation) You will then need to open a canvas inside a frame to draw on. The commands need to be interpreted, properly translating postscripts coordinate system to Java’s.

Project 4,5 and 6 – Parser for PGN Chess Language

This project involves parsing standardized chess notation. Of course, you will need to know how to play chess in order to build the parser. See The Rules of Chess.

PGN - Portable Game Notation is a standard designed for the representation of chess game data using ASCII text files. PGN is structured for easy reading and writing by human users and for easy parsing and generation by computer programs. A text file composed exclusively of PGN data records should have a file name with ".pgn" as the suffix. More information can be found at this site.

- above quoted from logicalchess.com

Below is an article showing the description of a chess game using PGN:

/ From Mark Weeks,
Your Guide to Chess.
Chess
Portable Game Notation (PGN)

Downloading games is a popular chess activity on the Internet. PGN makes it simple.

Related Resorces
•Chess game downloads

Computers and the Internet have had a profound impact on the game of chess. Opening & game databases, tactical calculation, endgame tablebases, online & email play, computer vs. computer, vs. human, algorithm & game theory, have all been with us for less than an average human lifetime.
The advances in chess have paralleled the advances in computing. The first mainframes to play chess knew little more than how the pieces moved. The best programs today, running on tiny processors, easily beat all but the best human players. One of the greatest advances in the last ten years has been the ability to exchange game scores in digital formats. This has led to databases of millions of games that can be searched in a few seconds.
All commercial chess database software is built around some database format which is proprietary to that software. Databases in that format can be exchanged only between owners of the software. A more general exchange of data between owners of incompatible software packages requires a neutral format which is understood by both packages.
Fortunately for the chess world a neutral format was created in the mid-1990s and caught on quickly. It's called Portable Game Notation (PGN). Files using PGN encoding are usually assigned the extension *.PGN. Here's a sample game in PGN format.
[Event "chp"]
[Site "USA"]
[Date "1956.??.??"]
[Round "?"]
[White "Byrne, D."]
[Black "Fischer, R."]
[Result "0-1"]
1.Nf3 Nf6 2.c4 g6 3.Nc3 Bg7 4.d4 O-O 5.Bf4 d5 6.Qb3 dxc4 7.Qxc4 c6 8.e4 Nbd7 9.Rd1 Nb6 10.Qc5 Bg4 11.Bg5 Na4 12.Qa3 Nxc3 13.bxc3 Nxe4 14.Bxe7 Qb6 15.Bc4 Nxc3 16.Bc5 Rfe8+ 17.Kf1 Be6 18.Bxb6 Bxc4+ 19.Kg1 Ne2+ 20.Kf1 Nxd4+ 21.Kg1 Ne2+ 22.Kf1 Nc3+ 23.Kg1 axb6 24.Qb4 Ra4 25.Qxb6 Nxd1 26.h3 Rxa2 27.Kh2 Nxf2 28.Re1 Rxe1 29.Qd8+ Bf8 30.Nxe1 Bd5 31.Nf3 Ne4 32.Qb8 b5 33.h4 h5 34.Ne5 Kg7 35.Kg1 Bc5+ 36.Kf1 Ng3+ 37.Ke1 Bb4+ 38.Kd1 Bb3+ 39.Kc1 Ne2+ 40.Kb1 Nc3+ 41.Kc1 Rc2+ 0-1
Here's a diagram showing the two character notation for each square on the board.

The format is defined in a document called the PGN Specification and Implementation Guide which can be found by following the 'Chess game downloads' link in the right sidebar of this article. The document defines all necessary formats for encoding chess game data using text files.
A PGN game score has two sections, formally known as the:-
Tag pair section and
Movetext section
The tag pairs are the lines enclosed in square brackets ('[' and ']'). Each tag pair consists of a tag name plus its value. In the tag pair '[Event "chp"]', 'Event' is the tag name and 'chp' is its value. It's clear from the tags that the game score is the well-known 'Game of the century' which 13-year old Bobby Fischer won against Donald Byrne in the 1956 U.S. Championship.
The movetext section is the game score in algebraic notation, which the PGN document calls Standard Algebraic Notation (SAN). The document defines details like 'O-O' instead of '0-0' for castling, and '=' for promotions ('c8=Q').
The tags in the Byrne - Fischer example ('Event', 'Site', 'Date', 'Round', 'White', 'Black', & 'Result') are collectively known as the Seven Tag Roster (STR). [In the computer world everything gets an acronym. There's the famous story of the technician who was confused by the meaning of 'FAN' on a schematic until he was told to look it up in the dictionary.]
Every PGN game score is required to have those seven tags. The PGN document says, 'The interpretation of these tags is fixed as is the order in which they appear. Although the definition and use of additional tag names and semantics is permitted and encouraged when needed, the STR is the common ground that all programs should follow for public data interchange.' If the value of one of the seven tags is unknown, it is filled with '?', as seen in the 'Round' tag from the example.
Other tags in common use are 'ECO', which gives the ECO classification, and 'Annotator', which gives the name of the person annotating the game. Yes, the PGN document standardizes the format for game comments. It also doesn't require a game to start from the initial position. The 'FEN' tag (Forsyth-Edwards Notation), described in the document, can be used to specify any legal chess position.
The great advantage of PGN is that it can be read and easily understood by people and by computers. Written in 1994 by Steven J. Edwards, the PGN document saw widespread distribution via the Internet and within a few years had become *the* standard. Today it is rare to see a text file using any other standard.
What are the disadvantages of the PGN standard? One common complaint is that its text-based encoding is not particularly efficient. The same data in ChessBase format uses about half the disk space. Web site operators with large chess databases often prefer to distribute their data in ChessBase or other format to save on disk costs. Even when compressed using ZIP, the space savings of a chess-specific format are substantial.
Another disadvantage is that there are no general guidelines for the values. The 'Site' tag value is often crammed into the 'Event' tag or vice versa. The values can be terse to the point of incomprehensible : [Event "ct"] means a candidates tournament. Players names are not written uniformly in all databases and not every chess program knows that Chirov and Shirov can be the same person.
But these problems are insignificant compared to the advantages of PGN. Searching and downloading game files is one of the most popular chess activities on the Internet, and PGN can take a lot of credit for this popularity. The quote from the introduction to the PGN document...
If now, while they are one people, all speaking the same language, they have started to do this, nothing will later stop them from doing whatever they propose to do. - Genesis XI, v.6
...is apt indeed.

The PGN standard is described here: