Lab 2: NLE Software in Windows

Lab 2: NLE software in Windows

Week 3, Tuesday 14th

There are two suites of NLE tools installed on the Lab machines. Xelda, the Xerox-developed suite of tools for NL processing based on finite state technology, includes tools for tokenising, morphological analysis, and part of speech tagging. WordNet is a lexical database. The goal of this lab is to get acquainted with Xelda; we will cover WordNet in a subsequent lab.

1. Xelda

The Xelda suite of tools includes tokenizers, part of speech taggers, NP extractors, and morphological analysers for several languages, including English, French, German, Spanish, Italian, Portuguese, and Russian. The tools have a client / server architecture : client programs (such as the application programs jxelda, see below, or your own Java programs) send requests to the Xelda server, that performs all the calculations.

The tools are in the folder

C:\Xelda

There is a pretty good online documentation at

C:\Xelda\doc\manuals\home.html

Exercise 1: open the folder C:\Xelda, and the online documentation file.

1.1 Starting the server

To use Xelda, whether using the graphical interface or from your Java program, you need first of all to start the server. To start the xelda server, click on the Start button, then select Programs, Xelda, and finally,xelda_server. This should result in a command window opening. (Alternatively, change to the folder containing thexelda_serverscript (C:\Xelda\bin) and double click onxelda_server.bat.)

1.2 Jxelda

After you start the server, the simplest way to see what Xelda can do is to start the application client jxelda. From the Start button, choose Programs, Xelda, and jxelda; or from the C:\Xelda\bin folder, click on jxelda.bat. A second command window should open. Then two more windows should open, one called simply 'Jxelda', the other 'Jxelda Result'. You should get a window that looks like this (ignore the text for the moment):

The next thing you have to do is to fix a small problem with the current setup. Sometime after starting jxelda, you will see a window appearing, saying that the connection with ' is not possible. Just click OK; then, in the Jxelda window, you should click on 'Options', then 'New Connection'. A new window will appear; click `OK' again.

Now everything should be working. The first thing you should try is the tokenizer. Type the following in the window:

Clairson International Corp. said it expects to report a net loss for its second quarter ended March 26 and doesn't expect to meet analysts' profit estimates of $3.9 to $4 million, or 76 cents a share to 79 cents a share, for its year ending Sept. 24.

Then select the text with the mouse (see above) and click on 'Tok'. In the 'Jxelda result' window, you'll see that the xelda tokenizer correctly identifies the last period as an end of sentence element, whereas the previous periods are understood as being part of a figure ($3.9) and an abbreviation (Sept.).

Now click on 'Morp'. This calls the morphological analyzer, that produces a series of hypotheses about the POS tags of the tokens in the Jxelda result window. For example, xelda will notice that 'report' can be classified either as a NN or as a VB:

report

» report+NN

» report+VB

In order to get xelda to assign each word only the most plausible tag in the context, click on `Disam'. You'll see that Xelda correctly classifies `report' as a VB in this context.

Finally, clicking on NP will activate Xelda's NP identifier - a partial parser that identifies all NPs in the text. (E.g., 'net loss' will be identified as a NP consisting of an adjective, JJ, and a noun, NN.)

Exercise 2: analyze the results of the NP parser. Does it correctly identify all and only the NPs in the text? For example, has "its second quarter" been recognized as an NP?

1.3 Calling Xelda from a Java program: the Xelda API

It is also possible to use Xelda from a Java program using the Xelda API. Again, this involves calling a server. The crucial steps in doing this are:

creating an instance of the class XeldaApi:

XeldaApi xelda_api = new XeldaApi();

creating a buffer in which the results of the connection will be stored:

StringBuffer connection_result = new StringBuffer();

starting up a connection:

xelda_api.newService("socket", "localhost", 40002, "", connection_result);

Once the connection is created, the server can be called to perform various services, including tokenization, morphological analysis, disambiguation, and npextraction. For example, the following code implements a request to tokenise the string "This is a test":

ResultReq resreq = xelda_api.tokenization("HTML",

"ascii",

"This is a test",

"English",

"FSM");

In the next page you’ll find a simple program that puts all of these things together and prints out the output of the tokenizer (the program is in the cscourse folder, accessible at , and is called XeldaTestOne.java).

Before running the program, make sure you start the server, as when using jxelda. You also have to make sure that the archive C:\xelda\lib\xeldaapi.jar is in your CLASSPATH variable.

There are more examples of calls to xelda in the Xelda Java API User Guide (that can be found starting from C:\xelda\doc\manuals\home.html).

Exercise 3: copy XeldaTestOne.java in your directory, and modify it so that it tokenizes the example text about Clairson Corporation used to test jxelda earlier in the Lab.