Aquaint Program

UNCLASSIFIEDPage 111/15/2018

AQUAINT PROGRAM

HITIQA Quarterly Report #5

Period covered by this report: December 2001 through March 31, 2003

Report Date: March 31, 2002

Classification Status: UNCLASSIFIED

Latest quarter events appear on shaded background

A. Project and Contact Information

ARDA Tracking number:

Project Title: HITIQA: High-Quality Interactive Question Answering

Technological Area: Information Exploitation. AQUAINT program. Building automated question answering system for intelligence analysts.

Point of contact:

Prof. Tomek Strzalkowski, Department of Computer Science,

University at Albany, SUNY, 1400 Washington Avenue,

Albany, NY12222.

Email:

Phone: 518-442-2608; Fax: 518-442-2606

Administrative point of contact:

Ms. Linda Donovan, Office of Sponsored Programs,

University at Albany, SUNY, ManagementServicesCenter, Rm. 312,

Albany, NY12222.

Email:

Phone 518-437-4555; Fax: 518-437-4560

B. Reportable & Tracking Information

Project Events:

AQUAINT kickoff meeting held at XeroxConferenceCenter on December 6-8, 2001
ARDA site review held at SUNY Albany East Campus on April 18, 2002. Attending were representatives of ARDA, CIA and AFRL as well as SUNY Albany and Rutgers.
In consultation with our COTR (John Donelan) prepared a memo to ARDA explaining why participation in main TREC QA task is not appropriate for HITIQA. Instead, we will participate in the dialogue evaluation pilot.
AQUAINT mid-year workshop held at Monterey Hyatt on June 11-13, 2002.
We have had a personnel change. Dr. Robert Erbacher has left the project and has been replaced by Professor Boris Yamrom. Prof Yamrom is a graphics and visualization expert. He joined HITIQA as of June 17, 2002.
Mr. Peter LaMonica from AFRL Rome started a summer project. He will develop a “user-validated” interface for HITIQA.
Mr. Tom Palen, an undergraduate in Computer Science Dept. has joined the project for the summer.
Prepared guidelines for AQUAINT Dialogue evaluation pilot to be held as part of TREC QA evaluations. This was done jointly with Jean Scholz and Sanda Harabagiu.
The PI is an editor (jointly with Prof. Harabagiu) of a new book “Advances in Open-Domain Question Answering” to be published by Kluwer in 2003.
Several people attended ACL conference in Philadelphia.
SUNY-Rutgers coordination meeting on information quality assessment held at Rutgers July 9-11.
HITIQA all hands meeting was held at SUNY Albany on August 8
The PI and co-PI attended SIGIR conference in Tampere, Finland, August 12-14.
SUNY has performed communication tests with NIST in preparation for AQUAINT Dialogue evaluations in October.
Mr. Uzoma Enyinna, a graduate student in information science, joined the project for the summer to assist information quality experiments.
The ILS Institute has moved to new, larger space on the main SUNY Albany campus.
Dialogue evaluations were conducted on line with NIST analysts on October 15, 18 and 22. This was an evaluation pilot including a wizard in the loop (to allow intervention in difficult situations). The results appear to be very good. The next pilot is being prepared for April-May 2003, and will most likely involve fixed-domain scenarios based on CNS data.

Incremental funding of $200K has arrived at SUNY on October 30th. This brings the current allocation to approx. 54% of the project total.
Quarterly project review held at RestonVA, on Nov. 22. Review attended by John Prange and Paul Matthews (ARDA), John Donelan, Jean-Michel Pomarede, Susan Viscuso (CIA), Tomek Strzalkowski, Sharon Small (SUNY Albany), Paul Kantor (Rutgers) and Boris Yamrom (CUNY). Progress on all research tasks has been reported and HITIQA system demonstration presented. Paul Matthews said HITIQA is the best progress, best managed, most promising project in AQUAINT program.
AAAI Spring symposium paper on interactive question answering accepted for presentation (March 2003)
Additional papers have been submitted to: ACL-03 and HLT-03.
Robert Rittman (Rutgers) attended CNS workshop in Monterey.
AQUAINT program meeting was held at Crystal City Marriott on Dec 3-5. This event was attended by Tomek Strzalkowski, Paul Kantor, Boris Yamrom, Rong Tang, Sharon Small, Ting Liu, and Robert Rittman. We have presented a preliminary system demo that has been well received.
We have intensified contact with AFRL Rome Labs (Chuck Messenger). They will help us to evaluate HITIQA using Air Force environment. A meeting to discuss this has been set for Jan. 7th, 2003.
We held a collaboration meeting at SUNY on January 7th with the AFRL Rome Labs (Chuck Messenger, Peter LaMonica). They will help us to evaluate HITIQA within the Air Force environment, using Early Bird data (to which we have no access). Peter LaMonica has developed a number of scenarios for this database.
The PI (Tomek Strzalkowski) and two students will participate in NRRC summer workshop on document retrieval for QA. We have received notification that the IR-QA summer workshop has been approved by the Executive Committee. The organizing meeting is planned for March 7 at NIST.
We have recruited a new student programmer, Sean Ryan. He will be responsible for maintaining user interface supporting information quality experiments, as well as organization and support of the second phase of experiments.
We are in process of negotiating a subcontract for CityUniversity (Boris’s institution) to hire a student programmer to help with visualization development. We are currently awaiting the approval from the sponsor.
Joint SUNY-Rutgers coordination meeting was held at SUNY on February 11. The agenda is attached to this report. We have established timeline for implementation of the second HITIQA prototype by April 10 review. We have also decided details of the second round of quality experiments. We have a detailed discussion of preliminary results from this work thus far.
We held a preliminary on-line evaluation session with AFRL Rome analysts accessing HITIQA via web-based interface. Several sessions were conducted in early March (10-13) using CNS data, with scenarios developed by AFRL.
The second round of Dialogue Evaluations is planned for April 21-24. NIST has released data and scenarios to be used.
The PI and Sharon Small have attended AAAI Spring Symposium on New Directions in Question Answering at Stanford University (March 24-26).
HITIQA Quarterly review is planned for April 10 at SUNY Albany.

Project Accomplishments by task:

Question Semantics (Task 1)
Selected an initial subset of TREC data (Disks 4 and 5) and an initial subset of questions derived from TREC7 and TREC8 queries. These questions will be used in building the preliminary version of the system as well as in user experiments.
Developed preliminary system architecture for end-to-end design. The architecture includes user interface, document search engine, document segmentation and clustering, cluster filtering and answer bootstrapping and selection. In addition, dialogue points have been identified, i.e., the points where the Dialogue Manager may ask clarification questions.
The preliminary end-to-end system has been completed based on the above architecture. It was demonstrated to ARDA at the project review in April 2002.
Integrated SMART search engine over Tipster disks 4 and 5 database.
Clustering algorithm has been developed to sift through the data returned from document search. Cluster signatures support topic separation by automated means. This is also a dialogue point.
We obtained the new AQUAINT text corpus and replaced TREC data. This corpus also supports most of the dialogue evaluation scenarios.
We have developed a system of frames to represent the main attributes of text. Frames are assigned to small chunks of text that form topical clusters. Frames have attributes including: target (the main content element), location, organization, persons, time, monetary values, among others. Simple extraction procedures have been developed to populate attributes within frames. The small size of text chunks helps to control ambiguity. The frames impose a partial structure on text as well as user questions, and help the dialogue manager to conduct meaningful interaction with the user.
We have developed an initial integration code between web interface and HITIQA. Peter LaMonica of AFRL has designed a Web interface to HITIQA. This interface has been integrated as a front end to HITIQA system. At this time it is possible (in theory) to access HITIQA on-line, however we did not make it available until we can secure the server access from potential intrusions.
We have completed the first version of text framer that implements question semantics suitable for interactive dialogue. Given the open-domain of the data, we have opted thus far for relatively generic frame model. In the next phase, we will be working with more domain specific data (CNS collection) therefore we anticipate a further specialization of the frame model. The summary of the data-drive semantics approach is in the chart below.
We have built and implemented initial framing capabilities for nuclear domain based on CNS data. The goal is to provide more detailed tagging (framing) of text that is known to be within a specific domain. This will in turn allow for more informed dialogue to occur.
We have extended text framing to include top ten retrieved paragraphs (in addition to the key clusters. This covers potentially highly relevant items that may not have yet occurred often enough in the data stream to produce signal reinforcement required to form a cluster.
New software: we have installed and integrated InQuery into HITIQA. BBN has reported further software problems with Linux version of IdentiFinder and delayed delivery indefinitely. We are expanding GATE capabilities instead.

Dialogue Management (Task 2)
Developed a proposal for Wizard of Oz evaluation to test the system’s projected dialogue points.
Developed the question processing architecture to support dialogue manager, a shown in the figure below. This architecture expands and specifies further details to the initial end-to-end architecture.

Started development of generic frames for dialogue maintenance. The following have been completed:
Code to determine the “base concept” of a cluster
GATE’s partof speech tagger adapted; interface written
JWNL installed on Unix and Linux
We have started development of “real” user interface to the dialogue system. The design is being prepared by Peter LaMonica at AFRL Rome who is working with potential users of the system.
We have implemented a first version of Dialogue Manager for HITIQA. At present it asks clarification questions in yes/no question format and accepts yes/no (or equivalent) answers that lead to manipulation of answer set.
We have developed a general Narrow-Broaden dialogue strategy for interacting with information. In this strategy, the Dialogue Manager is alternating between a Content Narrowing Task and Content Broadening Task. In the narrowing task, it asks clarification questions to narrow the scope of what the system perceives to be an answer space. In the broadening task, the clarification questions are aimed at expanding the scope of the current answer space over what is considered highly related and complementary information retrieved. The order in which these tasks are performed depends upon the size and variability of the Retrieved Set and the Answer Space.
We have completed the first version of the HITIQA Dialogue Manager. It was used in the Dialogue QA evaluation pilot. It has been demonstrated live at Reston review and at AQUAINT 12-month workshop. The current system supports clarification dialogue with the user (for analytical questions). We adopted the dialogue strategy whereby the system attempts to determine the most relevant information space. This is aided by visual interaction. At this point the response generation capabilities are rudimentary.

Current status of the system is depicted in the figure below:

We have completed initial framing capabilities for nuclear domain based on CNS data. The goal is to provide more detailed tagging (framing) of text that is known to be within a specific domain. This will in turn allow for more informed dialogue to occur. We are currently expanding this set to cover the WMD sub-domain.

The new web interface to HITIQA has been completed and is being tested. This interface will be used in April evaluations conducted by NIST. Prior to this we have set up a series of tests to be performed by AFRL Rome analysts. For this purpose, Chuck Messenger has asked AFRL analysts to create a set of scenarios based on CNS data. These scenarios will be used in testing HITIQA in mid March.
Additional settings selections have been added to HITIQA to allow more extensive testing with multiple options. This includes the number of near-miss topics considered, number of mismatches allowed, etc.
Further, we added options for continuation question after the initial question triaging is completed. This is a major expansion to HITIQA dialogue capabilities that will be first demonstrated at April review. We expect tests to be conducted with AFRL Rome assistance in mid-March.
The second version of HITIQA dialogue manager has been implemented with specialization for CNS domain. This is currently being tested in the lab. This includes construction of simple gazetteer for the WMD domain.

Information Quality (Task 3)
Completed literature review on information quality.
IRB protocol submitted and approved. The protocol was revised and resubmitted in mid February. It received the approval on 2/25/02.
Completed the first Focus Group study on March 8, 2002:
The purpose was to learn the information quality criteria used by professional journalists in preparation of news stories based on sources such as newswire services, consulting services, and live interviews.
Participant have been recruited from SUNY-Albany Communication Department, Journalism Program, Northeast Public Radio (WAMC), Times Union, TV Channel 6 (WRGB), 10 (WTEN) and 13 (WNYT), SUNY-Albany Media Relations and Publications, and Associated Press.
Session conducted at Times Union in Albany from 2 to 4 pm. All recorded and transcribed.
GUI for quality assessment experiments has been developed. This interface, implemented in tcl/tk, runs on a Windows client and accesses a database on Linux server. Users review documents and assess their quality along several dimensions, supporting their assessments by textual evidence.
TREC-based document collection. Developed an initial document collection for quality assessment experiments based on 5 TREC queries converted into analytical questions, and approx. 200 documents per topic retrieved from TREC disks 4 and 5.
The experimental lab (500 sq. ft) has been set up at SUNY. We are in process of installing necessary software and testing the GUI before the experimental users are invited.
Developed statistical model for evaluating student judges against experts
Run expert quality assessment and annotation sessions and SUNY and Rutgers. Total 100 documents were judged by 20 experts.Four Expert sessions conducted in May-early June.
Student Judgment Sessions. A total of 60 SUNY Albany student participants recruited for thetraining/testing sessions. Preliminary testing sessions conducted on in mid-June – 10 sessions with 45 participants. Out of these, 35 participants were invited for further experiments. The main judgment sessions started on June 26 will continue through July 19, 3 times a week.
Quality Judgment Experiment Data analysis and Document Processing
We have prepared the list of textual features to be collected from documents for alignment with quality indicators.Programming scripts for textual feature extraction completed and results have been developed and validated on a subset of 30 random documents. Subsequently the programs extracting textual features were run over all 1000 documents for which human quality assessment has been performed.
Factor analysis on the 9 quality scores has revealed two discernible dimensions. One identified as the “style – content” axis, the other seems to be an axis related to content, with the extremes tentatively identified as “depth” and “breadth”.
We are currently working to establish correlations between the quality factors (either the 9 original factors, or the 2 meta-factors) and the occurrence and density of certain textual features.
We have conducted a series of experiments to align human quality judgments with textual features computed automatically from documents. Preliminary results suggest that such correlation may exist for some qualities, for example DEPTH and MULTIVIEW. In general, the prediction rates for quality factors are promising but not fully reliableyet.

We have performed an analysis of relative effectiveness of “massive” models, “statistically efficient models”, and “sensible models”. It is clear that the latter two outperform the brute force massive “kitchen sink” model.
We have also experimented with using Wordnet to improve recognition of authoritative sources.
We have selected 1100 documents from CNS data and 600 documents from AQUAINT data set to be used for quality experiments in the second phase. The documents were selected based on 10 topics (see below). The documents have been screened for length balance and re-tagged uniformly using HTML tags similar to those used in TREC collection.
In addition we have selected and tagged 500 documents from the Web and added these to the collection. Web documents are expected to display significantly different quality characteristics than well-formed news.
We have created 10 search topics for information quality experiments: North Korea Missile; U.S. Policy China Taiwan; Nuclear Weapon Free Zone; Nuclear warheads and Iraq; Defense spending and national government; Peaceful Use of Nuclear Energy; Arm Race in Outer Space; International arms control; High tech export control; History of Chemical weapons.
We have created 3 quality classifiers: for the quality Depth, Multi-view, and Objectivity. These 3 classifiers perform significantly better than chance on TREC and AQUAINT data, and we plan to include them in our demonstration. The classifiers predict HIGH values of those qualities, which can be color coded in the interface display.
The second round of information quality experiments started, with expert assessment sessions currently being conducted at SUNY. Student sessions are planned for April. Documents used include AQUAINT database, CNS database and documents harvested from the web (see above).

Information Visualization (Task 4)
Began construction of interface API for visualization environment to access document clustering.
Created initial specification of data format to be passed to visualization environment.
Started implementation of interface API to access the visualization environment through JNI.
We have identified conceptual visualization for topical clusters. This was presented to ARDA at the site visit in April. We are proceeding with implementation. The first version is planned for the October review.
Implemented 3D (using VTK Java wrappers) and 2D (Java only) visualizations of Frames relative to HITIQA requirements.Currently the remote response to 3D interactions is not acceptable (too slow over the network), so in the first stage we are integrating my 2D visualization into HITIQA. Later on 3D will be introduced as an option in case the system is used locally.
Planning on investigation of the possibility to modify the design of HITIQA to be cient/server application instead of monolitic one. This will allow for splitting GUI part that can be run as a remote client from the document processing server part running on a separate computer.
The current 2-D visualization has been incorporated into HITIQA system. The user can navigate through the visual displays reviewing the Retrieved Set, the main Answer Space and related information. Values of selected frame attributes are shown for quick assessment of where to go next.
The visualization module has been integrated into the main HITIQA system. At present 2-D visualization using color-coded maps is used. We are also exploring using 3-D visualization (although it is significantly more computationally expensive). The concept of 3-D version (including information quality dimension) has been demonstrated at AQUAINT 12-month workshop. Current 2-D visualization panel is shown below: