CRAFT / Jeudi 12 Juillet à 10:15:: Salle INM200

Facilitating Reliable Content Analysis of Corpus Data withAutomatic and Semi-automatic Text Classification Technology:

AbstractIn this talk I introduce the objectives of the TagHelper project, which

is an applied research project exploring the application of text

classification technology to support corpus coding for behavioral research.

A wide range of behavioral researchers including social scientists,

psychologists, learning scientists, and education researchers collect, code,

and analyze large quantities of natural language corpus data as an important

part of their research. However, despite the availability of text

classification technology within the computational linguistics community,

none of the tools that are commonly used by behavioral researchers for their

coding efforts make use of it.

I will begin by giving an overview of the project in which I briefly

discuss alternative types of corpus coding we are experimenting with. We

then briefly discuss an evaluation of the bias introduced when corpus

analysts are exposed to imperfect coding predictions that must be

corrected. We evaluate this bias in terms of its impact on the speed,

reliability, and validity of coding. Finally, we present an in-depth

discussion of our work on automating the application of a

multi-dimensional process analysis used to analyze corpus data collected

from a series of investigations of the impact of collaboration scripts in

a computer supported collaborative learning setting (in collaboration with

KMRC, Tuebingen). The result of this work in progress is a suite of

techniques that can be used to automate 80%-100% of the coding on all 7

dimensions of this multi-dimensional coding scheme with an acceptable

level of reliability.

Bio Carolyn Rose is a Research Scientist with a 50/50 joint appointment

between the Language Technologies Institute and the Human-Computer

Interaction Institute at Carnegie Mellon University. She earned her PhD

in Language and Information Technologies from Carnegie Mellon in 1997.

She then worked as a Research Associate at the Learning Research and

Development Center for 6 years working on tutorial dialogue systems. She

has been at Carnegie Mellon as a faculty member since Fall of 2003.

Carolyn Rose's primary research objective is to develop and apply advanced

interactive technology to enable effective computer based and computer

supported instruction. A particular focus of her research is in exploring

the role of explanation and language communication in learning. Thus,

one major thrust of her research is in developing and applying language

technology to the problem of eliciting, responding to, and automatically

analyzing student verbal behavior. However, many of the underlying HCI

issues that are central to her work, such as influencing student

expectations, motivation, and learning orientation, transcend the

specific input modality.