CRAFT / Jeudi 12 Juillet à 10:15:: Salle INM200
Facilitating Reliable Content Analysis of Corpus Data withAutomatic and Semi-automatic Text Classification Technology:
AbstractIn this talk I introduce the objectives of the TagHelper project, which
is an applied research project exploring the application of text
classification technology to support corpus coding for behavioral research.
A wide range of behavioral researchers including social scientists,
psychologists, learning scientists, and education researchers collect, code,
and analyze large quantities of natural language corpus data as an important
part of their research. However, despite the availability of text
classification technology within the computational linguistics community,
none of the tools that are commonly used by behavioral researchers for their
coding efforts make use of it.
I will begin by giving an overview of the project in which I briefly
discuss alternative types of corpus coding we are experimenting with. We
then briefly discuss an evaluation of the bias introduced when corpus
analysts are exposed to imperfect coding predictions that must be
corrected. We evaluate this bias in terms of its impact on the speed,
reliability, and validity of coding. Finally, we present an in-depth
discussion of our work on automating the application of a
multi-dimensional process analysis used to analyze corpus data collected
from a series of investigations of the impact of collaboration scripts in
a computer supported collaborative learning setting (in collaboration with
KMRC, Tuebingen). The result of this work in progress is a suite of
techniques that can be used to automate 80%-100% of the coding on all 7
dimensions of this multi-dimensional coding scheme with an acceptable
level of reliability.
Bio Carolyn Rose is a Research Scientist with a 50/50 joint appointment
between the Language Technologies Institute and the Human-Computer
Interaction Institute at Carnegie Mellon University. She earned her PhD
in Language and Information Technologies from Carnegie Mellon in 1997.
She then worked as a Research Associate at the Learning Research and
Development Center for 6 years working on tutorial dialogue systems. She
has been at Carnegie Mellon as a faculty member since Fall of 2003.
Carolyn Rose's primary research objective is to develop and apply advanced
interactive technology to enable effective computer based and computer
supported instruction. A particular focus of her research is in exploring
the role of explanation and language communication in learning. Thus,
one major thrust of her research is in developing and applying language
technology to the problem of eliciting, responding to, and automatically
analyzing student verbal behavior. However, many of the underlying HCI
issues that are central to her work, such as influencing student
expectations, motivation, and learning orientation, transcend the
specific input modality.