Scoring Open-Ended Concept Maps. . . 1

A technique for automatically scoring open-ended concept maps

by

Ellen M. Taricani and Roy B. Clariana

The PennsylvaniaStateUniversity

College of Education

Editorial contact:

Dr. Roy Clariana

PennStateUniversity

1510 S. Quebec Way #42

Denver, CO80231

303-369-7071

Educational Technology Research and Development, 54, 61-78.

Running Head: Scoring open-ended concept maps...

A technique for automatically scoring open-ended concept maps

Abstract

Thisdescriptive investigation seeks to confirm and extend a technique for automatically scoring concept maps. Sixty unscored concept maps from a published dissertation were scored using a computer-based technique adapted fromSchvaneveldt (1990) and colleague’s Pathfinder network approach. The scores were based on link lines drawn between terms and on the geometric distances between terms. These concept map scores were compared to terminology and comprehension posttest scores. Concept map scores derived from link data were more related to terminology whereas concept map scores derived from distance data were more related to comprehension. A step-by-step description of the scoring technique is presented and the next steps in the development process are discussed.

______

Keywords: computer-based assessment, Pathfinder networks, technology tools

A technique for automatically scoring open-ended concept maps

Currently there is considerable interest in the use of concept maps both to promote and to measure meaningful learning (Shavelson, Lang, & Lewin, 1994). Concept maps are sketches or diagrams that show the relationships among a set of terms by the positions of the terms and by labeled lines and arrows connecting some of the terms. Guided by Ausubel’s (1968) theory, teachers and researchers, mainly in science education, have considered concept hierarchy, proposition correctness, and cross-concept links as the most salient features of concept maps (Rye & Rubba, 2002).

Concept maps are most often scored by raters using rubrics to quantify content, meaning, and visual arrangement. Extensive empirical research has shown that scoring approaches with the highest reliability and criterion-related validity compare specific features in student concept maps to those in expert referent maps (Ruiz-Primo & Shavelson, 1996). Because of the nature of the information in concept maps and also the idiosyncrasies of individual maps, raters almost necessarily must be subject-matter experts in the content of the map. Also, scoring concept maps by hand takes time and experience.

McClure, Sonak, and Suen (1999) reported that raters can be overwhelmed by complex scoring rubrics. After comparing six concept map scoring approaches, they concluded that the reliability of concept map scores decreases substantially as the cognitive complexity of the scoring task increases. Taken together, these issues mitigate against the casual use of concept maps for assessment in the classroom with all but the simplest scoring rubrics.

Several large-scale projects are underway that seek to automate concept map scoring (Cañas, Coffey, Carnot, Feltovich, Hoffman, Feltovich, & Novak, 2003; Herl, O'Neil, Chunga, & Schacter, 1999; Luckie, 2001). This investigation considers a concept map scoring technique that is based on Pathfinder associative networks (Schvaneveldt, Dearholt, & Durso, 1988) that does not require raters.

Pathfinder Associative Networks

Pathfinder networks (PFNets) are a well established method for representing knowledge that has been applied in a number of domains of interest to instructional designers (Jonassen, Beissner, & Yacci, 1993). PFNets are 2-dimensional graphic network representations of a matrix of relationship data in which concepts are represented as nodes and relationships as unlabeledlinks connecting the nodes. PFNets visually resemble concept maps, but without linking terms.

There are three steps in the Pathfinder approach. In Step 1, raw proximity data is collected typically using a word-relatedness judgment task. Participants are shown a set of terms two at a time, and judge the relatedness of each pair of terms on a scale from one (low) to nine (high). The number of pair-wise comparisons that participants must make is (n2 – n)/2, with n equal to the total number of terms in the list. In Step 2, a software tool called Knowledge Network and Orientation Tool for the Personal Computer (KNOT, 1998) is used to reducethe raw proximity data intoa PFNetrepresentation. Pathfinder uses an algorithm to determine a least-weighted path that links all of the terms. The rules for calculating the least-weighted path can be adapted by adjusting parameters that reduce or prune the number of links in the resulting PFNet (refer to Dearholt & Schvaneveldt, 1990). The resulting PFNet is based on a data reduction approach that is purported to represent the most salient relationships in the raw proximity data. In Step 3, the similarity of the participant’s PFNet to an expert referent PFNet is calculated also using KNOT software (Goldsmith & Davenport, 1990).The total number of derived links shared by two PFNetsis called links in common.Common isa positive integer that ranges from zero to the maximum number of links in the referent PFNet.

How can this Pathfinder approach be used to score concept maps? Several investigators have shown that concept map-like tasks capture some of the same types of relational information as word-relatedness judgment tasks (Jonassen, 1987; McClure et al., 1999; Rye & Rubba, 2002; Schau, Mattern, Zeilik, & Teague, 1999). Compared to word-relatedness judgment tasks, concept maps are relatively fast to complete and perhaps more authentic (Shavelson et al., 1994). In the present investigation, concept maps provide an alternative to word-relatedness judgment tasks in Step 1 for obtaining raw proximity data, while Steps 2 and 3 are conducted in the conventional way. Thus, the main contribution of this investigation is in clearly describing how components of a concept map can be converted into raw proximity data in Step 1, and then describing the validity and reliability of the scores obtained.

An Automatic Technique for Scoring Concept Maps

What information components of concept maps can be collected automatically?Concepts, links, and linking termscan be counted in various ways. In addition, following Kitchin (2000);Yin, Vanides, Ruiz-Primo, Ayala, and Shavelson (2004) have proposed that map structure complexity, as determined by examining the overall visual layout of the map, should also be considered. Automatically measuring these components of concept maps is easier with closed-ended concept mapping tasks where the student is provided with a predefined list of concepts and linking terms, such as the concept map scoring software used by Herl et al. (1999). However, many investigatorsrefer to open-ended concept mapping, where participants may use any concepts and linking termsin their maps, as the gold standard for capturing students’ knowledge structures (McClure et al., 1999; Ruiz-Primo, Schultz, Li, & Shavelson, 1999; Yin et al., p.24). But automatically scoring open-ended concept maps is considerably more difficult.

Clariana, Koul, and Salehi (in press) piloted a technique for scoring open-ended concept maps. Practicing teachers enrolled in graduate courses constructed concept maps on paper while researching the topic, “the structure and function of the human heart and circulatory system” online. Participants were given the online addresses of five articles that ranged in length from 1,000 to 2,400 words but were encouraged to view additional resources. After completing their research, participants then used their concept map as an outline to write a 250-word text summary of this topic (see Clariana, 2003). Computer software tools (Clariana, 2002) were used to measure the geometric distances between terms in the concept maps, referred to as distance data, and to count the link lines that connected terms, referred to as link data (see Figure 1). Using PathfinderKNOT software, the raw distance and link data were converted into PFNets and then were compared to an expert’sPFNetsto obtain similarity scores. Five pairs of raters using rubrics also scored all of the concept maps and text summaries. The correlation values (Pearson r) for the concept maps scored by raters compared to concept map link-based scores was 0.36, to concept map distance-based scores was 0.54, and to text summaries scored by raters was 0.49. Thecorrelation values for the text summariesscored by raters compared to concept map link-based scores was 0.76 and to concept map distance-based scores was 0.71.The authors concluded that these “automatically derived concept map scores can provide a relatively low-cost, easy to use, and easy to interpret measure of students’ science content knowledge.”

Figure 1

The link and distance data arrays of a simple map.

It is important to note that “link line” does not correspondto the term“proposition” used in the concept map literature. A proposition is the combination of two concepts (e.g., subject–predicate) and a linking term (e.g., verb) that describes the relationship between the two concepts. For example, in the proposition, “the aorta is a blood vessel”, “aorta” and “blood vessel” are concepts and “is a” is the linking term. Typically, proposition are scored based on correctness, which includes deciding whether the linking term is valid and significantfor that context.Recently, Harper, Hoeft, Evans, and Jentsch (2004) reported that the correlation between just counting link lines compared to actually scoring correct and valid propositions in the same set of maps was r = 0.97, suggesting that the substantial extra time and effort required to specify and hand-score all possible linking terms adds little additional information over just counting link lines.

Thus, the approach used by Clariana et al. (in press) is a significant departure from most concept map scoring approachesbecause it considers link lines, not proposition correctness.In addition, their techniqueuses the relative spatial location of concepts to communicate hierarchical and coordinate concept relations (Robinson, Corliss, Bush, Bera, & Tomberlin, 2003, p.26).This approach is founded on previous research on free association norms (Deese, 1965), on structural analyses of text propositions (Frase, 1969), on the Matrix Model of Memory (Humphreys, Bain, & Pike, 1989; Pike, 1984), and on current neural network models of cognition (McClelland, McNaughton, & O'Reilly, 1995).

For example, in an untrained neural network, concept units (e.g., subjects, predicates, and linking words) are randomly associated, but as the network learns by experiencing many propositions, structural relationships emerge.Elman (in press) has shown that verbs (linking words) and nouns (subjects–predicates) separate into two separate clusters, and then nouns sub-cluster based on their meaning (e.g., animate and inanimate). The present investigation assumes that the distances between concepts terms in concept maps capturesaspects of this underlying fundamental concept structure (e.g., subject–predicate associations). In addition, distance data may provide a direct measure of what Yin et al. (2004) call map structure complexity.

In a follow-up study, Clariana and Poindexter (2004) used the same Pathfinder scoring technique but asked participants to drawnetworkmaps rather than concept maps. Network maps are like concept maps, except that there are no linking terms. The mapping directions specifically directedthe participants to use spatial closeness to show relationships andintentionally deemphasized the use of link lines. Participants completed one of three print-based text lessons on the heart and circulatory system. The three lesson treatments included adjunct constructed response questions, scrambled-sentences, and a reading only control. Participants completed three multiple-choice posttests that assessed identification, terminology, and comprehension and then were handed a list of 25 pre-selected terms and were asked to draw a networkmap. The adjunct question treatment was significantly more effective than the other lesson treatments for the comprehension outcome, and no other treatment comparisons were significant. Scores based on networkmap link data (relative to distance data) were more related to terminology with Pearson’s r = .77 compared to r = .69, while scores based on networkmap distance data (relative to link data) were more related to comprehension, with Pearson’s r = .71 compared to r = .53. Thus the geometric distances between terms related more to the broader processes and functions of the heart and circulatory system, while the links drawn to connect terms related more to verbatim knowledge from the lesson text covering facts, terminology, and definitions.

The purpose of the presentdescriptive investigation is to confirm and extend these two previous experimental studies involving this computer-based technique for scoring open-ended concept maps. The ultimate goal of this line of research is to develop a software tool that allows a learner to create and save a concept map and then automatically score the map relative to an archived expert referent map, receiving specific feedback on their maps.One necessary component of this software tool is the mathematical approach used to convert raw data into scores in Step 2. Therefore, this investigation compares multiple data reduction approaches for forming PFNets. The resulting scores are compared to traditional multiple-choice posttests that measure terminology and comprehension of the lesson content as a measure of the concurrent criterion-related validity of the various concept map scores.

Method

A recent dissertation that used concept maps as an instructional treatment provides an ideal and convenient existing data set for the purposes of the present investigation (Taricani, 2002). Though this is previously published data, the present investigation does not reexamine the original research questions. Further, this computer-based scoring method was not available at the time of the original publication and also the concept maps werenot scored by raters in that study. In Taricani’s dissertation, undergraduate students were randomly assigned to one of five treatment conditions including a learner-generated concept map treatment with feedback, a learner-generated concept map treatment without feedback, a partially completed fill-in-the blank concept map treatment with feedback, a partially completed fill-in-the blank concept map treatment without feedback, and finally a reading-only, no map or feedback control treatment. Of these five, only the first two treatments involved creating a concept map, so only these two treatments are included in the present investigation.

Participants

Participants were freshmen students at a large northeastern university (n = 60) recruited as volunteers from both science and non-science courses. They were randomly assigned to either the learner-generated concept map treatment with feedback (Feedback) or the learner-generated concept map treatment without feedback (No Feedback). The No Feedback treatment group had 17 males and 13 females and the Feedback group, 12 males and 18 females. Participation was voluntary and participants were rewarded with either extra course credit or pizza and ice cream for their participation but not for performance.

Materials

The print-based instructional text was a 1,900-word passage on the human heart developed by Dwyer (1972) called “The human heart: Parts of the heart, circulation of blood, and cycle of blood pressure.” The text dealt with the parts of the heart and the internal processes that occur during the systolic and diastolic phases. The complexity of the information provided was suitable for these participants' general knowledge and comprehension levels.

Following the approach used by Novak and Gowin (1984), a two-page lesson on how to create a concept map was developed (available from Taricani, 2002, pp. 122-124). This two-page lesson included a description of concept mapping and an example of a hierarchical concept map. A short paragraph was presented that contained a familiar scenario about a student who took a walk to the campus library. The walk scenario was selected as a metaphor of blood flow through the heart (e.g., the student moves from point-to-point going past various buildings along the way and blood flows from point-to-point passing through various components along the way). After reading this paragraph, participants were asked to draw a concept map of the library walk scenario on a blank sheet of paper as practice. To foreshadow the lesson treatment, feedback in the form of an instructor prepared hierarchical concept map of the library walk was given to the Feedback group after they had completed the drawing portion of the 2-page concept map lesson. Feedback was not provided to the No Feedback group.

Posttests

The multiple-choice criterion posttest originally developed and validated by Dwyer (1972), consisted of 20 questions that dealt with terminology and 20 questions that dealt with comprehension. The terminology test was designed to measure declarative knowledge of facts, terms, and definitions. The comprehension test was designed to measure a more thorough understanding of the processes of the human heart, with a specific focus on the functions of different parts of the heart. The KR-20 reliability for the posttest was 0.83.

Procedure

The procedure was similar for both the Feedback and No Feedback treatment groups. First, participants completed a demographic survey. Next they completed the 2-page training lesson on how to draw a concept map, with or without feedback. Then participants read the 3-page instructional text on the human heart and were asked to draw a concept map of that information on a blank piece of paper while reading the lesson text. Participants could use any terms and any number of terms in their maps.