Testing Writing Skills: a Selected Bibliography

The International Research Foundation

for English Language Education

ASSESSING SECOND LANGUAGE WRITING SKILLS:
SELECTED REFERENCES

(last updated 14 May 2014)

Ali, K. A., & Mostafa, N. A. (2013). Errors in using past tense form in writing essays among Kurdish university learners. The Asian Journal of English Language and Pedagogy, 1, 176-189.

Ali, S. (2005). How effective is self-assessment in writing? In P. Davidson, C. Coombe, & W. Jones (Eds.), Assessment in the Arab world (pp. 307-322). Dubai: TESOL Arabia.

Allai, S. K., & Connor, U. (1991). Using performative assessment instruments with ESL student writers. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts. (pp. 227-240). Norwood, NJ: Ablex.

Amena, M. (2005). Identifying the processes of assessing writing using an analytic marking criteria. In P. Davidson, C. Coombe, & W. Jones (Eds.), Assessing second language writing in academic contexts (pp. 225-249). Norwood, NJ: Ablex.

Arthur, B. (1989). Short term changes in EFL composition skills. In C. A. Yorio, K. Perkins, & J. Schachter (Eds.), On TESOL ‘79: The learner in focus (pp. 330-342). Washington, DC: TESOL.

Attali, Y., & Burstein, J. (2006). Automated Essay Scoring with e-rater V.2. Journal of Technology, Learning, and Assessment (JTLA), 4(3).

Banerjee, J., Franceschina, F., Smith, A. M. (2007). Documenting features of written language production typical at different IELTS band score levels. In P. McGovern & S. Walsh (Eds.), IELTS Research Reports (Vol. 7) (pp. 241-309). Canberra, Australia: IELTS Australia.

Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores. Assessment in Education: Principles, Policy & Practice, 18(3), 277-291.

Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51-75.

Boldt, H. Valsecchi, M. I., & Cushing, S. C. (2001). Evaluation of student writing on text-responsible and non-text responsible writing tasks. MEXTESOL Journal, 24, 13-33.

Brodkey, D., & Young, R. (1981). Compositions correctness scores. TESOL Quarterly, 15(2), 159-168.

Brown, J. D. (1991). Do English and ESL faculties rate writing samples differently? TESOL Quarterly, 25(4), 587-603.

Brown, J. D., & Bailey, K. M. (1984). A categorical instrument for scoring second language writing skills. Language Learning, 34(4), 21-42.

Brown, J. D., Hilgers, T., & Marsella, J. (1991). Essay prompts and topics: Minimizing the effect of mean differences. Written Communicaton, 8, 533-556.

Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis J. C. Burstein (Eds.), Automated essay scoring: A cross disciplinary approach (pp. 113-121). Mahwah, NJ: Lawrence Erlbaum.

Calfee, R., & Perfumo, P. (Eds.). (1996). Writing portfolios in the classroom: Policy and practice, promise and peril. Mahwah, NJ: Lawrence Erlbaum.

Camp, R. (1993). Changing the model for direct assessment of writing. In M. Williamson & B. Huot (Eds.), Holistic scoring: Theoretical foundations and validation research (pp. 56-69). Cresskill, NJ: Hampton Press.

Carlise, R., & McKenna, E. (1991). Placement of ESL/EFL undergraduate writers in college-level writing programs. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 197-211). Norwood, NJ: Ablex.

Charney, D. (1984). The validity of using holistic scoring to evaluate writing. Research in the teaching of English, 18, 65-81.

Cho, D. (1999). A study on ESL writing assessment: Intra–rater reliability of ESL compositions. Melbourne Papers in Language Testing, 8(1), 1–24.

Cizek, G. J., & Page, B. A. (2003). The concept of reliability in the context of automated essay scoring. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 125–145). Mahwah, NJ: Lawrence Erlbaum.

Connor, U., & Carrell, P. (1983). The interpretation of tasks by writers and readers in holistically rated direct assessment of writing. In J. G. Carson & I. Leki (Eds.), Reading in the composition classroom (pp. 159-175). Mahwah, NJ: Lawrence Erlbaum.

Connor, U., & Mbaye, A. (2002). Discourse approaches to writing assessment. Annual Review of Applied Linguistics, 22, 263-278.

Connor-Linton, J. (1995a). Crosscultural comparison of writing standards: American ESL and Japanese EFL. World Englishes, 14, 99-115.

Connor-Linton, J. (1995b). Looking behind the curtain: What do L2 composition ratings really mean? TESOL Quarterly, 29, 762-765.

Cooper, C. R., & Odell, L. (Eds.). (1999). Evaluating writing: The role of teacher’s knowledge about text, learning, and culture. Urbana, IL: National Council of Teachers of English.

Cooper, T. C. (1976). Measuring written syntactic patterns of second language learners of German. Journal of Educational Research, 69, 176-183.

Cumming, A. (1989). Writing expertise and second language proficiency. Language Learning, 39, 81-141.

Cumming, A. (1990). Expertise in evaluation second language composition. Language Testing, 7(1), 31-51.

Cumming, A. (1990). The thinking, interactions, and participation to foster in adult ESL literacy instruction. TESL Talk, 20, 34-51.

Cumming, A. (1997). The testing of second-language writing. In C. Clapham (Ed.), The encyclopedia of language and education: Volume 7. Language assessment (pp. 51-63). Dordrecht, The Netherlands: Kluwer.

Cumming, A. (1998). Theoretical perspectives on writing. Annual Review of Applied Linguistics, 18, 61-78.

Cumming, A. (2001). ESL/EFL instructors’ practices for writing assessment: Specific purposes or general purposes? Language Testing, 18(2), 207-224.

Cumming, A. (2001). Learning to write in a second language: Two decades of research. International Journal of English Studies, 1(2), 1-23.

Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10, 5-43.

Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86, 67-96.

Cumming, A., Kantor, R., Powers, D., Santos, T., & Taylor, C. (2000). TOEFL 2000 writing framework: A working paper. Princeton, NJ: Educational Testing Service.

Daly, J. A., & Dickson-Markman, F. (1982). Contrast effects in evaluating essays. Journal of Educational Measurement, 19, 309–316.
Delaruelle, S. (1997). Text type and rater decision-making in the writing module. In G. Brindley
& G. Wigglesworth (Eds.), Access: Issues in language test design and delivery
(pp. 215–242). Sydney, Australia: National Centre for English Language Teaching and
Research, Macquarie University.

DeRemer, M. (1998). Writing assessment: Raters’ elaboration of the rating task. Assessing Writing, 5, 7-29.

Diederich, P. B., French, J. W., & Carlton, S. T. (1961). Factors in judgements of writing ability. Research Bulletin, RB-61-15. Princeton, NJ: Educational Testing Service (ERIC Document Reproduction Service ED 002172).

di Gennaro, K. (2008). Assessment of Generation 1.5 learners for placement into college writing courses. Journal of Basic Writing, 27(1), 61-79.

di Gennaro, K. (2009). Investigating differences in the writing performance of international and generation 1.5 students. Language Testing, 26, 533-559.

Dobson, B. (2007). Designing effective writing assessments for classroom contexts. In C. Irvine-Niakaris & A. Nebel (Eds.), 2nd Language Testing & Evaluation Forum, Teaching and testing: Opportunities for learning (pp. 7-19). Athens, Greece: Hellenic American Union.

Douglas, D. (2000). Specific purpose tests of reading and writing. Assessing languages for specific purposes (pp.189-245). Cambridge, UK: Cambridge University Press.

East, M., & Young, D. (2007). Scoring L2 writing samples: Exploring the relative effectiveness of two different diagnostic methods. New Zealand Studies in Applied Linguistics, 13(1), 1-21.

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155–185.

Educational Testing Service. (1992). TOEFL Test of Written English guide. Princeton, NJ: Educational Testing Service.

Elbow, P. (1993). Ranking, evaluating and liking: Sorting out three forms of judgment. College English, 55(2), 187-206.

Elder, C., Barkhuizen, G., Knoch, U., & von Randow, J. (2007). Evaluating rater responses to
an online training program for L2 writing assessment. Language Testing, 24, 37–64.

Elder, C., Knoch, U., & Zhang, R. (2009). Diagnosing the support needs of second language writers: Does the time allowance matter? TESOL Quarterly, 43(2), 351-360.

Englehard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31, 93–112.

Erdosy, M. U. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions (TOEFL Research Report No. 70, RR-03-17). Princeton, NJ: ETS. Retrieved from: http://www.ets.org/Media/Research/pdf/RR-03-17.pdf

Esmaeili, H. (2002). Integrating reading and writing tasks and ESL students’ reading and writing performance in an English language test. The Canadian Modern Language Review, 58(4), 599-622.

Evola, J., Mamer, E., & Lentz, B. (1980). Discrete point versus global scoring for cohesive devices. In J. W. Oller & K. Perkins (Eds.), Research in language testing (pp. 177-181). Rowley, MA: Newbury House.

Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1, 1–16.

Faigley, L., Cherry, R. D., Jolliffe, D. A., & Skinner, A. M. (1985). Assessing writers’ knowledge and processes of composing. Norwood, NJ: Ablex.

Fawcett, S., Sandberg, A., & Pittman, M. S. (1987). Test package. Grassroots: The writer’s workbook. Boston, MA: Houghton Mifflin.

Feak, C., & Dobson, B. (1996). Building on the impromptu: A source-based academic writing assessment. College ESL, 6(1), 73-84.

Flahive, D. E., & Snow, B. G. (1980). Measures of syntactic complexity in evaluating ESL compositions. In J. W. Oller & K. Perkins (Eds.), Research in language testing (pp. 171-176). Rowley, MA: Newbury House.

Freedman, W. S. (1993). Linking large-scale testing and classroom portfolio assessments of student writing. Educational Assessment, 1(1), 27-52.

Freedman, S. W., & Calfee, R. C. (1983). Holistic assessment of writing: Experimental design and cognitive theory. In P. Mosenthal, L. Tamor, & S. A. Walmsley (Eds.), Research on writing: Principles and methods (pp. 75–98). New York, NY: Longman.

Frodeson, J., & Starna, N. (1999). Distinguishing incipient and functional bilingual writers: Assessment and instructional insights gained through second-language writer profiles. In L. Harklau, K. Losely, & M. Siegal (Eds.), Generation 1.5 meets college composition: Issues in the teaching of writing to U.S.-educated learners of ESL (pp. 61-80). Mahwah, NJ: Lawrence Erlbaum.

Furneaux, C., & Rignall, M. (2007). The effect of standardization–training on rater judgements for the IELTS writing module. In L. Taylor & P. Falvey (Eds.), IELTS Collected Papers: Research in speaking and writing assessment (pp. 422–445). Cambridge, England: Cambridge University Press.

Gaies, S. J. (1980). T-unit analysis in second language research: Applications, problems and limitations. TESOL Quarterly, 14, 53-60.

Hamp-Lyons, L. (1987). Performance profiles for academic writing. In K. M. Bailey, T. L. Dale, & R.T. Clifford (Eds.), Language testing research: Selected papers from the 1986 Colloquium, (pp. 78-92). Monterey, CA: Defense Language Institute.

Hamp-Lyons, L. (1989). Raters respond to rhetoric in writing. In H. W. Dechert & M. Raupauch (Eds.), Interlingual processes (pp. 229-244). Tübingen, Germany: Gunter Narr.

Hamp-Lyons, L. (Ed.). (1991). Assessing second language writing in academic contexts. Norwood, NJ: Ablex.

Hamp-Lyons, L. (1991). Basic concepts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 5-15). Norwood, NJ: Ablex.

Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241-276). Norwood, NJ: Ablex.

Hamp-Lyons, L. (1994). Interweaving assessment and instruction in college ESL writing classes. College ESL, 4(1), 43-55.

Hamp-Lyons, L. (1995). Rating non-native writing: The trouble with holistic scoring. TESOL Quarterly, 29, 759-762.

Hamp-Lyons, L. (1996). The challenges of second-language writing assessment. In E. M. White, W. D. Lutz, & S. Kamusikiri (Eds.), Assessment of writing: Politics, policies, practices (pp. 226-240). New York, NY: Modern Language Association.

Hamp-Lyons, L. (2001). Fourth generation writing assessment. In T. Silva P. K. Matsuda (Eds.), On second language writing (pp. 117–125). Mahwah, NJ: Lawrence Erlbaum.

Hamp-Lyons, L., & Condon, W. (1993). Questioning assumptions about portfolio-based assessment. College Composition and Communication, 44, 176-190.

Hamp-Lyons, L., & Condon, W. (2000). Assessing the portfolio: Principles for practice, theory, and research. Cresskill, NJ: Hampton Press.

Hamp-Lyons, L., & Henning, G. (1991). Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts. Language Learning, 41, 337-373.

Hamp-Lyons, L., & Kroll, B. (1996). Issues in ESL writing assessment. College ESL, 6(1), 52-72.

Hamp-Lyons, L., & Kroll, B. (1997). TOEFL 2000—Writing: Composition, community, and assessment. Princeton, NJ: Educational Testing Service.

Hamp-Lyons, L., & Mathias, S. P. (1994). Examining expert judgments of task difficulty on essay tests. Journal of Second Language Writing, 3, 49-68.

Hanania, E., & Shikhani, M. (1986). Interrelationships among three tests of language proficiency: Standardized ESL, cloze, and writing. TESOL Quarterly, 20(1), 97-109.

Hayes, J. R., Hatch, J. A., & Silk, C. M. (2000). Does holistic assessment predict writing performance? Estimating the consistency of student performance on holistically scored writing assignments. Written Communication, 17(1), 3-26.

Hayward, M. (1990). Evaluations of essay prompts by nonnative speakers of English. TESOL Quarterly, 24(4), 753-758.

Henning, G., & Davidson, F. (1987). Scalar analysis of composition ratings. In K. M. Bailey, T. L. Dale, & R. T. Clifford (Eds.), Language testing research: Selected papers from the 1986 Colloquium (pp. 24-38). Monterey, CA: Defense Language Institute,

Herman, J. L., Aschbacher, P. R., & Winters, L. (1992). A practical guide to alternative assessment. Alexandria, VA: Association for Supervision and Curriculum Development.

Herzog, M. (1988). Issues in writing proficiency assessment. Section 1: The Government Scale. In P. Lowe & C. W. Stansfield (Eds.), Second language proficiency assessment: Current issues (pp. 149-177). Englewood Cliffs, NJ: Prentice Hall Regents.

Homburg, T. J. (1984). Holistic evaluation of ESL compositions: Can it be validated objectively? TESOL Quarterly, 18(1), 87-109.

Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60(2), 237-263.

Huot, B. (1990). Reliability, validity, and holistic scoring: What we know and what we need to know. College Composition and Communication, 41, 201-213.

Huot, B. (1993). The influence of holistic scoring procedures on reading and rating student essays. In M. Williamson & B. Huot (Eds.), Validating holistic scoring for writing assessment (pp. 206-236). Cresskill, NJ: Hampton Press.