ESL Writing Assessment

Introduction

While all of educational evaluation challenges teachers and researchers, writing assessments of language ability are particularly challenging. The form and content of the test and the form and content of the student’s response all rely on words. Writing has several unique features that sharpen the challenge present in all language tests. First, the learning of written language is more closely affiliated with schooling than listening, speaking, or reading (Weigle, 2002, p. 4). Also, the ability to write well is associated more with the work of schooling and subsequent success inside and beyond the academy (The National Commission on Writing, The Neglected “R” & Writing: A Ticket to Work…Or a Ticket Out, A Survey of Business Leaders). Therefore, writing assessments have high consequences for not just students, but also for teachers, administrators, and other educational stakeholders. The need for teachers and researchers to get writing assessment right is great.

This is especially true for ESL (English as second language) students. ESL face challenges when it comes to placement in various institutions. The classroom setting offers a bigger and wider challenge to ESL students as writing proficiency is usually below standards in comparison to other students. Assessing ESL students for placement positions at various institutions should be responsible and reflective of students training. Writing placement tests have been criticized on several occasions for being strictly grammar oriented or multiple-choice papers that do not capture the writing composition proficiency of a student. However, there are different confounding aspects that result from using essays for the assessment of ESL students. Key to this is the use of essays is the absence of benchmark texts, lack of flawless and reliable results criteria and lack of quantitative dimensions for inter-rater consistency.

The aim of this essay is to provide and analyze the literature review on writing assessment in the context of ESL student tests in relation to the cognitive writing model. The essay will provide solutions on overcoming problems faced by ESL students when taking writing assessment tests due to low cognitive capabilities.

Theories of Writing Placement Assessment

Writing assessment for a long time has been among the most problematic areas in language evaluation and determination of students’ writing competence in a second language and overall competence in the language. The main problem in this issue is discovered to be the scorer himself. Personal characteristics of different scorers result in different evaluation of the same work, which becomes a problem, because a written work receives different scores and accurate evaluation becomes nearly impossible (Godshalk, Swineford &Coffman, 1966, pp. 1-5). In psychometric scoring the differences in results given by two different scorers are seen as potential errors in overall evaluation of a written work. This means that is scoring system maintains errors in evaluation of the results it needs to be improved and as a result needs further training and changes. The differences in evaluation of two scorers of the same work means that they due to personal differences think differently about main features of the essay, on which their scoring is based. The literature review states that it is difficult to determine why some scorers are easier to find agreement on the scoring and work evaluation than the others. However, there is a hypothesis that “proficiency in a psychometric scoring task may manifest itself the cognitive behaviors of scorers” (Wolfe, Kao, Ranney, 1998, p. 466). The research done by Pula and Huot determined that understanding of scoring rubrics of the essay could be vital in difference of evaluation of the same work checked by different scorers, which directly depends on scorers’ cognition (Wolfe, Kao, Ranney, 1998, p. 467). Also, according to Vaughan (1991) individual differences can be a result in different scoring results of the same work. The studies of Pula and Huot (1993) and studies of Vaughan helped to determine main factors of scorer’s differences, which result in differences of scoring results in the process of evaluation of the same work (Wolfe, Kao, Ranney, 1998, p. 468).

Crusan (2002) states that there are two types of assessments, the direct and indirect assessments. A direct test seeks to evaluate a comprehensive understanding of language using an essay. This tests students’ of language communication within a given context. The results are evaluated according to various rubrics of the writing assignment, such as maintenance of proper format and vocabulary, sentence structure, grammar, etc, when every rubric brings certain points to the writer. On the other hand, in indirect assessment, the proficiency of students on specific language areas such as vocabulary, grammar, comprehension or reading is tested. Multiple questions test is usually used in indirect assessment and is evaluated according to number of correct answers in the questions.

According to the theory of cognitive vitality of scoring the combination of the two: direct and indirect methods are used in order to isolate specific components of a language and at the same time to evaluate them (Wolfe, Kao, Ranney, 1998, p. 468). Validity is recognized as one of the most important aspects of psychological assessment and testing. According to Messick’s (1989) unified theory, the main issue is not when one test is valid or not, but when the test scores and the inference of the test user is designed in such way that it makes the test valid or invalid.

However, the most productive and accurate alternative of a method, which involves students’ writing to be read by independent readers, is an indirect testing of writing abilities, which involves testing of grammar and its use in writing (Diederich, 1974). Various writing tests with multiple choice questions, open-ended questions and written essays can be a part of a placement test.

Literature Review: Case Studies

The literature abounds with arguments for and against different scoring, specifically the direct and indirect methods. These are some of the case studies on why there are differences in scoring due to the cognition of scorers or correctors:

Vaughan (1991) study that employed think-aloud task to examine the thinking patterns of experienced essay scorers found out that scorers have an individualistic approach. Because each scorer has a varied set of experiences, these differences of opinion may become a troubling source of test-irrelevant variance – leading to both unreliable measures and invalid interpretations of test scores by creating a mismatch between the information provided to test result users through the test manual and the information provided by test scores.

Pula & Hout (1993), in a think-aloud study, found that while novice and experienced raters considered the same criteria, their reading process was quite different. To begin, novice raters in his study tended to make more comments as they read, whereas expert raters made more comments as they had finished reading. Compared to novice raters, expert raters also made a greater percentage of personal comments. The reason for these differences, the study shows, is that expert raters already knew what to evaluate in a composition and had already developed a strategy for rating. These researchers identified three experiential factors that differentiate expert from novice scorers: (1) personal background; (2) professional training; and (3) work experience or a scorer`s previous experiences.

Freedmen & Calfee (1989), created an information-processing model of essay scoring. From the information processing perspective, present a model where raters: () read and comprehend text, (2) evaluate the text, and (3) articulate their evaluation (Figure 1). So in this model, it has to be noted, raters create a text image after reading and comprehending, and it is that text image, rather than the text itself, that raters evaluate and store impressions of. While acknowledging the possibility that rating could be a linear process, Freedman and Calfee believe that it is more likely one that is recursive, where chunks of text are evaluated as they read and comprehend. The monitor in their model allows raters to revise their evaluations as they read and evaluate more pieces of text.

Figure 1. Freedman and Calfee`s (1989, p.92) Model of the Rating Process

Regardless of which of these views is more appropriate for the purpose of knowing why there are differences in scoring, these studies emphasize that it is important to understand how the thinking processes and thus cognition that scorers use to evaluate decisions impact the task of scoring writing and the writing process.

Critical Analysis

For years, writing assessment has been a highly problematic area and researches have been trying to develop a method, which would assure accurate results of writing assessment evaluation (Godshalk, Swineford, & Coffman, 1966, 1-5). Variability in evaluating writing assessments especially due to corrector cognition and scores has been perceived and addressed differently by the measurement and composition communities (Broad, 2003; Huot, 2002; Moss, 1996; Weigle, 1998; White, 1993). Corrector cognition variability, for example, has often been seen by the measurement community as a “source of measurement error” that lowers the reliability and, hence, the validity of essay tests.

Diederich, French, and Carlton (1961) tried to establish the suitability of using direct assessment methods in testing the language proficiency of students. In the study 300 essays were read and assessed by 53 judges. Surprisingly, the study established that 94 % of the essays had been awarded seven different scores. This clearly shows that it is not always guaranteed that the quality of writing can be determined objectively. If one essay ended up being awarded several scores by different markers, then it proves the inadequacy of relying on the essay as the sole assessment method.

Such studies as have been reported in the literature (Bacha, 2001; Carr, 2000; Schoonen, 2005) used quantitative methods to examine essay scores but did not consider if, and how, the rating scale influences corrector decision-making behavior and cognition (Frederiksen, 1992;Freedman & Calfee, 1983). In addition, the few models of essay rating processes that have been developed (Freedman & Calfee, 1983; Homburg, 1984; Ruth & Murphy, 1988) do not say much about how the scoring method mediates the rating process (Wolfe et al., 1997). As a result, little is known about how correctors arrive at judgments about writing quality and what part of scorer cognition play in this process. Such information is crucial for designing, selecting, and improving scoring methods and corrector cognition training as well as for the validation of ESL writing assessments.

White (1993) argued that differences between correctors in opinion about the quality of essays are, like disagreements about the value of works of art, legitimate and more valuable than absolute agreement, because “they combine to bring us nearer to accurate evaluation than would simple agreement” (p. 99). White called instead for the use of “consensus score”, which, he argued, “can yield useful measurement, which reflects the social process of judgment and offers sound statistical data” (p. 99; Broad, 2003; Huot, 2002; Moss et al., 1992).

In addition, new measurement models such as Generalizability theory (G-theory) and Multi-faceted Rasch measurement (MFRM) may offer a partial answer to the conflict between reliability and validity requirements in ESL writing assessment since these models expect variation in scores across correctors cognition as well as across examinees (Brennan, 2001; Kozaki, 2004; North, 2000; Weigle, 1998). Weigle (1998), for example, pointed out that the Multi-faceted Rasch model considers corrector cognition beneficial, rather than a hindrance to writing assessments. The Rasch model, however, requires that raters be self-consistent in order to allow the mathematical modeling and compensation for predictable variations in rater severity (Kozaki, 2004; Linacre, 1994; Weigle, 1998). As discussed in the introduction above, this essay aimed to provide solutions on overcoming problems faced by ESL students when taking writing assessment tests due to low cognitive capabilities.

To this end, I may propose two solutions for joining the direct and indirect assessments to have good and reasonable score for students. My idea is that, in the direct assessment, the students cannot get reliable score because of the different cognition of corrector. Also, in indirect assessment, we cannot measure student’s ability because the student may memorize the grammar and rules of writing. Thus, there are two solutions which are to mix two assessments to measure the student’s ability by mixing the indirect and direct assessments and the second one, is to have the accurate score from the corrector. This combination of the two, direct and indirect methods of writing assessment for the ESL students is vital in maintenance of proper results, which influences proper placement of the students. Besides, validity of the test format, both indirect and direct, is also helpful in evaluating language competence of the ESL students properly and accurately.

Implication for Teaching

This study explored and compared the literature on writing assessment in the context of ESL student tests in relation to the cognitive writing model. Further, it provided two solutions on overcoming the problems faced by ESL students when taking writing assessment tests due to low cognitive capabilities. Theoretically, findings from the essay can help clarify the role of scoring method and rater cognition in ESL writing assessments. Such information can add to our understanding of factors contributing to variability in ESL writing test scores and suggest methods to enhance the reliability, validity, and fairness of inferences and decisions based on such scores.

At the teaching level, the essay has generated information to enhance the usefulness of writing assessments. Information from this study can be useful for making decisions about corrector cognition training and the design, selection, and improvement of scoring methods in large-scale and classroom ESL writing assessments. For instance, identifying scoring methods that require the least amount of cognition training could significantly reduce the cost of large-scale writing rating. In addition, by examining the effects of different scoring methods on the performance of raters with different levels of experience, findings from this study can help clarify which scoring methods work better for different corrector populations. Finally, information about corrector cognition and corrector effects can help monitor and account for these effects so that the reliability, validity and fairness of writing assessment results can be improved in the future.

Conclusion

ESL students encounter various challenges when it comes to writing assessment. Development of proper writing assessment, training of the correctors’ cognition and modernization of the process and format of the writing assessment along with scoring system determine accurate evaluation of writing assessment. They also result in improvement of language abilities of ESL students, when they are placed to proper language study groups.

References

Armstrong, W. B. (1995, May). Validating placement tests in the community college: The role of test scores, biographical data, and grading concerns. Paper presented at the 35th Annual Forum of the Association for Institutional Research, Boston, MA.

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press.

Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions, and directions. Boston: Heinle & Heinle.

Breland, H. M. (1983). The direct assessment of writing skill: A measurement review (College Board Report No. 83-6, ETS RR No. 83-32). New York: College Entrance Examination Board.

Conlan, G. (1986). “Objective” measures of writing ability. In: K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: Issues and strategies. New York: Longman.

Conrad, L.M., and Goldstein. S.M. (1990).Student Input and Negotiation of Meaning in ESL Writing Conferences. TESOL Quarterly 24.3: 443-460.

Crusan (2002). An assessment of ESL writing placement Assessment. Assessing Writing 8 (2002) 17–30

Devine, J. (1993). The Role of Metacognition in Second Language Reading and Writing. In J. G. Carson & I. Leki (Eds.). Reading in the composition classroom: Second language perspectives (pp. 105-127). Boston: Heinle and Heinle.

Devine, J., Railey, K., & Boshoff, P. (1993). The implications of Cognitive Models in L1 and L2 writing. Journal of Second Language Writing 2: 203-225.

Diederich, P., French, J. W. & Carlton, S. (1961). Factors in Judgment of writing ability. Princeton, NJ: Educational Testing Services.

Flavell, J. (1985). Cognitive Development (2nd ed.). Englewood Cliffs, NY: Prentice-Hall Inc.

Gaudiani, C. (1981). Teaching Writing in the Foreign Language Curriculum. Washington, D.C.: Center for Applied Linguistics

Haswell, R. H. (1998). Searching for Kiyoko: Bettering mandatory ESL writing placement. Journal of Second Language Writing, 7, 133–174.

Hudson, S. A. (1982). An empirical investigation of direct and indirect measures of writing. Report of the 1980–81 Georgia Competency Based Education Writing Assessment Project — 1981. ERIC: ED #205993.

Kellogg, R. T. (1994). The Psychology of Writing. New York: Oxford.

McNenny, G. (2001). Writing instruction and the post-remedial university: Setting the scene for the mainstreaming debate in basic writing. In: G. McNenny & S. H. Fitzgerald (Eds.), Mainstreaming basic writers: Politics and pedagogies of access (pp. 1–15). Mahwah, NJ: Lawrence Erlbaum.

Wolfe E., Kao C., Ranney M. (1998), Cognitive Differences in Proficient and Nonproficient Essay Scorers, Written Communication, Vol. 15, No. 4, October 1998, pp. 465-492, Sage Publications, Inc.