Can a computer tell good prose from bad?
Some 200,000 prospective business school applicants are about to find out.
Starting Feb. 10, those who take the Graduate Management Admissions Test will have their two essay questions judged -- at least in part -- by a computer.
Test administrators say a computerized grader will help them cut costs and the time it takes to return tests. But critics argue that computers, no matter how sophisticated, have no place judging writing.
"My question is: Why are you assigning essays in the first place if you're not going to read them?" said Dennis Baron, head of the English Department at the University of Illinois in Urbana-Champaign.
Until now, two human readers have scored each GMAT essay on a scale of zero to 6 points. If the scores disagreed by more than one point, the essay was given to a third reader for a decision.
Starting this month, the GMAT's new grading software, dubbed "e-rater," will serve as the second reader.
Developed by Educational Testing Services in Princeton, N.J., e-rater took five years of research and testing to create and is programmed to evaluate more than 50 elements of an essay, ranging from content to organization of ideas.
To grade each GMAT essay, e-rater analyzes its contents, and then, based on what it finds, assigns it a numeric score using grading criteria previously determined by human test administrators.
"We're not evaluating creative writing here," said Frederic McHale of the Graduate Management Admission Council, which represents business schools around the country and hired ETS to develop e-rater. "What we're judging is: Can a person organize their thoughts and ideas and express them coherently on a specific topic?"
McHale said some school administrators were skeptical about replacing man with machine. "Their first thought was, 'Are you going to score what Shakespeare would write?' " he said.
But researchers who tested e-rater's scoring ability found it was as likely to agree with the first scorer as a human second reader was, a conclusion that McHale says put university officials more at ease.
If e-rater differs by more than a point from its human partner, the essay will be judged by a second human reader before going to a final referee.
"It's still not ready for use by itself," McHale concedes. "If I was highly creative or unique about the way I write down my ideas, it's possible that a human reader could see that and give a higher score to an applicant, while e-rater might not."
By using a computer, GMAT administrators expect to halve the 10-day period it now takes to grade and return the test and eventually lower the exam's $150 price tag.
The digital-age approach to grading essays has sent some test preparation services scrambling to revise their strategies.
Kaplan Educational Centers recently dispatched an updated battle plan to clients preparing for an encounter with e-rater.
"Use transitional phrases like 'therefore,' 'since,' and 'for example,' so that the computer can recognize that your essay contains a structured argument," reads one suggestion. "Use synonyms for more important terms. The computer rewards strong vocabulary," advises another.
"It's not a radical departure from the strategy we've been teaching for years," said Trent Anderson, executive director of graduate and professional programs at Kaplan, who added that the company has been tracking developments in computerized essay grading for some time.
But the notion of students making any change in style to satisfy a computer sets some professors' teeth on edge.
In a recent editorial in the Chronicle of Higher Education, Baron of the University of Illinois argued that forcing students to write for a computer cheapens the value of essay writing and said a computer can be used in better ways to assess knowledge.
"That's why we developed multiple choice," he declared.
Baron said his students had mixed reactions when he asked them how they would like to have their classroom essays graded by a computer.
One lamented that the computer would not be able to tell if a student's writing improved over time.
Another wondered whether the software would reward creativity or extra effort.
Like it or not, the GMAT may be just the beginning for robot readers. Some educators say it won't be long before students from elementary school through college will have their prose graded by machine.
Researchers at the University of Colorado and New Mexico State University are putting the final touches on their Intelligent Essay Assessor, a technology designed to appeal to overworked teachers who want to dig themselves out from mountains of paperwork.
Like the e-rater, the Intelligent Essay Assessor can be "trained" to distinguish good writing from bad by feeding it pre-graded essays on the same subject, said Peter Foltz, a psychologist at New Mexico State who co-developed the technology.
When the software encounters a new essay, it analyzes the words, sentence structure and content and asks itself how similar this essay is to the pregraded essays it has seen. "The computer really learns to grade in the same way as the teacher," said Foltz.
The Essay Assessor is also programmed to tell when a student is way off topic or is trying to fool it.
A professor can feed the software a class textbook, which the Essay Assessor mathematically analyzes, "learning" the course vocabulary. If a Biology 101 student then writes "dissect the cardiac organ" or "cut into the heart," the computer recognizes he's talking about the same thing.
Or if a student tries to cheat by merely rattling off a series of key vocabulary words, the computer sees there are no complete sentences and flags the essay as one the instructor should read.
The program, said Foltz, isn't designed for creative writing courses, but for history, biology, or other classes, in which instructors want to assess mostly knowledge and argument-writing skills. "It's not a style checker," said Foltz, adding that it doesn't look for spelling or grammar mistakes. "I think of it more as a knowledge checker."
Foltz said the technology has applications outside grading.
Teachers, for example, could use it as a way to give students more writing practice without worrying about overburdening themselves. The Essay Assessor could be programmed to tell students what's missing from their essays and where to find it in the textbook and other ways to improve scores. (GMAT administrators say they are considering offering just such a service to prospective test takers.)
"It can be used not only as a grading tool but a learning tool," said Foltz.
Pub Date: 2/01/99