The Johns Hopkins University will receive at least $48 million to develop computer systems that would help military and spy agencies process the huge amounts of intelligence data they collect.
The Department of Defense grant is for a new research center focused on improving technology that can automatically translate and analyze speech and text in multiple languages, school officials announced today. It would help overburdened intelligence analysts cope with the flood of information - often in Arabic - being gathered in Iraq and the war on terror, experts said.
The Human Language Technology Center of Excellence is being outfitted near Hopkins' Homewood campus, and the staff will include engineers, computer scientists, mathematicians, cognitive scientists and linguists.
"It's really supposed to be a fresh look at this problem," said Gary W. Strong, the center's executive director. "This technology has hit a wall at this point." Strong was previously a program manager at the National Science Foundation, focusing on language technology projects.
Experts from the University of Maryland, College Park and BBN Technologies, a Cambridge, Mass., software company, will also participate in the project.
The new center is likely to boost Maryland's reputation as a high-tech destination and may attract research dollars and talent to the state, Hopkins officials said.
In addition to Strong's appointment, Hopkins announced that James K. Baker will be the center's director of research. Baker founded Dragon Systems Inc., which in 1997 released Dragon NaturallySpeaking, a dictation program that can be trained to recognize a person's voice and turn it into written text. Baker is leaving a professorship at Carnegie Mellon University in Pittsburgh to take the Hopkins position.
"He is extremely well-known in the field," Strong said. "People are very interested in working with him."
BBN, which has offices in Columbia, was a prime force in development of the Defense Department's ARPAnet, the precursor to the Internet, in 1969.
Experts said the military and intelligence agencies need all the help they can get. Too few analysts are fluent in Arabic and other languages to translate and catalog information they collect, the experts said. This makes it more difficult and time-consuming to find important intelligence leads, such as those that might alert authorities to a terrorist plot.
To speed up the process, the government hopes to develop computer systems capable of screening speech and written documents for key intelligence leads.
Such data are gathered from television broadcasts, newspapers, the Internet and intercepted communications.
"That's an extremely important project," said defense analyst Loren B. Thompson, chief operating officer of the Lexington Institute, a conservative think tank in Arlington, Va. "The biggest single defect we have in our strategy in Iraq is the inability of our war fighters to understand the local language."
Thompson, who earlier this year co-wrote a report called "Hear No Evil," lamenting the government's lack of language specialists, said it's impractical to expect the average American soldier to be fluent in Arabic, before he or she deploys to Iraq.
He points out that because the troops don't know the language, they are forced to rely on local translators, whose loyalties could be called into question. "We monitor 84,000 frequencies in Iraq, but yet we don't know what the guy across the street is yelling at us," he said.
While new technology will improve the government's ability to decipher the enormous amount of information coming from overseas, Thompson said it will not replace human interpreters, who can better pick up on local nuances and dialects.
Even the translation machines troops carry, which hold a few hundred basic Arabic phrases, are not commonly understood across the Middle East, he said, because the language varies.
Alan Black, a researcher at the Language Technologies Institute at Carnegie Mellon, said current systems do well when translating clearly spoken language. The problem is that people rarely speak clearly, which can make it difficult even for a human listener to understand.
The translation and analysis is also expensive and slow. "If you're processing a 30-minute TV show, for example, it's probably going to take about 24 hours to get all the information out," he said.
To the degree the center makes significant advances in machine translation of conversations, it would be a major boost for intelligence agencies, said Mark M. Lowenthal, a former senior intelligence official who oversaw language training across the intelligence agencies. "Everyone is busting their heads on machine translation," he said. "That's sort of the Holy Grail."
He cautioned that the center may be limited by the small pot of money allocated for it, which works out to about $5 million a year until 2015. Strong said officials at the Department of Defense have indicated that the agency will continue to fund the center beyond 2015.