Scientists aim to catalog proteins


With the international race to decipher human DNA now mostly history, some biologists are embarking on an even more sweeping molecular mission: to catalog all the proteins in the human body.

On Monday, 180 scientists will converge in Washington to hash out a battle plan for the titanic effort, already under way in many laboratories around the world.

"This is the next step in the evolution of our understanding of what life is all about," says George Kenyon, a University of Michigan biochemist who organized the National Academy of Sciences meeting. "If you know the genes alone, it only tells you the beginning of the story."

Driving this push to understand proteins is their role in basic biology and human disease. Although the Human Genome Project thrust genes into the scientific spotlight, proteins do the grunt work in the body.

The complex and versatile chemicals digest food, heal wounds, make babies and form the scaffolding for blood vessels, vocal cords and other structures. Genes serve as the blueprints for building proteins.

Sickle cell anemia, cystic fibrosis and other illnesses occur when a defective gene orders up a flawed protein. Harmful bacteria and viruses use proteins to sneak past the body's defenses and make us sick.

Most drugs also target proteins, so cataloging them could lead to new miracle medicines. Penicillin, for example, works by inhibiting a protein that bacteria use in their cell walls. Aspirin, the most widely used drug in the world, zaps a protein involved in inflammation.

"The bottom line is: To really solve medical problems, you've got to solve the protein problem," says Joshua LaBaer, director of the Institute of Proteomics at Harvard Medical School.

But an inventory of all the proteins in the human body - collectively dubbed the proteome - is a quest that could make the decade-long Human Genome Project look like something out of a high school science fair.

"It's really a much different beast," says Scott Patterson of Celera Genomics, the Rockville company that led the charge to map human DNA and is hunting for new protein-based drugs.

Human DNA, scientists now know, contains about 35,000 genes. But the number of proteins is expected to be more vast - the best scientific guesses put the count in the hundreds of thousands.

And proteins are more complex. DNA is essentially the same in every cell in the body. Proteins come in many shapes and sizes, and their three-dimensional structure is vital to understanding their function. Although scientists began studying proteins before World War II, they've pieced together the structure of only a few hundred.

Concurrent projects

Several projects are attempting to step up the pace. Among them are the Human Proteome Project, an international federation of scientists representing academic, government and corporate labs. The slogan of their kickoff meeting last year: "Genes were easy."

The National Institutes of Health in Bethesda, meanwhile, has launched the $150 million Protein Structure Initiative, whose goal is to catalog 10,000 proteins in 10 years.

Although some protein projects are drawing inspiration - and even their catchy names - from the hugely successful Human Genome Project, scientists attempting to pin down the proteome stress that key differences exist. The Human Genome Project had a clear finish line: decoding the 3.2 billion chemical building blocks of DNA and locating the genes. Protein hunters wonder if they'll ever finish.

'A very long way to go'

"We have a very long way to go," says Samir Hanash, a cancer researcher at the University of Michigan who heads the Human Proteome Project.

Many scientists think success will depend on breakthroughs in technology. DNA is composed of just four different chemical building blocks, strung together like pearls on a necklace. The simple parts list meant gene hunters were able to automate the process with fleets of robots and high-speed computers that could tear through millions of DNA building blocks a day.

Proteins, on the other hand, can be made from as many as 20 different chemicals, known as amino acids. The added complexity makes locating and analyzing them far more difficult. Using traditional techniques, scientists require weeks - even months - to figure out the structure of a single protein. The cost for that work can top $100,000, according to the NIH.

But researchers are making progress in finding new tricks to slash the time and cost of probing proteins and determining their structures. The Los Alamos National Laboratory, one of nine labs working on the NIH protein project, recently developed software to calculate protein structures, slashing the analysis from days to hours in some cases.

"A big challenge here is deciding which problems you want to go after and how you do it," says John Norvell, head of the NIH project.

Dividing up the work

To that end, many protein hunters are making the problem more manageable by focusing on specific pieces of the proteome rather than tackling the whole thing, as some groups initially announced they would do.

Members of the Human Proteome Project, for example, are considering first tackling proteins found in the blood because blood "is the thermostat of what's going on in the whole body," Hanash says.

New research shows the value of such a strategy. Using powerful pattern-recognition software, several scientists announced last week that they had pinpointed a cluster of proteins in the blood that signals ovarian cancer, a disease difficult to catch early and lethal when found late. The potentially lifesaving screening tool was reported in the British medical journal The Lancet.

Drug companies involved in the protein projects are further narrowing their search. Rather than attempting to compile encyclopedic lists of proteins like some academic groups, the companies are analyzing only proteins that might pay off in commercial products.

Celera scientists, for instance, spend their days sifting through tissue and blood samples from patients with lung and pancreatic cancers, hunting for proteins that might signal the presence of the diseases and thus serve as drug targets or diagnostic markers.

Scientists working on the protein projects point out that human DNA wasn't decoded overnight. First discussed in 1985, the Human Genome Project took more than six years to get off the ground. It won't officially finish until early next year.

"We always forget the past," Norvell says.

Copyright © 2019, The Baltimore Sun, a Baltimore Sun Media Group publication | Place an Ad