Protein Data Bank Deposits Are Life’s Building Blocks
A four-decade project to catalog the basic structures used to build life pays dividends for everything from new drugs to Bjork’s performances.
Biology’s newest knowledge, fused with the special effects of The Hobbit or Harry Potter films — that’s what’s in store from a stunning new cinematic field of biomedical animation. Catch a glimpse in this video — The Inner Life of a Cell — that might have made biologists of us all had we seen it earlier in our lives. It offers an unprecedented, scientifically accurate dramatization of how cells function, sense their surroundings and respond to external stimuli in mind-blowing moving imagery. It is part of a continuing animation series created by Xvivo, a Connecticut scientific animation firm, for future biologists now studying at Harvard University. Expect more inspired animations — as teaching tools, in video games and any Hollywood screen — the fruit of ever improving software and our “golden age of biology.”
And although Helen M. Berman had no direct hand in Inner Life, this kind of biological animation is only possible because of a critical scientific accomplishment, the Protein Data Bank, she helped create 41 years ago. Currently the director of the Research Collaboratory for Structural Bioinformatics at Rutgers University in New Jersey — which serves as a central custodian of the PDB — she is the sole still-active professional among the founders.
Far more than just providing the source material for illuminating entertainment, the Protein Data Bank is the single most important global repository of virtually everything science has so far discovered about nucleic acids and proteins — the “tiny molecular machines” that carry out critical functions for cells, including just keeping them alive.
Another way to think of it as a kind of global root cellar where all basic biological building blocks sit neatly on shelves. These are the three-dimensional atomic structures of biologically important molecules that range from bits of DNA to complex machines like the ribosome, which make proteins from amino acids.
“If we know the structure, we will understand the function — that’s the paradigm. The sequence [of amino acids] goes to structure, goes to function — so, if you know [for example] the structure of hemoglobin, you can understand better how hemoglobin carries oxygen,” explains Berman. “These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, other animals, and humans. Understanding the shape of a molecule helps to understand how it works.”
An indispensable source of raw data for furthering our understanding of biology, from cells to our bodies, the data bank has been critical for aiding practical advances. Think drug discovery — pharmaceutical firms periodically download the entire data bank — or launching whole new fields of study, such as computational structural biology, aka structural bioformatics.
Berman is excited over what might be called “second generation use” of the bank’s data — especially unexpected roles in the arts, such as the cinema or performances (e.g., the Icelandic chanteuse Bjork’s live Biophilia performance (and accompanying apps) currently in a 10-day North American debut run in New York City.
Writing in the Journal for Biocommunication, molecular biologist David S. Goodsell, who hosts the “Molecule of the Month” feature on the bank’s website, called the bank “an amazing resource that is waiting to be tapped for all manner of educational and artistic applications.”
The two most common methods used in determining molecular structures — which means visualizing at a level too small even for the most powerful microscope — are X-ray crystallography (also called X-ray diffraction) and nuclear magnetic resonance spectroscopy. (For a nice description of these methods, click here.) The cost of figuring out some atomic structures — say getting a 3-D rendering of proteins structures, which are like a necklace of different-color beads — has decreased as scientific techniques improve. But while an average cost may lie between $50,000 and $250,000 per structure, complicated structures can still cost $1 million to $2 million each.
As for the value of the data bank’s “deposits,” current estimates range up to $8 billion. Meanwhile, the U.S. government contributes $6 million a year, the lion’s share of the data bank’s annual budget — which is slated to remain flat for the foreseeable future. That imposes inventiveness on the part of the bank. “We have to figure out ways of improving our infrastructure so that we can actually keep up with the data,” Berman says matter-of-factly.
When “crystallographers” — scientists studying atoms in solids — who started the PDB at the Brookhaven National Laboratory and Cold Springs Harbor Lab (both on Long Island) in the 1970s, there were just seven atomic structures in the bank. Today, it catalogs more than 78,000 structures — still just a fraction of the 20 million unique sequences estimated to exist. But it’s constantly growing. “In 1977, which is kind of when I started in this, there were 77 structures in total,” said University of California, San Diego’s Philip E. Bourne, the associate director of the PDB. “Now we get almost that many or twice that many in a month.”
Anyone in the world can browse the bank’s vaults, and download whatever they want to work with on their own. Every month some 150,000 visitors come to the site; last year, those visitors hit “download” 250 million times. Based on an early commitment to free and unfettered access to all, use of the PDB is free of charge. “People put data into the PDB, then they expect to have the data available for free,” says Berman. “After all, if it weren’t for the people who put the data in, there wouldn’t be any PDB.”
While the prevailing scientific model today is open access, that was not the obvious choice in the 1970s. Nonetheless, the chemistry community took a different route. “It’s a cultural-sociologic thing — the evolution of those rules — it’s completely community based,” says Berman, adding with understated pride: “We were ahead of our time.”
A critical phase in this evolution was the onset of the AIDS epidemic 30 years ago, when respected figures in the scientific community argued it was morally indefensible for the findings of any publicly funded research related not be released.
Today all major scientific journals in certain fields require, almost as a condition of being published, that the basic data underlying the papers be deposited with the Protein Data Bank. This makes sense not only because most of the data is still largely publicly funded, but because that basic data may prove useful to others.
“[The PDB] really set a precedent for data sharing and now there are depositories for all sorts of different information,” says Heather Carlson, a medicinal chemistry professor at the University of Michigan, Ann Arbor, who makes frequent use of the PDB. “No individual lab can have enough information on sequences or on proteins needed to solve the problems we face — a worldwide repository of information, like the PDB, is definitely needed.” Even the most well-intentioned sharing — requesting and then waiting for others to send you the data you need — among colleagues spread out all over the world is too impractical.
The United States, which for a long time generated the lion’s share of new knowledge in the data bank, accounts for about 50 percent of new research added today. The data bank has gone global, with key collaborators at the European Bioinformatics Institute (UK), and the Protein Data Bank Japan, which — together with the Biological Magnetic Resonance Data Bank, at the University of Wisconsin-Madison form the Worldwide PDB, the overseeing organization.
Swelling data demands efficient curation. This means not only ensuring that the deposited structures are high quality, but also that there is an ever-more-handy set of tools to make the data useable by that widening base of users. “Our responsibility is to make sure the data are in good shape so that when people take [it] they can count on it,” says Berman. “People call this the gold standard of structures.”