Menus Subscribe Search

Follow us


How to Hold a World of Tweets

• April 23, 2010 • 4:04 PM

The U.S. Library of Congress is blazing a trail in determining how to store an ever-expanding trove of information that never had physical form.

After the Library of Congress announced last week that it had acquired the rights to Twitter’s entire archive for preservation as a forever testament to how Americans lived and communicated in the early 21st century, the Idea Lobby called up to ask if we could come by and see, well, where they were going to put it. Would tweets be housed amid Thomas Jefferson’s 6,487 tomes? Or on the shelf beneath the Federalist Papers?

“There isn’t really much to see,” conceded Beth Dulabahn, the library’s director of integration management. Sure, there’s a server that physically houses the LoC’s digital collection, although it looks pretty much like any other server.

But, on both a practical and philosophical level, micro-blogging’s arrival inside the nation’s oldest federal cultural institution raises a complex question, one Dulabahn and her colleague Martha Anderson happily accepted our inquiry to discuss.

How exactly do you preserve, archive and house for posterity things that don’t physically exist?

[class name=”dont_print_this”]

Idea Lobby

THE IDEA LOBBY
Miller-McCune's Washington correspondent Emily Badger follows the ideas informing, explaining and influencing government, from the local think tank circuit to academic research that shapes D.C. policy from afar.

[/class]

The library has been mulling this question since 2000, when Congress created the National Digital Information Infrastructure and Preservation Program, which Anderson directs.

The library first began working in digital (and in other non-paper media like home movies) in the ’90s. At the time, it was trying to digitize physical objects unique to the LoC, like the Federalist Papers, to make them more accessible to people who weren’t old enough to get into the reading rooms or who lived too far away from Washington to visit. The material was then distributed by videodiscs to remote locations like schools.

“There was serious consideration about putting this stuff out on CDs,” Anderson said. “I just kind of cringe when I think about that – by this time, we would have millions of CDs.”

Around 1996, it dawned on the LoC that it could archive and share the artifacts over the then little-used Internet. Then came a dramatic shift in information outside the library: As the Web exploded in popularity, the LoC’s initiative shifted from digitizing physical objects to capturing born-digital objects that never existed in the physical in the first place.

“We’ve been reminded so much the last few years of all the things that didn’t even exist when our program began in 2000,” Anderson said. “There were no Google maps.”

“There was no Google!” Dulabahn added.

“There was no YouTube of course, there was no Facebook, no Twitter, no Flickr,” Anderson said. “All these things that are just commonplace today did not even exist when our program began. And we thought we had challenges then.”

The library spent the better part of a year trying to figure out how to capture video embedded in websites. Then, suddenly, came YouTube, and overnight the staff had to start all over again relearning the new universal format.

The library’s mandate — to chronicle information in the present to be available in perpetuity — bumps up against the very nature of the dynamic Internet. Digital photos today are created as JPEGs, but what happens when that format becomes obsolete, and 10 years from now, your computer can’t read it? If the library simultaneously preserves software such as a current JPEG reader, does it also need to preserve hardware, like a 2010 MacBook Pro?

And even if the library successfully captures everything — navigable links, text, images — on a website, what happens when that site is updated two hours, or two years, later?

“How often do you go out and try to grab them? What’s enough?” Dulabahn asked. “There are a lot of philosophical questions about, well, what are you actually trying to do? Are you trying to get blow-by-blow, or is once a month enough? You can’t make those decisions across the board; it’s not one size fits all.”

The LoC has focused on sites important for public policy, such as those related to elections and government, or major historic events like Sept. 11 or Pope John Paul II‘s death and succession. (Click here to tour WhiteHouse.gov on March 13, 2003, on the eve of the Iraq war.) Given its limited scope, and even with the coordinated efforts of national libraries across the globe, only a fraction of 1 percent of Web history must have been captured by now, right?

“A fraction at what point in time?” Dulabahn countered, reinforcing the impression that any true understanding of the topic might require a grasp of quantum mechanics.

In more measurable terms, it took the LoC two years to accumulate its first terabyte of Web data. In 2005, it began collecting a terabyte a month. Last spring, that became a terabyte a week. This year, Anderson expects to double that rate. The library isn’t necessarily collecting more websites; rather, the same number of sites today yields exponentially more content.

In total, the collection last year surpassed 3 billion individual “objects” like text files or images – “which was kind of horrifying to us,” Anderson said.

Because how do you sort through all that information to make it useful, and unbundle it from the format restrictions of this moment in time? All the typical standards for how to organize books don’t apply.

To a certain extent, the LoC and its numerous partner organizations are just trying to grab as much content as they can before it disappears, and they hope one day someone will know how to use it.

“This is a reversal of a model we’ve had for decades at the library, where basically we select at the front end,” Dulabahn said. “What’s left you catalogue, and you bind, and you move to the shelves. With this kind of voluminous digital content — like Web archives, things like Twitter archives — it’s a completely different model. You have a chance to get it; it’s probably your only chance looking across the spectrum of time. You have a very small window, so you get it and say, ‘Then we’ll figure out over time how to find the good stuff.'”

But just as they’re betting on future innovation in digital archiving — Web crawlers that can tackle Facebook and search functions that can index time — so too will digital information continue to evolve faster than its would-be preservers can keep up.

Twitter has created a revolution in communication that Dulabahn and Anderson could never have predicted 10 years ago. And so it’s possible something so new will arise in the future that it makes all of the LoC’s efforts at digital preservation obsolete.

“But I want to say this word of encouragement,” Anderson said. “Early video games are still alive — Pacman, those early Donkey Kongs — because people have cared enough about them to try to do something to keep them alive.”

People will also be motivated by more than just nostalgia as businesses become further invested in the enormous new marketplace of digital information.

Predicted Dulabahn: “I think we’ll be able to ride on the coattails of that.”

Emily Badger
Emily Badger is a freelance writer living in the Washington, D.C. area who has contributed to The New York Times, International Herald Tribune and The Christian Science Monitor. She previously covered college sports for the Orlando Sentinel and lived and reported in France.

More From Emily Badger

A weekly roundup of the best of Pacific Standard and PSmag.com, delivered straight to your inbox.

Recent Posts

November 24 • 10:00 AM

Why Are Patients Drawn to Certain Doctors?

We look for an emotional fit between our physicians and ourselves—and right now, that’s the best we can do.


November 24 • 8:00 AM

Why Do We Elect Corrupt Politicians?

Voters, it seems, are willing to forgive—over and over again—dishonest yet beloved politicians if they think the job is still getting done.



November 24 • 6:00 AM

They Steal Babies, Don’t They?

Ethiopia, the Hague, and the rise and fall of international adoption. An exclusive investigation of internal U.S. State Department documents describing how humanitarian adoptions metastasized into a mini-industry shot through with fraud, becoming a source of income for unscrupulous orphanages, government officials, and shady operators—and was then reined back in through diplomacy, regulation, and a brand-new federal law.


November 24 • 4:00 AM

Nudging Drivers, and Pedestrians, Into Better Behavior

Daniel Pink’s new series, Crowd Control, premieres tonight on the National Geographic Channel.


November 21 • 4:00 PM

Why Are America’s Poorest Toddlers Being Over-Prescribed ADHD Drugs?

Against all medical guidelines, children who are two and three years old are getting diagnosed with ADHD and treated with Adderall and other stimulants. It may be shocking, but it’s perfectly legal.



November 21 • 2:00 PM

The Best Moms Let Mess Happen

That’s the message of a Bounty commercial that reminds this sociologist of Sharon Hays’ work on “the ideology of intensive motherhood.”


November 21 • 12:00 PM

Eating Disorders Are Not Just for Women

Men, like women, are affected by our cultural preoccupation with thinness. And refusing to recognize that only makes things worse.


November 21 • 10:00 AM

Queens of the South

Inside Asheville, North Carolina’s 7th annual Miss Gay Latina pageant.


November 21 • 9:12 AM

‘Shirtstorm’ and Sexism in Science

Following the recent T-shirt controversy, it’s clear that sexism in science persists. But the forces driving the gender gap are still being debated.


November 21 • 8:00 AM

What Makes a Film Successful in 2014?

Domestic box office earnings are no longer a reliable metric.



November 21 • 6:00 AM

What Makes a City Unhappy?

According to the National Bureau of Economic Research, Dana McMahan splits time between two of the country’s unhappiest cities. She set out to explore the causes of the happiness deficits.


November 21 • 5:04 AM

Sufferers of Social Anxiety Disorder, Your Friends Like You

The first study of friends’ perceptions suggest they know something’s off with their pals but like them just the same.


November 21 • 4:00 AM

In 2001 Study, Black Celebrities Judged Harshly in Rape Cases

When accused of rape, black celebrities were viewed more negatively than non-celebrities. The opposite was true of whites.


November 20 • 4:00 PM

Women, Kink, and Sex Addiction: It’s Not Like the Movies

The popular view is that if a woman is into BDSM she’s probably a sex addict, and vice versa. In fact, most kinky women are perfectly happy—and possibly healthier than their vanilla counterparts.


November 20 • 2:00 PM

A Majority of Middle-Class Black Children Will Be Poorer as Adults

The disturbing findings of a new study.


November 20 • 12:00 PM

Standing Up for My Group by Kicking Yours

Members of a minority ethnic group are less likely to express support for gay equality if they believe their own group suffers from discrimination.


November 20 • 10:00 AM

For Juvenile Records, It’s ‘Justice by Geography’

A new study finds an inconsistent patchwork of policies across states for how juvenile records are sealed and expunged.


November 20 • 8:00 AM

Surviving the Secret Childhood Trauma of a Parent’s Drug Addiction

As a young girl, Alana Levinson struggled with the shame of her father’s substance abuse. But when she looked more deeply into the research on children of drug-addicted parents, she realized society’s “conspiracy of silence” was keeping her—and possibly millions of others—from adequately dealing with the experience.



November 20 • 6:00 AM

Extreme Weather, Caused by Climate Change, Is Here. Can Nike Prepare You?

Following the approach we often see from companies marketing products before big storms, Nike focuses on climate change science in the promotion of its latest line of base-layer apparel. Is it a sign that more Americans are taking climate change seriously? Don’t get your hopes up.


November 20 • 5:00 AM

How Old Brains Learn New Tricks

A new study shows that the neural plasticity needed for learning doesn’t vanish as we age—it just moves.


November 20 • 4:00 AM

The FBI’s Dangerous Misrepresentation of Encryption Law

The FBI no more deserves a direct line to your data than it deserves to intercept your mail at the post office. But it doesn’t want you to know that.


Follow us


Sufferers of Social Anxiety Disorder, Your Friends Like You

The first study of friends' perceptions suggest they know something's off with their pals but like them just the same.

Standing Up for My Group by Kicking Yours

Members of a minority ethnic group are less likely to express support for gay equality if they believe their own group suffers from discrimination.

How Old Brains Learn New Tricks

A new study shows that the neural plasticity needed for learning doesn't vanish as we age—it just moves.

Ethnic Diversity Deflates Market Bubbles

But it's not in the rainbow and sing-along way you'd hope for. We just don't trust outsiders' judgments.

Online Brain Exercises Are Probably Useless

Even under the guidance of a specialist trainer, computer-based brain exercises have only modest benefits, a new analysis shows.

The Big One

One in two United States senators and two in five House members who left office between 1998 and 2004 became lobbyists. November/December 2014

Copyright © 2014 by Pacific Standard and The Miller-McCune Center for Research, Media, and Public Policy. All Rights Reserved.