Menus Subscribe Search

Your Money

data-board

(Photo: ra2studio/Shutterstock)

You, Yes You, Can Analyze Data, Too

• April 21, 2014 • 8:00 AM

(Photo: ra2studio/Shutterstock)

What does it take to break into the growing field of data analysis? To start, you’ll need the Internet, a computer, and some basic math skills.

Big data is everywhere. You probably hear people talking about data on the subway, or the radio, or by the water cooler. You turn on the TV, and the news is all about data. We’re using data to find missing airplanes, become better managers, fight disease, fight crime, fight hunger, fight fires, fight scammers, and fight each other over data. Before you know it, data is going to be all grown up and taking your son or daughter to the prom.

With all of these potential uses for data comes demand for people who can do something with it. Harvard Business Review described the emerging job of “data scientist” as “a high-ranking professional with the training and curiosity to make discoveries in the world of big data” and the “sexiest job of the 21st Century.” The distinction between data scientists and data analysts is mostly dependent on the industry: “Scientist” tends to be used for jobs that barely existed a decade ago in tech, start-ups, and social media circles, while analyst is a job that has existed for decades within government, economic analysis, or academia.

Not one of them suggested enrolling at the local university. In fact, the most expensive thing any of them recommended was to buy a book.

Data scientist salaries typically start in the low six figures. Normally, such lucrative careers require years of expensive formal training or some serious connections; past generations of data analysts required access to university-level supercomputers to crunch numbers on the big data scale. But becoming a data scientist is perhaps the most prominent example of a new industry that breaks from the higher education model and allows people to learn the necessary skills without years of classes.

Brian Burke, founder of Advanced NFL Stats, is a former Navy fighter pilot and military contractor turned NFL data scientist. He received four weeks of statistics and econometrics training while receiving a master’s in “leadership” from the Navy, but other than that he’s entirely self-taught. Charles Pensig, a senior data analyst at Jawbone, tells me he “doesn’t have much in the way of formal training.” He studied statistics at the University of Pennsylvania’s Wharton School and then “taught myself most of the skills I need.” Carl Bialik, lead news writer at FiveThirtyEight, Nate Silver’s recently-launched “data journalism” site, says his training was “mostly self-taught through Excel.”*

Of course, not every data scientist is an amateur gone pro. Sean Taylor, a research scientist on Facebook’s Data Science Team, says “working with data is all I know.” Likewise, Trey Causey, senior data scientist at Zulily and consultant for an unnamed NFL team (he signed a non-disclosure agreement), has incorporated his statistical analysis interests throughout his formal education, culminating in a minor in quantitative methods in his Ph.D. program.

Despite their different approaches, all the scientists I spoke to had virtually identical advice as to how someone could get started down this lucrative path. Not one of them suggested enrolling at the local university. In fact, the most expensive thing any of them recommended was to buy a book.

FIRST, THEY ALL SAY, stop relying on basic spreadsheets like Excel and learn a programming language. Excel’s formulas offer a variety of tools, but they’re often indirect ways of interacting with the data that can be easily misinterpreted or provide minimal insight. Think of it a bit like the “give a man a fish, he eats for a day” saying: Excel gives you data analysis one point at a time, but programming languages—where you write your own commands—teach you to interact with the data on a more meaningful level, understanding more than just the single formula.

“Break out of a point-and-click habit and get to know a language like R or Python,” Causey suggests. “It can be a slower start than something like Excel or Weka, but it forces you to think about the analyses you want to run and what you expect the output to look like.” It’s always tempting to go back to Excel when you need to do a hit-and-run analysis, but Excel’s ceiling will keep you from running serious analysis and you won’t truly understand the data. “I lean toward recommending R to people because it’s had a ton of time to mature and it’s easy to install and get started,” Taylor says. “This is important because you can’t really manipulate and understand data without a little bit of light programming. It also paves the way to more sophisticated analyses and fancier plotting.”

Don’t worry: You don’t have to learn these programming languages on your own if you don’t want to. “There is a treasure trove of information available on the Web, most of which is far more gentle, user-friendly, and effective than a grad school course,” Burke says. “Free courses on Coursera or similar sites can be really great sources.” Likewise, Pensig says that “the best classes I took were Coursera’s data analysis in R and Codeacademy’s python.” For any specific problem, Google is your friend: “You can find answers to just about anything through a well-crafted Web search,” Bialik says.

Once you have the basics down, it’s time to get your hands dirty. “I’d recommend starting some kind of project for fun,” Burke says. The consensus hovered around politics, sports, and movies as vibrant areas for amateur analysis. “Start crunching some numbers from something you’re interested in. It forces you to truly understand the concepts and tools.”

Taylor agrees: “You should be curious to learn about the subject and you should have some idea of what the answers should look like in advance” so you can check your work more easily. “Torture [the data] to your heart’s content,” Causey says, but don’t get carried away seeking that counter-intuitive finding. On the same point, he provided a sports example: If your model says Mark Sanchez is a better quarterback than Tom Brady, there’s probably something wrong with the model.

Finally, you’ll want to learn how to make your data look pretty. Many websites are increasing their data visualization budgets, including the recently-launched Vox venture with Ezra Klein, in which he calls beautiful data visualizations his version of clickbait. Visualizing data is sometimes referred to as Exploratory Data Analysis, or EDA. “If you can’t look at the data, you won’t be able to understand the story it’s telling you,” Taylor cautioned. To this end, he recommended a free EDA course on Udacity produced by his Facebook colleagues.

This career path also indicates a possible, albeit much more limited, future for online education, a concept that was once hailed as the great democratizer but is now experiencing a bit of a backlash. The recommendations above are very limited in skill set and scope; that is, it’s no liberal arts education. But it also offers a kind of digital apprenticeship that requires no investment (other than time) and an earnings potential far above the mean. One of this generation’s great questions is whether higher education is still worth it. The data scientist might suggest a future where it isn’t.

This isn’t to say everyone should take these steps to become a data scientist, but that anyone with a computer, the Internet, and a basic understanding of math and statistics presumably could. And even if you’re lacking in the latter categories, there are online courses to help with that, too.


*UPDATE — April 21, 2014: We originally wrote that Charles Pensig is a senior data scientist at Jawbone. He is a senior data analyst.

Aaron Gordon
Aaron Gordon is a freelance writer living in Washington, D.C. He also contributes to Sports on Earth, The New Yorker, Deadspin, and Slate.

More From Aaron Gordon

A weekly roundup of the best of Pacific Standard and PSmag.com, delivered straight to your inbox.

Recent Posts

July 25 • 2:00 PM

Trophy Scarves: Race, Gender, and the Woman-as-Prop Trope

Social inequality unapologetically laid bare.


July 25 • 1:51 PM

Confusing Population Change With Migration

A lot of population change is baked into a region from migration that happened decades ago.


July 25 • 1:37 PM

Do Not Tell Your Kids That Eating Vegetables Will Make Them Stronger

Instead, hand them over in silence. Or, market them as the most delicious snack known to mankind.



July 25 • 11:07 AM

The West’s Groundwater Is Being Sucked Dry

Scientists were stunned to discover just how much groundwater has been lost from beneath the Colorado River over the past 10 years.


July 25 • 10:00 AM

Shelf Help: New Book Reviews in 100 Words or Less

What you need to know about Bad Feminist, XL Love, and The Birth of Korean Cool.



July 25 • 8:00 AM

The Consequences of Curing Childhood Cancer

The majority of American children with cancer will be cured, but it may leave them unable to have children of their own. Should preserving fertility in cancer survivors be a research priority?


July 25 • 6:00 AM

Men Find Caring, Understanding Responses Sexy. Women, Not So Much

For women looking to attract a man, there are advantages to being a caring conversationalist. But new research finds it doesn’t work the other way around.


July 25 • 4:00 AM

Arizona’s Double-Talk on Execution and Torture

The state is certain that Joseph Wood’s death was totally constitutional. But they’re looking into it.


July 24 • 4:00 PM

Overweight Americans Have the Lowest Risk of Premature Death

Why do we use the term “normal weight” when talking about BMI? What’s presented as normal certainly isn’t the norm, and it may not even be what’s most healthy.


July 24 • 2:00 PM

California’s Lax Policing of the Fracking Industry Has Put the Drought-Stricken State in a Terrible Situation

The state’s drought has forced farmers to rely on groundwater, even as aquifers have been intentionally polluted due to exemptions for the oil industry.


July 24 • 12:00 PM

What’s in a Name? The Problem With Washington’s Football Team

A senior advisor to the National Congress of American Indians once threw an embarrassing themed party that involved headdresses. He regrets that costume now, but knows his experience is one many others can relate to.


July 24 • 11:00 AM

How Wildlife Declines Are Leading to Slavery and Terrorism

As wildlife numbers dwindle, wildlife crimes are rising—and that’s fueling a raft of heinous crimes committed against humans.


July 24 • 10:58 AM

How the Supremes Pick Their Cases—and Why Obamacare Is Safe for Now

The opponents of Obamacare who went one for two in circuit court rulings earlier this week are unlikely to see their cases reach the Supreme Court.



July 24 • 9:48 AM

The People Who Are Scared of Dogs

While more people fear snakes or spiders, with dogs everywhere, cynophobia makes everyday public life a constant challenge.


July 24 • 8:00 AM

Newton’s Needle: On Scientific Self-Experimentation

It is all too easy to treat science as a platform that allows the observer to hover over the messiness of life, unobserved and untouched. But by remembering the role of the body in science, perhaps we humanize it as well.


July 24 • 6:00 AM

Commercializing the Counterculture: How the Summer Music Festival Went Mainstream

With painted Volkswagen buses, talk of “free love,” and other reminders of the Woodstock era replaced by advertising and corporate sponsorships, hippie culture may be dying, but a new subculture—a sort of purgatory between hipster and hippie—is on the rise.


July 24 • 5:00 AM

In Praise of Our Short Attention Spans

Maybe there’s a good reason why it seems like there’s been a decline in our our ability to concentrate for a prolonged period of time.


July 24 • 4:00 AM

How Stereotypes Take Shape

New research from Scotland finds they’re an unfortunate product of the way we process and share information.


July 23 • 4:00 PM

Who Doesn’t Like Atheists?

The Pew Research Center asked Americans of varying religious affiliations how they felt about each other.


July 23 • 2:00 PM

We Need to Start Tracking Patient Harm and Medical Mistakes Now

Top patient-safety experts call on Congress to step in and, among other steps, give the Centers for Disease Control and Prevention wider responsibility for measuring medical mistakes.


July 23 • 12:19 PM

How a CEO’s Fiery Battle Speeches Can Shape Ethical Behavior

CEO war speech might inspire ethical decisions internally and unethical ones among competing companies.


July 23 • 12:00 PM

Why Do We Love the ‘Kim Kardashian: Hollywood’ Game?

It’s easy enough to turn yourself into a virtual celebrity, complete with fame and mansions—but it will likely cost you.


Follow us


Subscribe Now

Do Not Tell Your Kids That Eating Vegetables Will Make Them Stronger

Instead, hand them over in silence. Or, market them as the most delicious snack known to mankind.

The West’s Groundwater Is Being Sucked Dry

Scientists were stunned to discover just how much groundwater has been lost from beneath the Colorado River over the past 10 years.

How Wildlife Declines Are Leading to Slavery and Terrorism

As wildlife numbers dwindle, wildlife crimes are rising—and that's fueling a raft of heinous crimes committed against humans.

How a CEO’s Fiery Battle Speeches Can Shape Ethical Behavior

CEO war speech might inspire ethical decisions internally and unethical ones among competing companies.

Modern Technology Still Doesn’t Protect Americans From Deadly Landslides

No landslide monitoring or warning systems are being used to protect vulnerable communities.

The Big One

Today, the United States produces less than two percent of the clothing purchased by Americans. In 1990, it produced nearly 50 percent. July/August 2014

Copyright © 2014 by Pacific Standard and The Miller-McCune Center for Research, Media, and Public Policy. All Rights Reserved.