Big Data : A Revolution That Will Transform How We Live, Work, and Think

by Viktor Mayer-Schönberger, Kenneth Cukier

A revelatory exploration of emerging trends in "big data"—our newfound ability to gather and interpret vast amounts of information—and the revolutionary effects these developments are producing in business, science, and society at large.

  • Format: eBook
  • ISBN-13/ EAN: 9780544002937
  • ISBN-10: 0544002938
  • Pages: 240
  • Publication Date: 03/05/2013
  • Carton Quantity: 1

Also available in:

About the Book
About the Authors
  • A revelatory exploration of the hottest trend in technology and the dramatic impact it will have on the economy, science, and society at large.

    Which paint color is most likely to tell you that a used car is in good shape? How can officials identify the most dangerous New York City manholes before they explode? And how did Google searches predict the spread of the H1N1 flu outbreak?

    The key to answering these questions, and many more, is big data. “Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it. This emerging science can translate myriad phenomena—from the price of airline tickets to the text of millions of books—into searchable form, and uses our increasing computing power to unearth epiphanies that we never could have seen before. A revolution on par with the Internet or perhaps even the printing press, big data will change the way we think about business, health, politics, education, and innovation in the years to come. It also poses fresh threats, from the inevitable end of privacy as we know it to the prospect of being penalized for things we haven’t even done yet, based on big data’s ability to predict our future behavior.

    In this brilliantly clear, often surprising work, two leading experts explain what big data is, how it will change our lives, and what we can do to protect ourselves from its hazards. Big Data is the first big book about the next big thing.

  • 1


    IN 2009 A NEW FLU virus was discovered. Combining elements of the viruses that cause bird flu and swine flu, this new strain, dubbed H1N1, spread quickly. Within weeks, public health agencies around the world feared a terrible pandemic was under way. Some commentators warned of an outbreak on the scale of the 1918 Spanish flu that had infected half a billion people and killed tens of millions. Worse, no vaccine against the new virus was readily available. The only hope public health authorities had was to slow its spread. But to do that, they needed to know where it already was.

       In the United States, the Centers for Disease Control and Prevention (CDC) requested that doctors inform them of new flu cases. Yet the picture of the pandemic that emerged was always a week or two out of date. People might feel sick for days but wait before consulting a doctor. Relaying the information back to the central organizations took time, and the CDC only tabulated the numbers once a week. With a rapidly spreading disease, a two-week lag is an eternity. This delay completely blinded public health agencies at the most crucial moments.

       As it happened, a few weeks before the H1N1 virus made headlines, engineers at the Internet giant Google published a remarkable paper in the scientific journal Nature. It created a splash among health officials and computer scientists but was otherwise overlooked. The authors explained how Google could “predict” the spread of the winter flu in the United States, not just nationally, but down to specific regions and even states. The company could achieve this by looking at what people were searching for on the Internet. Since Google receives more than three billion search queries every day and saves them all, it had plenty of data to work with.

       Google took the 50 million most common search terms that Americans type and compared the list with CDC data on the spread of seasonal flu between 2003 and 2008. The idea was to identify people infected by the flu virus by what they searched for on the Internet. Others had tried to do this with Internet search terms, but no one else had as much data, processing power, and statistical know-how as Google.

       While the Googlers guessed that the searches might be aimed at getting flu information — typing phrases like “medicine for cough and fever” — that wasn’t the point: they didn’t know, and they designed a system that didn’t care. All their system did was look for correlations between the frequency of certain search queries and the spread of the flu over time and space. In total, they processed a staggering 450 million different mathematical models in order to test the search terms, comparing its predictions against actual flu cases from the CDC in 2007 and 2008. And they struck gold: their software found a combination of 45 search terms that, when used together in a mathematical model, had a strong correlation between their prediction and the official figures nationwide. Like the CDC, they could tell where the flu had spread, but unlike the CDC they could tell it in near real-time, not a week or two after the fact.

       Thus when the H1N1 crisis struck in 2009, Google’s system proved to be a more useful and timely indicator than government statistics with their natural reporting lags. Public health officials were armed with valuable information.

       Strikingly, Google’s method does not involve distributing mouth swabs or contacting physicians’ offices. Instead, it is built on “big data” — the ability of society to harness information in novel ways to produce useful insights or goods and services of significant value. With it, by the time the next pandemic comes around, the world will have a better tool at its disposal to predict and thus prevent its spread.


    Public health is only one area where big data is making a big difference. Entire business sectors are being reshaped by big data as well. Buying airplane tickets is a good example.

       In 2003 Oren Etzioni needed to fly from Seattle to Los Angeles for his younger brother’s wedding. Months before the big day, he went online and bought a plane ticket, believing that the earlier you book, the less you pay. On the flight, curiosity got the better of him and he asked the fellow in the next seat how much his ticket had cost and when he had bought it. The man turned out to have paid considerably less than Etzioni, even though he had purchased the ticket much more recently. Infuriated, Etzioni asked another passenger and then another. Most had paid less.

       For most of us, the sense of economic betrayal would have dissipated by the time we closed our tray tables and put our seats in the full, upright, and locked position. But Etzioni is one of America’s foremost computer scientists. He sees the world as a series of big-data problems — ones that he can solve. And he has been mastering them since he graduated from Harvard in 1986 as its first undergrad to major in computer science.

       From his perch at the University of Washington, he started a slew of big-data companies before the term “big data” became known. He helped build one of the Web’s first search engines, MetaCrawler, which was launched in 1994 and snapped up by InfoSpace, then a major online property. He co-founded Netbot, the first major comparison-shopping website, which he sold to Excite. His startup for extracting meaning from text documents, called ClearForest, was later acquired by Reuters.

       Back on terra firma, Etzioni was determined to figure out a way for people to know if a ticket price they see online is a good deal or not. An airplane seat is a commodity: each one is basically indistinguishable from others on the same flight. Yet the prices vary wildly, being based on a myriad of factors that are mostly known only by the airlines themselves.

       Etzioni concluded that he didn’t need to decrypt the rhyme or reason for the price differences. Instead, he simply had to predict whether the price being shown was likely to increase or decrease in the future. That is possible, if not easy, to do. All it requires is analyzing all the ticket sales for a given route and examining the prices paid relative to the number of days before the departure.

       If the average price of a ticket tended to decrease, it would make sense to wait and buy the ticket later. If the average price usually increased, the system would recommend buying the ticket right away at the price shown. In other words, what was needed was a souped-up version of the informal survey Etzioni conducted at 30,000 feet. To be sure, it was yet another massive computer science problem. But again, it was one he could solve. So he set to work.

       Using a sample of 12,000 price observations that was obtained by “scraping” information from a travel website over a 41-day period, Etzioni created a predictive model that handed its simulated passengers a tidy savings. The model had no understanding of why, only what. That is, it didn’t know any of the variables that go into airline pricing decisions, such as number of seats that remained unsold, seasonality, or whether some sort of magical Saturday-night-stay might reduce the fare. It based its prediction on what it did know: probabilities gleaned from the data about other flights. “To buy or not to buy, that is the question,” Etzioni mused. Fittingly, he named the research project Hamlet.

       The little project evolved into a venture capital-...

  • "Every decade, there are a handful of books that change the way you look at everything. This is one of those books. Society has begun to reckon the change that big data will bring. This book is an incredibly important start."
    —Lawrence Lessig, Roy L. Furman Professor of Law, Harvard Law School, and author of Remix and Free Culture

    "This brilliant book cuts through the mystery and the hype surrounding big data.
    A must-read for anyone in business, information technology, public policy, intelligence, and medicine. And anyone else who is just plain curious about the future."
    —John Seely Brown, former Chief Scientist, Xerox Corp., and head of Xerox Palo Alto Research Center

    "Big Data breaks new ground in identifying how today’s avalanche of information fundamentally shifts our basic understanding of the world. Argued boldly and written beautifully, the book clearly shows how companies can unlock value, how policymakers need to be on guard, and how everyone’s cognitive models need to change."
    —Joi Ito, Director of the MIT Media Lab

    "Big Data is a must-read for anyone who wants to stay ahead of one of the key trends defining the future of business."
    —Marc Benioff, Chairman and CEO,

    "An optimistic and practical look at the Big Data revolution — just the thing to get your head around the big changes already underway and the bigger changes to come."
    —Cory Doctorow,

    "Just as water is wet in a way that individual water molecules aren’t, big data can reveal information in a way that individual bits of data can’t. The authors show us the surprising ways that enormous, complex, and messy collections of data can be used to predict everything from shopping patterns to flu outbreaks."
    —Clay Shirky, author of Cognitive Surplus and Here Comes Everybody

    "The book teems with great insights on the new ways of harnessing information, and offers a convincing vision of the future. It is essential reading for anyone who uses — or is affected by — big data."
    —Jeff Jonas, IBM Fellow & Chief Scientist, IBM Entity Analytics

    “What I’m certain about is that Big Data will be the defining text in the discussion for some time to come.”

    “The authors make clear that ‘big data’ is much more than a Silicon Valley buzzword… No other book offers such an accessible and balanced tour of the many benefits and downsides of our continuing infatuation with data.”
    Wall Street Journal

    "Plenty of books extol the technical marvels of our information society, but this is an original analysis of the information itself—trillions of searches, calls, clicks, queries and purchases....A fascinating, enthusiastic view of the possibilities of vast computer correlations and the entrepreneurs who are taking advantage of them."
    —STARRED Kirkus Reviews

    "This book offers important insights and information"

    "'big data' [is] one of the buzzwords of corporate executives, tech-savvy politicians, and worried civil libertarians. If you want to know what they’re all talking about, then Big Data is the book for you, a comprehensive and entertaining introduction to a very large topic....Mayer-Schönberger and Cukier offer up some sensible suggestions on how we can have the blessings of big data and our freedoms, too. Just as well; their lively book leaves no doubt that big data’s growth spurt is just beginning."
    —Boston Globe