Book Review: Big Data

It must be said that the current enthusiasm that surrounds “big data” is not all hype. The concept is based on reality, and this book explains why that is so. The authors note that between 2000 and 2007, the total amount of data stored digitally increased from 25% to over 90%. With this massive deluge of data, there has been a resulting paradigm shift in how one thinks about data. In fact, the authors coin the term “datafication” to describe recent attempts to utilize data in new and creative ways.

The authors identify three central characteristics of big data. The first is that it is based on an analysis of all available data rather than just a subset. This is referred to as the “N = all” approach, wherein analysts forgo the traditional tenet of statistics that assumes that an inference is made about a population from a sample. As a corollary, the second property of big data is the eschewal of the need for exactitude and certainty that’s typically a necessity of working with small data samples, and the acceptance of what is termed “messy data.” With larger data sets, results are not precise but the overall benefits are said to outweigh the costs.

The third characteristic is a move away from an insistence on determining causality when developing predictions. In the world of big data, it is deemed sufficient to find correlations. The analyst is not looking for carefully reasoned explanations as to why phenomena occur. It is sufficient to understand what occurred and to establish relationships between variables. In fact, in one telling example of big data in action, it is revealed that the manager in charge of the project made sure not to hire any professional statisticians, as their methods are contrary to big data analysis. More appropriate for big data projects are a new breed of “data scientists” who have less reluctuance to utilize new methods.

Interestingly, the authors are not blind cheerleaders for the brave new world of big data. They elaborate extensively on its dark side, and warn of the dangers of a “data dictatorship,” as they put it. Indeed, an entire chapter is devoted to the societal risks of an over-reliance on big data. One significant risk discussed in detail is the ability of big data to make predictions on events that have not yet occurred, and the possibility that we might penalize individuals for criminal acts which they have not committed. In our politically correct society, this is a worrisome trend that is already well on its way.

Big data is not for the little guy. However, even if you’re not in a position to make immediate use of its principles, this book provides an enlightening look at what some organizations are doing with massive amounts of data, and of the potential, both positive and negative, for society at large.

It should be noted that the authors recently published an article in Foreign Affairs entitled “The Rise of Big Data” that nicely summarizes the theses in this book. The article can be found at:

Big Data: A Revolution That Will Transform How We Live, Work, and Think
by Viktor Mayer-Schonberger and Kenneth Cukier

Houghton Mifflin Harcourt, March 2013

242 pages, $27.00

