Big Data: Size Is Everything

Annotated Data, Noise and Simple Algorithms

less than 1 minute read

As a complement (or dissent) from my last post on Big Data, Google weighs in on “The Unreasonable Effectiveness of Data”. This paper anticipates some of the problems with big data by encouraging the use of simple rules paired with massive amounts of data to solve problems. Google Translate operates off of the same principle, analyzing millions of already-translated documents looking for patterns to apply. Easy, free of complexity and possible due to millions of documents. The question then becomes, how can we structure the web to automatically create these huge datasets? A good skim.

Updated: