You have probably heard the term this year in tech circles and you definitely will hear it more in 2012. Big Data describes the development of faster and more scalable analytic processes for the vast unstructured data produced partly due to the explosion of data sources such as web-based devices. Gartner estimates that 85% of all data exists in a semi-structured form, which means that firms will have to store and untangle different types of data more efficiently before obtaining any business insights.
Here’s a sample of very exciting predictions for big data next year and beyond:
- Within two years, information on the Internet will double every 11 hours. (Sources: University of California at Berkeley School of Information Management and Systems; IBM)
- By 2015, movie downloads and peer-to-peer file sharing will explode to 100 exabytes, equivalent to 5 million Libraries of Congress. (Source: www.humanproductivitylab.com)
- By 2020, a $1,000 personal computer will have the raw processing power of a human brain. (Sources: Hans Moravec, Robotics Institute, Carnegie Mellon University, 1998; Cisco IBSG, 2006-2009)
- By 2015, Google will index approximately 775 billion pages of content. (Source: Cisco IBSG, 2009
Don’t get the two parts to Big Data confused: Storage and Analytics
Storage of such quickly multiplying data has been relatively well dealt with existing IT processing technologies. The unknown is the analytics and presentation the stored volumes of data. If that was not complicated enough, the fast globalizing world means faster and more data-driven decision-making, which increases the need for more real-time analysis and insights. In turn, IT firms are seeing opportunities in the intersection between cloud computing and big data to serve their customer’s needs better. The bombardment of information on consumers and producers is brought to light by the 2009 Cisco estimate that more than 35 billion devices are now connected to the internet. Therefore, the pressure and opportunities to effectively manage big data are real.
The big boys are in the house
IBM, EMC, SAP, Teradata, MapR, Oracle and Cloudera have various versions of analytics, business intelligence and visualization tools. Currently Hadoop followed by NoSQL are the current favourites for storing and processing large amounts of unstructured data. In fact, GigaOm reported on the so-called “Hadoop Wars” since Yahoo Inc open-sourced Hadoop. Since then, Cloudera, Oracle, IBM, Amazon Web Services’ Elastic MapReduce service and Microsoft actively build upon the open-sourced Apache Hadoop. Widespread reports of the extremely high maintenance costs, both on the personnel and the architectural level, illustrate the hunger for such technology to gain a business edge in today’s environment.
Moreover, most industries, whether they make tangible products or provide services, realize the importance in storing and reducing their data into intelligent chunks. Given their customer’s shortened decision-making time and producer’s increased ability to customize their output, the importance of incisive, big data analytics will continue to rise. For example, as Ms. Barrow explains on GigaOm, the manufacturing industry generates “a ton of relational and nonrelational data from inventory systems… [traditional] operations, and product life cycle management”. It illustrates that even the traditional manufacturing industries will have to grapple with storing, analyzing and presenting voluminous amounts of data in a technologically efficient and profitable manner. Where ever you may be in today’s business supply chain, big data will follow you.
15 comments