Is “Big Data” same as “Big Analytics”?

March 26, 2012

Do a Google on Big Data and you are more likely to find people talking about two things:

  • How Open Source solutions like Hadoop have pioneered this space
  • How some companies have used these solutions to build large scale analytics solutions and business intelligence modules.

Read more and one will find mention of Map Reduce and how many of the NoSQL data stores support this useful “Data Locality” pattern – taking compute to where the data is.

Hadoop users and the creators themselves acknowledge that the technology is good for “streaming reads” and supports high throughput at the cost of latency. This constraint and the fact that Map Reduce tasks are very I/O bound, make it seemingly unsuitable for use cases that involve users waiting for a response such as in OLTP applications.

While all of the above is relevant and mostly true, is it also leading to a certain stereo-typing – that of equating Big Data to Big Analytics?

It might be useful to describe Big Data first. Gartner categorizes data build up in an enterprise as under : Volume, Variety and Velocity. Rapid growth in any of these categories or combinations thereof, results in Big Data. It might be worthwhile to note here that there is no classification under transaction processing or analytics, thereby implying that Big Data is not just Big Analytics.

Big Data solutions need not be limited to Big Analytics and may extend to low latency data access workloads as well. A few random thoughts on patterns and solutions:

  • Data Sharding – useful to scale low latency data stores like RDBMS to store Big Data. Sharding may be built into application code, use an intermediary between the application and data store or inherently supported by the data store using auto-sharding of data.
  • Data Stores by purpose – Big Data invariably means distribution and may result in data duplication; within a single store or multiple. For e.g. data extracts from a DFS like Hadoop may also be stored in a high-speed NoSQL or sharded RDBMS and accessed via secondary indices. This could lead to scenarios outlined by the CAP theorem (http://en.wikipedia.org/wiki/CAP_theorem).
  • Data Stores that effectively leverage CPU, RAM and disk space – Moore’s Law has been proven right the last few years and data stores like the Google Big Table (or HBase) successfully leverage the trend of abundant commodity compute, memory and storage.
  • Optimized Compute Patterns – Efforts like Peregrine(http://peregrine_mapreduce.bitbucket.org/) that support pipe-lined Map Reduce jobs.
  • Data aware Grid topologies – A compute grid where worker participation in a compute task is influenced by data available locally to the worker, usually in-memory. Note that this is different from the data locality pattern implemented in most Map Reduce frameworks.
  • And more…..

It may suffice to say that Big Analytics has been the most visible and commonly deployed use case on Big Data. New age companies, especially the internet based ones, have been using Big Data technologies to deliver content sharing, email, instant messaging and social platform services in near real time. Enterprises are slowly but surely warming up to this trend.

About these ads

One Response to “Is “Big Data” same as “Big Analytics”?”

  1. mksanthosh Says:

    Generally agreed that Big Data and Big Analytics are not the same, but it helps to perform analytics on data that is really huge at the source rather than taking it to a data warehouse. Probably this is a factor which makes Big Data Analytics more visible.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: