As per Wikipedia's definition, Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
There’s no clear cut definition for ‘big data’ - it is a very subjective term. Most people would consider a data set of terabytes or more to be ‘big data’. One reasonable definition is that it is data which can’t easily be processed on a single machine.
The big data challenges include capture, curating, storage, search, sharing, transfer, analysis and visualization.
To understand big data in more precisely, we should look into the characteristic of big data, i.e. 3 V's -
1. Volume:
As the name suggests big data technology depends on a
massive amount of data in order to get better intelligence and more valuable
information. The technology is few years old and according to IBM in 2012, the
data gathered every day equals 2.5 Exabyte (2.5 quintillion bytes). Such
enormous amount of data will definitely require very advanced computational
power as well as storage resources to be handled, stored, and analyzed in a
reasonable amount of time. Moreover, the gathered information is rapidly increasing
in detail, thus in size.
According to Harvard business review, “Big Data: The
Management Revolution, by Andrew McAfee”, the size of data is expected to be
doubled every 40 months depending on the high penetration rates of the wireless
technology market.
2. Velocity:
Big data technology requires a very high computational
resources as well as storage in order to handle large data and complex sets of
unstructured data. The data can be generated and stored in many ways, yet the
company ability to store, retrieve and process these data sets affects the company
agility.
A famous example was
demonstrated by a group of researchers from the MIT media lab on the black
Friday (the start of Christmas shopping in the United States). In an experiment
the MIT media lab group collected information from the Location Based Service
over Smartphones to detect how many cars entered Macy’s parking lot. Using such
information they were able to estimate the size of Macy’s sales before Macy’s
itself was able to detect it.
3. Variety:
Unlike the
traditional analytics, the big data theoretically has an infinite number of
forms. The data are collected in tremendous number of ways and every single
operation or action represents a value to the business. No one can count the
number of operations that are carried over the web and electronic devices every
single moment all over the globe. For instance, every post and interaction on
Facebook, tweet, shared image, text message, GPS signal and many other forms of
electronic interaction counts and adds valuable information.
This variety of data in most cases produces large amounts of unstructured data sets. The biggest issue that comes with such enormous and unstructured database is the noisy image of the data. Subsequently, in order to get the proper information and superior value the big data will poses much more mining.
The digital data is growing like tsunami. As per the forecast done by
IDC, it is projected to grow about 40 Zettabytes by 2020.
Courtesy: Hadoop Summit April 2-3, 2014, Amsterdam, Netherlands |
The key Big Data technologies are as follows -
- Hadoop - MapReduce framework, including Hadoop Distributed File System (HDFS)
- NoSQL (Not Only SQL) data stores
- MPP (Massively Parallel Processing) databases
- In-memory database processing
Typical Big Data Problems -
- Perform sentiment analysis on 12 terabytes of daily Tweets
- Predict power consumption from 350 billion annual metere readings
- Identify potential fraud in a business's 5 million daily transactions
Nice Post Ambuj...Very useful....!
ReplyDeleteThanks Srinivas for your feedback and stay tuned to upcoming post.
ReplyDelete