Big Data: Hadoop at a glance

Wednesday, May 14, 2014

Hadoop at a glance

Apache Hadoop, at its core, consists of two components – Hadoop Distributed File System and Hadoop MapReduce. HDFS is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process huge amounts of data in parallel on large clusters of compute nodes. Other Hadoop-related projects (also called EcoSystems) at Apache include Hive, Pig, HBase, Yarn, Mahout, Oozie, Sqoop, Avro, Cascading, ZooKeeper, Flume, Drill, etc.

Other competing technologies of Haddop are - Google Dremel, HPCC Systems, Apache Storm.

Google Dremel is a distributed system developed at Google for interactively querying large datasets and powers Google's BigQuery service.

HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems.

Apache Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language.

Hadoop distributions are provided by growing numbers of companies. They provide products that include Apache Hadoop, a derivative work thereof, commercial support, and/or tools and utilities related to Hadoop. Some major hadoop distribution companies are - Cloudera, Hortonworks, MapR, Amazon Web services, Intel, EMC, IBM, etc.

5 comments:

vijjuMay 14, 2014 at 10:53 PM
Hi Ambuj Nice Article one again and i feel like to add some more points to this article through my comments

The Hadoop EcoSystem once again classification in the following parts
1) Data Analytic s
2) Management
3) Data Access
4) Data Processing
5) Data Storage.

The Data which will need to be provide solution through Hadoop, for providing solution purpose we need to follow the most of the above Ecosystem classifications.
-> Under "Data Analytic s" -- BI & Analytic s Tools came
-> Under "Management" -- Oozie(workflow), EMR(AWS wrokflow), Chukwa(monitoring), Flume(monitoring),
Zookeeper(Mgmt) and so on tools came.
-> Under "Data Access" -- Hive(SQL), Pig(Data flow), Avro(JSON), Mahout(machine learning),
Sqoop(Data Connector) and so on tools came.
-> Under "Data Processing" -- Mapreduce Framework came.
-> Under "Data Storage" -- HDFS, S3 on AWS, Hbase,Cloudstore,NFS with MapR and so on came...

For understanding purpose some of the ecosystem components are explained below :
1. Hive : A data warehouse infrastructure with SQL like querying capabilities on Hadoop Data sets.
2. Pig : A high level data flow language and execution framework for parallel computation.
3. ZooKeeper : A high performance coordination service for distributed applications.
4. Mahout : A scalable machine learning and data mining library..
5. HBase : A scalable, distributed database that supports structured data storage for large tables.

Once again thanks for u r time and knowledge to share about the hadoop....

ReplyDelete
Replies
Ambuj KumarMay 15, 2014 at 2:15 PM
Thanks a lot Vijju for your wonderful feedback.
Yes, you are true that categorization can provide better understanding. I would like to appreciate you for your explanation on certain utilities of the topic. In this post, I particularly targeted to provide brief info about Hadoop & its ecosystem. In a very next post, I am going to put light on an exhaustive range of Hadoop ecosystem other than Apache projects as well. please stay tuned.
ReplyDelete
Replies
AnonymousApril 30, 2022 at 12:15 AM
mmorpg oyunlar
instagram takipçi satın al
tiktok jeton hilesi
tiktok jeton hilesi
antalya saç ekimi
instagram takipçi satın al
İnstagram Takipci Satın Al
Metin pvp
instagram takipçi satın al
ReplyDelete
Replies
AnonymousMay 17, 2022 at 5:38 PM
perde modelleri
sms onay
mobil ödeme bozdurma
NFT NASIL ALİNİR
ankara evden eve nakliyat
Trafik sigortasi
DEDEKTOR
web sitesi kurma
aşk kitapları
ReplyDelete
Replies
AnonymousJune 3, 2022 at 9:28 PM
kadıköy bosch klima servisi
tuzla beko klima servisi
çekmeköy lg klima servisi
çekmeköy alarko carrier klima servisi
maltepe beko klima servisi
ümraniye lg klima servisi
beykoz toshiba klima servisi
üsküdar beko klima servisi
pendik lg klima servisi
ReplyDelete
Replies

Add comment