Hadoop

If you have any doubts in the below, contact us by dropping a mail to the Kung Fu Panda. We will get back to you very soon.

Technologies in Hadoop Ecosystem

Processes in Hadoop

can be found out by running command "jps"

Note that in the hadoop version 2.0+, job tracker and task tracker have been replaced by YARN taskmanager, and resourcemanager.

Hadoop Setup

Namenode

Datanode

Job Tracker

Task Tracker

Seondary Namenode

InputSplit

Types of Inputformat

HDFS

Hadoop Overall flow

Hadoop Task flow

Mapper => Combiner(Optional) => Partitioner(Optional) => Reducer(Optional)

Mapper

Combiner

Partitioner

Reducer

A reducer has three phases.

No of Reducers

Hadoop Configuration Params

Hadoop Configuration files

Speculative Execution

YARN

Distributed Cache

Counters

Sqoop

Misc

Commands