What are the core components of Hadoop ?
- Hadoop is an open source framework that is meant for storage and processing of big data in a distributed manner.
HDFS (Hadoop Distributed File System)
- HDFS is the basic storage system of Hadoop.
- The large data files running on a cluster of commodity hardware are stored in HDFS.
- It can store data in a reliable manner even when hardware fails.
Core Components of Hadoop
- Hadoop MapReduce
- YARN
Hadoop MapReduce
- MapReduce is the Hadoop layer that is responsible for data processing. It writes an application to process unstructured and structured data stored in HDFS.
- It is responsible for the parallel processing of high volume of data by dividing data into independent tasks.
- The processing is done in two phases Map and Reduce.
- The Map is the first phase of processing that specifies complex logic code and the Reduce is the second phase of processing that specifies light-weight operations.
YARN
- The processing framework in Hadoop is YARN.
- It is used for resource management and provides multiple data processing engines i.e. data science, real-time streaming, and batch processing.