What are the features of Hadoop ?
Hadoop supports the storage and processing of big data. It is the best solution for handling big data challenges. Some important features of Hadoop are –
Open Source
- Hadoop is an open source framework which means it is available free of cost.
- Also, the users are allowed to change the source code as per their requirements.
Distributed Processing
- Hadoop supports distributed processing of data i.e. faster processing.
- The data in Hadoop HDFS is stored in a distributed manner and MapReduce is responsible for the parallel processing of data.
Fault Tolerance
- Hadoop is highly fault-tolerant. It creates three replicas for each block at different nodes, by default.
- This number can be changed according to the requirement. So, we can recover the data from another node if one node fails.
- The detection of node failure and recovery of data is done automatically.
Reliability
- Hadoop stores data on the cluster in a reliable manner that is independent of machine.
- So, the data stored in Hadoop environment is not affected by the failure of the machine.
Scalability
- Another important feature of Hadoop is the scalability. It is compatible with the other hardware and we can easily ass the new hardware to the nodes.
Economic
- Apache Hadoop is not very expensive as it runs on a cluster of commodity hardware.
- Hadoop also provides huge cost saving also as it is very easy to add more nodes on the fly here. So if requirement increases, then you can increase nodes as well without any downtime and without requiring much of pre-planning.
Easy to use
- No need of client to deal with distributed computing, the framework takes care of all the things. So this feature of Hadoop is easy to use.
Data Locality
- This one is a unique features of Hadoop that made it easily handle the Big Data. Hadoop works on data locality principle which states that move computation to data instead of data to computation.
- When a client submits the MapReduce algorithm, this algorithm is moved to data in the cluster rather than bringing data to the location where the algorithm is submitted and then processing it.
High Availability
- The data stored in Hadoop is available to access even after the hardware failure. In case of hardware failure, the data can be accessed from another path.