Difference between flume and Kafka
Flume | Kafka |
---|---|
The Flume is a Distributed reliable system for collecting, aggregating and moving large amount of data to centralized datastore like HDFS or Hbase |
General purpose publish – subscribe model messaging system |
Adding more consumers means to change the design of flume pipeline and replicating the channel to deliver messages to new sink which needs downtime |
Easy to add more consumers without downtime |
Supports many built-in sources and sinks out of box |
Sometimes need to write own producer and consumer code though Spark and Storm have now come up with built-in integrations to Kafka |
Flume pushes data into sink and hence consumers do not have to maintain offset |
Subscribers are responsible for pulling data and also maintaining pointer to offset |
Events are lost in case the agent goes down | Provides fault tolerance |
Does not support partitioning | Supports partitioning |
Flume pushes data to the sink because of which writes to sink can overwhelm data reads from sink |
Since kafka does not push data, writes from producer to broker and reads from broker to consumers can happen at their own pace |
It is tightly integrated with Hadoop | General purpose |