What is the difference between flume and Kafka ?

Difference between flume and Kafka

Flume	Kafka
The Flume is a Distributed reliable system for collecting, aggregating and moving large amount of data to centralized datastore like HDFS or Hbase	General purpose publish – subscribe model messaging system
Adding more consumers means to change the design of flume pipeline and replicating the channel to deliver messages to new sink which needs downtime	Easy to add more consumers without downtime
Supports many built-in sources and sinks out of box	Sometimes need to write own producer and consumer code though Spark and Storm have now come up with built-in integrations to Kafka
Flume pushes data into sink and hence consumers do not have to maintain offset	Subscribers are responsible for pulling data and also maintaining pointer to offset
Events are lost in case the agent goes down	Provides fault tolerance
Does not support partitioning	Supports partitioning
Flume pushes data to the sink because of which writes to sink can overwhelm data reads from sink	Since kafka does not push data, writes from producer to broker and reads from broker to consumers can happen at their own pace
It is tightly integrated with Hadoop	General purpose

Categorized in:

Tagged in: