Difference between flume and Kafka

Flume Kafka
The Flume is a Distributed reliable system for collecting,
aggregating and moving large amount of data to
centralized datastore like HDFS or Hbase
General purpose publish – subscribe
model messaging system
Adding more consumers means to change the
design of flume pipeline and replicating the channel to
deliver messages to new sink which needs
downtime
Easy to add more consumers without downtime
Supports many built-in sources and sinks
out of box
Sometimes need to write own producer and consumer code though Spark and Storm have now come up with built-in integrations to Kafka
Flume pushes data into sink and hence
consumers do not have to maintain offset
Subscribers are responsible for pulling data
and also maintaining pointer to offset
Events are lost in case the agent goes down Provides fault tolerance
Does not support partitioning Supports partitioning
Flume pushes data to the sink because of
which writes to sink can overwhelm data reads from sink
Since kafka does not push data, writes from producer to broker and reads from broker to consumers can happen at their own pace
It is tightly integrated with Hadoop General purpose
what is the difference between flume and kafka

Categorized in: