Kafka – a simple explanation

kafka-overview.001

A message/record consists of a key and a value. A message is sent to a topic. One key resolves to one partition where all messages of that key are persisted.  Thus depending on number of  keys, a topic can have one or many partitions.

Broker is a physical node hosting physical partitions. A replication factor is total number of copies of a topic (its partitions) on distributed network of nodes.

A producer submits messages to their topics (writes to partitions).

A consumer consumes messages from subscribed (one or more) topics. A consumer belongs to a group. One partition is assigned (for reading) to one consumer from a consumer group.  Multiple consumer groups can be assigned to and read from one topic and work independently.

Every time a consumer polls new messages, a heart-beat is sent to the group co-ordinator (A group co-ordinator is the first broker reporting its availability to Zookeeper). If the heart-beat is not received by the group coordinator then it assumes that the consumer is down and re-assigns its partition to other available consumer – process called rebalancing.

After a consumer reads and processes a message it needs to commit the offset of that message. This keeps a consumer and last read message in sync.

If a leader partition goes does, then another in-sync replicated partition becomes a leader and rest of the partitions start keeping up with the new leader partition.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s