KRaft Overview for Confluent Platform

Starting with Confluent Platform version 8.0, KRaft (pronounced craft) mode is how metadata is managed in Apache Kafka®.

Kafka Raft (KRaft) is the consensus protocol that greatly simplifies Kafka’s architecture by consolidating responsibility for metadata into Kafka itself.

The following image provides a simple illustration of Kafka running with KRaft managing metadata for the cluster. Each KRaft controller is a node in a Raft quorum, and each node is a broker that can handle client requests.

KRaft isolated mode

The controller quorum

The KRaft controller nodes comprise a Raft quorum which manages the Kafka metadata log. This log contains information about each change to the cluster metadata. Metadata about topics, partitions, ISRs, configurations, and so on, is stored in this log.

Using the Raft consensus protocol, the controller nodes maintain consistency and leader election without relying on any external system. The leader of the metadata log is called the active controller. The active controller handles all RPCs made from the brokers. The follower controllers replicate the data which is written to the active controller, and serve as hot standbys if the active controller should fail. With the concept of a metadata log, brokers use offsets to keep track of the latest metadata stored in the KRaft controllers, which results in more efficient propagation of metadata and faster recovery from controller failovers.

KRaft requires a majority of nodes to be running. For example, a three-node controller cluster can survive one failure. A five-node controller cluster can survive two failures, and so on.

Periodically, the controllers will write out a snapshot of the metadata to disk. This is conceptually similar to compaction, but state is read from memory rather than re-reading the log from disk.

Scaling Kafka with KRaft

There are two properties that determine the number of partitions an Kafka cluster can support: the per-node partition count limit and cluster-wide partition limit.

KRaft mode is designed to handle a large number of partitions per cluster, however Kafka’s scalability still primarily depends on adding nodes to get more capacity, so the cluster-wide limit still defines the upper bounds of scalability within the system.

In KRaft, the quorum controller reduces the time taken to move critical metadata in a controller failover scenario. The result of this change is a near-instantaneous controller failover. The following image shows the results of a Confluent lab experiment on a Kafka cluster running 2 million partitions, which is 10 times the maximum number of partitions for a cluster running ZooKeeper. The experiment shows that controlled shutdown time and recovery time after uncontrolled shutdown are greatly improved with a quorum controller versus ZooKeeper.

../_images/partition-scale-chart.png

Configure Confluent Platform with KRaft

For details on how to configure Confluent Platform with KRaft, see KRaft Configuration for Confluent Platform.

Client configurations are not impacted by Confluent Platform moving to KRaft to manage metadata.

Migrate from ZooKeeper to KRaft

If you haven’t already migrated to KRaft, see Migrate from ZooKeeper to KRaft on Confluent Platform. You must do this before you upgrade to Confluent Platform 8.0.

Limitations and known issues

  • Combined mode, where a Kafka node acts as a broker and also a KRaft controller, is not currently supported by Confluent. There are key security and feature gaps between combined mode and isolated mode in Confluent Platform.
  • Versions of Confluent Platform older than Confluent Platform 7.9 do not support quorum reconfiguration, meaning you cannot add more KRaft controllers, or remove existing ones in your cluster. Confluent Platform 7.9 adds this feature. For more information, see KIP 853. Three (3) or five (5) controllers are generally recommended for production, but you should have at least 3. For more information, see Hardware.
  • You cannot currently use Schema Registry Topic ACL Authorizer for Confluent Platform for Schema Registry with Confluent Platform in KRaft mode. As an alternative, you can use Schema Registry ACL Authorizer for Confluent Platform or Configure Role-Based Access Control for Schema Registry in Confluent Platform.
  • Currently, Health+ reports KRaft controllers as brokers and as a result, alerts may not function as expected.