Crack Interview — Spring Boot with Apache Kafka

A cup of JAVA coffee with NeeSri
16 min readJul 5, 2024

--

Dear Readers,

I gonna to explain Apache kafka implementation in Springboot.

Apache Kafka is a distributed streaming platform that is used to build real-time data pipelines and streaming applications. It is designed to handle large volumes of data and provides features for publish-subscribe messaging, fault-tolerance, and high-throughput.

Real-Time Example: Online Shopping Platform

Imagine an online shopping platform like Amazon or eBay. Let’s see how Kafka could be used in this scenario.

Use Case 1: Order Processing

  1. Placing an Order:
  • When a customer places an order, the order details (like product ID, quantity, customer info) are sent to a Kafka topic named orders.
  • Here, the system that takes customer orders acts as a producer and sends the order details to the orders topic.

2. Processing the Order:

  • Multiple services need to process this order, such as payment processing, inventory management, and shipping.
  • Each of these services can be a consumer in a consumer group that subscribes to the orders topic.

For example:

  • The payment service reads the order details from the orders topic to process the payment.
  • The inventory service reads the same details to update the stock.
  • The shipping service reads the details to prepare the shipment.

Key Concepts

1. Kafka Cluster
2. Kafka Broker
3. Kafka Producer
4. Kafka Consumer
5. Kafka Topic
6. Kafka Partitions
7. Kafka Offsets
8. Kafka Consumer Group

1.Kafka Cluster:

Kafka is a distributed system, it acts as a cluster. A Kafka cluster consists of a set of brokers. A cluster has a minimum of 3 brokers.

The following diagram shows Kafka cluster with three Kafka brockers:

Kafka Cluster

2. Kafka Broker:

A Kafka broker is essentially a Kafka server. The name “broker” makes sense because the Kafka broker acts as a middleman (or agent) between producers and consumers.

  • Producer: Think of a producer as someone who writes a message.
  • Consumer: Think of a consumer as someone who reads a message.
  • Kafka Broker: This is the server that handles the messages. It takes messages from the producer and delivers them to the consumer.

The producer and consumer don’t talk to each other directly. Instead, they use the Kafka broker to send and receive messages. This way, the Kafka broker manages all the communication.

Example with a Simple Diagram

Imagine a producer wants to send a message to a consumer. Here’s how it works:

  1. Producer writes a message and sends it to the Kafka broker.
  2. Kafka Broker receives the message and stores it.
  3. Consumer reads the message from the Kafka broker.

3. Kafka Producer

Producer is an application that sends messages. It does not send messages directly to the recipient. It sends messages only to the Kafka server.

The following diagram shows Producer sends messages directly to Kafka broker:

4.Kafka Consumer

A Kafka consumer is an application that retrieves messages from the Kafka server.

When producers send data, they send it to the Kafka server, not directly to specific consumers. The Kafka server acts as a central hub that holds the messages.

How it Works

  • Producer: Sends messages to the Kafka server.
  • Kafka Broker: Stores the messages.
  • Consumer: Reads messages from the Kafka server.

Anyone interested in the data can request it from the Kafka server. So, any application that needs data can act as a consumer and request messages from the Kafka server, as long as it has the necessary permissions to read that data.

Example with a Simple Diagram

Let’s consider an example where producers are sending data about user activities to the Kafka server. Consumers, like analytics services or notification services, can read these messages from the Kafka server.

Diagram

+-----------+        +-------------+        +-----------+
| Producer | -----> | Kafka Broker| <----- | Consumer |
| (Sends | | (Server) | | (Reads |
| Messages) | | | | Messages) |
+-----------+ +-------------+ +-----------+
^ ^
| |
| |
+-----------+ +-----------+
| Consumer | | Consumer |
| (Analytics)| | (Alerts) |
+-----------+ +-----------+

Explanation

  1. Producer: Sends messages about user activities to the Kafka broker.
  2. Kafka Broker: Stores these messages in the central server.
  3. Consumers:
  • Analytics Service: Reads the messages from the Kafka broker to analyze user behavior.
  • Notification Service: Reads the messages from the Kafka broker to send alerts.

In this setup:

  • Producers send data to the Kafka broker without worrying about who will read it.
  • Consumers request and read messages from the Kafka broker as needed.
  • Multiple consumers can read the same messages, allowing different services to use the data in various ways.

This design makes it easy to manage data flow in a system and to add new consumers without changing the producers.

5. Kafka Topic

When producers send data to the Kafka broker, consumers need a way to identify which data they want to read. This is where Kafka topics come in.

A topic in Kafka is like a table in a database or a folder in a file system. It’s a category or a feed name to which records are sent. Each topic is identified by a unique name. Producers send data to specific topics, and consumers read data from specific topics.

Key Points

  • Topic: A named category where messages are stored.
  • Producers: Send messages to specific topics.
  • Consumers: Request and read messages from specific topics.
  • Multiple Topics: You can create as many topics as you need for different types of data.

The following diagram shows two Topics are created in a Kafka broker:

6. Kafka Partitions

n Kafka, each topic is divided into smaller parts called partitions. Partitions are important for scalability and fault tolerance.

Key Points

  • Partitions: Each topic is split into several partitions, which are ordered, immutable sequences of records (messages).
  • Sequence: Within a partition, records are stored in the exact order they arrive and each record has a unique identifier called an offset.
  • Distributed System: Partitions allow Kafka to distribute data across multiple computers (Kafka brokers) for better storage and management.

Why Partitions are Important

  1. Scalability: By dividing a topic into multiple partitions, Kafka can handle more data and more consumers simultaneously. Different partitions can be stored on different brokers, allowing for horizontal scaling.
  2. Fault Tolerance: If one broker fails, other brokers with copies of the partitions can continue to serve the data, ensuring high availability.

Example with a Simple Diagram

Let’s take a topic called orders and divide it into three partitions. These partitions will be distributed across three Kafka brokers.

Kafka Cluster:
+----------------------------+
| Kafka Broker 1 |
| +------------------------+ |
| | Partition 0 | |
| | - Order 1 | |
| | - Order 4 | |
| | - Order 7 | |
| +------------------------+ |
| |
| Kafka Broker 2 |
| +------------------------+ |
| | Partition 1 | |
| | - Order 2 | |
| | - Order 5 | |
| | - Order 8 | |
| +------------------------+ |
| |
| Kafka Broker 3 |
| +------------------------+ |
| | Partition 2 | |
| | - Order 3 | |
| | - Order 6 | |
| | - Order 9 | |
| +------------------------+ |
+----------------------------+

Producers:
+-----------+ +-------------+
| Producer | -----> | Kafka Broker|
+-----------+ +-------------+

Consumers:
+-----------+ +-------------+
| Consumer | <----- | Kafka Broker|
+-----------+ +-------------+

By using partitions, Kafka efficiently manages large volumes of data, ensuring that the system is scalable and reliable. Each partition acts as a log, storing messages in the order they arrive, and enabling consumers to process data in parallel.

7. Kafka Offsets

In Kafka, an offset is a unique identifier assigned to each message within a partition. Offsets are crucial for maintaining the order and tracking the processing of messages.

Key Points

  • Offset: A sequence of IDs given to messages as they arrive at a partition.
  • Immutable: Once an offset is assigned to a message, it never changes.
  • Sequential: Offsets start from zero for the first message, and each subsequent message receives the next sequential number.

Why Offsets are Important

  1. Message Ordering: Offsets ensure that messages are read in the same order they were written.
  2. Consumer Progress: Consumers use offsets to keep track of which messages they have processed, allowing them to resume from the correct position after a failure or restart.
  3. Efficient Retrieval: Offsets enable efficient retrieval of messages by specifying which message to start from.

Example with a Simple Diagram

Let’s take a topic orders with one partition. Messages are assigned offsets as they arrive.

Diagram

Kafka Partition:
+----------------------------------+
| Partition 0 |
| +-----------------------------+ |
| | Offset | Message | |
| |-----------------------------| |
| | 0 | Order 1 | |
| | 1 | Order 2 | |
| | 2 | Order 3 | |
| | 3 | Order 4 | |
| | 4 | Order 5 | |
| +-----------------------------+ |
+----------------------------------+

Producer:
+-----------+ +-------------+
| Producer | -----> | Kafka Broker|
+-----------+ +-------------+

Consumer:
+-----------+ +-------------+
| Consumer | <----- | Kafka Broker|
+-----------+ +-------------+

Explanation

  1. Producer:
  • Sends messages (orders) to the Kafka broker.
  • Messages are assigned sequential offsets as they are stored in the partition.

2. Kafka Broker:

  • Stores the messages in the partition with their offsets.
  • The orders partition contains messages with offsets 0, 1, 2, 3, and 4.

3. Consumer:

  • Reads messages from the Kafka broker.
  • Uses offsets to track which messages have been processed.

How Offsets Work

  1. Assigning Offsets:
  • When a new order (message) arrives, it is stored in the partition and assigned the next available offset.
  • Example: The first order gets offset 0, the second order gets offset 1, and so on.

2.Reading Messages:

  • A consumer reads messages from the partition starting from a specific offset.
  • Example: If the consumer has processed messages up to offset 2, it will start reading from offset 3 next.

3. Storing Offsets:

  • Consumers store the offset of the last processed message, usually in a separate storage system (like Kafka’s internal storage or an external database).
  • Example: If a consumer reads up to offset 3, it stores this offset to know where to resume from in case of a restart.

By using offsets, Kafka ensures that messages are processed in the correct order and that consumers can reliably keep track of their progress. This mechanism helps maintain the integrity and consistency of data processing in a distributed system.

8.Kafka Consumer Group

In Kafka, a consumer group is a logical grouping of one or more consumers that work together to consume and process messages from Kafka topics.

Key Points

  • Consumer Group: A set of consumers that collaboratively consume messages from one or more Kafka topics.
  • Parallel Processing: Consumers within the same group divide the workload by consuming different partitions of a topic concurrently.
  • Load Balancing: Kafka ensures that partitions of a topic are evenly distributed across consumers in a group, allowing for efficient and scalable message processing.

Why Consumer Groups are Important

  1. Scalability: By distributing partitions across multiple consumers, Kafka can handle large volumes of data and support multiple consumers working in parallel.
  2. Fault Tolerance: If one consumer in a group fails, Kafka automatically rebalances partitions among the remaining consumers to ensure uninterrupted processing.
  3. Ordering: Consumers within the same group preserve message order within each partition they consume from, ensuring sequential processing of messages.

Example with a Simple Diagram

Let’s illustrate a consumer group Group-A with three consumers consuming messages from a topic orders with three partitions.

Kafka Topic (orders):
+-----------------------------+
| Partition 0 | Partition 1 |
|-------------|---------------|
| Consumer-A | Consumer-B |
| Consumer Group: Group-A |
| |
| Partition 2 |
|-------------| |
| Consumer-C | |
| Consumer Group: Group-A |
+-----------------------------+

Producer:
+-----------+ +-------------+
| Producer | -----> | Kafka Broker|
+-----------+ +-------------+

Consumers (Consumer Group: Group-A):
+-----------+ +-------------+
| Consumer-A| <----- | Kafka Broker|
+-----------+ +-------------+

+-----------+ +-------------+
| Consumer-B| <----- | Kafka Broker|
+-----------+ +-------------+

+-----------+ +-------------+
| Consumer-C| <----- | Kafka Broker|
+-----------+ +-------------+

Integrating Apache Kafka with Spring Boot can enable your applications to produce and consume messages efficiently.

Below is a step-by-step guide on how to integrate Apache Kafka with a Spring Boot application.

Step 1: Set Up Apache Kafka

  1. Download Apache Kafka: Download the latest version of Kafka from the Apache Kafka website.
  2. Extract the Kafka package: Extract the downloaded Kafka package to your desired location.
  3. Start Zookeeper: Kafka requires Zookeeper to manage its cluster metadata. Run the following command to start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties

4. Start Kafka Server: Once Zookeeper is up and running, start the Kafka server using:

bin/kafka-server-start.sh config/server.properties

Step 2: Create a Spring Boot Application

  1. Initialize a Spring Boot Project: Use Spring Initializr to create a new Spring Boot project with the following dependencies:
  • Spring Web
  • Spring for Apache Kafka

2. Add Dependencies: Ensure your pom.xml or build.gradle includes the necessary dependencies for Kafka.

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
</dependencies>

Step 3: Configure Kafka Properties

  1. Create a configuration file: Add application.yml or application.properties with the necessary Kafka configurations.
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
group-id: my-group
auto-offset-reset: earliest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer

Step 4: Create Kafka Producer

  1. Create a Kafka Producer Service: Implement a service to send messages to a Kafka topic.
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;

@Service
public class KafkaProducer {

private static final String TOPIC = "my-topic";

@Autowired
private KafkaTemplate<String, String> kafkaTemplate;

public void sendMessage(String message) {
kafkaTemplate.send(TOPIC, message);
}
}

2. Expose an endpoint to produce messages: Create a REST controller to send messages to the Kafka topic.

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class KafkaController {

@Autowired
private KafkaProducer kafkaProducer;

@GetMapping("/send")
public String sendMessageToKafka(@RequestParam("message") String message) {
kafkaProducer.sendMessage(message);
return "Message sent to Kafka";
}
}

Step 5: Create Kafka Consumer

  1. Create a Kafka Consumer Service: Implement a service to consume messages from a Kafka topic.
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;

@Service
public class KafkaConsumer {

@KafkaListener(topics = "my-topic", groupId = "my-group")
public void listen(String message) {
System.out.println("Received message: " + message);
}
}

Step 6: Run the Application

  1. Start your Spring Boot application: Run the Spring Boot application using your IDE or the command line.
mvn spring-boot:run

2. Send a message: Open a browser or use a tool like curl or Postman to send a message to the Kafka topic.

curl "http://localhost:8080/send?message=Hello, Kafka!"3.

3. Check the consumer logs: Verify that the message is received by the consumer. You should see the message printed in the console where the Spring Boot application is running.

Now lets see some of the important Interview questions

1. What is Apache Kafka, and what are its primary use cases?

Answer: Apache Kafka is a distributed streaming platform designed to handle large volumes of real-time data. It is used for building real-time data pipelines and streaming applications. Primary use cases include:

  • Real-time data streaming for event sourcing.
  • Building data pipelines for ETL processes.
  • Log aggregation.
  • Real-time monitoring and analytics.

2. How do you integrate Apache Kafka with Spring Boot?

Answer: Integration involves several steps:

  • Adding Kafka dependencies in the Spring Boot project.
  • Configuring Kafka properties in application.properties or application.yml.
  • Implementing Kafka producers and consumers using Spring Kafka annotations like @KafkaListener for consumers.
  • Optionally, creating configuration classes to define producer and consumer factories.

3. What dependencies are required for integrating Kafka with a Spring Boot application?

Answer: You need to include the spring-boot-starter-web and spring-kafka dependencies in your pom.xml or build.gradle.

Maven

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
</dependencies>

4. How do you configure Kafka properties in a Spring Boot application?

Answer: Kafka properties are configured in the application.properties or application.yml file. Here is an example configuration:

spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
group-id: my-group
auto-offset-reset: earliest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer

5. How do you create a Kafka producer in a Spring Boot application?

Answer:

  • Inject KafkaTemplate into a service.
  • Use the KafkaTemplate to send messages to a Kafka topic.

Example:

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;

@Service
public class KafkaProducer {

private static final String TOPIC = "my-topic";

@Autowired
private KafkaTemplate<String, String> kafkaTemplate;

public void sendMessage(String message) {
kafkaTemplate.send(TOPIC, message);
}
}

6. How do you create a Kafka consumer in a Spring Boot application?

Answer:

  • Use the @KafkaListener annotation to create a method that listens to messages from a Kafka topic.

Example:

import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;

@Service
public class KafkaConsumer {

@KafkaListener(topics = "my-topic", groupId = "my-group")
public void listen(String message) {
System.out.println("Received message: " + message);
}
}

7. How can you ensure message ordering in Kafka?

Answer: Message ordering is ensured by Kafka at the partition level. All messages sent to a particular partition are ordered. To guarantee ordering, ensure that messages related to a particular key are sent to the same partition. This can be controlled by setting a partition key when producing messages.

8. What are the benefits of using Kafka with Spring Boot?

Answer: The benefits include:

  • Simplified configuration and setup.
  • Seamless integration with Spring’s dependency injection and configuration management.
  • Built-in support for creating producers and consumers.
  • Enhanced productivity with annotations like @KafkaListener.

9. How do you handle Kafka serialization and deserialization in Spring Boot?

Answer: Kafka serializers and deserializers convert data to and from byte arrays. Spring Kafka provides out-of-the-box serializers and deserializers for common types like String and ByteArray. For custom types, you can implement your own serializer and deserializer.

Custom Serializer Example:

import org.apache.kafka.common.serialization.Serializer;

public class CustomSerializer implements Serializer<CustomObject> {
@Override
public byte[] serialize(String topic, CustomObject data) {
// Implement serialization logic
}
}

Custom Deserializer Example:

import org.apache.kafka.common.serialization.Deserializer;

public class CustomDeserializer implements Deserializer<CustomObject> {
@Override
public CustomObject deserialize(String topic, byte[] data) {
// Implement deserialization logic
}
}

Configuring Custom Serializer/Deserializer:

spring:
kafka:
consumer:
value-deserializer: com.example.CustomDeserializer
producer:
value-serializer: com.example.CustomSerializer

10. How do you ensure fault-tolerance in a Kafka-Spring Boot integration?

Answer:

  • Use Kafka’s built-in replication feature to replicate messages across multiple brokers.
  • Handle exceptions and retries in your consumer logic.
  • Use Kafka’s consumer group feature to distribute the load among multiple consumers.
  • Implement idempotent message processing to avoid duplicates in case of retries.

11. Can a Single Consumer Read from Multiple Topics?

Answer: Yes, a single consumer can subscribe to multiple topics.

12. How Many Consumer Groups Can a Kafka Topic Have?

Answer: There is no set limit on the number of consumer groups that can read from a Kafka topic. Each consumer group operates independently.

13. How Many Topics Can a Kafka Consumer Handle?

Answer: A Kafka consumer can handle as many topics as necessary, provided the consumer can manage the data load and the network and system resources are sufficient.

14. Can Two Consumers Consume from the Same Partition at the Same Time in Kafka?

Answer: No, within the same consumer group, only one consumer can consume from a partition at a time. Consumers in different consumer groups can read from the same partition independently.

15. What if the Partition is Full?

Answer: Kafka uses a retention policy based on time or size. If a partition is full, older messages are deleted based on the configured retention policy.

16. What Happens if There are Too Many Partitions?

Answer:

  • Increased management overhead.
  • Potential strain on Zookeeper.
  • More file descriptors and memory usage.
  • Higher latency due to increased leader election time and replication.

17. What if Kafka is Full?

Answer: If Kafka runs out of disk space, it stops accepting new messages. Producers will receive exceptions indicating that the broker is unable to handle the request.

18. What is the Maximum Partition per Topic?

Answer: There is no hard limit, but practical limits are determined by the broker’s capability and configuration. Managing very large numbers of partitions can strain resources.

19. What if Kafka Consumer is Down?

Answer: If a Kafka consumer is down, the group coordinator reassigns the partitions to other consumers in the same group. If no consumers are available, the partitions remain unconsumed until a consumer becomes available.

20. Difference Between Partition and Segment?

Answer:

  • Partition: Logical division of a topic, allowing for parallel processing and scalability.
  • Segment: Physical files within a partition, where messages are stored sequentially.

21. Difference Between Kafka Topic and Kafka Cluster?

Answer:

  • Kafka Topic: Logical channel to which messages are sent and from which messages are read.
  • Kafka Cluster: Group of Kafka brokers working together, typically managed with Zookeeper.

21. How to Handle Duplicates in Kafka?

Answer:

  • Idempotent Producers: Use enable.idempotence=true to ensure exactly-once semantics.
  • Deduplication on Consumer Side: Maintain a set of processed message keys.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("enable.idempotence", "true");
KafkaProducer<String, String> producer = new KafkaProducer<>(props);

22. Can Multiple Consumers Read from the Same Partition in Kafka?

Answer: No, within the same consumer group, only one consumer can read from a partition at a time.

23. Can Two Consumers from the Same Consumer Group Read from the Same Partition of a Topic?

Answer: No, Kafka ensures that each partition is consumed by only one consumer within a consumer group.

24. What Happens if There Are More Consumers Than Partitions in Kafka?

Answer: Some consumers will be idle since there are not enough partitions to assign one to each consumer.

25. How Many Partitions Should a Kafka Topic Have?

Answer: It depends on the expected throughput and parallelism. More partitions increase parallelism but also increase overhead.

26. How Does Kafka Consumer Know Which Partition to Consume?

Answer: Kafka’s group coordinator assigns partitions to consumers in a group.

27. Can Kafka Consumer Read from Multiple Clusters?

Answer: Yes, but it requires setting up multiple consumers, each configured for a different cluster.

28. Can Kafka Producer Write to Multiple Partitions?

Answer: Yes, the producer can write to different partitions of a topic based on the partition key.

29. Can Two Producers Write to the Same Partition in Kafka?

Answer: Yes, multiple producers can write to the same partition.

30. Can Kafka Consumer Read the Same Message?

Answer: Yes, if they are in different consumer groups.

31. What Are the Disadvantages of Using Kafka?

Answer:

  • Complexity in setup and maintenance.
  • Latency due to replication and consistency settings.
  • Potential for data loss in certain configurations.

32. What is the Downside of Using Too Many Partitions?

Answer:

  • Increased management overhead.
  • Higher resource consumption (memory, file handles).
  • Potential performance degradation.

33. Use of Multiple Partitions in Kafka?

Answer:

  • Increased parallelism and throughput.
  • Load balancing across consumers.

34. Benefits of Partitioning in Kafka?

Answer:

  • Scalability: More consumers can process data in parallel.
  • Fault tolerance: Data is replicated across partitions.

35. Can the Number of Partitions Be Reduced for a Kafka Topic?

Answer: No, Kafka does not support reducing the number of partitions for a topic once it has been created.

36. Do Kafka Messages with the Same Key Go to the Same Partition?

Answer: Yes, messages with the same key are always sent to the same partition.

37. How Are Partitions Assigned to Consumers?

Answer: Kafka uses a partition assignment strategy (e.g., range, round-robin) to assign partitions to consumers.

Keep Learning

Be Happy :)

--

--

No responses yet