thetechnicalinterview.com - What is Kafka and Why should you care?

Welcome to another episode of the technical interview, where we make complex tech topics. Easy to understand.

Today we are going to talk about one of the hottest tech topics out there, no it’s not AI it’s Kafka.

Kafka was first created by LinkedIn to fix their data logging problems. Logging needs to be super fast so it doesn’t slow everything else down. If logging itself turns into a bottleneck, then it’s basically useless and Kafka was born.

To put it simply. Kafka stores a list of data. Think of it as a long list of data which we call events or messages, ordered by the time they were written. You can only append this list. In other words, you can only add to the end of this list. Moreover, you can’t modify or delete existing messages.

We call the data that is stored: events. Every time you write to Kafka you are writing an event. These events are stored within topics. Topics are a way to organize events. Let’s look at a real world example. A user creating an account is an event which can be written to new user topic. In other words, a topic which in this case is called “new users” stores similar events together. Topics are used to categorize events. The act of creating writing an event is called producing, and act of reading that event and doing something with it is consuming. So an event is written by a producer and read by a consumer.

The data stored in Kafka, can be short lived or long lived. It can be stored for minutes, hours, or even decades.

Nowadays Kafka is one of the hottest tools used to communicate between two or more applications in realtime. Many applications can talk to each other in real time using Kafka.

You can also use Kafka to process data stored within traditional application such as database. This can be done using Kafka Connect. For example you can use Kafka Connect to update or process the data that stored in a database. The Connect would stream the data to a Kafka topic which can be consumed by a service or application to be processed. The processed data can be written back to the database using the Kafka Connect as well.

So many of you are wondering why not use a database, well the amazing thing about Kafka is that it can be distributed easily over a large number of devices. Unlike traditional database which are hard to scale Kafka is easily scaled. Keep in mind Kafka is not a replacement for databases, it’s a way for multiple applications or services to communicate.

Let’s look at a real world examples of Kafka.

One amazing use case for Kafka would be an emailing service which would email the user to inform them of an event. Let’s take a shopping application as an example. In this case, the shopping platform acts as a producer, generating events that are written to Kafka topics. Example of these topics can be new purchase or delivery update. An email service would act as consumer, consuming the events and would send thank you or delivery update emails to the customer.

Thanks for listening to this episode of the technical interview about Kafka. Please share this podcast with other people who are preparing for their interviews or would like to learn more about various technology topics.

Thank you. Till next time.