In this article I am going to be going over some quick explanation on how Kafka works and then go install it on an Ubuntu 16.04 Server and run a few basic commands to make sure it's working.

What does Kafka do?

First what is Kafka and why would I want it?

From Wikipedia

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log,"[3] making it highly valuable for enterprise infrastructures to process streaming data.

https://en.wikipedia.org/wiki/Apache_Kafka [1]

OK, fantastic what does that mean?

A good place to start educating yourself on Kafka is their own documentation http://kafka.apache.org/documentation/ [2]

But before you dive into that, let me go over a simple "How Kafka Works" example.

Records, Topics and Partitions

In Kafka information is stored as a Record. A Record contains three pieces of information.

1 1. Value:
The stored message (messages are typically small ~10KB)

2 2. Key:
An optional key can be associated with a record

3 3. Timestamp:
As of version 0.10.0 Records now include a timestamp

Records are written sequentially to a Partition of a Topic.

This image shows the anatomy of a Topic that contains a single Partition. As records come in they are appended to the end of the partition. This Forms an immutable sequence of records. If I add one more record to the above the new record would be appended to the end.

This shows that the newly added record is added to the end. When a record is added to a partition it is given a sequential id number called the offset. In this case the newly added record has an offset number of 6.

Lifecycle of a Record?

How long does a Record stay around? That depends.

In the server.properties file there are a few settings that determine how long a record stays in a partition.

Name	Description	Type	Default
log.retention.hours	The number of hours to keep a log file before deleting it (in hours), tertiary to log.retention.ms property	int	168
log.retention.bytes	The maximum size of the log before deleting it	long	-1 (no size limit)

There are the basic properties that set the retention rules for a record. The default settings is to remove a record if it is older than 168 hours (7 days). You can also set a byte size limit if you do when the size of the partition exceeds that size records will be removed from the front of the partition to get the total under the size.

For example…

In this Topic with one partition 6 records have been written. The first two records where written on day 1 and the rest on day 5.

Eight days later if we look at the partition we will see that the first two records have been removed. They were removed based on the log.retention.hours in the server settings. Everything 7 days or older is removed.

Reading from a Topic

When a consumer starts up it subscribes to a Topic.

Although a new consumer can read every message in a topic it is more typical to subscribe to a topic and wait for new records to be sent to it.

When a new record is added to the Topic it sends the new record to all Consumers attached to that Topic/Partition. In this example the Consumer a subscribed to this Topic after Record '4' was added. No records were sent to the Consumer until the next record, Record '5', was added to the topic.

As long as this consumer is attached all records added to this Topic/Partition will be sent to it.

Multiple Consumers can be attached to the same Topic/Partition.

OK now that gives us some basics. With that in mind I am going to install Kafka on Ubuntu 16.04 and do a few tests.

Installing Kafka on Ubuntu 16.04

I have a basic Ubuntu 16.04 server installed.

Install Oracle java 1.8

You need Java installed on the machine and I prefer installing Oracles Java vs the OpenJDK.

Run the following command to install it.

> sudo echo oracle-java8-installer \

shared/accepted-oracle-license-v1-1 select true | \

sudo /usr/bin/debconf-set-selections

> sudo echo \

"deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | \

sudo tee /etc/apt/sources.list.d/webupd8team-java.list

> sudo echo \

"deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | \

sudo tee -a /etc/apt/sources.list.d/webupd8team-java.list

> sudo apt-key adv --keyserver \

hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886

> sudo apt-get update

> sudo apt-get -y install oracle-java8-installer

Now check the java version

> java -version

Install Zookeeper

Now we need to install Zookeeper. I am not a zookeeper guy… yet but it's needed for a Kafka install.

> sudo apt-get install zookeeperd

Test to make sure it's up

> netstat -ant | grep :2181

This is the results you want.

Install Kafka

Here is Kafka's Download page https://kafka.apache.org/downloads.html [3]
This is where I found the URL to download.

> wget \

http://apache.cs.utah.edu/kafka/0.10.1.0/kafka_2.11-0.10.1.0.tgz

Make a director for kafka and untar it.

> sudo mkdir /opt/kafka

> sudo tar -xvf kafka_2.11-0.10.1.0.tgz -C /opt/kafka

Try it out real quick to make sure it runs.

> sudo /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh \

/opt/kafka/kafka_2.11-0.10.1.0/config/server.properties

Looks good.

Leave it running and use the Kafka-console tools to talk to it.

These tools are located at

/opt/kafka/kafka_2.11-0.10.1.0/bin/

I am going to set up some simple scripts to make it simpler to run these command.

> sudo vi /bin/kafka-topics

And place the following in it

#!/bin/bash

exec "/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-topics.sh" "$@"

Make it executable

> sudo chmod 755 /bin/kafka-topics

Let me do the same thing for kafka-console-consumer

> sudo vi /bin/kafka-console-consumer

And place the following in it

#!/bin/bash

exec "/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-console-consumer.sh" "$@"

Make it executable

> sudo chmod 755 /bin/kafka-console-consumer

Let me do the same thing for kafka-console-producer

> sudo vi /bin/kafka-console-producer

And place the following in it

#!/bin/bash

exec "/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-console-producer.sh" "$@"

Make it executable

> sudo chmod 755 /bin/kafka-console-producer

Creating a Topic

In Kafka you post messages to topics. Currently you have no topic set up. To prove this, run this command.

> kafka-topics --zookeeper localhost:2181 --list

You should get nothing returned

Now create a topic

> kafka-topics --create \

--zookeeper localhost:2181 \

--replication-factor 1 \

--partitions 1 \

--topic "topic-one"

For this simple example I will not go into multiple Partitions or Replication-factor.
And now list all topics again.

> kafka-topics --zookeeper localhost:2181 --list

There it is…

Send Message to Topic

In another terminal start a producer

> kafka-console-producer --broker-list \

localhost:9092 --topic topic-one

Then in one terminal start a consumer and listen to the topic

> kafka-console-consumer --bootstrap-server \

localhost:9092 --topic topic-one

Now on the producer side type in some messages. Each time you hit return it will send the line you typed.

Messages produced are consumed on the other side

While I am at it let me add another consumer

> kafka-console-consumer --bootstrap-server \

localhost:9092 --topic topic-one

Also you can feed it an entire file.

Let me create a test file

> vi /tmp/test.txt

And place the following in it

Line 01 This is line 1

Line 02 each line becomes a record

Line 03 That is how the console-producer works

Line 04 just to show you

Now run this command to feed the test.txt file into the Kafka Topic

> kafka-console-producer --broker-list \

localhost:9092 --topic topic-one < /tmp/test.txt

Each line of the file becomes a record. That is the way the console-producer works.

You could just pipe the info

> cat /tmp/test.txt | kafka-console-producer --broker-list \

localhost:9092 --topic topic-one

Or you can use this file to tail a file.

> tail -f -n +1 /tmp/test.txt | kafka-console-producer \

--broker-list localhost:9092 --topic topic-one

Now just append to the /tmp/test.txt file and see its messages get sent.

> echo "APPEND ME" >> /tmp/test.txt

There you go a very basic overview on very basic Kafka Topic with one partition.

(More to come as I do more research)

References

[1] Kafka Wikipedia page

https://en.wikipedia.org/wiki/Apache_Kafka

Accessed 12/2016

[2] Kafka Download page

http://kafka.apache.org/documentation/

Accessed 12/2016

[3] Kafka Download page

https://kafka.apache.org/downloads.html

Accessed 12/2016

Kafka: Getting Started

Posted on Thursday, December 22, 2016

What does Kafka do?

Records, Topics and Partitions

Lifecycle of a Record?

Reading from a Topic

Installing Kafka on Ubuntu 16.04

Install Oracle java 1.8

Install Zookeeper

Install Kafka

Creating a Topic

Send Message to Topic

References

No comments:

Post a Comment