I am still poking around Prometheus. So far I am liking it a lot.

In this article I am going to go over how to get it connected to JMX in Kafka/Zookeeper to start getting metrics out of those systems for monitoring.

I some of information for this article at https://blog.rntech.co.uk/2016/10/20/monitoring-apache-kafka-with-prometheus/ [1] and https://www.robustperception.io/monitoring-kafka-with-prometheus/ [2]

Be forewarned, this is going to be a long article as I poke at this tool to get it working the way I want it to.

Prometheus JMX Exporter

JXM Exporter located at https://github.com/prometheus/jmx_exporter/ [3] on GitHub. It is a lightweight http server that exposes JMX data as Prometheus compatible metrics that it can then scrape.

First go at it

I have a single Kafka server running. On that server I am going to download and run the JMX exporter

Download the jar file (Link obtained from here https://github.com/prometheus/jmx_exporter/#running )

> cd

> mkdir jmx_exporter

> cd jmx_exporter

> wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.9/jmx_prometheus_javaagent-0.9.jar

Create a configuration file in yaml format

> vi prometheus_kafkfa.yml

And place the following in it. (taken from https://github.com/prometheus/jmx_exporter/#configuration )

---

hostPort: 127.0.0.1:1234

jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:1234/jmxrmi

ssl: false

lowercaseOutputName: false

lowercaseOutputLabelNames: false

whitelistObjectNames: ["org.apache.cassandra.metrics:*"]

blacklistObjectNames: ["org.apache.cassandra.metrics:type=ColumnFamily,*"]

rules:

- pattern: "^org.apache.cassandra.metrics<type=(\w+), name=(\w+)><>Value: (\d+)"

value: $3

valueFactor: 0.001

labels: {}

help: "Cassandra metric $1 $2"

type: GAUGE

attrNameSnakeCase: false

Set the -javaagent option in the KAFKA_OPTS

In my kafka setup I am starting the kafka server with a script at /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh so I am going to tweak that and add a KAFKA_OPTS

> sudo vi /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh

And add the following to the top (change the JMX_DIR to your own location)

JMX_DIR="/home/patman/jmx_exporter"

export KAFKA_OPTS="$KAFKA_OPTS -javaagent:$JMX_DIR/jmx_prometheus_javaagent-0.9.jar=1234:$JMX_DIR/prometheus_kafkfa.yml"

Now start up Kafka (this is the command I use)

> sudo /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.11-0.10.1.0/config/server.properties

And I get an error.

Looks like I have an issue with my JMX Exporter config file.

Let me fix that, open up the yaml config file again.

> vi ~/jmx_exporter/prometheus_kafkfa.yml

I am going to place the following in it (which is a copy of the example config given here https://github.com/rama-nallamilli/kafka-prometheus-monitoring/blob/master/prometheus-jmx-exporter/confd/templates/kafka.yml.tmpl [4]

---

hostPort: 127.0.0.1:1234

ssl: false

lowercaseOutputLabelNames: false

lowercaseOutputName: true

rules:

- pattern : kafka.cluster<type=(.+), name=(.+), topic=(.+), partition=(.+)><>Value

labels:

topic: "$3"

partition: "$4"

- pattern : kafka.log<type=Log, name=(.+), topic=(.+), partition=(.+)><>Value

labels:

topic: "$2"

partition: "$3"

- pattern : kafka.controller<type=(.+), name=(.+)><>(Count|Value)

- pattern : kafka.network<type=(.+), name=(.+)><>Value

- pattern : kafka.network<type=(.+), name=(.+)PerSec, request=(.+)><>Count

labels:

request: "$3"

- pattern : kafka.network<type=(.+), name=(\w+), networkProcessor=(.+)><>Count

labels:

request: "$3"

type: COUNTER

- pattern : kafka.network<type=(.+), name=(\w+), request=(\w+)><>Count

labels:

request: "$3"

- pattern : kafka.network<type=(.+), name=(\w+)><>Count

- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count

labels:

topic: "$3"

- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*><>Count

type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>(Count|Value)

labels:

clientId: "$3"

topic: "$4"

partition: "$5"

- pattern : kafka.server<type=(.+), name=(.+), topic=(.+), partition=(.*)><>(Count|Value)

labels:

topic: "$3"

partition: "$4"

- pattern : kafka.server<type=(.+), name=(.+), topic=(.+)><>(Count|Value)

labels:

topic: "$3"

type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>(Count|Value)

labels:

clientId: "$3"

broker: "$4:$5"

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+)><>(Count|Value)

labels:

clientId: "$3"

- pattern : kafka.server<type=(.+), name=(.+)><>(Count|Value)

- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count

- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count

labels:

topic: "$4"

type: COUNTER

- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+), partition=(.+)><>Count

labels:

topic: "$4"

partition: "$5"

type: COUNTER

- pattern : kafka.(\w+)<type=(.+), name=(.+)><>(Count|Value)

type: COUNTER

- pattern : kafka.(\w+)<type=(.+), name=(.+), (\w+)=(.+)><>(Count|Value)

labels:

"$4": "$5"

Now start up Kafka again

> sudo /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.11-0.10.1.0/config/server.properties

Now try and curl the /metrics

> curl localhost:1234/metrics

Now I have lots and lots of data, probably more than I want.

So I am going to tweak the config file again and build it up from the basics to get what I want.

Visual VM

To make my life simpler I am going to get VisualVM talking to the JMX directly so I can see what variables are available.

I am going to edit my kafka start up script to add a few variable to make JMX available at a port.

> sudo vi /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh

And place the following in it.

export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote"

#Should retrieve local IP address

IP_ADDR=`ip route get 8.8.8.8 | awk '{print $NF; exit}'`

export KAFKA_OPTS="$KAFKA_OPTS -Djava.rmi.server.hostname=$IP_ADDR"

export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote.port=9090"

export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote.authenticate=false"

export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote.ssl=false"

Now start up Kafka again

> sudo /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.11-0.10.1.0/config/server.properties

Download Visual VM

Head over to https://visualvm.github.io/

And Download

Unzip it and start it up.

Accept the license

Add a JMX Connection.

Inter in the IP address and port (In my case the kafka server lives at 192.168.0.140) . Then click OK

Now you should have this.

Double Click on it.

Click on the monitor tab and you can now see stuff scrolling by.

Go to Tools -> Plugins

Select the Available Plugins tab and checkbox the VisualVM-MBeans and click Install.

Accept the license and click Install

Finish

Close the connection

Now you should have an MBeans tab

There are your beans J

Simpler Config File

Let me start very basic and get one piece of data from the JVM and one from Kakfa.

Here is a very simple yaml file.

---

lowercaseOutputName: true

But if I run with this it seems I get everything I can possible get.

How many variables?

Run this quick command to check

> curl -s localhost:1234/metrics | grep -v "^#" | wc -l

In my case it is 2,609 variables. That is far more than I need to start with.

Let me see if I can start simpler by using a whitelist.

Whitelist and rules

I think I found my process_cpu_seconds_total variable here in java.lang.OpertatingSystem { processCpuTime }

Let me see if I can narrow it down to that.

Here is my first go at it.

---

lowercaseOutputName: true

whitelistObjectNames: ["java.lang.OperatingSystem:*"]

Restarting Kafka and checking…

> curl -s localhost:1234/metrics | grep -v "^#" | wc -l

OK that took it down a bit. Now I am at 48 variables.

Although looking at the data in java.lang.OperatingSystem…

> curl -s localhost:1234/metrics | grep -v "^#"

I probably want most if not all of this data.

But to learn this tool better I want to try and narrow it down to one variable if at all possible.

Wait something is up if I change the yaml file to this.

---

lowercaseOutputName: true

whitelistObjectNames: ["java2222.lang.OperatingSystem:*"]

I get the same 48 metrics. Are those default metrics I get no matter what?

Maybe, for now, I should focus on just filtering out other MBeans.
Let me try to narrow the kafka data down to this section. This is where it gets tricky if you are not a JMX expert, and I am not a JMX expert. L

As a test all I want is the Count under kafka.server.BrokerTopic

To do this you need to pick apart four parts.

The first three parts can be used in the white list.

1. Domain

2. Type

3. Name

1. Domain

In this case the domain = kafka.server

2. Type

In this case type = BrokerTopicMetrics

3. Name

In this case the Name = BytesInPerSec

I can use these first three parts and create a whitelistObjectNames.

Here is a simple yaml file using that.

---

lowercaseOutputName: true

whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]

If I restart Kafka with this yaml file and run this curl command to grab the kafka data points.

> curl -s localhost:1234/metrics | grep -v "^#" | grep kafka

Here is what I got out

kafka_server_brokertopicmetrics_count{name="BytesInPerSec",} 0.0

kafka_server_brokertopicmetrics_oneminuterate{name="BytesInPerSec",} 0.0

kafka_server_brokertopicmetrics_meanrate{name="BytesInPerSec",} 0.0

kafka_server_brokertopicmetrics_fifteenminuterate{name="BytesInPerSec",} 0.0

kafka_server_brokertopicmetrics_fiveminuterate{name="BytesInPerSec",} 0.0

We get five data points. If you look at visualVM (which you may need to restart)

The five data points are coming from this section.

For example count => kafka_server_brokertopicmetrics_count{name="BytesInPerSec",} 0.0

How can I narrow it down further? Get it down to just the Count for example.

I need to use rules!

Here is an example of a rule that filters out the count. (It uses regex to grab the type, name, and attribute variables)

---

lowercaseOutputName: true

whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]

rules:

- pattern: kafka.server<type=(.+), name=(.+)><>(Count)

Here I have one rule, you can have more. If you have multiple rules they are applied in order, the first pattern that matches is used. If no pattern matches the attribute is not collected.

If I run with this config file I should get one single kafka data point.

> curl -s localhost:1234/metrics | grep -v "^#" | grep kafka

Here is what I got out

whiteboardcoder_kafka_server_brokertopicmetrics_bytesinpersec_count 0.0

I am still getting the other 48 default jvm data, not sure how to narrow that down, but as for kafka I narrowed it down to the one point!

Counters and gauges

If you look at the output of the kafka data point. You will see that it is set as a gauge.

What if I want it to be a counter? I can edit that in the yaml file, just add a counter type in a rule.

---

lowercaseOutputName: true

whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]

rules:

- pattern: kafka.server<type=(.+), name=(.+)><>(Count)

type: COUNTER

Now it comes out as a counter.

Multiple filter

Let me try something a little more fancy…

Every one of these has the same attributes.

I am going to use that to get data for Count and FifteenMinuteRate for every Name

Here is my yaml file.

---

lowercaseOutputName: true

whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=*,*"]

rules:

- pattern: kafka.server<type=(.+), name=(.+)><>(Count|FifteenMinuteRate)

type: GAUGE

I should end up with 16 variables

> curl -s localhost:1234/metrics | grep -v "^#" | grep kafka

And I do! Nice it worked!

A good place to read up on what you should be monitoring is here in the Kafka docs https://kafka.apache.org/documentation/#monitoring [5]

Above this section it says what we do graphing and alerting on.

I am going to try and create a yaml file to get just this data.

OK this one is close, but it is missing a few things. Here are the things it is missing.

· kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs

· kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs

· kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec

· kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)

They only reason I did not add these is because I cannot see them in the VisualVM on my machine. Probably due to the fact it is not set up to replicate (only a single kafka test instance)

With a single topic on the kafka cluster this gives me 123 kafka data points. That is a lot more manageable than what I had before with 2,500+.