Prometheus and JMX

Posted on Saturday, April 22, 2017


I am still poking around Prometheus.   So far I am liking it a lot. 

In this article I am going to go over how to get it connected to JMX in Kafka/Zookeeper to start getting metrics out of those systems for monitoring.



Be forewarned, this is going to be a long article as I poke at this tool to get it working the way I want it to.






Prometheus JMX Exporter


JXM Exporter located at https://github.com/prometheus/jmx_exporter/ [3] on GitHub.  It is a lightweight http server that exposes JMX data as Prometheus compatible metrics that it can then scrape.





First go at it


I have a single Kafka server running.  On that server I am going to download and run the JMX exporter

Download the jar file  (Link obtained from here https://github.com/prometheus/jmx_exporter/#running )


  > cd
  > mkdir jmx_exporter
  > cd jmx_exporter
  > wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.9/jmx_prometheus_javaagent-0.9.jar


Create a configuration file in yaml format


  > vi prometheus_kafkfa.yml


And place the following in it.  (taken from https://github.com/prometheus/jmx_exporter/#configuration )


---
hostPort: 127.0.0.1:1234
jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:1234/jmxrmi
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
whitelistObjectNames: ["org.apache.cassandra.metrics:*"]
blacklistObjectNames: ["org.apache.cassandra.metrics:type=ColumnFamily,*"]
rules:
  - pattern: "^org.apache.cassandra.metrics<type=(\w+), name=(\w+)><>Value: (\d+)"
    name: cassandra_$1_$2
    value: $3
    valueFactor: 0.001
    labels: {}
    help: "Cassandra metric $1 $2"
    type: GAUGE
    attrNameSnakeCase: false







Set the -javaagent option in the KAFKA_OPTS

In my kafka setup I am starting the kafka server with a script at /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh  so I am going to tweak that and add a KAFKA_OPTS


  > sudo vi /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh


And add the following to the top  (change the JMX_DIR to your own location)


JMX_DIR="/home/patman/jmx_exporter"
export KAFKA_OPTS="$KAFKA_OPTS -javaagent:$JMX_DIR/jmx_prometheus_javaagent-0.9.jar=1234:$JMX_DIR/prometheus_kafkfa.yml"







Now start up Kafka (this is the command I use)


  > sudo /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.11-0.10.1.0/config/server.properties




 

And I get an error.

 



Looks like I have an issue with my JMX Exporter config file.



Let me fix that, open up the yaml config file again.


  > vi ~/jmx_exporter/prometheus_kafkfa.yml


I am going to place the following in it (which is a copy of the example config given here https://github.com/rama-nallamilli/kafka-prometheus-monitoring/blob/master/prometheus-jmx-exporter/confd/templates/kafka.yml.tmpl [4]



---
hostPort: 127.0.0.1:1234
ssl: false
lowercaseOutputLabelNames: false
lowercaseOutputName: true
rules:
- pattern : kafka.cluster<type=(.+), name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_cluster_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.log<type=Log, name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_log_$1
  labels:
    topic: "$2"
    partition: "$3"
- pattern : kafka.controller<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_controller_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)><>Value
  name: kafka_network_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)PerSec, request=(.+)><>Count
  name: kafka_network_$1_$2_total
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+), networkProcessor=(.+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
  type: COUNTER
- pattern : kafka.network<type=(.+), name=(\w+), request=(\w+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+)><>Count
  name: kafka_network_$1_$2
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_server_$1_$2_total
  labels:
    topic: "$3"
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_server_$1_$2_total
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    topic: "$4"
    partition: "$5"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    broker: "$4:$5"
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
- pattern : kafka.server<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_server_$1_$2

- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_$1_$2_$3_total
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+), partition=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
    partition: "$5"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$4
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+), (\w+)=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$6
  labels:
    "$4": "$5"


Now start up Kafka again


  > sudo /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.11-0.10.1.0/config/server.properties


Now try and curl the /metrics


  > curl localhost:1234/metrics




 

Now I have lots and lots of data, probably more than I want.

So I am going to tweak the config file again and build it up from the basics to get what I want.






Visual VM


To make my life simpler I am going to get VisualVM talking to the JMX directly so I can see what variables are available.

I am going to edit my kafka start up script to add a few variable to make JMX available at a port.


  > sudo vi /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh


And place the following in it.


export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote"
#Should retrieve local IP address
IP_ADDR=`ip route get 8.8.8.8 | awk '{print $NF; exit}'`
export KAFKA_OPTS="$KAFKA_OPTS -Djava.rmi.server.hostname=$IP_ADDR"
export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote.port=9090"
export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
export KAFKA_OPTS="$KAFKA_OPTS -Dcom.sun.management.jmxremote.ssl=false"





Now start up Kafka again


  > sudo /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.11-0.10.1.0/config/server.properties







Download Visual VM






And Download

Unzip it and start it up.






Accept the license




Add a JMX Connection.








Inter in the IP address and port  (In my case the kafka server lives at 192.168.0.140) .  Then click OK







Now you should have this.
Double Click on it.





Click on the monitor tab and you can now see stuff scrolling by.




 


Go to Tools -> Plugins

 


Select the Available Plugins tab and checkbox the VisualVM-MBeans and click Install.


 



Next



 


Accept the license and click Install


 


Finish






 
Close the connection



 
Now you should have an MBeans tab


 



There are your beans J




Simpler Config File


Let me start very basic and get one piece of data from the JVM and one from Kakfa.

Here is a very simple yaml file.


---
lowercaseOutputName: true


But if I run with this it seems I get everything I can possible get.



How many variables?


Run this quick command to check


  > curl -s localhost:1234/metrics | grep -v "^#" | wc -l





In my case it is 2,609 variables.  That is far more than I need to start with.

Let me see if I can start simpler by using a whitelist.






Whitelist and rules





I think I found my process_cpu_seconds_total variable here in java.lang.OpertatingSystem { processCpuTime }

Let me see if I can narrow it down to that.



Here is my first go at it.


---
lowercaseOutputName: true
whitelistObjectNames: ["java.lang.OperatingSystem:*"]


Restarting Kafka and checking…


  > curl -s localhost:1234/metrics | grep -v "^#" | wc -l



OK that took it down a bit.  Now I am at 48 variables.



Although looking at the data in java.lang.OperatingSystem…


  > curl -s localhost:1234/metrics | grep -v "^#"




I probably want most if not all of this data.


But to learn this tool better I want to try and narrow it down to one variable if at all possible.


Wait something is up if I change the yaml file to this.


---
lowercaseOutputName: true
whitelistObjectNames: ["java2222.lang.OperatingSystem:*"]



I get the same 48 metrics.   Are those default metrics I get no matter what?

Maybe, for now, I should focus on just filtering out other MBeans.
Let me try to narrow the kafka data down to this section.  This is where it gets tricky if you are not a JMX expert, and I am not a JMX expert. L




As a test all I want is the Count under kafka.server.BrokerTopic


To do this you need to pick apart four parts.

The first three parts can be used in the white list.

1.      Domain
2.      Type
3.      Name




1.      Domain




In this case the domain = kafka.server




2.      Type



In this case type = BrokerTopicMetrics 



3.      Name



In this case the Name = BytesInPerSec



I can use these first three parts and create a whitelistObjectNames.

Here is a simple yaml file using that.


---
lowercaseOutputName: true
whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]





If I restart Kafka with this yaml file and run this curl command to grab the kafka data points.


  > curl -s localhost:1234/metrics | grep -v "^#" | grep kafka


Here is what I got out

kafka_server_brokertopicmetrics_count{name="BytesInPerSec",} 0.0
kafka_server_brokertopicmetrics_oneminuterate{name="BytesInPerSec",} 0.0
kafka_server_brokertopicmetrics_meanrate{name="BytesInPerSec",} 0.0
kafka_server_brokertopicmetrics_fifteenminuterate{name="BytesInPerSec",} 0.0
kafka_server_brokertopicmetrics_fiveminuterate{name="BytesInPerSec",} 0.0




We get five data points.  If you look at visualVM (which you may need to restart)



The five data points are coming from this section.



For example count =>  kafka_server_brokertopicmetrics_count{name="BytesInPerSec",} 0.0


How can I narrow it down further?  Get it down to just the Count for example.

I need to use rules!

Here is an example of a rule that filters out the count.   (It uses regex to grab the type, name, and attribute variables)


---
lowercaseOutputName: true
whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]
rules:
  - pattern: kafka.server<type=(.+), name=(.+)><>(Count)
    name: WHITEBOARDCODER_kafka_server_$1_$2_$3




Here I have one rule, you can have more.    If you have multiple rules they are applied in order, the first pattern that matches is used.  If no pattern matches the attribute is not collected.

If I run with this config file I should get one single kafka data point.




  > curl -s localhost:1234/metrics | grep -v "^#" | grep kafka


Here is what I got out

whiteboardcoder_kafka_server_brokertopicmetrics_bytesinpersec_count 0.0



 
I am still getting the other 48 default jvm data, not sure how to narrow that down, but as for kafka I narrowed it down to the one point!




Counters and gauges


If you look at the output of the kafka data point.  You will see that it is set as a gauge.



What if I want it to be a counter?   I can edit that in the yaml file, just add a counter type in a rule.


---
lowercaseOutputName: true
whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*"]
rules:
  - pattern: kafka.server<type=(.+), name=(.+)><>(Count)
    name: WHITEBOARDCODER_kafka_server_$1_$2_$3
    type: COUNTER



Now it comes out as a counter.





Multiple filter


Let me try something a little more fancy…



Every one of these has the same attributes.





I am going to use that to get data for Count and FifteenMinuteRate for every Name


Here is my yaml file.


---
lowercaseOutputName: true
whitelistObjectNames: ["kafka.server:type=BrokerTopicMetrics,name=*,*"]
rules:
  - pattern: kafka.server<type=(.+), name=(.+)><>(Count|FifteenMinuteRate)
    name: WHITEBOARDCODER_kafka_server_$1_$2_$3
    type: GAUGE



I should end up with 16 variables


  > curl -s localhost:1234/metrics | grep -v "^#" | grep kafka



 
And I do!  Nice it worked!




A good place to read up on what you should be monitoring is here in the Kafka docs https://kafka.apache.org/documentation/#monitoring [5]





Above this section it says what we do graphing and alerting on.

I am going to try and create a yaml file to get just this data.


OK this one is close, but it is missing a few things.  Here are the things it is missing.

·         kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
·         kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
·         kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
·         kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)

They only reason I did not add these is because I cannot see them in the VisualVM on my machine. Probably due to the fact it is not set up to replicate (only a single kafka test instance)


With a single topic on the kafka cluster this gives me 123 kafka data points.  That is a lot more manageable than what I had before with 2,500+.




References


[1]        Monitoring Apache Kafka with Prometheus
[2]        Monitoring Kafka with Prometheus
[3]        JMX Exporter github page
[4]        JMX Exporter github page  (Example yaml file)
[5]        Kafka Docs Monitoring


3 comments: