Getting Sensu to send alerts to pagerduty

Posted on Tuesday, November 25, 2014


This guide will go over setting up Sensu to trigger pagerduty alerts (services).

This guide assumes you have a pagerduty account and are familiar with API key service alerts.  If you don't know how to set up pagerduty I have posted a few guides on the subject at http://www.whiteboardcoder.com/2014/08/pagerduty-getting-started.html  and http://www.whiteboardcoder.com/2014/10/pagerduty-api-service-alert.html

It also assumes you have a Sensu Master/client set up.





Another guide I found on this same subject can be found at http://www.pagerduty.com/docs/guides/sensu-integration-guide/ [1]
This guide is quick and simple but a little outdated.  For example it says Sensu will not resolve the pagerduty incident once triggered, that is not longer true (with the code they are using)





Installing redphone gem


There is a gem called redphone its github page is at https://github.com/portertech/redphone [2].  It is used to support pagerduty, pingdom, loggly, and statusPage.  All the API request are done over SSL.


To install this gem run


    > sudo gem install redphone







To list your installed gems run this command


    > gem list






Before I get too far…


Before I get too far into this I want to use the redphone gem and create a simple ruby script trigger a pagerduty alert.

Head over to http://www.pagerduty.com/ and login to your account.







I am going to create a new temporary API service to use for this test.  Here is how to go about that.




Click on Services






Click on +Add New Service








Give it a name, select an Escalation Policy.   Set the Integration type to Use our API directly and click Add service.





Copy the Service API Key.  For purpose of this write up I will use  999XXXXXXXXXXb1 as my key.

 


A simple ruby Script


OK I have my API token now I want to write a simple ruby script to
Trigger the pagerduty alert. This script in no way will use Sensu, but it will use the Red Phone gem I installed.




    > vi pagerduty_test.ruby


Here is my code  (change the api_key to your own)


#!/usr/bin/ruby

require 'redphone/pagerduty'

api_key = '999XXXXXXXXXXb1'

response = Redphone::Pagerduty.trigger_incident(
            :service_key => api_key,
            :description => "This is a Test Alert from Patrick"
          )

if response['status'] == 'success'
  puts "pagerduty Alert issued!"
else
  puts "Error issusing pagerduty '" + response "'"
end




Save it then make the file executable


    > chmod u+x pagerduty_test.ruby



Run the program


    > ./pagerduty_test.ruby





It successfully triggered my alert wahoo!

Now to get to work with Sensu…






Pagerduty Sensu Handler


Now that I have figured out how to use the RedPhone gem I need to figure out Sensu Handlers.

The Sensu doc page for this is located at http://sensuapp.org/docs/0.16/handlers [2]

Looking over this page there are several different types of handlers,  pipe, TCP, UDP, AMQP, and Sets.

For my near term purposes I think I only really need pipe and sets. 
Pipe handlers execute a script and pass the event in via STDIN.
Sets are used for grouping handlers.  It’s a way to send the same event to several handlers at the same time.  For example if you want an event to two different Pipe handlers, one which sends a message to HipChat and one that sends an email, you can use a set handler.   I am not going to use a Set handler in this document, but I thought it worth mentioning.

I found this github repo https://github.com/sensu/sensu-community-plugins/tree/master/handlers/notification [3] which contains several Sensu Handlers you can just copy and use.  The pagerduty handler can be found at https://github.com/sensu/sensu-community-plugins/blob/master/handlers/notification/pagerduty.rb   [4]


Here is the code (as it was on 11/17/2014)



#!/usr/bin/env ruby
#
# This handler creates and resolves PagerDuty incidents, refreshing
# stale incident details every 30 minutes
#
# Copyright 2011 Sonian, Inc <chefs@sonian.net>
#
# Released under the same terms as Sensu (the MIT license); see LICENSE
# for details.
#
# Dependencies:
#
#   sensu-plugin >= 1.0.0
#

require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-handler'
require 'redphone/pagerduty'

class Pagerduty < Sensu::Handler

  def incident_key
    source = @event['check']['source'] || @event['client']['name']
    [source, @event['check']['name']].join('/')
  end

  def handle
    if @event['check']['pager_team']
      api_key = settings['pagerduty'][@event['check']['pager_team']]['api_key']
    else
      api_key = settings['pagerduty']['api_key']
    end
    begin
      timeout(10) do
        response = case @event['action']
        when 'create'
          Redphone::Pagerduty.trigger_incident(
            :service_key => api_key,
            :incident_key => incident_key,
            :description => event_summary,
            :details => @event
          )
        when 'resolve'
          Redphone::Pagerduty.resolve_incident(
            :service_key => api_key,
            :incident_key => incident_key
          )
        end
        if response['status'] == 'success'
          puts 'pagerduty -- ' + @event['action'].capitalize + 'd incident -- ' + incident_key
        else
          puts 'pagerduty -- failed to ' + @event['action'] + ' incident -- ' + incident_key
        end
      end
    rescue Timeout::Error
      puts 'pagerduty -- timed out while attempting to ' + @event['action'] + ' a incident -- ' + incident_key
    end
  end

end



For my first test I am going to create a notifications folder and use wget to retrieve the code from github


    > sudo mkdir -p /etc/sensu/handlers/notifications
    > cd /etc/sensu/handlers/notifications/
    > sudo wget https://raw.githubusercontent.com/sensu/sensu-community-plugins/master/handlers/notification/pagerduty.rb


Save it then make the file executable


    > sudo chmod a+x pagerduty.rb




It still requires a handler and a "pagerduty" json setting.

Create the pagerduty handler


    > sudo mkdir -p /etc/sensu/conf.d/handlers
    > sudo vi /etc/sensu/conf.d/handlers/pagerduty.json


Replace the api_key with your own.


{
  "handlers": {
    "pagerduty": {
      "command": "/etc/sensu/handlers/notifications/pagerduty.rb",
      "type": "pipe",
      "severities": [
        "ok",
        "critical",
        "unknown"
      ]
    }
  },
  "pagerduty": {
     "api_key": "999XXXXXXXXXXb1"
  }
}





Then I have to add this handler to a check.  In my case I had a check called check_file.json I had created before, so I will edit that.



    > sudo vi /etc/sensu/conf.d/check_file.json




{
    "checks": {
        "check_file": {
            "handlers": [
                "default", "hipchat", "pagerduty"
            ],
            "command": "/etc/sensu/plugins/check-file.rb -f /home/patman/test.txt",
            "interval": 60,
            "occurrences": 3,
            "subscribers": [
               "check-from-sensu-master",
               "client-1",
               "client-2",
               "aws-client"
            ]
        }
    }
}


All I did was add the "pagerduty" handler in the handlers section.


Restart the Sensu Master with the following command, and its client


    > sudo service sensu-server restart && sudo service sensu-api restart && sudo service sensu-client restart





To trigger my check_file check I just need to remove a file from my home directory.



    > rm ~/test.txt


After 3 occurrences the handler triggers.

And it worked!  The pagerduty service was triggered!
I immediately acknowledged the issue.




    > touch  ~/test.txt


Bringing back the file resolves the issue in Sensu and Pagerduty.

That is not how I want Sensu to do.  I do not want Sensu to Resolve any pagerduty service alarm, only to trigger them.  So that means I need to tweak some code.

Here is my tweaked code.


#!/usr/bin/env ruby

require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-handler'
require 'redphone/pagerduty'

class Pagerduty < Sensu::Handler

  def incident_key
    source = @event['check']['source'] || @event['client']['name']
    [source, @event['check']['name']].join('/')
  end

  def handle
    if @event['check']['pager_team']
      api_key = settings['pagerduty'][@event['check']['pager_team']]['api_key']
    else
      api_key = settings['pagerduty']['api_key']
    end
    begin
      timeout(10) do
        if @event['action'] == 'create'
          response = Redphone::Pagerduty.trigger_incident(
                       :service_key => api_key,
                       :incident_key => incident_key,
                       :description => event_summary,
                       :details => @event
                     )
          if response['status'] == 'success'
            puts 'pagerduty -- ' + @event['action'].capitalize + 'd incident -- ' + incident_key
          else
            puts 'pagerduty -- failed to ' + @event['action'] + ' incident -- ' + incident_key
          end
        end
      end
    rescue Timeout::Error
      puts 'pagerduty -- timed out while attempting to ' + @event['action'] + ' a incident -- ' + incident_key
    end
  end
end


This code worked, it triggers a pagerduty alert but does not resolve it.



Playing with the handler for a bit and looking at the logs I saw this message.



{"timestamp":"2014-11-17T19:47:58.930938-0700","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/notifications/pagerduty.rb","type":"pipe","severities":["ok","critical","unknown"],"name":"pagerduty"},"output":"only handling every 30 occurrences: sensu-master/check_file\n"}


Only handling every 30 occurences.

I found this post https://github.com/sensu/sensu/issues/613 [5] which mentions that if you are using the sensu-handler this is the expected behavior.  The default is to only trigger once every 30 minutes (after the initial trigger…. The initial trigger delay does not count).


I ran a little test.  I triggered a pagerduty alert and acknowledged it.  Then let my sensu alarm run for 30 minutes.  At the 30 minute mark the pagerduty handler triggered again.  All it did was add a "Triggered" event to the current open triggered alert.  So no new alert was created, which is exactly the behavior I was looking for.


If I leave the service alert in an acknowledged state and fix the Sensu check (by creating the file again).  Wait for it to resolve then remove the file again (to trigger the incident again).  At this point I still have an open, but acknowledged, incident.

The new trigger does not open a new alert, but just add another "triggered" event to the current open triggered alert… Not quite what I want… I need to think this through and play with it.

Looking at the pagerduty I can see that it has the same incident key… Maybe if I had a different incident key it would issue a different alert?

I have a second check called check_second_file.json
Which checks for


    > sudo vi /etc/sensu/conf.d/check_second_file.json


Which works the same way as my last check but looks for a different file.

Restart the Sensu Master with the following command, and its client


    > sudo service sensu-server restart && sudo service sensu-api restart && sudo service sensu-client restart




If I remove the first file and trigger the alert and acknowledge it.  Then I trigger the second Sensu alert, do I get two alerts in pagerduty?

Yes I do!  So, all it needs is a unique incident Key!

This actually works exactly like I want it to.  I want each type of Sensu Check to only have one open pagerduty alert at a time.  If an alert is open and it triggers again I want it to be absorbed into the last alert.

But if you don't want that and you want each one to trigger a new alert you could put a timestamp in the incident key.  The following code does exactly that.


#!/usr/bin/env ruby

require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-handler'
require 'redphone/pagerduty'

class Pagerduty < Sensu::Handler

  def incident_key
    source = @event['check']['source'] || @event['client']['name']
    [source, @event['check']['name']].join('/') + ("%10.5f" % Time.now.to_f).to_i.to_s
  end

  def handle
    if @event['check']['pager_team']
      api_key = settings['pagerduty'][@event['check']['pager_team']]['api_key']
    else
      api_key = settings['pagerduty']['api_key']
    end
    begin
      timeout(10) do
        if @event['action'] == 'create'
          response = Redphone::Pagerduty.trigger_incident(
                       :service_key => api_key,
                       :incident_key => incident_key,
                       :description => event_summary,
                       :details => @event
                     )
          if response['status'] == 'success'
            puts 'pagerduty -- ' + @event['action'].capitalize + 'd incident -- ' + incident_key
          else
            puts 'pagerduty -- failed to ' + @event['action'] + ' incident -- ' + incident_key
          end
        end
      end
    rescue Timeout::Error
      puts 'pagerduty -- timed out while attempting to ' + @event['action'] + ' a incident -- ' + incident_key
    end
  end
end


One problem with this is that it would trigger a new incident every 30 minutes when the default handler timer goes off.



To fix that you could override the default timer.  Add a "refresh" to your check.



    > sudo vi /etc/sensu/conf.d/check_file.json


And add a refresh variable.


{
    "checks": {
        "check_file": {
            "handlers": [
                "default", "hipchat", "pagerduty"
            ],
            "command": "/etc/sensu/plugins/check-file.rb -f /home/patman/test.txt",
            "interval": 60,
            "occurrences": 3,
            "refresh": 43200,
            "subscribers": [
               "check-from-sensu-master",
               "client-1",
               "client-2",
               "aws-client"
            ]
        }
    }
}


43200 seconds is 12 hours.   The handlers for this check will only re-run every 12 hours.

That may help out.  But for me I will probably set this to 3600 and not use the epoch timestamp.






Different Pagerduty alerts


What if you want to have different Sensu checks and you want thos checks to trigger different pagerduty service alerts?

Luckily someone thought of that when writing the pagerduty.rb code.  You can designate a pager_team in your check and use that pager_team's API alert key.  This makes it easy to have specific pagerduty alerts triggered per Sensu Check.



    > sudo vi /etc/sensu/conf.d/check_file.json


Edit your check and designate a pager_team (you make up the team name)


{
    "checks": {
        "check_file": {
            "handlers": [
                "default", "hipchat", "pagerduty"
            ], 
            "command": "/etc/sensu/plugins/check-file.rb -f /home/patman/test.txt",
            "interval": 60,
            "occurrences": 3,
            "refresh": 3600,
            "pager_team": "alert_1",
            "subscribers": [
               "check-from-sensu-master",
               "client-1",
               "client-2",
               "aws-client"
            ]  
        }  
    }  
}




Edit the pagerduty.json file adding pager_teams


> sudo vi /etc/sensu/conf.d/handlers/pagerduty.json


Replace the api_keys with your own. And use the team names you created here.


{
  "handlers": {
    "pagerduty": {
      "command": "/etc/sensu/handlers/notifications/pagerduty.rb",
      "type": "pipe",
      "severities": [
        "ok",
        "critical",
        "unknown"
      ]  
    }  
  }, 
  "pagerduty": {
     "api_key": "999XXXXXXXXXXb1",
     "alert_1": {
        "api_key": "33XXXXXXXXXXXee9"
     },   
     "alert_2": {
        "api_key": "999XXXXXXXXXXb1"
     }  
  }
}


Now I effectively have a default pagerduty alert and two specific alerts that a check can use if the check designates the pager_team.


Restart the Sensu Master with the following command, and its client


    > sudo service sensu-server restart && sudo service sensu-api restart && sudo service sensu-client restart



Now I removed my file to trigger my Sensu alert


    > rm ~/test.txt


After 3 occurrences the handler triggers.

And it worked!  The pagerduty service from the pager_team 'alert_1' was triggered!  Cool this is what I need

Bringing back the file resolves the issue.


    > touch  ~/test.txt




As a test I am going to edit my second Sensu check and have it use a different alert and see how it works.


    > sudo vi /etc/sensu/conf.d/check_second_file.json




{
    "checks": {
        "check_file_2": {
            "handlers": [
                "default", "hipchat", "pagerduty"
            ], 
            "command": "/etc/sensu/plugins/check-file.rb -f /home/patman/test-2.txt",
            "interval": 60,
            "occurrences": 3,
            "refresh": 3600,
            "pager_team" : "alert_2",
            "subscribers": [
               "client-1", 
               "client-2",
               "aws-client"
            ]   
        }  
    }  
}

Restart the Sensu Master with the following command, and its client


    > sudo service sensu-server restart && sudo service sensu-api restart && sudo service sensu-client restart


Now let me see if I can trigger these two different Sensu alarms and have each one in turn trigger its own pagerduty service alert.


    > rm ~/test.txt



    > rm ~/test-2.txt



Perfect!  It created two different pagerduty alerts like it should!






A few more tweaks…


I like the Description, but it would be nice to be able to append a message to it.





And looking at the Message Itself I see






Which is not really what I want.

So I need to tweak the code a bit.


    > sudo vi /etc/sensu/handlers/notifications/pagerduty.rb



Here is the code I came up with.


#!/usr/bin/env ruby

require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-handler'
require 'redphone/pagerduty'

class Pagerduty < Sensu::Handler

  def incident_key
    source = @event['check']['source'] || @event['client']['name']
    [source, @event['check']['name']].join('/')
  end

  def handle
    if @event['check']['pager_team']
      api_key = settings['pagerduty'][@event['check']['pager_team']]['api_key']
    else
      api_key = settings['pagerduty']['api_key']
    end


    if @event['check']['pagerduty_desc']
      description = event_summary + ' ' + @event['check']['pagerduty_desc']
    else
      description = event_summary
    end
   

    if @event['check']['playbook']
      details = @event['check']['playbook']
    else
      details = @event
    end

    begin
      timeout(10) do
        if @event['action'] == 'create'
          response = Redphone::Pagerduty.trigger_incident(
                       :service_key => api_key,
                       :incident_key => incident_key,
                       :description => description,
                       :details => details
                     )  
          if response['status'] == 'success'
            puts 'pagerduty -- ' + @event['action'].capitalize + 'd incident -- ' + incident_key
          else
            puts 'pagerduty -- failed to ' + @event['action'] + ' incident -- ' + incident_key
          end
        end
      end
    rescue Timeout::Error
      puts 'pagerduty -- timed out while attempting to ' + @event['action'] + ' a incident -- ' + incident_key
    end
  end
end




Then edit the check json file


    > sudo vi /etc/sensu/conf.d/check_file.json


Edit your check and designate a pager_team


{
    "checks": {
        "check_file": {
            "handlers": [
                "default", "hipchat", "pagerduty"
            ], 
            "command": "/etc/sensu/plugins/check-file.rb -f /home/patman/test.txt",
            "interval": 60,
            "occurrences": 3,
            "refresh": 3600,
            "pager_team": "alert_2",
            "pagerduty_desc": "This is a test for a message https://www.google.com",
            "playbook" : "Go get notes on how to fix this at https://www.yahoo.com",
            "subscribers": [
               "check-from-sensu-master",
               "client-1",
               "client-2",
               "aws-client"
            ]  
        }  
    }  
}

The pagerduty_desc will append to the description.  The playbook will replace the message sent to pagerduty. (I am using playbook because I have seen it used by other sensu handlers, I am not sure if it’s in standard use or not?)

If they are not present in the check the normal description and message will be sent.


Restart the Sensu Master with the following command, and its client


    > sudo service sensu-server restart && sudo service sensu-api restart && sudo service sensu-client restart


Trigger an alert


    > rm ~/test.txt






In pagerduty the message is appended and the URL is clickable.




Looking at the incident details the Message shows up just fine and the URL is clickable.


Perfect!  That is exactly what I wanted.


I think that is enough for this write up, hope it helps someone.





References
[1]        How To Integrate Sensu with PagerDuty
                        http://www.pagerduty.com/docs/guides/sensu-integration-guide/
                Accessed 11/2014
[2]        Redphone, the monitoring service ruby library github page
                        https://github.com/portertech/redphone
                Accessed 11/2014
[3]        sensu-community-plugins
                Accessed 11/2014
[4]        sensu-community-plugins pagerduty.rb
                Accessed 11/2014
[5]        bugs in multiple handlers #613

            Accessed 11/2014




This post is a part of and epic, the pagerduty and twilio epic.

Epic Goal:   Set up a phone number, via twillio, that when called will set off a pagerduty event.


Its also part of my general Sensu Epic



This post is a part of and epic, the Sensu Epic.


Epic Goal:   My goal is to figure out how to use Sensu to moni

No comments:

Post a Comment