Monitoring Cisco IP SLAs: syslog messages

by gorthx

Now that I have an SLA in place, and have some baseline data, I want some notifications in case my SLAs drop. I’ll cover basic syslog messages here, and SNMP traps in another post.

Cisco’s docs have a neat graph that shows how the reaction-config interprets the thresholds. There’s also a handy-dandy chart that tells us which reactions are available for each type of SLA.

I configured a udp-jitter SLA, so everything in the ‘UDP Jitter’ column that’s marked with a Y is something I can configure a reaction for. I’ll check RTT [1], jitter, MOS, and timeout for starters. Initially, I tried this out on just a few routers, with some numbers very close to my collected stats (see last week’s graphs), so I could make sure it was working. Here, I have adjusted them to some more realistic numbers; YMMV.

config t
!! send trap if we get no response from the other end
!! note there's no threshold-value, we don't need one for a timeout.  It either did or did not happen.
ip sla reaction-configuration 10 react timeout threshold-type immediate action-type trapOnly

!! send trap if rtt goes above 130;  send trap when it drops back below 110
!! cisco recs 130ms delay as acceptable for voice
ip sla reaction-configuration 10 react rtt threshold-value 130 110 threshold-type immediate action-type trapOnly

!! send trap if overall jitter goes above 2;  send trap when it drops back below 1
ip sla reaction-configuration 10 react jitterAvg threshold-value 2 1 threshold-type immediate action-type trapOnly 
!! same, src to dst
ip sla reaction-configuration 10 react jitterSDAvg threshold-value 2 1 threshold-type immediate action-type trapOnly 
!! same, dst to src
ip sla reaction-configuration 10 react jitterDSAvg threshold-value 2 1 threshold-type immediate action-type trapOnly

!! send trap when MOS goes above 400;  send trap if it drops below 380
ip sla reaction-configuration 10 react mos threshold-value 400 380 threshold-type immediate action-type trapOnly

!! send the traps to the log server
!! this is the most important piece :)
ip sla logging traps

Notes:
– the order of threshold-value, threshold-type, and action-type is not important; IOS will re-order it into the format shown above.
– you can delete single lines with ‘no ip sla reaction-configuration [id] react [reaction]’
– you can delete all reactions with ‘no ip sla reaction-configuraion [id]’
– you can only set a falling threshold (the second value) that is less than or equal to the rising threshold (the first value).
– since MOS is usually expressed in a decimal value 1-5, multiply by 100 to get an integer value for the threshold
– another point about the MOS thresholds: the context-sensitive help says values between 1-60000 are valid, but if you try to enter anything other than 100-500, you get an error.
– the syntax for ‘ip sla logging traps’ is counter-intuitive to me. A lot like ‘snmp-server enable traps syslog’ – which reads like it would send SNMP traps to syslog, but it actually does the opposite: you get syslog messages sent to your snmp traphost as traps.

You can use a couple of commands to check your work:
sh run | section reaction
or
sh ip sla reaction-configuration [sla id]

The latter is in a format that’s a bit easier to read than the long-string-of-text you get from looking at the config; here’s a piece of it to give you an idea:

Entry number: 10
Index: 1
Reaction: timeout
Threshold Type: Immediate
Threshold CountX: 5
Threshold CountY: 5
Action Type: Trap only

With this configuration, we’ll get syslog messages that look like this:

Mar 22 09:29:12.979 MDT: %RTT-3-IPSLATHRESHOLD: IP SLAs(10): Threshold exceeded for jitterDSAvg
Mar 22 09:30:12.963 MDT: %RTT-3-IPSLATHRESHOLD: IP SLAs(10): Threshold below for jitterDSAvg
Mar 22 12:33:47.054 MST: %RTT-3-IPSLATHRESHOLD: IP SLAs(10): Threshold exceeded for rtt
Mar 22 12:35:17.057 MST: %RTT-3-IPSLATHRESHOLD: IP SLAs(10): Threshold below for rtt
Mar 28 08:51:07.048 MDT: %RTT-3-IPSLATHRESHOLD: IP SLAs(10): Threshold below for mos
Mar 29 15:40:01.565 MDT: %RTT-3-IPSLATHRESHOLD: IP SLAs(10): Threshold exceeded for mos

Thrilling, huh.

During the first week of data-gathering (read: sitting around waiting for something to happen & kick off a log message), I discovered the timeout message isn’t necessary. By enabling ‘ip sla logging traps’, you’ll get messages like this in case of a failure:

Mar 29 09:56:39.404 EDT: %RTT-4-OPER_CLOSS: condition occurred, entry number = 10

According to the docs, “This message displays connection loss conditions in the IP Service Level Agreement (IP SLA) operations. This message is enabled when the ip sla monitor logging trap command is entered.” So, one less thing to configure.

Problems I have with this system:
1. My user-friendly SLA tag isn’t included in the log message, so I need to remember which SLA “10” refers to.
2. The current rtt, jitter, and MOS values aren’t included in the message. That would be really useful.
3. Speaking of MOS, those last two are kind of interesting. MOS being below threshold is bad, whereas for RTT/jitter that same message would mean the situation is recovering, so I need to remember that MOS messages are the reverse of the jitter & RTT notices.

Verdict: these messages are not very useful. This whole thing feels sort of thrown-together, frankly. We could use the Cisco EEM to give us more descriptive messages, but a) it would be a management nightmare in large-scale deployment and b) it’s a bit hit-or-miss if we got a message about any given event anyway. So, I’m going to see what I can get with SNMP traps. It looks like an actual trap may have more information.


1 – If I’m to believe the chart, RTT may actually refer to RTTAvg. Seems to me like you’d get a more immediate reaction if you alerted on RTT (e.g. the “Latest RTT” value from “sh ip sla statistics”) instead of the average, but ok.

Advertisements
%d bloggers like this: