We’ve been playing around with IP SLAs at work lately (here’s a quick overview of SLAs and what they can do for you). Right now we’re mainly interested in monitoring VOIP services, so we set up an SLA to monitor jitter.
First, we need to configure a responder. This is the machine that will field all of the SLA requests. For our tests, we used a tiny lab router (a stock 2921) – we want to find out how much extra load SLAs would add, so we used something small in order to maximize opportunities for breakage.
This part is simple:
config t ip sla responder
Then we set up probes on other routers, at several branch offices:
config t !! give it a meaningful ID, if you can ip sla 10 !! measure jitter to responder, on an arbitrarily chosen port, using appropriate voice codec udp-jitter 10.10.10.1 3000 codec g711ulaw !! give it a meaningful name, if you like tag backwater-office-VoiceTest !! Type of Service 184 translates to DSCP tag EF (expedited forwarding), good for voice/video tos 184 !! make sure the SLA packets aren't getting corrupted verify-data !! schedule it to start now, never die ip sla schedule 10 life forever start-time now
Verify your work with:
sh run | section sla
– no frequency is specified, so it will run at the default interval of 60 seconds
– you can’t update the probes in-place. You have to delete them (with eg “no ip sla 100”) and start over; this includes reconfiguring the scheduler, which seems odd because it looks like it’s a global config param. :shrug:
We can check out stats from our SLA like so:
PROBE#sh ip sla statistics 10 Round Trip Time (RTT) for Index 10 Type of operation: jitter Latest RTT: 39 ms Latest operation start time: 12:16:32.012 EDT Mon Mar 12 2012 Latest operation return code: OK RTT Values: Number Of RTT: 999 RTT Min/Avg/Max: 39/39/59 milliseconds Latency One-Way Time: Number of Latency one-way Samples: 999 Source to Destination Latency one way Min/Avg/Max: 23/24/27 milliseconds Destination to Source Latency one way Min/Avg/Max: 15/15/35 milliseconds Jitter Time: Number of SD Jitter Samples: 997 Number of DS Jitter Samples: 997 Source to Destination Jitter Min/Avg/Max: 0/1/3 milliseconds Destination to Source Jitter Min/Avg/Max: 0/1/20 milliseconds Packet Loss Values: Loss Source to Destination: 0 Loss Destination to Source: 1 Out Of Sequence: 0 Tail Drop: 0 Packet Late Arrival: 0 Packet Skipped: 0 Voice Score Values Calculated Planning Impairment Factor (ICPIF): 1 Mean Opinion Score (MOS): 4.34 Number of successes: 47 Number of failures: 0 Operation time to live: Forever
– the wonderful user-friendly tag I put on my sla is not shown in this output. So, I’ll need to make the ids relevant to me in some way (eg, a vlan id, a subnet, whatever.)
– the pieces I’m interested in are the Latest RTT (compare to RTT Min/Avg/Max), Jitter Min/Avg/Max, MOS, and Latest operation return code (OK or Failed). It’s also interesting to compare the differences in dst-to-src vs src-to-dst.
– most of the rest of this pertains to the actual data samples.
Back to our responder, now that I’ve configured about 30 probes:
responder#sh proc cpu | include SLA 23 873372000 27581444 31 3.91% 3.14% 3.09% 0 IP SLAs Responde
Let’s have a look at CPU over time:
Basically, it was doing nothing for a very long time (hence the low avg usage) and now is holding steady between 5-10%. So, make sure you keep track of the health of your responders!
Stay tuned for Part 2, in which I set up some quick-and-dirty monitoring of the SLAs with rrdtool.