Skip to main content

monitoring services, the mom 2005 agent way…

the net effect of how this topic came up was an email i received over the weekend on some monitoring i setup.  the user basically said … hey this is a great alert!  now how do i tell when the service comes back up?

always wanting more.  never satisfied.  anyway, i started looking into this only to find that it wasn’t nearly as intuitive as it should have been.  if i wanted to deconstruct the entire thing and create it myself, that’d have probably worked…but i wanted to understand how the agent did this stuff.

if you look at the agent properties (under global settings), you’ll notice that there’s a “service monitoring” tab.  you can use this tab to specify the check and reporting intervals.  i’m not going to go into detail on those values since really, this post is about much cooler (boring?) stuff.


i’m not sure what it executes under the covers, but i do know that the event id is ALWAYS 21207.  so what is different?  what changes in the events to make the state switch from active to inactive or back to active again.  glad you asked.  here’s a few tables to help you understand the values that’ll come into parameters (which by the way, are not exposed in the events).


event & parameter details:

event parameter value
event id 21207
source microsoft operations manager
parameter 1 # of seconds from last sample
parameter 2 display name
parameter 3 old state (string)
parameter 4 new state (string)
parameter 5 service name (short name version)
parameter 6 start up type (string)
parameter 7 service context (user id)
parameter 8 old state (numeric)
parameter 9 new state (numeric)
parameter 10 start up type (numeric)


service state details:

state value
stopped 1
start pending 2
stop pending 3
running 4
continue pending 5
pause pending 6
paused 7


service startup details:

start type value
automatic 2
manual 3
disabled 4
unknown -1


based on that, now you can build your own alert conditions by modifying the event rule.  this one, for example, was created by the management pack wizard.  here’s the formula for the state alert:

AttributeValue(Parameter 10) = 2 AND (AttributeValue(Parameter 9)= "1" OR AttributeValue(Parameter 9)= "3")

now, we deconstruct this to see what all of these mean:

  • parameter 9 – new state
    • 1 – stopped OR
    • 3 – stopped pending
  • parameter 10 – start up type
    • 2 - automatic


so in human language (or so i purport!): if the service is an automatic service and is in the state “stopped” or “stopped pending”, then raise a critical error alert.

don’t forget the rest of your criteria:

  • source: Microsoft Operations Manager
  • event id: 21207
  • parameter 5: short name of the service.



Popular posts from this blog

how to retrieve your ip address with powershell...

update: this is how it’s performed in powershell v3 as demonstrated here.(get-netadapter | get-netipaddress | ? addressfamily -eq'IPv4').ipaddress update: this is by far the easiest.PS C:\temp> (gwmi Win32_NetworkAdapterConfiguration | ? { $_.IPAddress -ne $null }).ipaddress
are you laughing yet?  i know you probably find this topic amusing.  it's really interesting though.  whenever you get over it, i'll do this in the standard cmd.exe interpreter and then in powershell to show you what kind of coolness powershell does.done?  okay, good.  this is an interpretation of a demo that bob wells did at our smug meeting.  hope you like it.i should tell you, it's not as simple as the title would lead you to believe.  i like doing that little slight-of-hand thing since it gives the impression that i'm painting a very easy target on my back for your criticism (though it's probably true in other ways)!  the idea is that we want to retrieve just the ip ad…

understanding the “ad op master is inconsistent” alert

i use the term “understanding” loosely.  this is by far no definitive guide on this particular alert, just a few things i have picked up in my attempt to understand it.let’s look at the context of the alert:The Domain Controller's Op Master is inconsitent. See additional alerts for details.
first of all, it gives very little information.  the only particularly useful detail is that it indicates which server is having the issue.  other than that, just a spelling error as there are no additional critical alerts to look at for details.this rule, as you know, comes from a sealed mp.  therefore, we can’t modify anything in it except the overrides.  the couple i’ve tinkered with are:interval (sec) log success event to begin with, interval (sec) is just set way too high.  the default is 60 seconds.  why on earth would anyone want to know that your op master consistency may be off, every minute?  actually, i could think of a few reasons, but really, it’s overkill.  the way the script works…

sccm: content hash fails to match

back in 2008, I wrote up a little thing about how distribution manager fails to send a package to a distribution point. even though a lot of what I wrote that for was the failure of packages to get delivered to child sites, the result was pretty much the same. when the client tries to run the advertisement with an old package, the result was a failure because of content mismatch.I went through an ordeal recently capturing these exact kinds of failures and corrected quite a number of problems with these packages. the resulting blog post is my effort to capture how these problems were resolved. if nothing else, it's a basic checklist of things you can use.DETECTIONstatus messagestake a look at your status messages. this has to be the easiest way to determine where these problems exist. unfortunately, it requires that a client is already experiencing problems. there are client logs you can examine as well such as cas, but I wasn't even sure I was going to have enough material to …