Skip to main content

understanding the “ad op master is inconsistent” alert

i use the term “understanding” loosely.  this is by far no definitive guide on this particular alert, just a few things i have picked up in my attempt to understand it.

let’s look at the context of the alert:

The Domain Controller's Op Master is inconsitent. See additional alerts for details.

first of all, it gives very little information.  the only particularly useful detail is that it indicates which server is having the issue.  other than that, just a spelling error as there are no additional critical alerts to look at for details.

this rule, as you know, comes from a sealed mp.  therefore, we can’t modify anything in it except the overrides.  the couple i’ve tinkered with are:

  • interval (sec)
  • log success event

to begin with, interval (sec) is just set way too high.  the default is 60 seconds.  why on earth would anyone want to know that your op master consistency may be off, every minute?  actually, i could think of a few reasons, but really, it’s overkill.  the way the script works (which has a copyright of 2001, by the way, and apparently briefly modified for opsmgr by stripping out the createevent calls and replacing them with property bag calls) is to check all replication partners for op master consistency.  on a bridgehead server, that’s quite a big number.  i ended up dropping the value to 3600.

i mentioned earlier that the alert has no useful context.  here’s why.  if you look inside the script, the functions for createevent route to this subroutine:

Sub CreateEvent(lngEventID, lngEventType, strMessage)

oAPI.LogScriptEvent "AD Replication Partner Op Master Consistency", lngEventID, lngEventType, strMessage

if lngEventID=EVENTID_CANT_DETERMINE_OP_MASTER or lngEventID=EVENTID_OP_MASTERS_INCONSISTENT then
bState=1
End if
End Sub
 

depending on the condition, bState either goes 0 or 1.  this next snippet indicates what to do with the bState condition:

if bState=0  then
set oBag = oAPI.CreateTypedPropertyBag(StateDataType)
oBag.AddValue "State", "GOOD"
oBag.AddValue "EventID", EVENTID_GOOD
oAPI.AddItem oBag
END IF
if bState=1 then
set oBag = oAPI.CreateTypedPropertyBag(StateDataType)
oBag.AddValue "State", "BAD"
oBag.AddValue "EventID", EVENTID_BAD
oAPI.AddItem oBag
END If

and as you can see, the state is captured as well as the event id.  no other detail.  for this reason, you can’t figure out whether bad is really bad or just maybe bad.  i mean, is it really inconsistent, or can the agent simply not retrieve the required data? 

this is where the other override i mentioned earlier, comes into play – log success event.  if you choose to turn this on, you can view the associated events of the health service to see the details it stores.  it will show you every consistent partner – which could potentially be useful.  this is what a “good” event would look like:

AD Replication Partner Op Master Consistency : Op Master PDC 'myDC1.MYDOMAIN.COM' consistent with replication partner 'myDC2.MYDOMAIN.COM'.

and the inverse (logs regardless of log success event override):

AD Replication Partner Op Master Consistency : The script 'AD Replication Partner Op Master Consistency' failed to executethe following LDAP query: '<LDAP://myDC1.myDomain.com/CN=Schema,CN=Configuration,DC=myDomain,DC=com>;(&(objectClass=dMD)(fSMORoleOwner=*));fSMORoleOwner;Subtree'.
The error returned was 'Table does not exist.' (0x80040E37)



AD Replication Partner Op Master Consistency : Unable to determine schema Op Master on domain controller 'myDC1.myDomain.com'.


that’s actually two events that i pulled out of the operations manager event log on the dc in question.  those events are not captured by opsmgr by default.

when you view the data, it tells you everything you need to know.  first of all, don’t panic.  there’s no inconsistency!  the error is because the partner dc can’t be reached.  (yes, that is a problem obviously but not a data integrity type problem).

i mentioned that there are no additional critical alerts to look at for details.  if you switch to the warning alerts, you will find detail on why the particular alert fired.  for example, here are two that relate to the event messages above:

AD Replication Partner Op Master Consistency : The script 'AD Replication Partner Op Master Consistency' failed to executethe following LDAP query: '<LDAP://myDC1.myDomain.com/CN=Configuration,DC=myDomain,DC=com>;(&(objectClass=crossRefContainer)(fSMORoleOwner=*));fSMORoleOwner;Subtree'.
The error returned was 'Table does not exist.' (0x80040E37)



AD Replication Partner Op Master Consistency : Unable to determine domain naming Op Master on domain controller 'myDC1'.


looks remarkably alike, right?

if you want to see it all in one view, i suggest using the dc active alerts view which will show you all the alerts stacked up for easier correlation.

image

optionally, search for the computer name in question.  don’t expect to find it under dc events.  as stated earlier, these events are not picked up by default.

hope that helps!

Comments

  1. Hello,
    Thanks for the wonderful article, it explains well.

    I enabled the log success event counter and it fills the logs in no time.

    So i saved the events, it says sucessfull for all DCS for all the fsmo owners, however sometimes when the failure occurs it occurs once or twice or maybe 3 times in 1 day. The server for which it throws an alert has about 14 replication partners which are located around the globe. So i was thinking that it might be related to the response time as sometimes there might be some network issues.

    Could you please tell me that when the script runs every 5 minutes for one server it checks the FSMO consistency for all the replication partners, how much time does the script wait for the response from the partner?

    Appreciate any help :)
    Hari

    ReplyDelete
  2. Hi,
    COuld you tell me that when teh script runs to check consistency from all replication partners does it wait for a particular amount of time before it fails? i mean the default response time threshold?

    ReplyDelete
  3. Hi there,
    Wondeful article, it explains well.

    Could you please tell me that when teh script runs to check the FSMO consistency for al the replication partners, does it wait for any response time? i mean the response time threshold?

    Appreciate any help
    Hari

    ReplyDelete
  4. Could someone summarize this article in plain English?

    ReplyDelete
    Replies
    1. oh man, i wrote it in what i thought was plain english. should i try old english? thou art ... and that's about as far as i can get. ;)

      Delete
    2. Ha ha.. I like your sense of humor..

      Delete
  5. So how can i change interval (sec)? where can i find this to change? :)

    ReplyDelete
  6. when u get an alert, right click on it and select overrides and then change the interval there
    JB

    ReplyDelete
  7. Very insightful and helpful post. Thank you. I am curious if you have any ideas on how to keep this alert from triggering on some new lag domain controllers that are only turned on once a week to replicate, then are turned back off. Because this is sealed, I don't see any way to edit beyond the available overrides, which are not much help. I thought that maybe the "AD Replication Partner Op Master Consistency script' could be edited to "ignore" specific LDAP queries on these new lag Domain Controllers, but I am not seeing where that script resides. Any ideas?

    ReplyDelete
    Replies
    1. Well, unfortunately, it doesn't work for lag sites out of the box. It's been a long time since I've looked at this, but if you wanted to make it work, you would need to turn off the one in the sealed MP and create your own custom, then using conditional logic, strip out the DCs returned in your lag site.

      Delete

Post a Comment

Popular posts from this blog

using preloadpkgonsite.exe to stage compressed copies to child site distribution points

UPDATE: john marcum sent me a kind email to let me know about a problem he ran into with preloadpkgonsite.exe in the new SCCM Toolkit V2 where under certain conditions, packages will not uncompress.  if you are using the v2 toolkit, PLEASE read this blog post before proceeding.   here’s a scenario that came up on the mssms@lists.myitforum.com mailing list. when confronted with a situation of large packages and wan links, it’s generally best to get the data to the other location without going over the wire. in this case, 75gb. :/ the “how” you get the files there is really not the most important thing to worry about. once they’re there and moved to the appropriate location, preloadpkgonsite.exe is required to install the compressed source files. once done, a status message goes back to the parent server which should stop the upstream server from copying the package source files over the wan to the child site. anyway, if it’s a relatively small amount of packages, you can

How to Identify Applications Using Your Domain Controller

Problem Everyone has been through it. We've all had to retire or replace a domain controller at some point in our checkered collective experiences. While AD provides very intelligent high availability, some applications are just plain dumb. They do not observe site awareness or participate in locating a domain controller. All they want is the name or IP of one domain controller which gets hardcoded in a configuration file somewhere, deeply embedded in some file folder or setting that you are never going to find. How do you look at a DC and decide which applications might be doing it? Packet trace? Logs? Shut it down and wait for screaming? It seems very tedious and nearly impossible. Potential Solution Obviously I wouldn't even bother posting this if I hadn't run across something interesting. :) I ran across something in draftcalled Domain Controller Isolation. Since it's in draft, I don't know that it's published yet. HOWEVER, the concept is based off

sccm: content hash fails to match

back in 2008, I wrote up a little thing about how distribution manager fails to send a package to a distribution point . even though a lot of what I wrote that for was the failure of packages to get delivered to child sites, the result was pretty much the same. when the client tries to run the advertisement with an old package, the result was a failure because of content mismatch. I went through an ordeal recently capturing these exact kinds of failures and corrected quite a number of problems with these packages. the resulting blog post is my effort to capture how these problems were resolved. if nothing else, it's a basic checklist of things you can use.   DETECTION status messages take a look at your status messages. this has to be the easiest way to determine where these problems exist. unfortunately, it requires that a client is already experiencing problems. there are client logs you can examine as well such as cas, but I wasn't even sure I was going to have enough m