Skip to main content

sms: you couldn't f5 your way out of a paper bag!

otherwise known as "i've been waiting for async query to complete!" i'm not entirely sure at this point whether or not i love or hate it when you hear the words "thank god you're here!" while you're walking to your desk in the morning. this is before you set your things down, while your lids are still half-shut... and certainly before your first good cup of coffee.

i'm just glad most of the time it's someone cracking a joke. too bad it wasn't this morning. it's not that i'm saying it was a necessarily terrible thing. it's always great to have something new to discover, get frustrated with, and eventually conquer ... or most likely ... concede.

today's problem du jour was a sms issue. (i'm sure you caught on to that by the title.) collection evaluator was taking a tremendously long time trying to refresh certain collections. the collection evaluator log indicated a problem like this:

Preparing to refresh collection XYZ01234 12/11/2007 2:34:06 PM 26116 (0x6604) Refreshing results for collection XYZ01234 12/11/2007 2:34:06 PM 26116 (0x6604) Waiting for async query to complete, have waited 325 seconds already. 12/11/2007 2:37:06 PM Waiting for async query to complete, have waited 645 seconds already. 12/11/2007 2:42:26 PM Waiting for async query to complete, have waited 975 seconds already. 12/11/2007 2:47:56 PM Results refreshed for collection XYZ01234, 961 entries changed. 12/11/2007 2:53:38 PM

looking closely at it, there were a few odd things about it but definitely nothing out of the ordinary. just strange oddities like using subselect statements to make sure it had the sms client in add/remove programs. hmmmmmmm... since i'm using sms inventory for the collection criteria in the first place, doesn't that imply it had the sms client in arp at some point? :) anyway, the snippet is the modified version, without the subselect.

select SMS_R_System.ResourceID, SMS_R_System.ResourceType, SMS_R_System.Name,SMS_R_System.SMSUniqueIdentifier, SMS_R_System.ResourceDomainORWorkgroup, SMS_R_System.Client 
from SMS_R_System inner join SMS_G_System_SoftwareFile as Project on Project.ResourceID = SMS_R_System.ResourceId inner join SMS_G_System_ADD_REMOVE_PROGRAMS on SMS_G_System_ADD_REMOVE_PROGRAMS.ResourceID = SMS_R_System.ResourceId 
where (Project.FileName = "Winproj.exe" and Project.FileVersion < "11.0.2003" or SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName like "Microsoft Project% 2002")

i tried all manner of grouping the stuff together to make it run right, cycling collection evaluator over and over again because each failure meant run times that extended into torment! after a lot of slight query changes, nothing helped. i figured i'd dissect it to see where the problem was exactly.

this is what produced some very interesting results. after tearing the collection criteria apart, what i found was that each individual query would produce a very quick query execution. instead of the wql query above, i created these two separate ones.

Add/Remove Programs:

select SMS_R_System.ResourceID, SMS_R_System.ResourceType, SMS_R_System.Name,SMS_R_System.SMSUniqueIdentifier, SMS_R_System.ResourceDomainORWorkgroup, SMS_R_System.Client 
from SMS_R_System inner join SMS_G_System_SoftwareFile as Project on Project.ResourceID = SMS_R_System.ResourceId inner join SMS_G_System_ADD_REMOVE_PROGRAMS on SMS_G_System_ADD_REMOVE_PROGRAMS.ResourceID = SMS_R_System.ResourceId 
where SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName like "Microsoft Project% 2002"

Software Files:

select SMS_R_System.ResourceID, SMS_R_System.ResourceType, SMS_R_System.Name,SMS_R_System.SMSUniqueIdentifier, SMS_R_System.ResourceDomainORWorkgroup, SMS_R_System.Client 
from SMS_R_System inner join SMS_G_System_SoftwareFile as Project on Project.ResourceID = SMS_R_System.ResourceId inner join SMS_G_System_ADD_REMOVE_PROGRAMS on SMS_G_System_ADD_REMOVE_PROGRAMS.ResourceID = SMS_R_System.ResourceId 
where Project.FileName = "Winproj.exe" and Project.FileVersion < "11.0.2003"

as i said earlier, this had a dramatic effect on run time. it went from minutes to seconds. the answer was so simple. instead of using a single collection criteria, we'd add both. after running the new collection, i made sure coll-fullcollection evaluator showed me the same improved performance. the query finished its execution in 4 seconds. that's better!

so here you go. something like this produces much better results in this case. i'm still trying to figure out why.

for a fun exercise, i took some screenshots... but then realized, i can't use most of them since they don't really tell a story. ah well. i guess i have more to learn about this blogging stuff still. unfortunately for you, the story isn't over yet!

here i am again blogging about sms 2003 and mom 2005. yes, yes, i know... there are new technologies out which i unfortunately am not using yet. so for all of those less fortunate, yours truly included, there is still a great deal of value in what you've got running. the sms 2003 management pack for mom 2005 does not have any rules that look for long-running queries! even if you think you're a genius, you're bound to make a screw up sooner or later. if you have other people working with you, in the same environment, more opportunities abound... (and i don't mean people to blame).

well, that's a quick fix! all you need is a fairly easy regex expression and a new application log provider.

mom-app-prov

as you can see in the screenshot above, i've created a new provider and named it "sms 2003 collection evaluator log." the provider log type is "generic single-line log file". all you have to do next is point this to your sms log directory and specify the colleval.log file... and yes, just for you, i've got that screenshot too!

mom-app-dir

once you have this provider done, you'll need to create a new event rule and tie it to your sms computer group. in your event rule, your data provider should be the one you created above. just switch the provider name like this:

image

under the criteria section, switch to the advanced view since you won't have access to specifying regex matching. once in it, you'll need to populate it with this regular expression:

.*have[ ]waited[ ]([3-9][0-9][0-9]+|[1-9][0-9][0-9][0-9]+)[ ]seconds.*

it should look like this when you're done:

image

mom 2005 is a finicky little beast when it comes to regex statements. if you remember, the original statement in the log looks like the following:

Waiting for async query to complete, have waited 325 seconds already. 12/11/2007 2:37:06 PM

by finicky, i mean, mom will not handle spaces in a regex statement. if you put it in, you'll get a syntax error. obviously we need to put spaces in the text to look for... otherwise we'd be relying solely on the number of seconds. probably not a good idea since it's just a number value. to get around that just put a space inside of empty brackets like this -- [ ]. as far as the other brackets, this regex statement will pick up any value above 300 seconds. 300 seconds = 5 minutes which seems like a reasonable thing to look for. you can slide this out as necessary... but this isn't a tutorial on regex (nor am i qualified to give one, nor would you want to receive one from me). :)

you're set. now at least you'll know if sms is getting its face punched by badly written queries or some other oddity. at some point i'll look into why the original query took so long. it seems like this problem didn't crop up until after sp3... so maybe? i don't know. it's a stretch. if anyone knows, please leave a comment!

Comments

  1. This comment has been removed by a blog administrator.

    ReplyDelete

Post a Comment

Popular posts from this blog

using preloadpkgonsite.exe to stage compressed copies to child site distribution points

UPDATE: john marcum sent me a kind email to let me know about a problem he ran into with preloadpkgonsite.exe in the new SCCM Toolkit V2 where under certain conditions, packages will not uncompress.  if you are using the v2 toolkit, PLEASE read this blog post before proceeding.   here’s a scenario that came up on the mssms@lists.myitforum.com mailing list. when confronted with a situation of large packages and wan links, it’s generally best to get the data to the other location without going over the wire. in this case, 75gb. :/ the “how” you get the files there is really not the most important thing to worry about. once they’re there and moved to the appropriate location, preloadpkgonsite.exe is required to install the compressed source files. once done, a status message goes back to the parent server which should stop the upstream server from copying the package source files over the wan to the child site. anyway, if it’s a relatively small amount of packages, you can

How to Identify Applications Using Your Domain Controller

Problem Everyone has been through it. We've all had to retire or replace a domain controller at some point in our checkered collective experiences. While AD provides very intelligent high availability, some applications are just plain dumb. They do not observe site awareness or participate in locating a domain controller. All they want is the name or IP of one domain controller which gets hardcoded in a configuration file somewhere, deeply embedded in some file folder or setting that you are never going to find. How do you look at a DC and decide which applications might be doing it? Packet trace? Logs? Shut it down and wait for screaming? It seems very tedious and nearly impossible. Potential Solution Obviously I wouldn't even bother posting this if I hadn't run across something interesting. :) I ran across something in draftcalled Domain Controller Isolation. Since it's in draft, I don't know that it's published yet. HOWEVER, the concept is based off

sccm: content hash fails to match

back in 2008, I wrote up a little thing about how distribution manager fails to send a package to a distribution point . even though a lot of what I wrote that for was the failure of packages to get delivered to child sites, the result was pretty much the same. when the client tries to run the advertisement with an old package, the result was a failure because of content mismatch. I went through an ordeal recently capturing these exact kinds of failures and corrected quite a number of problems with these packages. the resulting blog post is my effort to capture how these problems were resolved. if nothing else, it's a basic checklist of things you can use.   DETECTION status messages take a look at your status messages. this has to be the easiest way to determine where these problems exist. unfortunately, it requires that a client is already experiencing problems. there are client logs you can examine as well such as cas, but I wasn't even sure I was going to have enough m