sms: you couldn't f5 your way out of a paper bag!
otherwise known as "i've been waiting for async query to complete!" i'm not entirely sure at this point whether or not i love or hate it when you hear the words "thank god you're here!" while you're walking to your desk in the morning. this is before you set your things down, while your lids are still half-shut... and certainly before your first good cup of coffee.
i'm just glad most of the time it's someone cracking a joke. too bad it wasn't this morning. it's not that i'm saying it was a necessarily terrible thing. it's always great to have something new to discover, get frustrated with, and eventually conquer ... or most likely ... concede.
today's problem du jour was a sms issue. (i'm sure you caught on to that by the title.) collection evaluator was taking a tremendously long time trying to refresh certain collections. the collection evaluator log indicated a problem like this:
Preparing to refresh collection XYZ01234 12/11/2007 2:34:06 PM 26116 (0x6604) Refreshing results for collection XYZ01234 12/11/2007 2:34:06 PM 26116 (0x6604) Waiting for async query to complete, have waited 325 seconds already. 12/11/2007 2:37:06 PM Waiting for async query to complete, have waited 645 seconds already. 12/11/2007 2:42:26 PM Waiting for async query to complete, have waited 975 seconds already. 12/11/2007 2:47:56 PM Results refreshed for collection XYZ01234, 961 entries changed. 12/11/2007 2:53:38 PM
looking closely at it, there were a few odd things about it but definitely nothing out of the ordinary. just strange oddities like using subselect statements to make sure it had the sms client in add/remove programs. hmmmmmmm... since i'm using sms inventory for the collection criteria in the first place, doesn't that imply it had the sms client in arp at some point? :) anyway, the snippet is the modified version, without the subselect.
select SMS_R_System.ResourceID, SMS_R_System.ResourceType, SMS_R_System.Name,SMS_R_System.SMSUniqueIdentifier, SMS_R_System.ResourceDomainORWorkgroup, SMS_R_System.Client from SMS_R_System inner join SMS_G_System_SoftwareFile as Project on Project.ResourceID = SMS_R_System.ResourceId inner join SMS_G_System_ADD_REMOVE_PROGRAMS on SMS_G_System_ADD_REMOVE_PROGRAMS.ResourceID = SMS_R_System.ResourceId where (Project.FileName = "Winproj.exe" and Project.FileVersion < "11.0.2003" or SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName like "Microsoft Project% 2002")
i tried all manner of grouping the stuff together to make it run right, cycling collection evaluator over and over again because each failure meant run times that extended into torment! after a lot of slight query changes, nothing helped. i figured i'd dissect it to see where the problem was exactly.
this is what produced some very interesting results. after tearing the collection criteria apart, what i found was that each individual query would produce a very quick query execution. instead of the wql query above, i created these two separate ones.
Add/Remove Programs:
select SMS_R_System.ResourceID, SMS_R_System.ResourceType, SMS_R_System.Name,SMS_R_System.SMSUniqueIdentifier, SMS_R_System.ResourceDomainORWorkgroup, SMS_R_System.Client from SMS_R_System inner join SMS_G_System_SoftwareFile as Project on Project.ResourceID = SMS_R_System.ResourceId inner join SMS_G_System_ADD_REMOVE_PROGRAMS on SMS_G_System_ADD_REMOVE_PROGRAMS.ResourceID = SMS_R_System.ResourceId where SMS_G_System_ADD_REMOVE_PROGRAMS.DisplayName like "Microsoft Project% 2002"
Software Files:
select SMS_R_System.ResourceID, SMS_R_System.ResourceType, SMS_R_System.Name,SMS_R_System.SMSUniqueIdentifier, SMS_R_System.ResourceDomainORWorkgroup, SMS_R_System.Client from SMS_R_System inner join SMS_G_System_SoftwareFile as Project on Project.ResourceID = SMS_R_System.ResourceId inner join SMS_G_System_ADD_REMOVE_PROGRAMS on SMS_G_System_ADD_REMOVE_PROGRAMS.ResourceID = SMS_R_System.ResourceId where Project.FileName = "Winproj.exe" and Project.FileVersion < "11.0.2003"
as i said earlier, this had a dramatic effect on run time. it went from minutes to seconds. the answer was so simple. instead of using a single collection criteria, we'd add both. after running the new collection, i made sure collection evaluator showed me the same improved performance. the query finished its execution in 4 seconds. that's better!
so here you go. something like this produces much better results in this case. i'm still trying to figure out why.
for a fun exercise, i took some screenshots... but then realized, i can't use most of them since they don't really tell a story. ah well. i guess i have more to learn about this blogging stuff still. unfortunately for you, the story isn't over yet!
here i am again blogging about sms 2003 and mom 2005. yes, yes, i know... there are new technologies out which i unfortunately am not using yet. so for all of those less fortunate, yours truly included, there is still a great deal of value in what you've got running. the sms 2003 management pack for mom 2005 does not have any rules that look for long-running queries! even if you think you're a genius, you're bound to make a screw up sooner or later. if you have other people working with you, in the same environment, more opportunities abound... (and i don't mean people to blame).
well, that's a quick fix! all you need is a fairly easy regex expression and a new application log provider.
as you can see in the screenshot above, i've created a new provider and named it "sms 2003 collection evaluator log." the provider log type is "generic single-line log file". all you have to do next is point this to your sms log directory and specify the colleval.log file... and yes, just for you, i've got that screenshot too!
once you have this provider done, you'll need to create a new event rule and tie it to your sms computer group. in your event rule, your data provider should be the one you created above. just switch the provider name like this:
under the criteria section, switch to the advanced view since you won't have access to specifying regex matching. once in it, you'll need to populate it with this regular expression:
.*have[ ]waited[ ]([3-9][0-9][0-9]+|[1-9][0-9][0-9][0-9]+)[ ]seconds.*
it should look like this when you're done:
mom 2005 is a finicky little beast when it comes to regex statements. if you remember, the original statement in the log looks like the following:
Waiting for async query to complete, have waited 325 seconds already. 12/11/2007 2:37:06 PM
by finicky, i mean, mom will not handle spaces in a regex statement. if you put it in, you'll get a syntax error. obviously we need to put spaces in the text to look for... otherwise we'd be relying solely on the number of seconds. probably not a good idea since it's just a number value. to get around that just put a space inside of empty brackets like this -- [ ]. as far as the other brackets, this regex statement will pick up any value above 300 seconds. 300 seconds = 5 minutes which seems like a reasonable thing to look for. you can slide this out as necessary... but this isn't a tutorial on regex (nor am i qualified to give one, nor would you want to receive one from me). :)
you're set. now at least you'll know if sms is getting its face punched by badly written queries or some other oddity. at some point i'll look into why the original query took so long. it seems like this problem didn't crop up until after sp3... so maybe? i don't know. it's a stretch. if anyone knows, please leave a comment!
This comment has been removed by a blog administrator.
ReplyDelete