opalis: controlling maintenance mode with opalis, sccm, and scom
UPDATE: found a problem in the "retrieve ads and updates from sccm" script that was causing the script to stop working under certain conditions. I've modified it slightly and posted it at the bottom of the blog post.
WARNING: this is a proof of concept. don't just load this in your production environment and kick it off. you'll be totally on your own (as if that weren't the case already). while it works in my test environment, it may not in yours. test, test, test.
I've been spending some time toying around with opalis. the first hurdle, if you've been reading my posts, was in setting up the thing. the second hurdle was actually getting something useful to work, and the third hurdle was to figure out how to export something for use by communities. fortunately, I've cleared them all. the particular proof of concept I wanted to try was using opalis to integrate scom with sccm.
whenever we go through patching cycles, we tend to spend more time than necessary in scheduling maintenance modes for our servers. this goes for patches, software deployments, etc. since it's a proof of concept, I don't get to all the scenarios that are actually required to make this thing function in its entirety. I'll make sure to list the things I think are open items that we need to address. guess what? that's where you come in. I'm hoping you guys that are reading this will help out. it works in the exact scenario that I'll outline... but you know... there's always holes.
requirements
- system center operations manager 2007
- system center configuration manager 2007 or systems management server 2003 (should work)
- opalis integration server 6.x (yes, it must be configured AND working)
- opalis integration packs
- microsoft sms
- microsoft operations manager 2007
- windows powershell v2.0
scenario
to get a level set, I want to lay out exactly what this thing should do. this way, there are no lofty expectations or unexpected outcomes. :) that said, at a high level, the opalis action server will read out the upcoming advertisements and deployments from a sccm server, validate a few things (dates, times, exclusions, etc), retrieve the collections involved, retrieve the members of those collections, and then put them into maintenance mode. if this is interesting to you, we'll move on to the finer details of what I'm talking about. but to make it a little clearer, I'm going to add in some screenshots from the opalis policy that controls all this stuff.
how it works
okay, I promised screenshots so here we go.
so if you'll follow along, I circled where we're starting since it may not be entirely clear.
- Run Every 10 Minutes this object simply schedules policy to run every 10 minutes. it's configurable of course. you may want it modify based on your needs.
- Get Date from Start we need to get the value from this object so that we can use it later for comparisons. trust me. it'll make sense.
- Retrieve Ads and Updates from SCCM this is a powershell script that goes out and pulls all open advertisements (software) or deployments (patches).
- Write Discovered Advertisements self-explanatory, I think.
- Get Date from Script this is another date object that we pull from the script in step 3. we want to hold on to this so we can compare them.
- Compare Dates now we take the date from step 2 and the date from step 5 and compare them. if they match, that means the ad is scheduled to deploy on the same day. we'll discard anything that doesn't match so that we're not scheduling for things in a future date. I'm skipping the logging functions since that's self-explanatory as well.
- Check Duration on this step, we're looking to see if there's a time skew variable (get to that soon). if not, we're just going to use the default of 30 minutes.
on to the second half of the policy...
- Log the New Duration basically we're just writing to the log what the duration value is.
- Compare Time Skew this step is kind of interesting so I'll outline it.*
- get the time value from step 2 above and add the time skew figure to it. if the time value is 2:15 PM, we add time skew to it. assuming it's default of 30 minutes, the new value is 2:45 PM.
- get the time value from the ads/deployments in step 3.
- determine if the skewed time is greater than the ad/deployment time. if the ad time is 2:30 PM, we check if the skewed time of 2:45 PM is greater. if it is, then we assume it's safe to start.
- at the same time, determine if the current time is less than or equal to the ad time. after all, we don't need to have maintenance mode set for an advertisement every 10 minutes. if the ad time is 2:30 PM and the current time is 2:15 PM, cool. run it. if the current time is 2:35 PM, then don't run it again.
- if we can start, write that to a log and move on. if not, write that to a log and skip that ad.
- Get Collection Members this is where we go out and retrieve the members of each collection that are referenced in the eligible advertisements and deployments.
- Check Member Names in this step, we're sending each of the collection members (server names) through a powershell script. if they do not match the value in the variable RMS*, we send the server name through to the Maintenance Mode objects. if you notice the links labeled "safe", this is where this evaluation occurs.
- Start Maintenance Mode for Windows Computer / Healthservice Watcher these two steps are identical except we're sending the object into both of them since a healthservice watcher and windows computer object will exist for any server in scom.
- Junction this is utilized as a waiting point for both forks to complete the execution of placing machines in maintenance mode.
- Updated and finally, here we write the servers that were placed into maintenance mode.
* the reason I'm using time skew is so that opalis will catch advertisements before they're scheduled to start. let's say that you have the policy set to run every 10 minutes. you have an ad that's scheduled to start at 2:13 PM. your policy started it's execution at 2:10 PM, found nothing, and is waiting to start at 2:20 PM. with a time skew value, we're basically telling the opalis server that it should add 30 minutes to the execution time of the policy. so in this case, the opalis server believes it is 2:40 PM instead of 2:10 PM. since 2:40 PM is greater than 2:13 PM, it'll get started. coincidentally, the time skew value is also used to indicate how long the server should be placed into maintenance mode. a time skew of 30 minutes equals a maintenance mode duration of 30 minutes.
* it's a bad idea for the rms server to be placed into maintenance mode. if the rms server is inadvertently added to a collection for deployment, this step will not send the rms server to the maintenance mode objects. if there are multiple values for the rms variable, it's looped through so even clustered rms environments are safe (assuming you list all the cluster members).
setting it all up
- if your opalis environment is already set up, make sure to load up the integration packs listed in the requirements section above. you'll want those in place before importing an ois_export file that calls objects that aren't there. trust me. I've been there. It's easier this way. :)
- when you import your ois_export file, remove all checkboxes except for the policies and the variables. though I promise you, I removed these on export, it's probably a good practice to do this anyway just in case. you'll probably get a dialog screen indicating that you're going to overwrite your "ops console" variables. whether you choose to or not is up to you. if you choose not to, it'll just create a harmless, empty folder that you can remove.
- the next thing we should talk about are the variables. when you import the ois_export file, it should import all the variables that you need and place them into a folder called maintenance mode as shown here:
advertisement prefix this is used as a means of filtering our advertisements or deployments that you want to look at. for example, if you started all of your ads or deployments with MM: the script will only pull back those ads.
this is important because most environments will have ads that target workstations as well. there's no point in needlessly looping workstations through to opsmgr for maintenance.package the name used for the program or package doesn't really matter. what's important is that the program advertised has the right flag set. make sure that "disable operations manager alerts while this program runs" is set.
rms in this variable, set the rms name to guard it from being added to maintenance mode.
if you have a clustered rms environment, list each cluster member individually, in quotes, separated by a comma. ex: "myrms1","myrms2"site code this is the site code of your sccm environment site server this variable holds the name of your sccm server time skew this is where you can adjust the time skew value. it will default to 30 if none is supplied.
- once you have that squared away, there are some places you'll have to configure in the policy.
logging anywhere that a log item exists, you may want to change the default logging location. it's set to c:\temp\datetime.log. get collection members this step will require your credentials to the sccm server. I took the painstaking process of narrowing it down to just the necessary permissions for the action service account. I've outlined this below, if you're interested.* start maintenance mode both of these objects will require you to define the connection to the opsmgr server. this is defined in the supporting documentation for the opsmgr integration pack if you need help.
additionally, "mydomain.com" will need to be replaced with your domain suffix on the monitor line.
Security Permissions for SCCM
I mentioned having to correct permissions for the action account above in the get collection members step. this is what I had to do to make it work:
- dcom permissions adjustment (on the sccm server)
- launch dcomcnfg.
- navigate to component services \ computers \ my computer. right-click my computer, choose properties.
- under the com security tab, click edit limits in both sections.
- grant the following rights to the ois action account:
- remote access
- remote launch
- remote activation
- navigate to the dcom config section under my computer, locate windows management instrumentation.
- right-click windows management instruction, choose properties.
- under the security tab, click edit under the launch and activation permissions section.
- grant the ois action account the following permissions:
- remote launch
- remote activation
- sccm permissions
- in the configuration manager console, grant the ois action account the following permissions:
collections read read resource advertisement read deployment read package read
- in the configuration manager console, grant the ois action account the following permissions:
additional stuff to fix
- I think I put enough checks in place to keep the ad or deployment from needlessly getting reevaluated throughout the day and having maintenance mode set over and over again. however, if your time skew is sufficiently large enough, say 1 hour, and your scheduling object is set to run every 10 minutes, you would be attempting to put the same machines into maintenance several times during the same hour until the current time lapses the ad time.
- the time skew challenge really needs to be adjusted. I haven't figured out what the right formula is since the equipment I've been using to do all this is lab equipment. it's not exactly the fastest stuff. I also have the sql server running on the same server. in a production scenario, this would all be separated.
- I haven't done any timing tests to determine how fast machines actually go into maintenance mode. the scheduling (every 10 minutes) and the time skew are critical here based on how fast opalis can drop machines into maintenance mode and how many machines are expected in any given collection. obviously the schedules have to be far enough apart to allow for adequate processing.
where to get stuff
I've posted the file to my skydrive:
updated script
it's a real pain to do exports so for now, I'm putting in the modified script I mentioned at the top of the post below:
$mySCCMServer = "\`d.T.~Vb/{2E9411B1-8303-40F4-AF6F-0D914047D89A}\`d.T.~Vb/"
$myNamespace = "root\sms\site_\`d.T.~Vb/{D30862CE-1972-40CF-AAC5-B17FAE687E3C}\`d.T.~Vb/"
$myAdvPrefix = "\`d.T.~Vb/{4644F218-B446-4519-9E7F-E806969BD13D}\`d.T.~Vb/"
$myAds = Get-WmiObject -ComputerName $mySCCMServer -Namespace $myNameSpace `
-Query "select * from sms_advertisement where advertisementname like '$myAdvPrefix%'"
$myPrgs = Get-WmiObject -ComputerName $mySCCMServer -Namespace $myNameSpace `
-Query "select * from sms_program"
$myPkgs = Get-WmiObject -ComputerName $mySCCMServer -Namespace $myNameSpace `
-Query "select * from sms_package"
$myColls = Get-WmiObject -ComputerName $mySCCMServer -Namespace $myNameSpace `
-Query "select * from sms_collection"
$myDeployments = Get-WmiObject -ComputerName $mySCCMServer -Namespace $myNameSpace `
-Query "select * from sms_updatesassignment where assignmentname like '$myAdvPrefix%' and disablemomalerts = 1"
$finalCollection = @()
$finalDate = @()
$finalDuration =@()
if ( $myAds -ne $null ) {
foreach ($ads in $myAds) {
foreach ($prgs in $myPrgs) {
if ($ads.packageid -eq $prgs.PackageID -and $ads.programname -eq $prgs.programname) {
if ($prgs.ProgramFlags -band [math]::pow(2,5)) {
$AdInst = Set-WmiInstance -Path $ads.__PATH
if ( $($adinst.assignedschedule).starttime -ne $null ) {
$AdDate = ([management.managementdatetimeconverter]::todatetime($($adinst.assignedschedule).starttime))
foreach ($colls in $myColls) {
if ( $ads.collectionid -eq $colls.collectionid ) {
$finalCollection += $colls.name
}
}
$finalDate += $AdDate
$finalDuration += $prgs.Duration
}
}
}
}
}
}
if ( $myDeployment -ne $null ) {
foreach ($Deployment in $myDeployments) {
foreach ($colls in $myColls) {
if ( $Deployment.TargetCollectionid -eq $colls.collectionid ) {
$finalCollection += $colls.name
}
}
$finalDate += ([management.managementdatetimeconverter]::todatetime($Deployment.EnforcementDeadline))
$finalDuration += 60
}
}
if ( $finalDate -ne $null ) {
$continue = "Y"
}
Hi Marcus,
ReplyDeleteThis was just the kind of thing I was looking for....A step by step approach with pics explaining what to do. Thankyou very much!!
John Bradshaw
glad you found it useful, john. :) thanks for commenting...
ReplyDeletegreat post, wish I could see the pictures. work must be blocking that host.
ReplyDelete-bryce
Nice post.
ReplyDeleteBut I can't get it to work with our Maintenance Windows that we have set up on server Collections for software updates. The policy compares the value when the maintenance window was first active and not the actual next time it occurs. Do you have any tips on how I would solve this?
i'm only a year late on this reply. that's okay right? (sorry i didn't see it.)
Deletehow are you reading out the values from the maintenance window? i suppose it would be a matter of getting a hold of the right value.
Hi,
ReplyDeleteCan we put the servers into maintenance mode via email (Using Opalis) Is these feature supported
Harry
yes, i suppose you could. i've been using the exchange mail community ip from ryan andorfer with pretty good success. here's a link: http://scorch.codeplex.com/
DeleteHi,
ReplyDeleteBy using Opalis exchange integration pack, we can monitor/read email when user sends an email with server info to put into SCOM maintenance mode. My question is how to create a runbook to pull those info and put the server in Maintenance mode.
Can you please put some light on this. As i am newbie in opalis
Thanks in Advance
Mwasaka
hi mwasaka -
Deletedrop by codeplex and get the exchange mailbox IP. it has some objects that let you read out emails. I would do that and send out the server names to the workflow that will put the server into maintenance mode.
that is also a great idea that I hadn't thought of. I think youd have to use a form if you want to allow a specification of date/time. hmmmm. :-)
Hi Marcus,
DeleteThanks for your prompt reply.
As i am a newbie in Opalis, can you give any reference to accomplish this task. This would be very helpful for me
Thanks in advance
Mwasaka
Hello Marcus,
ReplyDeleteThat is a great post you got!!
I was wondering if it is possible to target all windows computers to be in Maintenance Mode for a particular monitor using the start maintenance mode activity in SCOM 2007 R2 integration pack.
In our environment everyday between 2-5 am, the servers are heavily loaded due to backup and archiving activites that keep running on them. Therefore, generating too many memory and CPU utilization alerts.
I have been trying to use the Start Maintenance Mode activity to put Microsoft.Windows.OperatingSystem:servername.mydomain.com which successfully puts the Operating System into Maintenance Mode for particular server.
I have more than 1000 servers we are monitoring in our environment. Is there a way I can put the "Available Megabytes of Memory" or "CPU Utilization" monitor in Maintenance Mode (which will target all the servers). I have tried everything that is available in the SCOM Integration Pack and for everything I try i get the error "Failed to get Monitor. The exception was "An object of type MonitoringObject with Id 00000000-0000-0000-0000-000000000000 was not found".
I have made the monitor field (inside the Start Maintenance mode activity" read every line from Read File activity (my file contains all the servers) and also using the Get Monitor Activity where the Monitor field inside the Start Maintenance Mode activity reads the name of the Monitor.
Any help in this regard will be highly appreciated.
Thanks in Advance !!!
Abdul
the challenge with targeting is that you can only target the windows computer object and the health service watcher. things like you pointed would have to be chased back to windows computer object.
Deletei would use the get computer object or something like that to pull the list that you're talking about... and then target them with that. of course, there are easier ways than using opalis or orchestrator to do this. tools exist that let you place whole sets of machines into maintenance mode. if that's the only thing you really want to achieve, you should look into that...
Hello Marcus,
ReplyDeleteThank you for the prompt Reply. It is not the machines that I want to target. It is basically the Memory and CPU monitors I want to target( Microsoft.Windows.OperatingSystem) so that all the machines in my environment dont alert me between 1-5 am.
Like I said I have also tried using the Get Monitor and also made the Windows.Computer.OperatingSystem using a Read File Activity. However, in all the cases I get acvitity aborted with the exception mentioned above.
Cheers
Abdul
here's another variation done in orchestrator: http://www.systemcentercentral.com/tabid/143/indexid/93523/default.aspx
Deleteand anders' example:
http://contoso.se/blog/?p=2164
maybe you can glean something off of that. i think the primary issue is that you have to target the monitor properly by targeting the correct xpath. you could try using the ellipsis to see if that shows you the right path to try.
thank you for the link marcus. I will update you once I try it out in my lab.
ReplyDeleteCheers!
Regards,
Abdul
Hello Marcus,
ReplyDeleteI have been co-ordinating with Mattew ( from this systemcentercentral post) regarding Maintenance Mode for a monitor. As per his explaination you cannot put a particular monitor in maintenance mode, but you can objects of a particular class in Maintenance. This is what I was able to achieve through my Runbook and this is the closest you can get to place a monitor in Maintenance Mode.
Thank You for your support and assistance!
Cheers!
Abdul
right on. i did a poor job of explaining that earlier. i hope to see your finished runbook at some point. sounds very cool :)
Delete