O R G A N I C / F E R T I L I Z E R: 09.08

Sep 30, 2008

dsmod bug when using the –c option?

UPDATE: thanks to some anonymous commenters, i have corrected my example in this post. it seems i left off the trailing %a in the for loop! oops. fixed now.

i was visiting up in roanoke extolling about the boundless possibilities with command shells, scripting, etc to a near liability.  in other words, i bored them nearly to death.  :)

to my surprise, it stuck.  i’ve been exchanging conversation with one of the site admins and ran across this bug while running through a sample scenario on listing members from one group and adding them to another.  typically, you could do this quite easily with the dsquery tool set.

it looks something like this:

dsquery group -name "myGroup" | dsget group -members | dsmod group "cn=myNewGroup,ou=etc,dc=etc,dc=etc" -addmbr –c
 
so what are we doing here?
  1. dsquery group –name “myGroup” – retrieves the dn of the group
  2. dsget group –members – retrieves the membership list (dn) of the group passed through the pipe
  3. dsmod group “cn=mynewgroup…” –addmbr –c – adds the members of the previous group into specified group.

this works fine as long as there are no conflicts.  if you run into conflicts, the process bombs out with this error:

dsmod failed:CN=myNewGroup...:The specified account name is already a member of the local group.
 
the –c option specifed above should go right past this condition and keep trying other members.  it doesn’t work no matter what position you place it, however.  to get around this, you can use for looping.  :)
 
for /f "delims=" %a in ('dsquery group -name "myGroup" ^| dsget group -members') do dsmod group "cn=myNewGroup..." –addmbr %a
 
so how is this different?
  1. for /f “delims” %a in (‘dsquery…’) – retrieves the membership list of the group and assigns them as a token value of %a
  2. dsmod group “cn=myNewGroup…” –addmbr – for each member, we’re adding them individually to the group.

in this case, even if we run into failures, it doesn’t matter since we’re kicking off dsmod as separate commands each time.

and of course, to do this in powershell, you’d execute a command like this:

Get-QADGroupMember "myGroupName" | foreach {Add-QADGroupMember -identity "CN=myNewGroup..." -member $_}

Sep 16, 2008

verifying replication failure with admp and mom 2005

you’ve no doubt seen this error message if you’re monitoring active directory replication.

The following DCs have not updated their MOMLatencyMonitor objects within the specified time period (8 hours). This is probably caused by either replication not occurring, or because the 'AD Replication Monitoring' script is not running on the DC.

Format: DC, Naming Context, Hours since last update

My-Site
myDCserver, NDNC:DC=DomainDnsZones,DC=myDomain,DC=com, 16
 

typically, this error is generated when a DC is no longer replicating.  the ADMP script watches changes to an attribute called adminDescription.  under the container MOMLatencyMonitors off the root of the watched naming context, exist objects that represent all of the DCs for that naming context.

for example:

myDCserver, NDNC:DC=ForestDnsZones,DC=myDomain,DC=com, 9
 

this statement indicates that the domain controller myDCserver has not replicated the required value for 9 hours or more in the naming context of DC=ForestDnsZones,DC=myDomain,DC=com.  there are two places this can fail:

  1. the domain controller is having trouble replicating.
  2. the MOM Agent is not operating correctly to write to the adminDescription.

to narrow down the problem, follow the steps below.

the domain controller may be having trouble replicating.

to validate this condition, we can use repadmin.  issuing the following command gets us some usable data.

repadmin.exe /showrepl myDCserver
 
DC=ForestDnsZones,DC=myDomain,DC=com
    mySite\myDCserver2 via RPC
        DC object GUID: 67x4141y-x526-45xy-x32y-8x04yx041yx7
        Last attempt @ 2008-09-16 09:46:41 was successful. 
 

it's important to pay attention to the naming context that was specified.  in this case, we see that the last attempt was successful and very close to the current timeframe.  this indicates that replication is not the issue.

 

the mom agent is not operating correctly to write to the adminDescription.

as stated above, the admp script to check replication uses the objects in these containers to handle a type of synthetic replication.  for the mom agent running the script, it writes to its own object's adminDescription attribute.  in order to see where a problem may exist, we can utilize dsquery to list the current attributes for all objects in the naming context of ForestDnsZones.

dsquery * cn=momlatencymonitors,dc=forestdnszones,dc=cox,dc=com -scope onelevel -attr name admindescription
 

and receive the following results:

name admindescription
myDCserver 20080916.0301
myDCserver1 20080916.1301
myDCserver2 20080916.1401
myDCserver3 20080916.1501
myDCserver4 20080916.1301
myDCserver5 20080916.1101
myDCserver6 20080916.1301
myDCserver7 20080916.1301
myDCserver8 20080916.1201

from the results, we can determine that this time, the replication alert is actually a mom error as noted by the delta between myDCserver and any other in the list.

Sep 15, 2008

restarting services and terminating processes with mom 2005

this particular example is for softgrid.  i thought it might be useful to generalize it for any purpose, though.  you probably already have services that may require a restart every now and then.  that’s pretty easy in mom.  you can do it by issuing a simple net stop && net start command as illustrated in this post.

the general perception is that admins are lazy.  to help perpetuate this obvious lie, i tried to use the simple method above but failed.  it turns out that some services don’t terminate the processes upon stopping, as you would expect.  short of trying some ridiculously long for loop statements inside of the batch response, you have to go with a script.

i really did consider going with batch script but ended up needing a bit more flexibility.  for instance, instead of blindly going through the cycle, i wanted to make sure we were still in the given condition before we went ahead with it.  to do that, we have to check the process utilization state.  anyway, the script does the following:

  • examines process(es) and the processor utilization rate
  • stops the service
  • terminates the running process(es)
  • starts the service
  • creates a log output
  • stamps the alert description with informative data

of course, we need to give it the process and service name we want it to attack.  for that, you’ll need the following parameters when you set up this script in mom.

  • sProcess – process name
  • iThreshold – threshold that the process utilization must be above
  • sService – service name to restart
  • sLogName – name of log file to generate

a bit more minutia – the script will check the process utilization 10 times in a row, then divide by 10 for the average.  if the average is above the threshold, it goes through the cycle to reset the thing.  you can change all that crap around in the script but is not exposed by parameter.

we’re in testing with opsmgr, so whenever we go live, i’ll have to convert these scripts.  i’ll post them in opsmgr format as i get them prepared.  for now, here’s the mom 2005 version:

'==========================================================================
' NAME: Service/Process Restart
'
' AUTHOR: Marcus C. Oh
' DATE  : 9/15/2008
'
' COMMENT: Recycles runaway processes and services based on a threshold
'          Logs to %windir%\temp directory
'
' VERSION: 1.0
'==========================================================================

' Standard event constants
Const EVENT_TYPE_SUCCESS = 0
Const EVENT_TYPE_ERROR   = 1
Const EVENT_TYPE_WARNING = 2
Const EVENT_TYPE_INFORMATION = 4

' Parameters for MOM
sProcess = ScriptContext.Parameters.Get("Process")
iThreshold = CInt(ScriptContext.Parameters.Get("Threshold"))
sService = ScriptContext.Parameters.Get("Service")
sLogName = ScriptContext.Parameters.Get("LogName")

sComputer = "."
bCycle = False

Set oAlert = ScriptContext.Alert


' Spin up the File System provider and create the log file
Set oShell = CreateObject("Wscript.Shell")
sWinDir = oShell.ExpandEnvironmentStrings("%WinDir%")
Set oFS = CreateObject("Scripting.FileSystemObject")
Set myLogFile = oFS.CreateTextFile(sWinDir & "\temp\" & sLogName,True)


' Spin up WMI
Set oWMIService = GetObject("winmgmts:\\" & sComputer & "\root\cimv2")


' Check the process from the parameter to see if the utilization 
' is currently above the indicated threshold.

myLog "[Starting process cycling...]"

'Set oPerfData = ScriptContext.Perfdata
myLog VbCrLf & vbTab & "Checking process(es) for: " & sProcess

Set cProcessNames = oWMIService.ExecQuery("Select handle from Win32_Process Where Name like '" & sProcess & "%'")
For Each oProcName In cProcessNames
    iLoop = 0
    iProcTime = 0
    myLog vbTab & "Examining process handle " & oProcName.handle
    While iLoop < 10
        Set cProcesses = oWMIService.ExecQuery("Select * From Win32_PerfFormattedData_PerfProc_Process Where IDProcess = '" & oProcName.handle & "'")
        For Each oProcess in cProcesses
            iProcTime = iProcTime + CInt(oProcess.PercentProcessorTime)
            myLog vbTab & oProcess.Name & " utilization aggregate - " & iProcTime & " (sample value - " & CInt(oProcess.PercentProcessorTime) & ")"
        Next
        iLoop = iLoop + 1
        mySleep(1000)
    Wend
    
    myLog vbTab & "Aggregate utilization for process handle " & oProcName.handle & " - " & iProcTime
    
    If iProcTime/10 > iThreshold Then
        myLog vbTab & "Process utilization matches criteria."
        myLog vbTab & "Divided by 10 - " & iProcTime/10
        bCycle = True
        Exit For
    Else
        myLog vbTab & "Process utilization at " & iProcTime/10 & " does not exceed threshold of " & iThreshold & VbCrLf
    End If
Next

If bCycle = True Then
    ' Stop the service.
    Call CommandService(sService,"Stop")
    mySleep(5000)
    
    
    ' Terminate all running processes.
    If VerifyService(sService,"Stopped") Then
        myLog VbCrLf & vbTab & sService & " has stopped successfully."
        myLog VbCrLf & vbTab & "Terminating process(es): " & sProcess
        Call TerminateProcess(sProcess)
    End If
    mySleep(5000)


    ' Start the service.
    Call CommandService(sService,"Start")
    mySleep(10000)
    
    
    'Verify the service started.
    If VerifyService(sService,"Started") Then
        myLog vbTab & sService & " has started successfully."
    Else
        myLog vbTab & sService & " has failed to start."
    End If

    
    ' Rewrite the original description with additional data.
    oAlert.Description = oAlert.Description & VbCrLf & VbCrLf &_
        "Remediation script for runaway processes has been executed." &_
        "Please review the following log for details: " & sWinDir & "\temp\" & sLogName
Else
    myLog vbTab & "Process utilization exceed threshold."
    
    ' Rewrite the original description with additional data.
    oAlert.Description = oAlert.Description & VbCrLf & VbCrLf &_
        "No remediation attempt required."
End If

myLog VbCrLf & "[Stopping process cycling...]"

' Close out the file
myLogFile.Close


' Subs and Functions ------------------------------------------------------

' Start/stop the service
Sub CommandService(sService,sAction)
    Set cServices = oWMIService.ExecQuery("Select * from Win32_Service where Name='" & sService & "'")
    For Each oService in cServices
        myLog VbCrLf & vbTab & sAction & " -- " & sService
        If sAction = "Stop" Then
            oService.StopService()
        ElseIf sAction = "Start" Then
            oService.StartService()
        End If
    Next
End Sub

' Verify the service state
Function VerifyService(sService,sState)
    Set cServices = oWMIService.ExecQuery("Select * From Win32_Service Where Name ='" & sService & "'")
    For Each oService in cServices
        If oService.State = sState Then
            VerifyService = True
        End If
    Next
End Function

' Terminate the processes
Sub TerminateProcess(sSGProcess)
    Set cRunningProcesses = oWMIService.ExecQuery("Select * from Win32_Process Where Name like '" & sSGProcess & "%'")
    For Each oRunningProcess in cRunningProcesses
        oRunningProcess.Terminate()
    Next
End Sub

' General sleep sub to switch between MOM and cmd line
Sub mySleep(iSleep)
    ScriptContext.Sleep(iSleep)
End Sub

Sub myLog(sData)
    myLogFile.WriteLine(sData)
End Sub

' Standard Event creation subroutine
Sub CreateEvent(iEventNumber,iEventType,sEventSource,sEventMessage)
    Set oEvent = ScriptContext.CreateEvent()
    oEvent.EventNumber = iEventNumber
    oEvent.EventType = iEventType 
    oEvent.EventSource = sEventSource
    oEvent.Message = sEventMessage
    ScriptContext.Submit oEvent
End Sub

Sep 3, 2008

troubleshooting device drivers with dpc problems

another little gem.  here’s what you need and some highlights:

process explorer
kernrate
hklm\system\currentcontrolset\services

  1. run process explorer
    • open the DPCs property
    • check the performance graph - see if it’s high
  2. if it is, run kernrate for 30-60 seconds
    • ctrl-c to escape and view the results
    • the offending item should be at the top or close
  3. find the subkey associated with the offending item
    • path is noted above
    • modify the “start” value to 4 in order to disable it.  (at your own risk)

thanks to steven daugherty … read the full article in windows it pro.

resetting windows socket layer with netsh

this comes out of windows it pro. cool enough to save. here’s a link to the related article. the command you issue is:

netsh winsock reset
 
thanks to apostolos fotakelis