As a continuation of my Counting Instances of a Process saga from earlier this week my client came back to me and said "What if we wanted to know the total number of processes that were running for longer than 300 seconds?" Oy vey! Fortunately my intense introduction to PowerShell over the past week or so led me down many pathways that were dead-ends -- or so I thought! (Confucius say - "Many journeys in life appear to lead to the wrong place until we find ourselves in the right place only because had the wisdom of the more circuitous journey") Going back to my notes (you do keep notes when you are troubleshooting a problem, right?) I found that I had attempted a query that looked something like this:
SELECT ElapsedTime FROM Win32_PerfFormattedData_PerfProc_Process WHERE Name like '%notepad%' AND ElapsedTime > 6000
This query, out of the box, failed in a Windows 2008 SP2 server. A quick query of the Thwack forum found an aLTeReGo from way back in 2010 that said there was a registry that had to be enabled to allow access to the Win32_PerfFormattedData_PerfProc_Process instance.
by
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\PerfProc\Performance]
"Disable Performance Counters"=dword:00000000
After verifying that the above mentioned key was 1, changing and rebooting (don't forget to reboot) I ran the query again (after launching a bunch of notepad instances and letting them run for a bit) and the query worked. Here is what the entire query looks like now. For the detail on the query see the Counting Instances of a Process post. Note that in this particular instance the customer only wanted to know if the count of processes over 300 seconds was > 1 as all of the processes being monitored were transient and should never hang. In this case I commented out the check for the warning status, even though I still defined it in the script argument. I did this just for re-usability later as we are making the shift to more PowerShell with the deprecation of SNMP in Windows 2012.
We built the associated alert that says if component_name is critical then send message. (OK, it's a bit more complex than that over here, but you get the gist.)
I'll cleanse our components of any identifying data, add in some more documentation, and then upload to the Content Exchange.
$process = Invoke-Command -ComputerName ${Node.DNS} –Credential ${CREDENTIAL} -ScriptBlock {get-wmiobject -query "SELECT ElapsedTime FROM Win32_PerfFormattedData_PerfProc_Process WHERE Name like'%notepad%' AND ElapsedTime > 300"};
$warn = $args[1]; #Sets the warning threshold to the the second argument in the script arguments.
$crit = $args[2]; #Sets the critical threshold to the the third argument in the script arguments.
$count = ($process | Measure-Object).count;
switch ($count)
{
{$count -ge $crit} {Write-Host 'Statistic:' $count; exit(3); break}
# If the count is greater than or equal to the Critical Threshold the script returns the count and exits as Critical.
# {$count -ge $warn} {Write-Host 'Statistic:' $count; exit(2); break}
# If the count is greater than or equal to the Warning Threshold the script returns the count and exits as Warning.
default {Write-Host 'Statistic:' $count; exit(0)}
# If the process is running and does not match the passed Warning or Critical thresholds the script returns the count and exits as Up.
}