Using Workloads in Sightline

Workloads are a powerful tool in the Sightline software suite. But what exactly are workloads, and what can you do with them? We’d like to take this opportunity to explore workloads… how workloads are defined, what information is provided by the Sightline Power Agent for each workload, and look at some examples. The examples shown here are from a Windows system, but the basic concepts apply to all platforms that can be monitored using a Sightline Power Agent.

Using workloads

Workloads allow you to combine processes into logical groups for reporting and tracking resource utilization. Workloads are defined in the Power Agent’s configuration file, and are normally mutually exclusive. Workloads are evaluated when the data is collected by the Power Agent.

One purpose for developing these groups is to evaluate performance. You may want to be able to answer questions such as how much CPU a particular group of users is using. Do the order entry (or other) people have enough memory? How much I/O is the database application really using? In many cases, you’ll find that the 80/20 rule applies; that is, 80% of the work is done by 20% of the users of applications. Your objective should be to define your workloads so that the Power Agent delivers meaningful data for the groups that have an impact on the system.

Defining workloads

All Power Agents have default workloads. You may want to remove some of the defaults and add your own. The important thing to remember is that capturing and reporting workload measurements is essentially a two-step process. First, decide which processes fit into which logical bucket, or workload. A good way to learn about your system’s overall workload is to open the Task Manager, or use the ls command on a UNIX system.

Sample workload definition

For Power Agents that use an AGENT.XML configuration file, workloads are defined in COMP_CRITERIA statements. Note that COMP_CRITERIA statements are evaluated in order, and the last COMP_CRITERIA statement must be for the Other workload.

These sample workloads are defined for a Windows 2008 system that is deployed as a Sightline data collection server. It has three Sightline products: a Power Agent, EA/V, and EDM. EDM is a Java-based product, so we have a Java workload defined.

    <COMP_CRITERIA>

        <NAME>Sightline</NAME>

        <CRITERION>{ Instance Name } = /agentmgr/ OR 

               { Instance Name } = /datamgr/ OR 

               { Instance Name } = /servd/ OR 

               { Instance Name } = /protomgr/ OR 

               { Instance Name } = /threshd/ OR 

               { Instance Name } = /summarizer/ OR

               { Instance Name } = /slaaListener/

        </CRITERION>

    </COMP_CRITERIA>

    <COMP_CRITERIA>

        <NAME>Expert Advisor Vision</NAME>

        <CRITERION>{ Instance Name } = /^[Ee][Aa][Vv]/</CRITERION>

    </COMP_CRITERIA>

    <COMP_CRITERIA>

        <NAME>Java</NAME>

        <CRITERION>{ Instance Name } = /^java/</CRITERION>

    </COMP_CRITERIA>

COMP_CRITERIA uses special characters within regular expressions. In COMP_CRITERIA statements, the carat ^ indicates “begins with” and the dollar sign $ indicates “ends with.” No indicators mean “contains.” Workloads definitions are case-sensitive, so you can use the [Xx] syntax to indicate that a character can be either upper or lower-case.

Workload metrics on Windows systems

On Windows systems, the default workload metric list includes these metrics:

   —Baseline-Workloads— 

   WL-%CPU[]        WL-PgFile Peak[]     WL-Total[] 

   WL-%User[]       WL-PgFile Bytes[]    WL-IOWriteOps/s[] 

   WL-%Priv[]       WL-Private Bytes[]   WL-IODataOps/s[] 

   WL-VirtPeak[]    WL-Threads[]         WL-IOOtherOps/s[] 

   WL-VirtBytes[]   WL-PoolPgd[]         WL-IOReadBytes/s[] 

   WL-PgFlt/Sec[]   WL-PoolNonPd[]       WL-IOWriteBytes/s[] 

   WL-WSPeak[]      WL-Handles[]         WL-IODataBytes/s[] 

   WL-WS[]          WL-IOReadOps/s[]     WL-IOOtherBytes/s[]

The brackets at the end of each metric indicate that it’s an array metric; that is, it is reported for each member of the array. In this case, the array is the group of workloads that were defined in the Power Agent’s configuration file.

Note the metric called WL-Total. This is the number of processes that were found to be members of the workload.

Displaying workload metrics

1020_b

There are workload metrics reported by all Power Agents, regardless of the platform. They are normally included in a specific Workloads metric group. The metrics will be array metrics, with the workload name as the subscript. Using either EA/V or EDM, look for the Workload group to display workload metrics. (On ClearPath OS 2200 systems, workload metrics are included in the System Log metric group.)

What can workloads tell us?

But what can workloads tell us? Looking at the Workload CPU Utilization chart below, you can see that there is a lot of activity in the middle of the display. We’re actually more interested in the dip in the display about a third of the way across the X axis. This is about 10:35 am.
1020_c

Memory utilization dropped, as well. Who is using the most memory on this system? Java, as shown in the green metric in the Real Memory Consumption by Workload chart.
1020_d

Looking deeper into the workloads, we can plot threads and processes per workload.
1020_e

Notice that that there was a drop in the number of processes for the Java workload, and also the Sightline workload.
1020_f

A close look at the processes active at 11:38:20 shows us that there were six active Sightline processes and 2 active Java processes.
1020_g

At 11:39:10, though, there are only five Sightline processes active, and no Java processes. What happened? EDM was stopped (the two Java processes), and the protomgr process that that was feeding data to EDM stopped as well.

Alerting on workload metrics

In both EA/V and EDM, you can create an alert on any workload metric. For example, we can use the WL-Total for Sightline metric to be notified if one or more of the processes in the Sightline workload stops.

Watching for long-term trends

We might also want to see how a workload has behaved over time. In this plot, you can see CPU utilization of the Sightline workload over an 8-month period.
1020_h

These are both production systems with 16 cpus. But what happened in July? The physical configuration did not change. However, a new version of the Sightline Power Agent, with performance enhancements, was released and installed on these systems. As you can see, the enhancements worked!
In summary, workloads can be used for many purposes, in both real-time and over time. The key, though, is to define your workloads appropriately and accurately, and maintain the workload definitions as new applications are added to your systems.

Ask John: JBoss tmp directory maintenance

Are you running EDM? As part of the normal operation of the JBoss container that Sightline uses, it creates temporary files in the jboss-as-7.1.1.Final/standalone/tmp folder. With each restart of EDM (and thus the JBoss container) the temp files are created anew, in new folders. Over time, this can become a large consumption of space, many gigabytes.