JAWS Performance Metrics

JAWS tracks performance metrics by analyzing HTCondor history logs collected across all compute sites. These logs are shipped via Filebeat to a centralized Elasticsearch backend, where they are indexed and made searchable. This document explains the key metrics shown on the JAWS Dashboard and how they can be used to assess compute and memory resource utilization.

Units

  1. Cores: Logical compute cores (or threads).

  2. Seconds: Real time that passes on the clock while the job runs.

  3. Core-seconds: Total time all cores combined spent actively working on the job.

    Example: A job running for 60 seconds using 2 cores = 120 core-seconds.

  4. GB: Gigabytes of memory used.

Metrics

  1. RequestCores

    • Unit: Cores (Integer)

    • Definition: Logical cores (or threads) requested by the user.

  2. CommittedSec

    • Unit: Seconds (Integer)

    • Definition: The total wall-clock time the job spent running successfully on a machine (i.e., it finished without failing).

    • Includes core time + idle time + I/O waits, etc.

    • Excludes time spent on failed attempts, retries, or time in the queue.

  3. ActiveComputeSec

    • Unit: Core-seconds (Integer)

    • Definition: Total time cores spent actively computing for the job

  4. AvgComputeCores

    • Unit: Cores (Float)

    • Definition: An estimation of how many cores were utilized on average during a job’s final, successful run.

    • Formula: AvgComputeCores = ActiveComputeSec / CommittedSec

    Example 1: Mixed Concurrency Over Time

    • A job used 1 core for 60 seconds, then 2 cores for 60 seconds.

    • ActiveComputeSec = (1 core × 60 s) + (2 cores × 60 s) = 180 Core-seconds

    • CommittedSec = 120 seconds

    • AvgComputeCores = 180 / 120 = 1.5 cores

    • Accurate representation of concurrency

    Example 2: Consistent High Utilization

    • A job used 4 cores continuously for 90 seconds.

    • ActiveComputeSec = 4 × 90 = 360 Core-seconds

    • CommittedSec = 90 seconds

    • AvgComputeCores = 360 / 90 = 4.0 cores

    • Accurate representation of concurrency

    Example 3: Burst Followed by Idle

    • A job used 4 cores for 10 seconds, then was idle for 590 seconds.

    • ActiveComputeSec = 4 × 10 = 40 Core-seconds

    • CommittedSec = 600 seconds

    • AvgComputeCores = 40 / 600 = 0.066 cores

    • Despite briefly using 4 cores concurrently, the AvgComputeCores was very low due to extended idle time

    • This does not reflect peak concurrency

    Example 4: Declining Core Usage

    • A job used 4 cores for 30 seconds, then 1 core for 90 seconds.

    • ActiveComputeSec = (4 × 30) + (1 × 90) = 120 + 90 = 210 Core-seconds

    • CommittedSec = 120 seconds

    • AvgComputeCores = 210 / 120 = 1.75 cores

    • Concurrency declined over time

    Example 5: Underutilization Despite High Request

    • A job requested 4 cores, but consistently used only 2 cores for 300 seconds.

    • ActiveComputeSec = 2 × 300 = 600 Core-seconds

    • CommittedSec = 300 seconds

    • AvgComputeCores = 600 / 300 = 2.0 cores

    • Underused cores, despite AvgComputeCores being > 1.

  5. ComputeUseFactor

    • Unit: Unitless

    • Definition: The fraction of requested cores that were actively used for computing (on average) during the job’s successful run.

    • Formula: ComputeUseFactor = AvgComputeCores / RequestCores

  6. NonComputeSec

    • Unit: Seconds (Integer)

    • Definition: The portion of the job’s runtime not actively spent on computation, including time spent in I/O waits, sleeping, blocking, or other non-CPU-bound activities.

    • Formula: NonComputeSec = CommittedSec (ActiveComputeSec)

    Note

    This metric is not currently calculated but may be added in a future update.

  7. Low ComputeUseFactor (i.e., low average core usage relative to RequestCores) does not necessarily imply low application code efficiency. However, if it is consistently low,

    • The workload is I/O-bound or memory-bound

    • The job is not parallelized efficiently

    • Fewer cores should be requested (RequestCores may be over-provisioned).

  8. PeakMemoryGB

    • Unit: GB (Float)

    • Definition: The peak memory used by the job during its successful run.

  9. RequestMemoryGB

    • Unit: GB (Float)

    • Definition: The amount of memory requested by the job.

  10. MemoryUseFactor

  • Unit: Unitless

  • Definition: The ratio of peak memory used to memory requested during the job’s successful run.

  • Formula: MemoryUseFactor = PeakMemoryGB / RequestMemoryGB

  1. Low MemoryUseFactor suggests memory over-allocation. In such cases, users are encouraged to reduce their memory requests.

HTCondor Attribute Mapping (Optional)

Note

This section is intended for users interested in understanding which HTCondor attributes JAWS metrics are based on. Most users can safely ignore this.

HTCondor Attribute Mapping

JAWS Metric

HTCondor ClassAd Field

RequestCores

RequestCpus

CommittedSec

CommittedTime

ActiveComputeSec

RemoteUserCpu + RemoteSysCpu

PeakMemoryGB

MemoryUsage (converted to GB)

RequestMemoryGB

RequestMemory (converted to GB)