GPU Usage Guide for WDL Workflows

This guide shows WDL users how to configure GPU resources in JAWS workflows using the runtime stanza.

Quick Start: Test GPU Access

Use this minimal WDL to verify GPU configuration before running production workflows:

version 1.0

workflow GPU_Quick_Test {
  call test_gpu
}

task test_gpu {
  command <<<
    nvidia-smi || echo "nvidia-smi not found"
    python3 -c "import torch; print('CUDA:', torch.cuda.is_available())"
  >>>

  output {
    File log = stdout()
  }

  runtime {
    docker: "pytorch/pytorch:latest"
    memory: "4GiB"
    cpu: 1
    gpu: true
    runtime_minutes: 10
  }
}

Expected output: nvidia-smi shows GPU info and CUDA: True

If this fails, see the Troubleshooting section below.

Runtime Stanza: GPU Attributes

GPU Configuration Attributes

The runtime stanza supports two GPU-specific attributes:

runtime {
  gpu: true         # REQUIRED to enable GPU allocation
  gpuCount: 1       # OPTIONAL, defaults to 1 when gpu is true
}

Attribute Details

Attribute

Type

Default

Description

gpu

Boolean

false

Set to true to request GPU resources. Required for any GPU access.

gpuCount

Int

1

Number of GPUs to allocate. Only applies when gpu: true. Most tasks should use 1.

Runtime Stanza Examples

Minimal GPU Runtime (Recommended Starting Point)

runtime {
  docker: "pytorch/pytorch:latest"
  memory: "16GiB"
  cpu: 4
  gpu: true                    # Enable GPU
  runtime_minutes: 60
}

This requests 1 GPU (default when gpu: true).

Explicit Single GPU

runtime {
  docker: "pytorch/pytorch:latest"
  memory: "16GiB"
  cpu: 4
  gpu: true
  gpuCount: 1                  # Explicit, same as omitting gpuCount
  runtime_minutes: 60
}

Multiple GPUs (Only if your code supports multi-GPU)

runtime {
  docker: "nvcr.io/nvidia/pytorch:24.01-py3"
  memory: "64GiB"
  cpu: 16
  gpu: true
  gpuCount: 4                  # Request 4 GPUs
  runtime_minutes: 240
}

⚠️ Warning: Requesting gpuCount > 1 does NOT automatically parallelize your code. Your application must explicitly use multi-GPU frameworks (e.g., PyTorch DDP, Horovod).

Dynamic GPU Count from Inputs

You can parameterize gpuCount using WDL inputs:

task flexible_gpu {
  input {
    Int num_gpus = 1
    Boolean use_gpu = true
  }

  command <<<
    python3 train.py --gpus ~{num_gpus}
  >>>

  runtime {
    docker: "pytorch/pytorch:latest"
    memory: "32GiB"
    cpu: 8
    gpu: use_gpu
    gpuCount: if use_gpu then num_gpus else 0
    runtime_minutes: 120
  }
}

In your inputs.json:

{
  "workflow.flexible_gpu.num_gpus": 2,
  "workflow.flexible_gpu.use_gpu": true
}

Critical: Docker Container Requirements

GPU Support Depends on Your Container

The most common GPU failure is using a CPU-only container.

Setting gpu: true in the runtime stanza does NOT add GPU support to your container. Your container must already include:

  • NVIDIA CUDA drivers/runtime

  • GPU-accelerated libraries (PyTorch, TensorFlow, etc.)

Complete Runtime Stanza Reference

All Runtime Attributes with GPU

runtime {
  # Container (REQUIRED, must be GPU-enabled for GPU tasks)
  docker: "pytorch/pytorch:latest"

  # Compute Resources
  memory: "32GiB"         # RAM allocation
  cpu: 8                  # CPU threads (useful for data loading)

  # GPU Resources
  gpu: true               # Enable GPU (REQUIRED for GPU access)
  gpuCount: 1             # Number of GPUs (default: 1)

  # Time Limit
  runtime_minutes: 120    # Maximum runtime
}

Available GPU Hardware

JAWS provides GPU access at these sites:

Site

GPU Model

Nodes

GPUs/Node

Memory/GPU

Perlmutter (NERSC)

NVIDIA A100

1536

4

40GB

Tahoma (EMSL)

NVIDIA Tesla V100

24

2

32GB

Site Selection: Specify site when submitting:

jaws submit workflow.wdl inputs.json tahoma

Troubleshooting

Common Runtime Stanza Errors

Issue: Task runs on CPU instead of GPU

Symptom:

>>> torch.cuda.is_available()
False

Causes & Fixes:

  1. Missing gpu: true in runtime stanza

    ❌ Wrong:

    runtime {
      docker: "pytorch/pytorch:latest"
      memory: "16GiB"
      cpu: 4
      # gpu missing
    }
    

    ✅ Fix:

    runtime {
      docker: "pytorch/pytorch:latest"
      memory: "16GiB"
      cpu: 4
      gpu: true  # Add this
    }
    
  2. CPU-only container

    ❌ Wrong:

    runtime {
      docker: "ubuntu:22.04"  # No CUDA
      gpu: true
    }
    

    ✅ Fix:

    runtime {
      docker: "pytorch/pytorch:latest"  # Has CUDA
      gpu: true
    }
    

Issue: nvidia-smi command not found

Cause: Container does not include NVIDIA drivers.

Fix: Use a CUDA-enabled base image:

runtime {
  docker: "nvidia/cuda:12.0-runtime"  # or pytorch/pytorch:latest
  gpu: true
}

Issue: Expected X GPUs but found Y

Symptom: Your code requests more GPUs than allocated.

Cause: Mismatch between code and runtime stanza.

Fix: Align your code with gpuCount:

task train {
  input {
    Int num_gpus = 2
  }

  command <<<
    python3 train.py --gpus ~{num_gpus}
  >>>

  runtime {
    docker: "pytorch/pytorch:latest"
    gpu: true
    gpuCount: num_gpus  # Match code expectation
  }
}

FAQ

Q: What’s the minimum runtime stanza for GPU?

A: Two attributes required: docker (CUDA-enabled) and gpu: true:

runtime {
  docker: "pytorch/pytorch:latest"
  gpu: true
}

Q: What happens if I omit gpuCount?

A: Defaults to 1 GPU when gpu: true. These are equivalent:

runtime { gpu: true }
runtime { gpu: true, gpuCount: 1 }

Q: Can I mix GPU and CPU tasks in one workflow?

A: Yes! Only add gpu: true to tasks that need GPUs:

task preprocess {
  runtime {
    docker: "ubuntu:22.04"
    memory: "16GiB"
    cpu: 4
    # No gpu → runs on CPU
  }
}

task train {
  runtime {
    docker: "pytorch/pytorch:latest"
    memory: "16GiB"
    cpu: 4
    gpu: true  # Runs on GPU
  }
}

Q: Does gpuCount: 4 automatically parallelize my code?

A: No. Your code must explicitly use multi-GPU frameworks (PyTorch DDP, Horovord, etc.).

Additional Resources