GPU Usage Guide for WDL Workflows

This guide shows WDL users how to configure GPU resources in JAWS workflows using the runtime stanza.

Quick Start: Test GPU Access 

Use this minimal WDL to verify GPU configuration before running production workflows:

version 1.0

workflow GPU_Quick_Test {
  call test_gpu
}

task test_gpu {
  command <<<
    nvidia-smi || echo "nvidia-smi not found"
    python3 -c "import torch; print('CUDA:', torch.cuda.is_available())"
  >>>

  output {
    File log = stdout()
  }

  runtime {
    docker: "pytorch/pytorch:latest"
    memory: "4GiB"
    cpu: 1
    gpu: true
    runtime_minutes: 10
  }
}

Expected output: nvidia-smi shows GPU info and CUDA: True

If this fails, see the Troubleshooting section below.

Runtime Stanza: GPU Attributes 

GPU Configuration Attributes 

The runtime stanza supports two GPU-specific attributes:

runtime {
  gpu: true         # REQUIRED to enable GPU allocation
  gpuCount: 1       # OPTIONAL, defaults to 1 when gpu is true
}

Attribute Details

Attribute	Type	Default	Description
`gpu`	Boolean	`false`	Set to `true` to request GPU resources. Required for any GPU access.
`gpuCount`	Int	`1`	Number of GPUs to allocate. Only applies when `gpu: true`. Most tasks should use `1`.

Runtime Stanza Examples 

Minimal GPU Runtime (Recommended Starting Point)

runtime {
  docker: "pytorch/pytorch:latest"
  memory: "16GiB"
  cpu: 4
  gpu: true                    # Enable GPU
  runtime_minutes: 60
}

This requests 1 GPU (default when gpu: true).

Explicit Single GPU

runtime {
  docker: "pytorch/pytorch:latest"
  memory: "16GiB"
  cpu: 4
  gpu: true
  gpuCount: 1                  # Explicit, same as omitting gpuCount
  runtime_minutes: 60
}

Multiple GPUs (Only if your code supports multi-GPU)

runtime {
  docker: "nvcr.io/nvidia/pytorch:24.01-py3"
  memory: "64GiB"
  cpu: 16
  gpu: true
  gpuCount: 4                  # Request 4 GPUs
  runtime_minutes: 240
}

⚠️ Warning: Requesting gpuCount > 1 does NOT automatically parallelize your code. Your application must explicitly use multi-GPU frameworks (e.g., PyTorch DDP, Horovod).

Dynamic GPU Count from Inputs

You can parameterize gpuCount using WDL inputs:

task flexible_gpu {
  input {
    Int num_gpus = 1
    Boolean use_gpu = true
  }

  command <<<
    python3 train.py --gpus ~{num_gpus}
  >>>

  runtime {
    docker: "pytorch/pytorch:latest"
    memory: "32GiB"
    cpu: 8
    gpu: use_gpu
    gpuCount: if use_gpu then num_gpus else 0
    runtime_minutes: 120
  }
}

In your inputs.json:

{
  "workflow.flexible_gpu.num_gpus": 2,
  "workflow.flexible_gpu.use_gpu": true
}

Critical: Docker Container Requirements 

GPU Support Depends on Your Container 

The most common GPU failure is using a CPU-only container.

Setting gpu: true in the runtime stanza does NOT add GPU support to your container. Your container must already include:

NVIDIA CUDA drivers/runtime
GPU-accelerated libraries (PyTorch, TensorFlow, etc.)

Complete Runtime Stanza Reference 

All Runtime Attributes with GPU 

runtime {
  # Container (REQUIRED, must be GPU-enabled for GPU tasks)
  docker: "pytorch/pytorch:latest"

  # Compute Resources
  memory: "32GiB"         # RAM allocation
  cpu: 8                  # CPU threads (useful for data loading)

  # GPU Resources
  gpu: true               # Enable GPU (REQUIRED for GPU access)
  gpuCount: 1             # Number of GPUs (default: 1)

  # Time Limit
  runtime_minutes: 120    # Maximum runtime
}

Available GPU Hardware 

JAWS provides GPU access at these sites:

Site	GPU Model	Nodes	GPUs/Node	Memory/GPU
Perlmutter (NERSC)	NVIDIA A100	1536	4	40GB
Tahoma (EMSL)	NVIDIA Tesla V100	24	2	32GB

Site Selection: Specify site when submitting:

jaws submit workflow.wdl inputs.json tahoma

Troubleshooting 

Common Runtime Stanza Errors 

Issue: Task runs on CPU instead of GPU

Symptom:

>>> torch.cuda.is_available()
False

Causes & Fixes:

Missing gpu: true in runtime stanza

❌ Wrong:

runtime {
  docker: "pytorch/pytorch:latest"
  memory: "16GiB"
  cpu: 4
  # gpu missing
}

✅ Fix:

runtime {
  docker: "pytorch/pytorch:latest"
  memory: "16GiB"
  cpu: 4
  gpu: true  # Add this
}

CPU-only container

❌ Wrong:

runtime {
  docker: "ubuntu:22.04"  # No CUDA
  gpu: true
}

✅ Fix:

runtime {
  docker: "pytorch/pytorch:latest"  # Has CUDA
  gpu: true
}

Issue: nvidia-smi command not found

Cause: Container does not include NVIDIA drivers.

Fix: Use a CUDA-enabled base image:

runtime {
  docker: "nvidia/cuda:12.0-runtime"  # or pytorch/pytorch:latest
  gpu: true
}

Issue: Expected X GPUs but found Y

Symptom: Your code requests more GPUs than allocated.

Cause: Mismatch between code and runtime stanza.

Fix: Align your code with gpuCount:

task train {
  input {
    Int num_gpus = 2
  }

  command <<<
    python3 train.py --gpus ~{num_gpus}
  >>>

  runtime {
    docker: "pytorch/pytorch:latest"
    gpu: true
    gpuCount: num_gpus  # Match code expectation
  }
}

FAQ 

Q: What’s the minimum runtime stanza for GPU?

A: Two attributes required: docker (CUDA-enabled) and gpu: true:

runtime {
  docker: "pytorch/pytorch:latest"
  gpu: true
}

Q: What happens if I omit gpuCount?

A: Defaults to 1 GPU when gpu: true. These are equivalent:

runtime { gpu: true }
runtime { gpu: true, gpuCount: 1 }

Q: Can I mix GPU and CPU tasks in one workflow?

A: Yes! Only add gpu: true to tasks that need GPUs:

task preprocess {
  runtime {
    docker: "ubuntu:22.04"
    memory: "16GiB"
    cpu: 4
    # No gpu → runs on CPU
  }
}

task train {
  runtime {
    docker: "pytorch/pytorch:latest"
    memory: "16GiB"
    cpu: 4
    gpu: true  # Runs on GPU
  }
}

Q: Does gpuCount: 4 automatically parallelize my code?

A: No. Your code must explicitly use multi-GPU frameworks (PyTorch DDP, Horovord, etc.).