==================================== GPU Usage Guide for WDL Workflows ==================================== .. role:: bash(code) :language: bash This guide shows WDL users how to configure GPU resources in JAWS workflows using the ``runtime`` stanza. .. contents:: Quick Links :local: :depth: 2 Quick Start: Test GPU Access ============================= Use this minimal WDL to verify GPU configuration before running production workflows: .. code-block:: text version 1.0 workflow GPU_Quick_Test { call test_gpu } task test_gpu { command <<< nvidia-smi || echo "nvidia-smi not found" python3 -c "import torch; print('CUDA:', torch.cuda.is_available())" >>> output { File log = stdout() } runtime { docker: "pytorch/pytorch:latest" memory: "4GiB" cpu: 1 gpu: true runtime_minutes: 10 } } **Expected output**: ``nvidia-smi`` shows GPU info and ``CUDA: True`` If this fails, see the Troubleshooting_ section below. Runtime Stanza: GPU Attributes =============================== GPU Configuration Attributes ----------------------------- The ``runtime`` stanza supports two GPU-specific attributes: .. code-block:: text runtime { gpu: true # REQUIRED to enable GPU allocation gpuCount: 1 # OPTIONAL, defaults to 1 when gpu is true } Attribute Details ^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 15 10 15 60 * - Attribute - Type - Default - Description * - ``gpu`` - Boolean - ``false`` - Set to ``true`` to request GPU resources. **Required** for any GPU access. * - ``gpuCount`` - Int - ``1`` - Number of GPUs to allocate. Only applies when ``gpu: true``. Most tasks should use ``1``. Runtime Stanza Examples ------------------------ **Minimal GPU Runtime** (Recommended Starting Point) .. code-block:: text runtime { docker: "pytorch/pytorch:latest" memory: "16GiB" cpu: 4 gpu: true # Enable GPU runtime_minutes: 60 } This requests **1 GPU** (default when ``gpu: true``). **Explicit Single GPU** .. code-block:: text runtime { docker: "pytorch/pytorch:latest" memory: "16GiB" cpu: 4 gpu: true gpuCount: 1 # Explicit, same as omitting gpuCount runtime_minutes: 60 } **Multiple GPUs** (Only if your code supports multi-GPU) .. code-block:: text runtime { docker: "nvcr.io/nvidia/pytorch:24.01-py3" memory: "64GiB" cpu: 16 gpu: true gpuCount: 4 # Request 4 GPUs runtime_minutes: 240 } ⚠️ **Warning**: Requesting ``gpuCount > 1`` does NOT automatically parallelize your code. Your application must explicitly use multi-GPU frameworks (e.g., PyTorch DDP, Horovod). Dynamic GPU Count from Inputs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can parameterize ``gpuCount`` using WDL inputs: .. code-block:: text task flexible_gpu { input { Int num_gpus = 1 Boolean use_gpu = true } command <<< python3 train.py --gpus ~{num_gpus} >>> runtime { docker: "pytorch/pytorch:latest" memory: "32GiB" cpu: 8 gpu: use_gpu gpuCount: if use_gpu then num_gpus else 0 runtime_minutes: 120 } } In your ``inputs.json``: .. code-block:: json { "workflow.flexible_gpu.num_gpus": 2, "workflow.flexible_gpu.use_gpu": true } Critical: Docker Container Requirements ======================================== GPU Support Depends on Your Container -------------------------------------- **The most common GPU failure is using a CPU-only container.** Setting ``gpu: true`` in the runtime stanza **does NOT** add GPU support to your container. Your container must already include: - NVIDIA CUDA drivers/runtime - GPU-accelerated libraries (PyTorch, TensorFlow, etc.) Complete Runtime Stanza Reference ================================== All Runtime Attributes with GPU -------------------------------- .. code-block:: text runtime { # Container (REQUIRED, must be GPU-enabled for GPU tasks) docker: "pytorch/pytorch:latest" # Compute Resources memory: "32GiB" # RAM allocation cpu: 8 # CPU threads (useful for data loading) # GPU Resources gpu: true # Enable GPU (REQUIRED for GPU access) gpuCount: 1 # Number of GPUs (default: 1) # Time Limit runtime_minutes: 120 # Maximum runtime } Available GPU Hardware ====================== JAWS provides GPU access at these sites: .. list-table:: :header-rows: 1 :widths: 20 25 15 15 15 * - Site - GPU Model - Nodes - GPUs/Node - Memory/GPU * - Perlmutter (NERSC) - NVIDIA A100 - 1536 - 4 - 40GB * - Tahoma (EMSL) - NVIDIA Tesla V100 - 24 - 2 - 32GB **Site Selection**: Specify site when submitting: .. code-block:: bash jaws submit workflow.wdl inputs.json tahoma Troubleshooting =============== Common Runtime Stanza Errors ----------------------------- **Issue: Task runs on CPU instead of GPU** Symptom: .. code-block:: python >>> torch.cuda.is_available() False **Causes & Fixes**: 1. **Missing** ``gpu: true`` **in runtime stanza** ❌ Wrong: .. code-block:: text runtime { docker: "pytorch/pytorch:latest" memory: "16GiB" cpu: 4 # gpu missing } ✅ Fix: .. code-block:: text runtime { docker: "pytorch/pytorch:latest" memory: "16GiB" cpu: 4 gpu: true # Add this } 2. **CPU-only container** ❌ Wrong: .. code-block:: text runtime { docker: "ubuntu:22.04" # No CUDA gpu: true } ✅ Fix: .. code-block:: text runtime { docker: "pytorch/pytorch:latest" # Has CUDA gpu: true } **Issue: nvidia-smi command not found** **Cause**: Container does not include NVIDIA drivers. **Fix**: Use a CUDA-enabled base image: .. code-block:: text runtime { docker: "nvidia/cuda:12.0-runtime" # or pytorch/pytorch:latest gpu: true } **Issue: Expected X GPUs but found Y** **Symptom**: Your code requests more GPUs than allocated. **Cause**: Mismatch between code and runtime stanza. **Fix**: Align your code with ``gpuCount``: .. code-block:: text task train { input { Int num_gpus = 2 } command <<< python3 train.py --gpus ~{num_gpus} >>> runtime { docker: "pytorch/pytorch:latest" gpu: true gpuCount: num_gpus # Match code expectation } } FAQ === **Q: What's the minimum runtime stanza for GPU?** A: Two attributes required: ``docker`` (CUDA-enabled) and ``gpu: true``: .. code-block:: text runtime { docker: "pytorch/pytorch:latest" gpu: true } **Q: What happens if I omit** ``gpuCount``? A: Defaults to 1 GPU when ``gpu: true``. These are equivalent: .. code-block:: text runtime { gpu: true } runtime { gpu: true, gpuCount: 1 } **Q: Can I mix GPU and CPU tasks in one workflow?** A: Yes! Only add ``gpu: true`` to tasks that need GPUs: .. code-block:: text task preprocess { runtime { docker: "ubuntu:22.04" memory: "16GiB" cpu: 4 # No gpu → runs on CPU } } task train { runtime { docker: "pytorch/pytorch:latest" memory: "16GiB" cpu: 4 gpu: true # Runs on GPU } } **Q: Does** ``gpuCount: 4`` **automatically parallelize my code?** A: No. Your code must explicitly use multi-GPU frameworks (PyTorch DDP, Horovord, etc.). Additional Resources ==================== - :doc:`JAWS Best Practices ` - :doc:`Compute Resources Reference ` - `PyTorch CUDA Documentation `_ - `TensorFlow GPU Guide `_ - `NVIDIA Container Toolkit `_