====================================
GPU Usage Guide for WDL Workflows
====================================

.. role:: bash(code)
   :language: bash

This guide shows WDL users how to configure GPU resources in JAWS workflows using the ``runtime`` stanza.

.. contents:: Quick Links
   :local:
   :depth: 2


Quick Start: Test GPU Access
=============================

Use this minimal WDL to verify GPU configuration before running production workflows:

.. code-block:: text

    version 1.0

    workflow GPU_Quick_Test {
      call test_gpu
    }

    task test_gpu {
      command <<<
        nvidia-smi || echo "nvidia-smi not found"
        python3 -c "import torch; print('CUDA:', torch.cuda.is_available())"
      >>>

      output {
        File log = stdout()
      }

      runtime {
        docker: "pytorch/pytorch:latest"
        memory: "4GiB"
        cpu: 1
        gpu: true
        runtime_minutes: 10
      }
    }

**Expected output**: ``nvidia-smi`` shows GPU info and ``CUDA: True``

If this fails, see the Troubleshooting_ section below.


Runtime Stanza: GPU Attributes
===============================

GPU Configuration Attributes
-----------------------------

The ``runtime`` stanza supports two GPU-specific attributes:

.. code-block:: text

    runtime {
      gpu: true         # REQUIRED to enable GPU allocation
      gpuCount: 1       # OPTIONAL, defaults to 1 when gpu is true
    }

Attribute Details
^^^^^^^^^^^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 15 10 15 60

   * - Attribute
     - Type
     - Default
     - Description
   * - ``gpu``
     - Boolean
     - ``false``
     - Set to ``true`` to request GPU resources. **Required** for any GPU access.
   * - ``gpuCount``
     - Int
     - ``1``
     - Number of GPUs to allocate. Only applies when ``gpu: true``. Most tasks should use ``1``.

Runtime Stanza Examples
------------------------

**Minimal GPU Runtime** (Recommended Starting Point)

.. code-block:: text

    runtime {
      docker: "pytorch/pytorch:latest"
      memory: "16GiB"
      cpu: 4
      gpu: true                    # Enable GPU
      runtime_minutes: 60
    }

This requests **1 GPU** (default when ``gpu: true``).

**Explicit Single GPU**

.. code-block:: text

    runtime {
      docker: "pytorch/pytorch:latest"
      memory: "16GiB"
      cpu: 4
      gpu: true
      gpuCount: 1                  # Explicit, same as omitting gpuCount
      runtime_minutes: 60
    }

**Multiple GPUs** (Only if your code supports multi-GPU)

.. code-block:: text

    runtime {
      docker: "nvcr.io/nvidia/pytorch:24.01-py3"
      memory: "64GiB"
      cpu: 16
      gpu: true
      gpuCount: 4                  # Request 4 GPUs
      runtime_minutes: 240
    }

⚠️ **Warning**: Requesting ``gpuCount > 1`` does NOT automatically parallelize your code. Your application must explicitly use multi-GPU frameworks (e.g., PyTorch DDP, Horovod).

Dynamic GPU Count from Inputs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can parameterize ``gpuCount`` using WDL inputs:

.. code-block:: text

    task flexible_gpu {
      input {
        Int num_gpus = 1
        Boolean use_gpu = true
      }

      command <<<
        python3 train.py --gpus ~{num_gpus}
      >>>

      runtime {
        docker: "pytorch/pytorch:latest"
        memory: "32GiB"
        cpu: 8
        gpu: use_gpu
        gpuCount: if use_gpu then num_gpus else 0
        runtime_minutes: 120
      }
    }

In your ``inputs.json``:

.. code-block:: json

    {
      "workflow.flexible_gpu.num_gpus": 2,
      "workflow.flexible_gpu.use_gpu": true
    }


Critical: Docker Container Requirements
========================================

GPU Support Depends on Your Container
--------------------------------------

**The most common GPU failure is using a CPU-only container.**

Setting ``gpu: true`` in the runtime stanza **does NOT** add GPU support to your container. Your container must already include:

- NVIDIA CUDA drivers/runtime
- GPU-accelerated libraries (PyTorch, TensorFlow, etc.)


Complete Runtime Stanza Reference
==================================

All Runtime Attributes with GPU
--------------------------------

.. code-block:: text

    runtime {
      # Container (REQUIRED, must be GPU-enabled for GPU tasks)
      docker: "pytorch/pytorch:latest"

      # Compute Resources
      memory: "32GiB"         # RAM allocation
      cpu: 8                  # CPU threads (useful for data loading)

      # GPU Resources
      gpu: true               # Enable GPU (REQUIRED for GPU access)
      gpuCount: 1             # Number of GPUs (default: 1)

      # Time Limit
      runtime_minutes: 120    # Maximum runtime
    }


Available GPU Hardware
======================

JAWS provides GPU access at these sites:

.. list-table::
   :header-rows: 1
   :widths: 20 25 15 15 15

   * - Site
     - GPU Model
     - Nodes
     - GPUs/Node
     - Memory/GPU
   * - Perlmutter (NERSC)
     - NVIDIA A100
     - 1536
     - 4
     - 40GB
   * - Tahoma (EMSL)
     - NVIDIA Tesla V100
     - 24
     - 2
     - 32GB

**Site Selection**: Specify site when submitting:

.. code-block:: bash

    jaws submit workflow.wdl inputs.json tahoma


Troubleshooting
===============

Common Runtime Stanza Errors
-----------------------------

**Issue: Task runs on CPU instead of GPU**

Symptom:

.. code-block:: python

    >>> torch.cuda.is_available()
    False

**Causes & Fixes**:

1. **Missing** ``gpu: true`` **in runtime stanza**

   ❌ Wrong:

   .. code-block:: text

       runtime {
         docker: "pytorch/pytorch:latest"
         memory: "16GiB"
         cpu: 4
         # gpu missing
       }

   ✅ Fix:

   .. code-block:: text

       runtime {
         docker: "pytorch/pytorch:latest"
         memory: "16GiB"
         cpu: 4
         gpu: true  # Add this
       }

2. **CPU-only container**

   ❌ Wrong:

   .. code-block:: text

       runtime {
         docker: "ubuntu:22.04"  # No CUDA
         gpu: true
       }

   ✅ Fix:

   .. code-block:: text

       runtime {
         docker: "pytorch/pytorch:latest"  # Has CUDA
         gpu: true
       }

**Issue: nvidia-smi command not found**

**Cause**: Container does not include NVIDIA drivers.

**Fix**: Use a CUDA-enabled base image:

.. code-block:: text

    runtime {
      docker: "nvidia/cuda:12.0-runtime"  # or pytorch/pytorch:latest
      gpu: true
    }


**Issue: Expected X GPUs but found Y**

**Symptom**: Your code requests more GPUs than allocated.

**Cause**: Mismatch between code and runtime stanza.

**Fix**: Align your code with ``gpuCount``:

.. code-block:: text

    task train {
      input {
        Int num_gpus = 2
      }

      command <<<
        python3 train.py --gpus ~{num_gpus}
      >>>

      runtime {
        docker: "pytorch/pytorch:latest"
        gpu: true
        gpuCount: num_gpus  # Match code expectation
      }
    }


FAQ
===

**Q: What's the minimum runtime stanza for GPU?**

A: Two attributes required: ``docker`` (CUDA-enabled) and ``gpu: true``:

.. code-block:: text

    runtime {
      docker: "pytorch/pytorch:latest"
      gpu: true
    }

**Q: What happens if I omit** ``gpuCount``?

A: Defaults to 1 GPU when ``gpu: true``. These are equivalent:

.. code-block:: text

    runtime { gpu: true }
    runtime { gpu: true, gpuCount: 1 }


**Q: Can I mix GPU and CPU tasks in one workflow?**

A: Yes! Only add ``gpu: true`` to tasks that need GPUs:

.. code-block:: text

    task preprocess {
      runtime {
        docker: "ubuntu:22.04"
        memory: "16GiB"
        cpu: 4
        # No gpu → runs on CPU
      }
    }

    task train {
      runtime {
        docker: "pytorch/pytorch:latest"
        memory: "16GiB"
        cpu: 4
        gpu: true  # Runs on GPU
      }
    }


**Q: Does** ``gpuCount: 4`` **automatically parallelize my code?**

A: No. Your code must explicitly use multi-GPU frameworks (PyTorch DDP, Horovord, etc.).


Additional Resources
====================

- :doc:`JAWS Best Practices </Resources/best_practices>`
- :doc:`Compute Resources Reference </Resources/compute_resources>`
- `PyTorch CUDA Documentation <https://pytorch.org/docs/stable/cuda.html>`_
- `TensorFlow GPU Guide <https://www.tensorflow.org/guide/gpu>`_
- `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/>`_