============ JAWS Backend ============ .. role:: bash(code) :language: bash Summary ======= In this section, we explain how tasks are processed and managed in JAWS through the integration of Cromwell, HTCondor, JAWS Pool Manager, and SLURM. We will provide an overview of the workflow submission process and the role each component plays. Understanding JAWS Task Submission Workflow =========================================== .. figure:: /Figures/HTCondor_Slurm.svg :scale: 100% :align: center JAWS uses a distributed computing approach, leveraging multiple systems to ensure efficient task scheduling, management, and execution. Here’s a breakdown of the components and their roles in the workflow submission process: **1. Cromwell: The Workflow Engine** Cromwell acts as the workflow engine in JAWS. It is responsible for submitting tasks to the job scheduler (HTCondor) for execution. Cromwell manages workflows written in the Workflow Description Language (WDL). **2. HTCondor: Job Scheduler** HTCondor is responsible for managing and queuing the tasks submitted by Cromwell. It plays a crucial role in distributing the tasks to the appropriate resources based on availability and capacity. **3. JAWS Pool Manager** The JAWS Pool Manager monitors the HTCondor queue to determine how many SLURM nodes are required to process the queued tasks. Once determined, it requests SLURM nodes using the `--exclusive` flag via the :bash:`sbatch` command to reserve dedicated resources for task execution. **4. Compute Pool: Execution Resources** Once SLURM allocates the necessary resources, HTCondor submits tasks to the available compute nodes in the pool. After a task is completed, the next task in the HTCondor queue is submitted to the SLURM node. If no more tasks are available, the JAWS Pool Manager releases the SLURM nodes using the :bash:`scancel` command. For example, if a workflow scatters into 300 tasks (the JAWS per-workflow concurrent-task limit; see :doc:`jaws_policies`) each with: .. code-block:: text runtime { memory: "8G" cpu: 4 } HTCondor will put them in the queue and the JAWS Pool Manager will start requesting SLURM nodes. If the site grants 5 nodes with 64 CPUs each, the pool can run up to 80 of these tasks in parallel (``5 nodes × 64 CPUs ÷ 4 CPUs/task = 80``). The other 220 tasks wait in the HTCondor queue and start as earlier tasks finish and free up cores. Files Generated During Execution ================================ During the task submission and execution process, various files are generated in the execution directory. These files can be helpful for monitoring task progress and troubleshooting errors. Below is a list of common files and their purposes: - `script.submit`: The script that Cromwell passes to HTCondor. This file contains the instructions for submitting the task to HTCondor. - `stdout.submit`: The standard output from `script.submit`, showing details about the task's submission process. - `stderr.submit`: The standard error from `script.submit`, useful for debugging any errors during task submission. - `submitFile`: Contains resource specifications (e.g., memory, CPU requirements) for the task and tells HTCondor how to handle the job. - `execution`.log: A log file produced by HTCondor that contains details on the running resources and job status. - `dockerScript`: Defines the Shifter or Apptainer command that runs `script`. - `script`: Represents the code defined in your workflow’s `command{}` section. - `stdout`: Standard output from the task being executed on the compute node. - `stderr`: Standard error output from the task, useful for identifying issues that occurred during execution. - `rc`: The return code from the task, indicating success or failure (typically `0` for success and non-zero for failure).