============== How JAWS Works ============== .. role:: bash(code) :language: bash JAWS is a multi-site workflow manager that uses the `Cromwell `_ workflow engine. Some main directives of JAWS are to make running of bioinformatics workflows easier, foster collaboration between users of the system, and make it possible to move workloads across different DOE resources. JAWS is composed of four main parts: 1) a command line interface: `Jaws Client`; 2) a centralized orchestration service: `Jaws Central`, administering runs to multiple sites; 3) a site service that wraps the workflow engine, like Cromwell, and is installed on a compute site; 4) a job submission manager, like HTCondor, which submits jobs to worker pools using SLURM. ################################ JAWS Components and Architecture ################################ Below is a diagram of the JAWS architecture. Note that there is some duplication of processes that is meant to demonstrate that "site" can be installed at multiple sites. The main takeaways here are: * All the commands are from the command line and handled by :bash:`Jaws Client`; * The :bash:`Jaws Central` is a server that coordinates which compute-site (e.g. LabIT or NERSC) the pipeline is run; * `GLOBUS `_ transfers all your files from your data source to the computing-site where Cromwell will actually run; * Cromwell is the workflow engine that will run the pipeline at the compute-site; * HTCondor serves as the backend to Cromwell and handles the running of the jobs on a HPC cluster. .. figure:: /Figures/jaws_architecture-Architecture.svg :scale: 100% Click on the image to enlarge JAWS Overall Workflow Processing -------------------------------- The user interfaces only with the :bash:`jaws-client`. The :bash:`jaws-client` communicates with :bash:`jaws-central` to move data to the target site and hands over the workflow executions to the respective :bash:`jaws-site` service which in turn runs the workflow to completion and relays the status back to :bash:`jaws-central`. Globus is used as a transfer mechanism between a central data storage location and target sites. The execution of workflows by :bash:`jaws-site` is orchestrated by Cromwell. jaws-client ----------- :bash:`jaws-client` is a command-line interface for the user and interacts with the central service using defined APIs. :bash:`jaws-client` offers commands to submit and monitor workflows. :bash:`jaws-central` saves metadata about runs, for example, which version of the pipeline was run, runtime statistics, which datasets were processed, etc. Cromwell ---------- `Cromwell `_ is responsible for executing the commands in a workflow. It takes a workflow, written in WDL, and creates instructions on how and when each task should be executed. In our case, the tasks are executed on a user-defined backend, HTCondor. HTCondor -------------- The main purpose of the HTCondor is to receive tasks from Cromwell and execute them on a compute resource (e.g. HPC cluster). It acts as an abstraction layer between :bash:`jaws-site` and different resources (different clusters and cloud resources). Globus ------ `GLOBUS `_ transfers all your files from your data source to the computing-site where Cromwell runs and back again. Leverages Community Supported Tools =================================== JAWS is built upon a foundation of robust, community-supported tools, ensuring reliability and widespread compatibility. .. figure:: /Figures/technologies_used.png :align: center :scale: 80% ---------------- Technologies used: ++++++++++++++++++ - **Cromwell**: Executes workflows written in Workflow Description Language (WDL). For more details, see the `WDL Specification `_. - **Shifter, Apptainer**: These container platforms define the runtime environment for tasks. (Apptainer is the current name for what was previously called Singularity.) - **HTCondor**: Distributes jobs across multiple compute clusters, such as "`dori`" and "`jgi`." - **Globus**: Facilitates file transfers between various endpoints using GridFTP. - **REST APIs**: Enable communication between different JAWS components through RESTful interfaces. - **RabbitMQ**: Acts as a message broker, managing the communication of workflow tasks between Cromwell and the compute cluster workers.