================== How to build WDLs ================== .. role:: bash(code) :language: bash ******* Summary ******* In this tutorial, we will create a WDL script for a common bioinformatics pipeline. .. admonition:: Pre-requisites This tutorial assumes that you have an understanding of the basic structure of a WDL script. Some useful links: * Start with the official `OpenWDL Specification `_ * `Real world examples `_ * Re-usable subworkflow tasks: `WDL-tasks `_ ************* Our workflow ************* The processing with `BBMap `_ contains two steps: * Alignment of sequence files to reference genome using `BBMap`, followed by * SAM to BAM format conversion using `samtools `_. The basic commands for the two steps are: .. code-block:: text # align reads to reference contigs bbmap.sh in=reads.fq ref=reference.fasta out=test.sam # create a bam file from alignment samtools view -b -F0x4 test.sam | samtools sort - > test.sorted.bam Setup your Working Environment ------------------------------ Download the example data repository: .. code-block:: text git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git cd jaws-tutorial-examples/data In this folder, you will find test data set: * Sample single-end FASTQ file * Reference fasta and index files ***************************** Converting Each Task to a WDL ***************************** If we have a workflow represented as a script (or a sequence of commands), we can parse it into WDL tasks. .. note :: Each script you create should execute in and write output to the **current working directory**. `BBMap` ------- This task will align the sample single-end FASTQ file to reference genome, using `BBMap` algorithm. Here is the task skeleton definition: .. code-block:: text task alignment { Inputs command {...} output {...} runtime {...} } Now, we need to define the input variables, the alignment command line that will be executed, and the expected outputs files: .. code-block:: text :linenos: :emphasize-lines: 8 task alignment { input { File fastq File fasta } command <<< bbmap.sh in=~{fastq} ref=~{fasta} out=test.sam >>> output { File sam = "test.sam" } } We are passing the fastq file for our sample and the reference fasta as inputs to the task. .. note:: Notice how to reference the variables in the command, using `~{variable_name}`. Older WDL specification use `${variable_name}`, however to avoid confusion with bash variables, it's recommended to use `~{variable_name}`. .. hint:: The `command` section is enclosed in either curly braces { ... } or triple angle braces <<< ... >>>. Expression placeholders differ depending on the command section style: +----------------------+----------------------------+ | Command Body Style | Placeholder Style | +======================+============================+ | `command { ... }` | `~{}` (preferred) or `${}` | +----------------------+----------------------------+ | `command <<< >>>` | `~{}` only | +----------------------+----------------------------+ Next, we need to define the runtime attributes, i.e., the number of CPUs, memory, and time required for the task, as well as the Docker container used for the execution. .. code-block:: text task alignment { ... runtime { docker: "jfroula/aligner-bbmap@sha256:8a849019294cea0636d474d07f18e5f84e2b2b58cf50b104c04348db91cdabb4" cpu: 1 memory: "5G" runtime_minutes: 10 } } Our task is complete and should look like this: .. code-block:: text task alignment { input { File fastq File fasta } command <<< bbmap.sh in=~{fastq} ref=~{fasta} out=test.sam >>> output { File sam = "test.sam" } runtime { docker: "jfroula/aligner-bbmap@sha256:8a849019294cea0636d474d07f18e5f84e2b2b58cf50b104c04348db91cdabb4" cpu: 1 memory: "5G" runtime_minutes: 10 } } `Samtools` ---------- This task will take the output from alignment step in SAM format, convert it to BAM, and sort it on coordinates using `Samtools` utility. The task skeleton is the same used above. The complete `Samtools` task definition should look like this: .. code-block:: text :linenos: :emphasize-lines: 7,8 task samtools { input { File sam } command <<< set -eo pipefail samtools view -b -F0x4 ~{sam} | samtools sort - > test.sorted.bam >>> output { File bam = "test.sorted.bam" } runtime { docker: "jfroula/aligner-bbmap@sha256:8a849019294cea0636d474d07f18e5f84e2b2b58cf50b104c04348db91cdabb4" cpu: 1 memory: "5G" runtime_minutes: 10 } } .. dropdown:: Hint: `set -eo pipefail` :color: info :animate: fade-in This command can be useful when used at the begining of the `command{}` section in your WDL. This command will help capture errors at the point where they occur in your unix code, rather than having the commands run beyond where the error happened, since this makes debugging more difficult. ******************** Workflow Definition ******************** Let’s explore the workflow skeleton: .. code-block:: text :linenos: :emphasize-lines: 1,3,6,8 version 1.0 workflow bbtools { input { } call alignment { input: } call samtools { input: } } At the top level, we define a workflow named `bbtools`, within which we make calls to a set of tasks, here `alignment` and `samtools`. The order in which the tasks are defined implies the order of execution if there is a dependency between the tasks. If no dependencies are determined, `cromwell` (the execution engine) will run the tasks in parallel. .. note:: The very first line represents the version of WDL specification being used. In this example, we are using version 1.0 of the WDL spec. Note that JAWS is currently using 1.0 version. Now, we need to define the input variables for the tasks, and most importantly, we need to tell `cromwell` how to link the tasks together: .. code-block:: text version 1.0 workflow bbtools { input { File reads File ref } call alignment { input: fastq=reads, fasta=ref } call samtools { input: sam=alignment.sam } } The WDL calls two functions or tasks. The second task, `samtools` uses the output from the previous task, `alignment`. How to pass the output of one task as input to another? In this example, each of the two tasks has an output section that defines the name of the output. The name of the output for the `alignment` task is “sam” (e.g. :bash:`File sam = \"test.sam\"`). Now the second task `samtools` can access this output by refering to it as “alignment.sam” (``). See the line input: :bash:`sam=alignment.sam`. Finally, combining all the top-level components, workflow and taks on the same file, we are expecting to have: .. code-block:: text version 1.0 workflow bbtools { input { File reads File ref } call alignment { input: fastq=reads, fasta=ref } call samtools { input: sam=alignment.sam } } task alignment { input { File fastq File fasta } command <<< bbmap.sh in=~{fastq} ref=~{fasta} out=test.sam >>> output { File sam = "test.sam" } runtime { docker: "jfroula/aligner-bbmap@sha256:8a849019294cea0636d474d07f18e5f84e2b2b58cf50b104c04348db91cdabb4" cpu: 1 memory: "5G" runtime_minutes: 10 } } task samtools { input { File sam } command <<< set -eo pipefail samtools view -b -F0x4 ~{sam} | samtools sort - > test.sorted.bam >>> output { File bam = "test.sorted.bam" } runtime { docker: "jfroula/aligner-bbmap@sha256:8a849019294cea0636d474d07f18e5f84e2b2b58cf50b104c04348db91cdabb4" cpu: 1 memory: "5G" runtime_minutes: 10 } } Now, you can save this file as `align.wdl`. .. note:: Note that the tasks are defined outside of the workflow block while the call statements are placed inside of it. .. note:: Note that each command, in the “command” level, is run in a docker container. ******** Validate ******** - Validate using JAWS Next, we will validate our script, make sure there are no syntax errors. We will use `jaws validate` command: .. code-block:: text ## Login to Dori ## Activate the environment module load jaws jaws validate align.wdl > Workflow is OK - Validate locally `jaws validate` uses `miniwdl `_. ****** Inputs ****** - Create your input file You can create an inputs file by scratch, following the skeleton: .. code-block:: text { ".": "" } For our example in this tutorial, you will have: .. code-block:: text jaws inputs align.wdl { "bbtools.reads": "data/sample.fastq.bz2", "bbtools.ref": "data/sample.fasta" } - Create your input file using JAWS As an alternative, you can build a skeleton template based on the WDL using the following command: .. code-block:: text jaws inputs align.wdl This command should output a template for the `inputs.json` file. You can then fill in the values of each key. .. code-block:: text { "bbtools.reads": "File", "bbtools.ref": "File" } *************** Execute Locally *************** Running with your own Cromwell version. Make sure the bbtools and samtools are installed in your environment. Also, you can use conda environment, as demonstrated `here `_. .. code-block:: text # run with your installed version cromwell run align.wdl -i inputs.json ## OR java -jar /path/to/cromwell/cromwell.jar run align.wdl -i inputs.json ******* Outputs ******* The outputs of the workflow will be written to `/call-/execution/` folder! Each task of your workflow gets run inside the execution directory so it is here that you can find any output files including the `stderr`, `stdout` & `script` file. Please explore the directory structure for relevant files! *********************** Visualize your Workflow *********************** Create the Directed Acyclic Graph (DAG) of the WDL file using `WOMtool`: .. code-block:: text java -jar womtool-87.jar graph align.wdl > align.dot dot -Tpng align.dot -o align.png # You need to install graphviz Install dependencies: .. code-block:: text wget https://github.com/broadinstitute/cromwell/releases/download/87/womtool-87.jar brew install graphviz # mac Sudo apt install graphviz #linux .. figure:: /Figures/align.png :class: with-shadow :scale: 100% :align: center