==========================
Creating Docker Containers
==========================

.. role:: bash(code)
   :language: bash

*******
Summary
*******

This tutorial describes one way Docker images can be created and used in your WDL. If you are unfamiliar with Docker, please see `Docker tutorial <https://www.digitalocean.com/community/tutorials/getting-started-with-docker>`_ or search for the many YouTube tutorials.

Prerequisites

This tutorial page relies on completing the previous tutorial, :doc:`Lesson 1: Development Environment <wdl_development>`.

.. note::

    As a pre-requisite, you will need a computer with Docker installed (Docker Engine - Community). Installation instructions can be found at `docs.docker.com/install <https://docs.docker.com/install/>`_ or if you have conda installed :bash:`conda install -c conda-forge docker-py`.

Here are the steps we're going to take for this tutorial:
   1. make a Docker image from the same commands you used for the conda environment (:doc:`Lesson 1: Development Environment <wdl_development>`);
   2. run a WDL that is using your Docker container.

****************************
Clone the Example Repository
****************************

For this tutorial, I will be using the example code from `jaws-tutorial-examples <https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git>`_.
To follow along, do:

.. code-block:: text

   git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
   cd jaws-tutorial-examples/5min_example

*******************
Create docker image
*******************

Next we'll describe how to create a Dockerfile and register it with `hub.docker.com <https://docs.docker.com/docker-hub/>`_. 
But first create an account and click on "Create a Repository". In the space provided, enter a name for your container, that doesn't have to exist yet, 
like :bash:`aligner-bbmap`. You will push a docker image to this name after you create it in the next steps.

To make the Dockerfile, you can use the same commands you used for the conda environment. 
Notice that it is good practice to specify the versions when installing software like I have done in the example Dockerfile.
Of course, you can drop the versions altogether to get the latest version but the Dockerfile may not work out-of-the-box in the future due to version conflicts.

.. note::
    It is helpful, when creating the Dockerfile to test each command (i.e. apt-get, wget, conda install, etc) manually, inside an empty docker container. 
    Once everything is working, you can copy the commands to a Dockerfile.

This docker command will create an interactive container with an ubuntu base image. You can start installing stuff as root.

.. code-block:: text

    docker run -it ubuntu:latest /bin/bash


Here is an example Dockerfile (provided in `5min_example`). We will create a container from it.

.. code-block:: text

    FROM ubuntu:22.04

    # Install stuff with apt-get
    RUN apt-get update && apt-get install -y wget bzip2 \
        && rm -rf /var/lib/apt/lists/*

    # Point to all the future conda installations you are going to do
    ENV CONDAPATH=/usr/local/bin/miniconda3
    ENV PATH=$CONDAPATH/bin:$PATH

    # Install miniconda
    # There is a good reason to install miniconda in a path other than its default.
    # The default intallation directory is /root/miniconda3 but this path will not be
    # accessible by shifter or singularity so we'll install under /usr/local/bin/miniconda3.
    RUN wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh \
        && bash ./Miniconda3*.sh -b -p $CONDAPATH \
        && rm Miniconda3*.sh

    # Install software with conda
    RUN conda install -c bioconda bbmap==38.84 samtools==1.11 \
        && conda clean -afy

    # This will give us a workingdir within the container (e.g. a place we can mount data to)
    WORKDIR /bbmap

    # Move script into container.
    # Notes that it is copied to a location in your $PATH
    COPY script.sh /usr/local/bin/script.sh


| **Build the image and upload to hub.docker.com**
| You need to use your docker hub user name to tag the image when you are building it (see below).

.. code-block:: text

   # create a "Build" directory and create docker container from there so its a small image. Its good practice to always create an image in
   # a directory containing only the required files, otherwise the container will also include them and could be very large.
   mkdir build
   cp script.sh Dockerfile build/
   cd build
   docker build --tag <your_docker_hub_user_name>/aligner-bbmap:1.0.0 .
   cd ../


**Test that the example script runs in the docker container**

.. code-block:: text

   # use your image name
   docker run <your_docker_hub_user_name>/aligner-bbmap:1.0.0 script.sh

   # if you are in the root of the 5min_example directory, then try re-running the script with data.
   docker run --volume="$(pwd)/../data:/bbmap" <your_docker_hub_user_name>/aligner-bbmap:1.0.0 script.sh sample.fastq.bz2 sample.fasta

   # Notice script.sh is found because it was set in PATH in the Dockerfile and
   # the two inputs are found because the data directory is mounted to /bbmap (inside container) where the script runs.


When you are convinced the docker image is good, you can register it with `hub.docker.com <https://docs.docker.com/docker-hub/>`_ (remember to make an account first). When you run a WDL in JAWS, the docker images will be pulled from `hub.docker.com`.

.. code-block:: text

   docker login
   docker push <your_docker_hub_user_name>/aligner-bbmap:1.0.0

Now your image is available on any site i.e. dori, jgi, tahoma, perlmutter, etc. 
Although you can manually pull your image using:

  - `shifter pull <https://docs.nersc.gov/development/containers/shifter/shifter-beginner-tutorial/#pull-your-docker-image-onto-nersc-via-shifter>`_;
  - `singularity pull <https://docs.sylabs.io/guides/3.2/user-guide/cli/singularity_pull.html>`_; or
  - `docker pull <https://docs.docker.com/engine/reference/commandline/pull/>`_;
  
JAWS will do this for you (you will need to manually pull the images if you are testing Cromwell locally).


*****************************
Test your image on Perlmutter
*****************************

Besides your docker-machine, it is useful to test your image on Perlmutter since you will likely be running your WDL there at some point. There are certain aspects of the docker container that will work on your docker-machine but won't on another site, like `dori`. This is because shifter or singularity behave differently than docker.

To test the docker container on :bash:`perlmutter-p1.nersc.gov`. You'll need to use the shifter command instead of docker to run your workflow, but the image is the same.
More about `shifter at NERSC <https://docs.nersc.gov/development/containers/shifter/shifter-beginner-tutorial/>`_.

Example:

.. code-block:: text

   # pull image from hub.docker.com
   shifterimg pull <your_docker_hub_user_name>/aligner-bbmap:1.0.0

   # clone the repo on Perlmutter
   git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
   cd jaws-tutorial-examples/5min_example

   # run your wrapper script. notice we are running the script.sh that was saved inside the image
   shifter --image=<your_docker_hub_user_name>/aligner-bbmap:1.0.0 ./script.sh ../data/sample.fastq.bz2 ../data/sample.fasta


*******
The WDL
*******
The :bash:`script.sh` that is supplied with the repo has two essential commands:

.. code-block:: text

    # align reads to reference contigs
    bbmap.sh Xmx12g in=$READS ref=$REF out=test.sam

    # create a bam file from alignment
    samtools view -b -F0x4 test.sam | samtools sort - > test.sorted.bam

It would make sense to have both commands inside one task of the WDL because they logically should be run together. However, for an excersise, we will have the two commands become two tasks. The output from the first command is used in the second command, so in our WDL example, we can see how tasks pass information.

See an example of the finished WDL :bash:`align_final.wdl` and its :bash:`input.json` file

.. dropdown:: align_final.wdl
    :color: info
    :animate: fade-in

    .. code-block:: text
    
        version 1.0
    
        workflow bbtools {
            input {
                File reads
                File ref
            }
    
            call alignment {
               input: fastq=reads,
                      fasta=ref
            }
            call samtools {
               input: sam=alignment.sam
           }
        }
    
        task alignment {
            input {
                File fastq
                File fasta
            }
    
            command {
                bbmap.sh Xmx12g in=~{fastq} ref=~{fasta} out=test.sam
            }
    
            runtime {
                docker: "jfroula/aligner-bbmap:2.0.2"
                runtime_minutes: 10
                memory: "5G"
                cpu: 1
            }
    
            output {
               File sam = "test.sam"
            }
        }
    
        task samtools {
            input {
                File sam
            }
    
            command {
               samtools view -b -F0x4 ~{sam} | samtools sort - > test.sorted.bam
            }
    
            runtime {
                docker: "jfroula/aligner-bbmap:2.0.2"
                runtime_minutes: 10
                memory: "5G"
                cpu: 1
            }
    
            output {
               File bam = "test.sorted.bam"
            }
        }

.. dropdown:: inputs.json
    :color: info
    :animate: fade-in
    
    .. code-block:: text
    
        {
            "bbtools.reads": "../data/sample.fastq.bz2",
            "bbtools.ref": "../data/sample.fasta"
        }

.. note::
    Singularity, docker, or shifter can be prepended to each command for testing (see align_with_shifter.sh); however,
    this wouldn't be appropriate for a finished "JAWSified" WDL because you loose portability. The final WDL should have the docker image name put inside the :bash:`runtime {}` section.

This may be helpful when testing & debugging so I've included an example where shifter is prepended to each command.

.. dropdown:: align_with_shifter.wdl
    :color: info
    :animate: fade-in

    .. code-block:: text
    
        version 1.0
    
        workflow bbtools {
            input {
                File reads
                File ref
            }
    
            call alignment {
               input: fastq=reads,
                      fasta=ref
            }
            call samtools {
               input: sam=alignment.sam
           }
        }
    
        task alignment {
            input {
                File fastq
                File fasta
            }
    
            command {
                shifter --image=jfroula/aligner-bbmap:2.0.2 bbmap.sh Xmx12g in=~{fastq} ref=~{fasta} out=test.sam
            }
    
            output {
               File sam = "test.sam"
            }
        }
    
        task samtools {
            input {
                File sam
            }
    
            command {
               shifter --image=jfroula/aligner-bbmap:2.0.2 samtools view -b -F0x4 ~{sam} | shifter --image=jfroula/aligner-bbmap:2.0.2 samtools sort - > test.sorted.bam
            }
    
            output {
               File bam = "test.sorted.bam"
            }
        }

You would run this WDL on Perlmutter with the following command.

.. code-block:: text

    java -jar /global/cfs/cdirs/jaws/jaws-install/perlmutter-prod/lib/cromwell-84.jar run align_with_shifter.wdl -i inputs.json

*****************************************************
The Docker Image Should be in the `runtime{}` Section
*****************************************************

Everything in the :bash:`command{}` section of the WDL will run inside a docker container if you've added docker to the :bash:`runtime{}` section.
Now your WDL has the potential to run on a machine with shifter, singularity, or docker. 
JAWS will take your docker image and run it appropriately as singularity, docker or shifter.
If you run the WDL with the cromwell command on a shifter or singularity machine, you need to supply a :bash:`cromwell.conf` file, explained shortly.

See :bash:`align_final.wdl`:

.. code-block:: text

    runtime {
        docker: "jfroula/aligner-bbmap:2.0.3"
    }

.. _run with conf:

*******************************
Run the Final WDL with Cromwell
*******************************

On a Docker machine
-------------------

You can now run the final WDL:

.. code-block:: text

    conda activate bbtools  # you need this for the cromwell command only
    cromwell run align_final.wdl -i inputs.json


On Perlmutter
-------------
You'll have to include a `cromwell.conf` file in the command because it is the config file that knows whether to run the image, supplied in the :bash:`runtime{}` section, with docker, singularity, or shifter. You don't need to supply a cromwell.conf file in the above cromwell command because docker is default.

The cromwell.conf file is used to:

1. override cromwell's default settings
2. tells cromwell how to interpret the WDL (i.e. use shifter, singularity, etc)
3. specifies the backend to use (i.e. local, slurm, HTcondor, etc)

.. note::

    JAWS takes care of the `cromwell.conf` for you.


Here you can find the config files: `jaws-tutorials-examples/config_files <https://code.jgi.doe.gov/official-jgi-workflows/jaws-tutorial-examples/-/tree/master/config_files>`_.


.. code-block:: text

    java -Dconfig.file=<repository-root>/config_files/<cromwell_*.conf> \
         -Dbackend.providers.Local.config.dockerRoot=$(pwd)/cromwell-executions \
         -Dbackend.default=Local \
         -jar <path/to/cromwell.jar> run <wdl> -i <inputs.json>

where

|    :bash:`-Dconfig.file`
|    points to a cromwell conf file that is used to overwrite the default configurations.  There are versions for perlmutter, dori, etc.
|
|    :bash:`-Dbackend.providers.Local.config.dockerRoot`
|    this overwrites a variable 'dockerRoot' that is in cromwell_perlmutter.conf so that cromwell will use your own current working directory to place its output.
|
|    :bash:`-Dbackend.default=[Local|Slurm]`
|    this will allow you to choose between the Local and Slurm backends. With slurm, each task will have it's own sbatch command (and thus wait in queue).
|
|    :bash:`cromwell.jar` can be what you installed or you can use these paths:
|        **dori:** /clusterfs/jgi/groups/dsi/homes/svc-jaws/jaws-install/dori-prod/lib/cromwell-84.jar
|        **perlmutter:** /global/cfs/cdirs/jaws/jaws-install/perlmutter-prod/lib/cromwell-84.jar

*********************************
Understanding the Cromwell Output
*********************************

Cromwell output is:

1. files created by the workflow
2. the stdout/stderr printed to screen

**1. Where to find the output files**

Cromwell saves the results under a directory called :bash:`cromwell-executions`. And under here, there is a unique folder name representing one WDL run.

.. figure:: /Figures/crom-exec.svg
    :scale: 100%

Each task of your workflow gets run inside the :bash:`execution` directory so it is here that you can find any output files including the stderr, stdout & script file.

Explaination of cromwell generated files

.. dropdown:: stderr
    :color: info
    :animate: fade-in
    
    The stderr from any of the commands/scripts in your task should be in this file.
    
.. dropdown:: stdout
    :color: info
    :animate: fade-in
    
    The stdout from all the commands/scripts in your task should be in this file. Not all scripts send errors to stderr as they should so you will find them in here instead.
    
.. dropdown:: script
    :color: info
    :animate: fade-in
    
    The script file is run by the script.submit file. It contains all the commands that you supplied in the `commands{}` section of the WDL, as well as cromwell generated code that creates the stderr, stdout, and rc files.
    
.. dropdown:: script.submit
    :color: info
    :animate: fade-in
    
    This file contains the actual command that cromwell ran. If the file was created by JAWS, there is one more step before "script" gets run.
    
    .. code-block:: text
    
        script.submit -> dockerScript -> script

.. dropdown:: rc
    :color: info
    :animate: fade-in
    
    This file contains the return code for the `commands{}` section of the WDL.
    One thing to remember is that the return code used for the rc file is from your last command run. And so if a command fails but the last command succeeded, the return code would be :bash:`0`, unless you used :bash:`set -e` which forces an exit upon the first error.

These files are only seen in JAWS


.. dropdown:: stdout.submit
    :color: info
    :animate: fade-in
    
    This file is created by `script.submit` and not by the script file and the content is not useful for debugging your task.
    
.. dropdown:: stderr.submit
    :color: info
    :animate: fade-in
    
    This file is created by script.submit and not by the script file which means there may be some useful error messages. 
    If there was a problem upstream of the task even starting, the error should be in this file.
    
.. dropdown:: dockerScript
    :color: info
    :animate: fade-in
    
    This file is created by `script.submit` and runs the script file.
    
    .. code-block:: text
    
       script.submit -> dockerScript -> script

**2. Cromwell's stdout**

When you ran :bash:`align_with_shifter.wdl` with cromwell above, observe these lines in the output.

1. the bash bbmap.sh and samtools commands that were run
2. paths to the output files from the workflow
3. you should see WorkflowSucceededState
4. copy a path from one of the output execution directories. Notice the cromwell generated files and your :bash:`.sam` or :bash:`.bam` output is there.
5. :bash:`Call-to-Backend` shows that we are running on local backend (default)

.. note::
  You won't have access to this same cromwell standard output when you run through JAWS. The same information can be found in different ways.


*********************************************
Key Considerations for Using Docker with JAWS
*********************************************
    
When using Docker with JAWS, it is essential to follow these key considerations to ensure a smooth workflow execution:

**1. One Docker Image Per Task**

Cromwell, the workflow engine used by JAWS, requires each task to specify a single Docker image. 
Although a single Docker image can be reused across multiple tasks, each task must explicitly declare its image.

**JAWS Optimization**: JAWS automatically checks whether an image has already been pulled on the file system and skips pulling it again, streamlining the process.

**2. Public Container Images**

All container images used in JAWS must be publicly accessible. If your image is private, please contact the JAWS team for assistance.

JGI Access: For JGI users, JAWS has access to a paid DockerHub organization, allowing it to authenticate and pull private images securely. 
Reach out to the JAWS team for help setting up access to private images if necessary.

**3. Prefer sha256 Over Version Tags**

It is highly recommended to use the `sha256` digest instead of version tags (e.g., `v1.0.1`). 
Version tags can be overwritten or reused, while `sha256` provides a unique and consistent identifier for the image.

- Finding the `sha256` Digest:

    - On a Docker machine:

    .. code-block:: text

        docker images --digests | grep <your_docker_hub_user_name>
    
    - Using `Shifter`:

    .. code-block:: text

        # on a shifter-machine
        shifterimg lookup ubuntu:16.04

    Replace the version tag (`16.04`) with the appropriate `sha256` digest for your image.

    - Example usage in a WDL file:

    .. code-block:: text

        runtime {
            docker: "ubuntu@sha256:20858ebbc96215d6c3c574f781133ebffdc7c18d98af4f294cc4c04871a6fe61"
        }

- Running Containers Interactively (Shifter)

    If you need to interactively access the container using Shifter, you can specify either the `sha256` digest or the version `tag`.

        .. code-block:: text

            shifter --image=id:20858ebbc96215d6c3c574f781133ebffdc7c18d98af4f294cc4c04871a6fe61
            or
            shifter --image=ubuntu:16.04

By following these guidelines, you can avoid common pitfalls and ensure a more reliable workflow execution in JAWS.