Creating Docker Containers

Summary

This is Lesson 2 of the JAWS tutorial series. In Lesson 1 you ran the BLAST workflow as plain bash inside a conda environment on your laptop. This lesson takes those same commands and packages them into a Docker image so the workflow can run on any machine with a container runtime, without anyone having to install BLAST first.

This is still a local development step, you’re not submitting to JAWS yet. The goal is to develop your code and build the container so that you can run and test it wherever you happen to be working: your laptop (which typically has Docker), Dori (Apptainer), or NERSC (Shifter). Once the container works, you’ll be ready to submit it to JAWS in a later lesson. JAWS itself uses Apptainer and Shifter on its compute sites, so a container that runs locally will also run there.

A quick vocabulary check before you start: an image is the built artifact, a snapshot of an operating system plus your installed tools and scripts; a container is a running instance of that image. You build an image once and run many containers from it, each isolated from the others. Apptainer and Shifter are alternative container runtimes used on HPC sites where security policy doesn’t allow Docker to run as root; they pull and execute the same Docker images you build, so you don’t need separate images for each.

If you’re unfamiliar with Docker, please see the official Get started with Docker guide.

By the end of this lesson you’ll have:

A Docker image built from the Dockerfile in blast_example/.
That image pushed to a registry (Docker Hub or the JGI GitLab Container Registry).
A confirmed test run of the BLAST workflow inside the container.

Prerequisites

Completed Lesson 1: Local Development Environment. You have the jaws-tutorial-examples repo cloned and blast.sh working inside your conda env.
Docker installed and running on your machine. See docs.docker.com/install for instructions per platform. Confirm with:
```
docker --version
docker run --rm hello-world
```
The hello-world test pulls a tiny image, runs it, and prints a confirmation message. If this fails you can’t continue until you fix Docker.
A free account on Docker Hub if you want to use Option A of the push step. If you’ll only push to library.jgi.doe.gov:5050 (JGI internal), you don’t need a Docker Hub account.

Automating this with CI/CD

This tutorial walks through the build-and-push steps by hand so you understand what’s happening. Once you’ve done it once, the JAWS team maintains a template repo, jaws-docker-builder, that automates the whole build-and-push cycle via GitLab CI. Every time you push a change to your Dockerfile or scripts, the pipeline rebuilds the image and pushes a new tag.

See the jaws-docker-builder README for setup. It supports both Docker Hub and library.jgi.doe.gov:5050 as destinations.

Step 1: Look at the Dockerfile

From Lesson 1, you should already be inside the example directory:

cd jaws-tutorial-examples/blast_example

The Dockerfile in that directory is short:

FROM ubuntu:22.04

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        ncbi-blast+ \
    && rm -rf /var/lib/apt/lists/*

COPY blast.sh /usr/local/bin/blast.sh
RUN chmod +x /usr/local/bin/blast.sh

WORKDIR /work

Three things to notice:

FROM ubuntu:22.04, every image starts from a base image. Here we use the stock Ubuntu 22.04 image from Docker Hub. The apt-get instruction installs BLAST+ from the Ubuntu package archive (the ncbi-blast+ package), which is the simplest install path on a Debian/Ubuntu base. Other tools may need conda inside the image, or a build-from-source step; the pattern is the same.
COPY blast.sh /usr/local/bin/blast.sh copies your script into a directory on the container’s PATH, so running blast.sh inside the container Just Works.
WORKDIR /work sets the container’s working directory. When you docker run --volume your data into /work (Step 3 below), the script will find data/reference.fasta and data/query.fasta relative to /work.

That’s the entire image. No miniconda, no extra dependencies; BLAST ships as a self-contained Ubuntu package.

What if my workflow needs conda inside the container? 🔗

BLAST is easy because Ubuntu ships an apt-get-installable package. Most other bioinformatics tools (the ones you used in Lesson 1’s conda env) don’t have an apt-get equivalent and you’ll want to install them via conda inside the image instead. The pattern looks like:

FROM ubuntu:22.04

# Tools needed to fetch and install miniconda
RUN apt-get update && \
    apt-get install -y --no-install-recommends wget bzip2 ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Install miniconda to a world-readable path (NOT /root/miniconda3 — see note below)
ENV CONDAPATH=/usr/local/miniconda3
ENV PATH=$CONDAPATH/bin:$PATH
RUN wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && bash Miniconda3-latest-Linux-x86_64.sh -b -p $CONDAPATH \
    && rm Miniconda3-latest-Linux-x86_64.sh

# Install your bioinformatics tools.
# Example: seqkit (fast FASTA/FASTQ manipulation) + samtools (SAM/BAM/CRAM utilities).
# Swap in whatever your workflow actually needs.
# Pin versions so the image is reproducible.
RUN conda install -c conda-forge -c bioconda -y \
        seqkit=2.8.2 \
        samtools=1.19 \
    && conda clean -afy

# Quick sanity check at build time — the build fails loudly if either tool is broken.
RUN seqkit version && samtools --version | head -1

COPY your_script.sh /usr/local/bin/your_script.sh
RUN chmod +x /usr/local/bin/your_script.sh

WORKDIR /work

Please pay special attention to the following three things to avoid problems:

Don’t install miniconda to its default path (/root/miniconda3). Shifter and Apptainer run the container as a non-root user that can’t read root’s home directory, so your tools will appear “not found” on JAWS compute sites even though they’re inside the image. Install to a world-readable path like /usr/local/miniconda3 (as above) and put $CONDAPATH/bin on the PATH via ENV.
``conda activate`` doesn’t reliably work inside a Dockerfile or a non-interactive container shell. Don’t try to conda activate myenv in a RUN line. Either install everything into the base env (as above), or put the env’s bin/ directory directly on PATH. Cromwell’s command block also runs non-interactively, so the script can’t rely on conda activation either.
Conda images get fat fast. Always run conda clean -afy in the same RUN step that does the install (so the cleanup ends up in the same Docker layer). Without it, your image can easily balloon to 2–3 GB and pulls become painfully slow on every JAWS run.

For private-registry workflows, the same conda recipe applies; just remember the call-caching caveat from Step 4 below.

Step 2: Build the Image

Build the image, tagging it with your Docker Hub (or other) username and a version:

docker build --tag <your-username>/blast-example:1.0.0 .

The --tag (or -t) gives the image a human-readable name. The convention <namespace>/<image-name>:<version> matters when you push:

For Docker Hub (the public registry most open-source projects publish to): <your-docker-hub-username>/<image-name>:<version>.
For library.jgi.doe.gov:5050 (the JGI’s private GitLab container registry — use it for images you don’t want public): include the registry hostname, e.g. library.jgi.doe.gov:5050/<your-gitlab-namespace>/<image-name>:<version>.

You can re-tag an image later (Step 4), so don’t worry about getting the registry name right on the first build.

Verify the image exists:

docker images | grep blast-example

You should see one row, with the tag you used and a SHA-shortened image ID.

Step 3: Run the Image Locally

This is the moment of truth: prove that the same workflow that ran in your conda env in Lesson 1 also runs inside the container.

Important

Make sure you’re inside jaws-tutorial-examples/blast_example for the rest of this step. The --volume "$(pwd)/data:/work/data" flag below uses $(pwd) (your current directory), so it only mounts the right files if your shell is in blast_example/.

cd jaws-tutorial-examples/blast_example
pwd
# .../jaws-tutorial-examples/blast_example   <-- should end in this

First, run the script with no arguments to confirm BLAST is installed inside the image:

docker run --rm <your-username>/blast-example:1.0.0 blast.sh

You should see the paths to makeblastdb and blastn (now under /usr/bin/ instead of your conda env). If they’re not found, the apt-get install step in the Dockerfile didn’t complete cleanly; rebuild.

Now try to run the actual workflow, without mounting anything yet. This will fail in an instructive way:

docker run --rm <your-username>/blast-example:1.0.0 \
    blast.sh data/reference.fasta data/query.fasta

You’ll see something like:

Building a new DB, current time: 05/25/2026 02:40:53
New DB name:   /work/blastdb/ref
New DB title:  data/reference.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: File data/reference.fasta does not exist

That’s the key lesson about containers: the container has its own filesystem and cannot see your laptop’s files unless you explicitly hand them in. Your data/ directory lives on your laptop; inside the container, /work/data is empty. makeblastdb happily started, then blastn couldn’t find reference.fasta, because there’s no such file in the container’s view of the world.

Fix it by mounting data/ into the container with --volume:

docker run --rm \
    --volume "$(pwd)/data:/work/data" \
    <your-username>/blast-example:1.0.0 \
    blast.sh data/reference.fasta data/query.fasta

What’s happening:

--rm removes the container after it exits. Good hygiene; otherwise stopped containers accumulate.
--volume "$(pwd)/data:/work/data" mounts your local data/ directory at /work/data inside the container. The WORKDIR /work from the Dockerfile means the script’s data/reference.fasta argument resolves to /work/data/reference.fasta inside the container.

Expected output:

Building a new DB, current time: 05/25/2026 02:53:24
New DB name:   /work/blastdb/ref
New DB title:  data/reference.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 5 sequences in 0.00282598 seconds.


2 of 3 query sequences had at least one BLAST hit.

The last line is the one to look for, 2 of 3 query sequences had at least one BLAST hit., same as Lesson 1. The timestamp and the “0.00282598 seconds” figure will differ on your machine; everything else should match.

Because the output files (summary.txt, hits.tsv, blastdb/) are written inside the container’s /work, they disappear when the container exits, that’s the --rm flag at work. If you want them on disk, mount the current directory too:

docker run --rm \
    --volume "$(pwd):/work" \
    <your-username>/blast-example:1.0.0 \
    blast.sh data/reference.fasta data/query.fasta
cat summary.txt

Step 4: Push to a Registry

Other people (and JAWS’ compute sites) can’t use an image that exists only on your laptop. You need to push it to a registry. You have two options.

Option A: Docker Hub (public)

The simplest path. Works for images that can be public.

docker login                                       # prompts for your Docker Hub credentials
docker push <your-username>/blast-example:1.0.0

Anyone (including JAWS) can now pull this image by name.

Option B: JGI GitLab Container Registry (`library.jgi.doe.gov:5050`)

The JGI’s private GitLab container registry, introduced in Step 2. Use it for JGI-internal images, or when you’d rather not depend on Docker Hub. You authenticate with your code.jgi.doe.gov credentials.

echo "<your-gitlab-password>" | docker login library.jgi.doe.gov:5050 -u <username> --password-stdin

# Re-tag the image you built in Step 2 to use the JGI registry hostname.
docker tag <your-username>/blast-example:1.0.0 \
           library.jgi.doe.gov:5050/<your-namespace>/blast-example:1.0.0

docker push library.jgi.doe.gov:5050/<your-namespace>/blast-example:1.0.0

Once pushed, JAWS pulls the image automatically when your workflow runs. You won’t need to docker pull it manually unless you want to test it on another machine.

Important

Call-caching is a Cromwell feature that lets JAWS skip a task and reuse its previous output when neither the inputs nor the Docker image have changed — invaluable when you’re iterating and re-running a workflow after fixing a single task. It does not work for images served from a private registry, including library.jgi.doe.gov:5050, because Cromwell can’t perform the SHA256 digest lookup against a registry it doesn’t have read credentials for. If you expect to re-run this workflow, prefer Docker Hub.

Step 5: Test the Image on Another Site

The whole point of containerizing was portability. To confirm the portability of the container test on a machine that has neither your conda environment nor a local copy of the image.

You can test this on Dori, which runs Apptainer rather than Docker itself, but Apptainer pulls and runs Docker images directly from any registry, so the image you built and pushed in Steps 2-4 works as-is.

# SSH to Dori
ssh <username>@dori.jgi.doe.gov

# Grab the tutorial data
git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/blast_example

# Pull the image from the registry (Docker Hub in this example).
# Apptainer converts the Docker image into a local .sif file.
apptainer pull docker://<your-username>/blast-example:1.0.0

# Run the workflow. --bind is Apptainer's equivalent of docker's --volume:
# it makes your current directory visible inside the container at /work.
apptainer exec \
    --bind "$(pwd):/work" \
    --pwd /work \
    blast-example_1.0.0.sif \
    blast.sh data/reference.fasta data/query.fasta

If you see the same 2 of 3 query sequences had at least one BLAST hit. output as in Lesson 1 and in your local Docker run, you’ve successfully shipped a portable workflow.

Note

You don’t normally invoke apptainer pull / apptainer exec by hand for JAWS runs, JAWS does it for you when the workflow lands on the site. The point of running it manually here is to prove the image works on a JAWS compute site before you wrap it in WDL (Lesson 3).

Key Considerations

Guidelines to consider knowing before you start building images for production workflows.

One Docker Image per WDL Task

When you wrap your workflow in WDL (Lesson 3), each task’s runtime { docker: ... } block names exactly one image. A task cannot use multiple images. You can reuse the same image across many tasks though, which is the common pattern for a single-tool workflow.

JAWS optimization: JAWS checks whether an image has already been pulled on the file system and skips re-pulling it, so reusing one image across multiple tasks is efficient.

Public Images

All container images JAWS uses must be readable by the JAWS service identity. Public Docker Hub images work out of the box. Private Docker Hub repositories work if JAWS has been configured with credentials (the JAWS team maintains a paid Docker Hub organization for JGI users, ask in #jaws if you need access). Private GitLab registries (library.jgi.doe.gov:5050) also work for pulling the image, but as noted in Step 4 above, call-caching doesn’t work for them.

Prefer SHA256 Digests Over Tags

When you reference the image in your WDL runtime block, use the SHA256 digest rather than a version tag:

runtime {
    # Good: digest is immutable
    docker: "<your-username>/blast-example@sha256:abc123…"

    # Less good: tag can be moved silently to a different image
    docker: "<your-username>/blast-example:1.0.0"
}

Tags are mutable; someone can push a new image under the same tag and silently break your reproducibility (and your call-caching). The digest is immutable and uniquely identifies a specific image build.

To find the digest of an image you just pushed:

docker images --digests | grep blast-example
# Or, after pushing:
docker inspect --format='{{.RepoDigests}}' <your-username>/blast-example:1.0.0

On a Shifter-equipped host (NERSC):

shifterimg lookup <your-username>/blast-example:1.0.0

Troubleshooting

What’s Next

You now have a portable Docker image of your workflow. Next:

Lesson 3: Writing WDLs, which wraps the same containerized commands into a WDL workflow that Cromwell (and JAWS) can execute. The WDL’s runtime { docker: ... } block will reference the image you just pushed.