Creating Docker Containers
Summary
This is Lesson 2 of the JAWS tutorial series. In Lesson 1 you ran the BLAST workflow as plain bash inside a conda environment on your laptop. This lesson takes those same commands and packages them into a Docker image so the workflow can run on any machine with a container runtime, without anyone having to install BLAST first.
This is still a local development step, you’re not submitting to JAWS yet. The goal is to develop your code and build the container so that you can run and test it wherever you happen to be working: your laptop (which typically has Docker), Dori (Apptainer), or NERSC (Shifter). Once the container works, you’ll be ready to submit it to JAWS in a later lesson. JAWS itself uses Apptainer and Shifter on its compute sites, so a container that runs locally will also run there.
A quick vocabulary check before you start: an image is the built artifact, a snapshot of an operating system plus your installed tools and scripts; a container is a running instance of that image. You build an image once and run many containers from it, each isolated from the others. Apptainer and Shifter are alternative container runtimes used on HPC sites where security policy doesn’t allow Docker to run as root; they pull and execute the same Docker images you build, so you don’t need separate images for each.
If you’re unfamiliar with Docker, please see the official Get started with Docker guide.
By the end of this lesson you’ll have:
A Docker image built from the
Dockerfileinblast_example/.That image pushed to a registry (Docker Hub or the JGI GitLab Container Registry).
A confirmed test run of the BLAST workflow inside the container.
Prerequisites
Completed Lesson 1: Local Development Environment. You have the
jaws-tutorial-examplesrepo cloned andblast.shworking inside your conda env.Docker installed and running on your machine. See docs.docker.com/install for instructions per platform. Confirm with:
docker --version docker run --rm hello-world
The
hello-worldtest pulls a tiny image, runs it, and prints a confirmation message. If this fails you can’t continue until you fix Docker.A free account on Docker Hub if you want to use Option A of the push step. If you’ll only push to
library.jgi.doe.gov:5050(JGI internal), you don’t need a Docker Hub account.
Automating this with CI/CD
This tutorial walks through the build-and-push steps by hand so you understand what’s happening. Once you’ve done it once, the JAWS team maintains a template repo, jaws-docker-builder, that automates the whole build-and-push cycle via GitLab CI. Every time you push a change to your Dockerfile or scripts, the pipeline rebuilds the image and pushes a new tag.
See the jaws-docker-builder README for setup. It supports both Docker Hub and library.jgi.doe.gov:5050 as destinations.
Step 1: Look at the Dockerfile
From Lesson 1, you should already be inside the example directory:
cd jaws-tutorial-examples/blast_example
The Dockerfile in that directory is short:
FROM ubuntu:22.04
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ncbi-blast+ \
&& rm -rf /var/lib/apt/lists/*
COPY blast.sh /usr/local/bin/blast.sh
RUN chmod +x /usr/local/bin/blast.sh
WORKDIR /work
Three things to notice:
FROM ubuntu:22.04, every image starts from a base image. Here we use the stock Ubuntu 22.04 image from Docker Hub. The
apt-getinstruction installs BLAST+ from the Ubuntu package archive (thencbi-blast+package), which is the simplest install path on a Debian/Ubuntu base. Other tools may need conda inside the image, or a build-from-source step; the pattern is the same.COPY blast.sh /usr/local/bin/blast.sh copies your script into a directory on the container’s
PATH, so runningblast.shinside the container Just Works.WORKDIR /work sets the container’s working directory. When you
docker run --volumeyour data into/work(Step 3 below), the script will finddata/reference.fastaanddata/query.fastarelative to/work.
That’s the entire image. No miniconda, no extra dependencies; BLAST ships as a self-contained Ubuntu package.
What if my workflow needs conda inside the container? 🔗
BLAST is easy because Ubuntu ships an apt-get-installable package. Most other bioinformatics tools (the ones you used in Lesson 1’s conda env) don’t have an apt-get equivalent and you’ll want to install them via conda inside the image instead. The pattern looks like:
FROM ubuntu:22.04
# Tools needed to fetch and install miniconda
RUN apt-get update && \
apt-get install -y --no-install-recommends wget bzip2 ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Install miniconda to a world-readable path (NOT /root/miniconda3 — see note below)
ENV CONDAPATH=/usr/local/miniconda3
ENV PATH=$CONDAPATH/bin:$PATH
RUN wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& bash Miniconda3-latest-Linux-x86_64.sh -b -p $CONDAPATH \
&& rm Miniconda3-latest-Linux-x86_64.sh
# Install your bioinformatics tools.
# Example: seqkit (fast FASTA/FASTQ manipulation) + samtools (SAM/BAM/CRAM utilities).
# Swap in whatever your workflow actually needs.
# Pin versions so the image is reproducible.
RUN conda install -c conda-forge -c bioconda -y \
seqkit=2.8.2 \
samtools=1.19 \
&& conda clean -afy
# Quick sanity check at build time — the build fails loudly if either tool is broken.
RUN seqkit version && samtools --version | head -1
COPY your_script.sh /usr/local/bin/your_script.sh
RUN chmod +x /usr/local/bin/your_script.sh
WORKDIR /work
Please pay special attention to the following three things to avoid problems:
Don’t install miniconda to its default path (
/root/miniconda3). Shifter and Apptainer run the container as a non-root user that can’t read root’s home directory, so your tools will appear “not found” on JAWS compute sites even though they’re inside the image. Install to a world-readable path like/usr/local/miniconda3(as above) and put$CONDAPATH/binon thePATHviaENV.``conda activate`` doesn’t reliably work inside a Dockerfile or a non-interactive container shell. Don’t try to
conda activate myenvin aRUNline. Either install everything into the base env (as above), or put the env’sbin/directory directly onPATH. Cromwell’s command block also runs non-interactively, so the script can’t rely on conda activation either.Conda images get fat fast. Always run
conda clean -afyin the sameRUNstep that does the install (so the cleanup ends up in the same Docker layer). Without it, your image can easily balloon to 2–3 GB and pulls become painfully slow on every JAWS run.
For private-registry workflows, the same conda recipe applies; just remember the call-caching caveat from Step 4 below.
Step 2: Build the Image
Build the image, tagging it with your Docker Hub (or other) username and a version:
docker build --tag <your-username>/blast-example:1.0.0 .
The --tag (or -t) gives the image a human-readable name. The convention <namespace>/<image-name>:<version> matters when you push:
For Docker Hub (the public registry most open-source projects publish to):
<your-docker-hub-username>/<image-name>:<version>.For
library.jgi.doe.gov:5050(the JGI’s private GitLab container registry — use it for images you don’t want public): include the registry hostname, e.g.library.jgi.doe.gov:5050/<your-gitlab-namespace>/<image-name>:<version>.
You can re-tag an image later (Step 4), so don’t worry about getting the registry name right on the first build.
Verify the image exists:
docker images | grep blast-example
You should see one row, with the tag you used and a SHA-shortened image ID.
Step 3: Run the Image Locally
This is the moment of truth: prove that the same workflow that ran in your conda env in Lesson 1 also runs inside the container.
Important
Make sure you’re inside jaws-tutorial-examples/blast_example for the rest of this step. The --volume "$(pwd)/data:/work/data" flag below uses $(pwd) (your current directory), so it only mounts the right files if your shell is in blast_example/.
cd jaws-tutorial-examples/blast_example
pwd
# .../jaws-tutorial-examples/blast_example <-- should end in this
First, run the script with no arguments to confirm BLAST is installed inside the image:
docker run --rm <your-username>/blast-example:1.0.0 blast.sh
You should see the paths to makeblastdb and blastn (now under /usr/bin/ instead of your conda env). If they’re not found, the apt-get install step in the Dockerfile didn’t complete cleanly; rebuild.
Now try to run the actual workflow, without mounting anything yet. This will fail in an instructive way:
docker run --rm <your-username>/blast-example:1.0.0 \
blast.sh data/reference.fasta data/query.fasta
You’ll see something like:
Building a new DB, current time: 05/25/2026 02:40:53
New DB name: /work/blastdb/ref
New DB title: data/reference.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: File data/reference.fasta does not exist
That’s the key lesson about containers: the container has its own filesystem and cannot see your laptop’s files unless you explicitly hand them in. Your data/ directory lives on your laptop; inside the container, /work/data is empty. makeblastdb happily started, then blastn couldn’t find reference.fasta, because there’s no such file in the container’s view of the world.
Fix it by mounting data/ into the container with --volume:
docker run --rm \
--volume "$(pwd)/data:/work/data" \
<your-username>/blast-example:1.0.0 \
blast.sh data/reference.fasta data/query.fasta
What’s happening:
--rmremoves the container after it exits. Good hygiene; otherwise stopped containers accumulate.--volume "$(pwd)/data:/work/data"mounts your localdata/directory at/work/datainside the container. TheWORKDIR /workfrom the Dockerfile means the script’sdata/reference.fastaargument resolves to/work/data/reference.fastainside the container.
Expected output:
Building a new DB, current time: 05/25/2026 02:53:24
New DB name: /work/blastdb/ref
New DB title: data/reference.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 5 sequences in 0.00282598 seconds.
2 of 3 query sequences had at least one BLAST hit.
The last line is the one to look for, 2 of 3 query sequences had at least one BLAST hit., same as Lesson 1. The timestamp and the “0.00282598 seconds” figure will differ on your machine; everything else should match.
Because the output files (summary.txt, hits.tsv, blastdb/) are written inside the container’s /work, they disappear when the container exits, that’s the --rm flag at work. If you want them on disk, mount the current directory too:
docker run --rm \
--volume "$(pwd):/work" \
<your-username>/blast-example:1.0.0 \
blast.sh data/reference.fasta data/query.fasta
cat summary.txt
Step 4: Push to a Registry
Other people (and JAWS’ compute sites) can’t use an image that exists only on your laptop. You need to push it to a registry. You have two options.
Option A: Docker Hub (public)
The simplest path. Works for images that can be public.
docker login # prompts for your Docker Hub credentials
docker push <your-username>/blast-example:1.0.0
Anyone (including JAWS) can now pull this image by name.
Option B: JGI GitLab Container Registry (library.jgi.doe.gov:5050)
The JGI’s private GitLab container registry, introduced in Step 2. Use it for JGI-internal images, or when you’d rather not depend on Docker Hub. You authenticate with your code.jgi.doe.gov credentials.
echo "<your-gitlab-password>" | docker login library.jgi.doe.gov:5050 -u <username> --password-stdin
# Re-tag the image you built in Step 2 to use the JGI registry hostname.
docker tag <your-username>/blast-example:1.0.0 \
library.jgi.doe.gov:5050/<your-namespace>/blast-example:1.0.0
docker push library.jgi.doe.gov:5050/<your-namespace>/blast-example:1.0.0
Once pushed, JAWS pulls the image automatically when your workflow runs. You won’t need to docker pull it manually unless you want to test it on another machine.
Important
Call-caching is a Cromwell feature that lets JAWS skip a task and reuse its previous output when neither the inputs nor the Docker image have changed — invaluable when you’re iterating and re-running a workflow after fixing a single task. It does not work for images served from a private registry, including library.jgi.doe.gov:5050, because Cromwell can’t perform the SHA256 digest lookup against a registry it doesn’t have read credentials for. If you expect to re-run this workflow, prefer Docker Hub.
Step 5: Test the Image on Another Site
The whole point of containerizing was portability. To confirm the portability of the container test on a machine that has neither your conda environment nor a local copy of the image.
You can test this on Dori, which runs Apptainer rather than Docker itself, but Apptainer pulls and runs Docker images directly from any registry, so the image you built and pushed in Steps 2-4 works as-is.
# SSH to Dori
ssh <username>@dori.jgi.doe.gov
# Grab the tutorial data
git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/blast_example
# Pull the image from the registry (Docker Hub in this example).
# Apptainer converts the Docker image into a local .sif file.
apptainer pull docker://<your-username>/blast-example:1.0.0
# Run the workflow. --bind is Apptainer's equivalent of docker's --volume:
# it makes your current directory visible inside the container at /work.
apptainer exec \
--bind "$(pwd):/work" \
--pwd /work \
blast-example_1.0.0.sif \
blast.sh data/reference.fasta data/query.fasta
If you see the same 2 of 3 query sequences had at least one BLAST hit. output as in Lesson 1 and in your local Docker run,
you’ve successfully shipped a portable workflow.
Note
You don’t normally invoke apptainer pull / apptainer exec by hand for JAWS runs, JAWS does it for you when the workflow lands on the site.
The point of running it manually here is to prove the image works on a JAWS compute site before you wrap it in WDL (Lesson 3).
Testing on NERSC (Perlmutter / Shifter) instead 🔗
NERSC machines use Shifter instead of Apptainer. The image pulls the same way; only the command changes:
# On perlmutter-p1.nersc.gov
shifterimg pull <your-username>/blast-example:1.0.0
git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/blast_example
shifter --image=<your-username>/blast-example:1.0.0 blast.sh data/reference.fasta data/query.fasta
Same expected output. JAWS picks the right runtime (Shifter on NERSC sites, Apptainer on JGI/Tahoma) automatically, so you don’t have to know which is which when you submit a workflow.
Key Considerations
Guidelines to consider knowing before you start building images for production workflows.
One Docker Image per WDL Task
When you wrap your workflow in WDL (Lesson 3), each task’s runtime { docker: ... } block names exactly one image. A task cannot use multiple images. You can reuse the same image across many tasks though, which is the common pattern for a single-tool workflow.
JAWS optimization: JAWS checks whether an image has already been pulled on the file system and skips re-pulling it, so reusing one image across multiple tasks is efficient.
Public Images
All container images JAWS uses must be readable by the JAWS service identity. Public Docker Hub images work out of the box. Private Docker Hub repositories work if JAWS has been configured with credentials (the JAWS team maintains a paid Docker Hub organization for JGI users, ask in #jaws if you need access). Private GitLab registries (library.jgi.doe.gov:5050) also work for pulling the image, but as noted in Step 4 above, call-caching doesn’t work for them.
Troubleshooting
docker build fails with Unable to locate package ncbi-blast+ 🔗
Usually means the apt-get update step in the Dockerfile didn’t run, or you changed the base image away from Ubuntu. The ncbi-blast+ package is in the Ubuntu archive but not in some other distributions (e.g. Alpine). Re-run with --no-cache to force a full rebuild:
docker build --no-cache --tag <your-username>/blast-example:1.0.0 .
docker run fails with “No such file or directory” looking for data/reference.fasta 🔗
The container can only see files you explicitly mount with --volume. Confirm that:
You’re running
docker runfrom insideblast_example/(so$(pwd)/dataresolves to the right path).The
--volumemount maps your localdata/to/work/datainside the container (the Dockerfile setsWORKDIR /work, so the script’s relative paths resolve from there).
To debug, docker run with a shell and look around:
docker run --rm -it --volume "$(pwd):/work" <your-username>/blast-example:1.0.0 bash
# Inside the container:
ls /work
ls /work/data
docker push fails with “denied: requested access to the resource is denied” 🔗
Either you’re not logged in, or the image tag’s namespace doesn’t match the account you’re logged in as. Confirm:
docker login # for Docker Hub
docker images | grep blast-example # confirm the tag prefix matches your username
If the tag prefix is wrong, re-tag:
docker tag oldname/blast-example:1.0.0 <correct-username>/blast-example:1.0.0
What’s Next
You now have a portable Docker image of your workflow. Next:
Lesson 3: Writing WDLs, which wraps the same containerized commands into a WDL workflow that Cromwell (and JAWS) can execute. The WDL’s
runtime { docker: ... }block will reference the image you just pushed.
See also
jaws-docker-builder, the CI/CD template repo for automating the build-and-push cycle.
JAWS Troubleshooting RoadMap, debugging guide for failed JAWS runs.