Setting up a Local Development Environment
Summary
This is Lesson 1 of the JAWS tutorial series. By the end of it you’ll have a working local environment in which you can iterate on a workflow before you ever submit anything to JAWS.
Three terms you’ll see throughout the series:
JAWS (JGI Analysis Workflow Service) is the workflow execution service you’ll ultimately submit your work to. It runs your workflows across multiple HPC sites (Dori, Perlmutter, Tahoma, and others).
WDL (Workflow Description Language) is the language you use to describe a workflow, that is, which commands to run, in what order, and what inputs and outputs each step has.
Cromwell is the workflow engine that executes WDL workflows. JAWS uses Cromwell internally, and you can also run Cromwell directly on your laptop while you’re developing.
The typical workflow when you’re building something new for JAWS:
Write and test the underlying code (a Python script, a bash pipeline, whatever it is) in your own development environment.
Wrap that code into a Docker image so it can run anywhere.
Wrap the image-running step into a WDL workflow and test the WDL locally with Cromwell.
Submit the same WDL to JAWS for the real run on HPC.
This lesson covers step 1: getting a local environment in which you can run the example workflow’s commands as plain bash, so you know your tools are healthy before you start adding Docker, WDL, or JAWS on top. The example you’ll use throughout the series is a small BLAST workflow (makeblastdb and blastn), because BLAST is universally recognized in biology, installs in seconds, and runs in seconds. Steps 2, 3, and 4 of the workflow above come in Lessons 2, 3, and 4.
Prerequisites
Before starting you’ll need:
A laptop or remote shell with the ability to install conda (macOS, Linux, or WSL on Windows).
Internet access to download Miniconda and conda packages.
gitinstalled.About 1 GB of free disk space.
You do not need access to a JAWS site yet. This lesson runs entirely on your own machine.
What You’ll Build
A conda environment named blast-tutorial containing the BLAST+ command-line tools (makeblastdb, blastn, and friends), plus a clone of the jaws-tutorial-examples repository, which contains a small reference workflow you’ll run to confirm everything is wired up.
Note
The example workflow uses BLAST because that’s a recognizable, fast, and easy-to-install tool. Your own workflows will install whatever their tools need. The pattern is the same; only the package list changes.
Step 1: Install Miniconda
Download the appropriate Miniconda installer for your platform. On macOS, pick arm64 if you’re on Apple Silicon (M1/M2/M3) and x86_64 on older Intel Macs; if you’re not sure, run uname -m and match the output.
# macOS example; pick the file that matches your OS and CPU architecture
bash Miniconda3-latest-MacOSX-x86_64.sh
Answer yes when the installer asks: “Do you wish the installer to initialize Miniconda3 by running conda init?” That step adds conda to your shell’s PATH by editing ~/.bash_profile or ~/.bashrc.
Open a new shell (or run source ~/.bashrc) so the change takes effect, then optionally turn off the automatic activation of the base env:
conda config --set auto_activate_base false
Verify the install:
which conda
conda --version
You should see a path under your home directory (something like ~/miniconda3/bin/conda) and a version number.
Step 2: Create the blast-tutorial Conda Environment
Create the environment and activate it. We don’t pin a Python version here because BLAST is the only dependency and doesn’t need one; for a Python-based workflow you’d typically add python=3.11 (or whichever version you need) to the conda create line.
conda create --name blast-tutorial
conda activate blast-tutorial
Your shell prompt should now show (blast-tutorial) at the front, indicating the env is active.
Install BLAST+ from bioconda:
conda install -c conda-forge -c bioconda blast
Verify the install:
makeblastdb -version
blastn -version
Each should print a version number. If either fails, the most common cause is that conda activate blast-tutorial wasn’t run in the current shell.
Step 3: Clone the Example Workflow Repository
git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/blast_example
Inside blast_example/ you’ll find:
blast.sh— the plain bash version of the BLAST workflow. You’ll use it in this lesson.Dockerfile— a recipe for building a Docker image with BLAST installed. You’ll use it in Lesson 2.blast.wdl— the same workflow written as a WDL. You’ll use it in Lesson 3.inputs.json— the inputs file Cromwell and JAWS read. You’ll use it in Lesson 4.data/reference.fastaanddata/query.fasta— small synthetic FASTA files used by every lesson.
Each subsequent lesson builds on the previous one’s artifact. For this lesson you only need blast.sh and data/.
Step 4: Run the Bash Script
Before involving Docker, WDL, or Cromwell, run the plain bash version of the workflow to confirm BLAST is installed and your shell can find it.
First, run it with no arguments. That triggers a built-in tools check that just looks up makeblastdb and blastn on your PATH:
./blast.sh
You should see the paths to makeblastdb and blastn in the output. If either reports “NOT FOUND,” your conda env isn’t fully active or the install didn’t complete; go back to Step 2.
Now run the actual workflow:
./blast.sh data/reference.fasta data/query.fasta
This runs three commands in sequence:
makeblastdb— builds a small nucleotide BLAST database fromreference.fasta, writing the database files into ablastdb/directory.blastn— searches each sequence inquery.fastaagainst the database, writing tabular hits tohits.tsv.A small awk one-liner — counts how many of the queries had at least one hit and writes a one-line summary to
summary.txt.
Verify the output:
cat summary.txt
head hits.tsv
The example data is constructed so that:
query_1is identical toref_01and produces a hit at 100.000% identity over the full 1000 bp.query_2is a mutated copy ofref_02and produces a hit at about 88.8% identity over 999 bp.query_3is unrelated random sequence and produces no significant hit.
The summary.txt file should therefore report 2 of 3 queries with at least one hit, and the first two lines of hits.tsv should look roughly like:
query_1 ref_01 100.000 1000 0 0 1 1000 1 1000 0.0 1847
query_2 ref_02 88.789 999 112 0 1 999 1 999 0.0 1225
(Tested against the bundled data/ files. The exact bit-score in the last column may vary slightly across BLAST versions.)
If you see that, your local BLAST environment works and you’ve successfully run the example workflow as plain bash. You’re done with Lesson 1.
Troubleshooting
A few common stumbling blocks at this stage.
conda: command not found after installing Miniconda 🔗
The installer’s conda init step modifies ~/.bashrc or ~/.bash_profile, but the change only takes effect in new shells. Either open a new terminal or source ~/.bashrc (or source ~/.bash_profile on macOS) to pick up the change.
makeblastdb or blastn not found inside the activated env 🔗
Confirm the env is actually active:
conda env list
The active env has a * next to it. If blast-tutorial isn’t active, run conda activate blast-tutorial and try again.
conda install -c bioconda blast is slow or fails to resolve dependencies 🔗
Newer conda versions can be slow when resolving bioconda packages. Try the faster libmamba solver:
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
Then re-run conda install -c bioconda blast.
What’s Next
You now have a working local environment in which you can iterate on the example workflow’s bash commands. Next:
Lesson 2: Docker Containers, which packages the same BLAST commands into a Docker image so the workflow can run on machines that don’t have BLAST installed (which is the situation on every JAWS compute site).