========================================== Setting up a Local Development Environment ========================================== .. role:: bash(code) :language: bash Summary ======= This is **Lesson 1** of the JAWS tutorial series. By the end of it you'll have a working local environment in which you can iterate on a workflow before you ever submit anything to JAWS. Three terms you'll see throughout the series: - **JAWS** (JGI Analysis Workflow Service) is the workflow execution service you'll ultimately submit your work to. It runs your workflows across multiple HPC sites (Dori, Perlmutter, Tahoma, and others). - **WDL** (Workflow Description Language) is the language you use to *describe* a workflow, that is, which commands to run, in what order, and what inputs and outputs each step has. - **Cromwell** is the workflow engine that *executes* WDL workflows. JAWS uses Cromwell internally, and you can also run Cromwell directly on your laptop while you're developing. The typical workflow when you're building something new for JAWS: 1. Write and test the underlying code (a Python script, a bash pipeline, whatever it is) in your own development environment. 2. Wrap that code into a Docker image so it can run anywhere. 3. Wrap the image-running step into a WDL workflow and test the WDL locally with Cromwell. 4. Submit the same WDL to JAWS for the real run on HPC. This lesson covers **step 1**: getting a local environment in which you can run the example workflow's commands as plain bash, so you know your tools are healthy before you start adding Docker, WDL, or JAWS on top. The example you'll use throughout the series is a small **BLAST** workflow (``makeblastdb`` and ``blastn``), because BLAST is universally recognized in biology, installs in seconds, and runs in seconds. Steps 2, 3, and 4 of the workflow above come in Lessons 2, 3, and 4. Prerequisites ============= Before starting you'll need: - A laptop or remote shell with the ability to install conda (macOS, Linux, or WSL on Windows). - Internet access to download Miniconda and conda packages. - ``git`` installed. - About 1 GB of free disk space. You do **not** need access to a JAWS site yet. This lesson runs entirely on your own machine. What You'll Build ================= A conda environment named ``blast-tutorial`` containing the BLAST+ command-line tools (``makeblastdb``, ``blastn``, and friends), plus a clone of the `jaws-tutorial-examples `_ repository, which contains a small reference workflow you'll run to confirm everything is wired up. .. note:: The example workflow uses BLAST because that's a recognizable, fast, and easy-to-install tool. Your own workflows will install whatever *their* tools need. The pattern is the same; only the package list changes. Step 1: Install Miniconda ========================= Download the appropriate `Miniconda installer `_ for your platform. On macOS, pick ``arm64`` if you're on Apple Silicon (M1/M2/M3) and ``x86_64`` on older Intel Macs; if you're not sure, run ``uname -m`` and match the output. .. code-block:: bash # macOS example; pick the file that matches your OS and CPU architecture bash Miniconda3-latest-MacOSX-x86_64.sh Answer **yes** when the installer asks: *"Do you wish the installer to initialize Miniconda3 by running conda init?"* That step adds conda to your shell's ``PATH`` by editing ``~/.bash_profile`` or ``~/.bashrc``. Open a new shell (or run ``source ~/.bashrc``) so the change takes effect, then optionally turn off the automatic activation of the ``base`` env: .. code-block:: bash conda config --set auto_activate_base false **Verify the install:** .. code-block:: bash which conda conda --version You should see a path under your home directory (something like ``~/miniconda3/bin/conda``) and a version number. Step 2: Create the ``blast-tutorial`` Conda Environment ======================================================== Create the environment and activate it. We don't pin a Python version here because BLAST is the only dependency and doesn't need one; for a Python-based workflow you'd typically add ``python=3.11`` (or whichever version you need) to the ``conda create`` line. .. code-block:: bash conda create --name blast-tutorial conda activate blast-tutorial Your shell prompt should now show ``(blast-tutorial)`` at the front, indicating the env is active. Install BLAST+ from bioconda: .. code-block:: bash conda install -c conda-forge -c bioconda blast **Verify the install:** .. code-block:: bash makeblastdb -version blastn -version Each should print a version number. If either fails, the most common cause is that ``conda activate blast-tutorial`` wasn't run in the current shell. Step 3: Clone the Example Workflow Repository ============================================= .. code-block:: bash git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git cd jaws-tutorial-examples/blast_example Inside ``blast_example/`` you'll find: - ``blast.sh`` — the plain bash version of the BLAST workflow. You'll use it in this lesson. - ``Dockerfile`` — a recipe for building a Docker image with BLAST installed. You'll use it in Lesson 2. - ``blast.wdl`` — the same workflow written as a WDL. You'll use it in Lesson 3. - ``inputs.json`` — the inputs file Cromwell and JAWS read. You'll use it in Lesson 4. - ``data/reference.fasta`` and ``data/query.fasta`` — small synthetic FASTA files used by every lesson. Each subsequent lesson builds on the previous one's artifact. For this lesson you only need ``blast.sh`` and ``data/``. Step 4: Run the Bash Script =========================== Before involving Docker, WDL, or Cromwell, run the plain bash version of the workflow to confirm BLAST is installed and your shell can find it. First, run it with no arguments. That triggers a built-in tools check that just looks up ``makeblastdb`` and ``blastn`` on your ``PATH``: .. code-block:: bash ./blast.sh You should see the paths to ``makeblastdb`` and ``blastn`` in the output. If either reports "NOT FOUND," your conda env isn't fully active or the install didn't complete; go back to Step 2. Now run the actual workflow: .. code-block:: bash ./blast.sh data/reference.fasta data/query.fasta This runs three commands in sequence: 1. ``makeblastdb`` — builds a small nucleotide BLAST database from ``reference.fasta``, writing the database files into a ``blastdb/`` directory. 2. ``blastn`` — searches each sequence in ``query.fasta`` against the database, writing tabular hits to ``hits.tsv``. 3. A small awk one-liner — counts how many of the queries had at least one hit and writes a one-line summary to ``summary.txt``. **Verify the output:** .. code-block:: bash cat summary.txt head hits.tsv The example data is constructed so that: - ``query_1`` is identical to ``ref_01`` and produces a hit at **100.000%** identity over the full 1000 bp. - ``query_2`` is a mutated copy of ``ref_02`` and produces a hit at about **88.8%** identity over 999 bp. - ``query_3`` is unrelated random sequence and produces **no significant hit**. The ``summary.txt`` file should therefore report **2 of 3** queries with at least one hit, and the first two lines of ``hits.tsv`` should look roughly like: .. code-block:: text query_1 ref_01 100.000 1000 0 0 1 1000 1 1000 0.0 1847 query_2 ref_02 88.789 999 112 0 1 999 1 999 0.0 1225 (Tested against the bundled ``data/`` files. The exact bit-score in the last column may vary slightly across BLAST versions.) If you see that, your local BLAST environment works and you've successfully run the example workflow as plain bash. You're done with Lesson 1. Troubleshooting =============== A few common stumbling blocks at this stage. .. dropdown:: ``conda: command not found`` after installing Miniconda :ref:`🔗 ` :color: info :name: local-env-conda-command-not-found-after-installing-miniconda :animate: fade-in The installer's ``conda init`` step modifies ``~/.bashrc`` or ``~/.bash_profile``, but the change only takes effect in *new* shells. Either open a new terminal or ``source ~/.bashrc`` (or ``source ~/.bash_profile`` on macOS) to pick up the change. .. dropdown:: ``makeblastdb`` or ``blastn`` not found inside the activated env :ref:`🔗 ` :color: info :name: local-env-makeblastdb-or-blastn-not-found-inside-the :animate: fade-in Confirm the env is actually active: .. code-block:: bash conda env list The active env has a ``*`` next to it. If ``blast-tutorial`` isn't active, run ``conda activate blast-tutorial`` and try again. .. dropdown:: ``conda install -c bioconda blast`` is slow or fails to resolve dependencies :ref:`🔗 ` :color: info :name: local-env-conda-install-c-bioconda-blast-is-slow-or-fails :animate: fade-in Newer ``conda`` versions can be slow when resolving bioconda packages. Try the faster ``libmamba`` solver: .. code-block:: bash conda install -n base conda-libmamba-solver conda config --set solver libmamba Then re-run ``conda install -c bioconda blast``. What's Next =========== You now have a working local environment in which you can iterate on the example workflow's bash commands. Next: - :doc:`Lesson 2: Docker Containers `, which packages the same BLAST commands into a Docker image so the workflow can run on machines that don't have BLAST installed (which is the situation on every JAWS compute site).