=============================== How to Define the Input Data =============================== .. role:: bash(code) :language: bash Summary ======= This is **Lesson 4** of the JAWS tutorial series. In :doc:`Lesson 3 ` you wrote ``blast.wdl``. The WDL describes *what* the workflow does; the **inputs file** says *which* files to run it on. .. admonition:: Pre-requisites - Completed :doc:`Lesson 3: Writing WDLs `. You have ``blast.wdl`` from ``jaws-tutorial-examples/blast_example/``. The Basic Format ================ The inputs file is a JSON object. Each key is ``.``; each value is whatever that input expects. For the BLAST workflow: .. code-block:: json { "blast_example.reference_fasta": "data/reference.fasta", "blast_example.query_fasta": "data/query.fasta" } Keys are case-sensitive and must match the WDL exactly. You can generate the template (so the keys are typo-proof) with: .. code-block:: bash jaws inputs blast.wdl > inputs.json Then edit each value to point at a real file. File Paths ========== Input values of type ``File`` can be: 1. **A relative path.** Resolved against the **location of the** ``inputs.json`` **file itself**, not the directory you submit from. 2. **An absolute path.** 3. **A URL** (``http://``, ``https://``, ``ftp://``). JAWS downloads the file once and hands it to the task as a local file. .. important:: The relative-path rule above is **JAWS-specific**. Plain Cromwell resolves relative paths against the directory you submit from; JAWS resolves them against the directory containing ``inputs.json``. The JAWS community made this choice so an inputs file can live next to its WDL and stay portable. For the BLAST example, ``inputs.json`` lives in ``blast_example/``, and the data lives in ``blast_example/data/``, so the relative path is just ``data/reference.fasta``. If ``inputs.json`` were one directory up, the path would be ``blast_example/data/reference.fasta``. Reference Data (``/refdata``) ============================= .. note:: The BLAST tutorial example doesn't use refdata — its reference FASTA is small enough to ship in the repo's ``data/`` directory. The pattern below is what you'll reach for in your *own* workflows once you start pointing at large shared resources. Reference data, BLAST databases, genome FASTAs, annotation files, anything large that gets reused across many runs, lives in **JAWS refdata**, not as an absolute path in your inputs.json. JAWS stores refdata centrally and syncs it to every compute site. Inside your task it's mounted at ``/refdata`` on whichever site the workflow lands on, so you write the same inputs.json regardless of where the run goes. Two rules: 1. **Declare refdata inputs as** ``String``, **not** ``File``. Cromwell would otherwise try to stage the file out of the container and fail; ``/refdata`` only exists inside. 2. **Use the** ``/refdata//...`` **path** in inputs.json, never the underlying Perlmutter absolute path. .. code-block:: text # In the WDL String reference_db # NOT File .. code-block:: json { "blast_example.reference_db": "/refdata/myteam/blast/swissprot", "blast_example.query_fasta": "data/query.fasta" } For setup, syncing, group permissions, and manifest files, see :doc:`/jaws/jaws_refdata`. Arrays and Maps =============== WDL types map onto JSON directly. ``Array[File]`` becomes a JSON array; ``Map[String, String]`` becomes a JSON object: .. code-block:: json { "blast_example.query_fastas": ["data/q1.fasta", "data/q2.fasta"], "blast_example.sample_info": {"name": "patient_42", "tissue": "liver"} } A single string where the WDL expects ``Array[File]`` is a type error; wrap it as ``["..."]``. Caching ======= JAWS caches recently-used input files in a staging area on each site. If you re-run a workflow with the same input file, JAWS reuses the staged copy rather than re-transferring it, which matters most for large inputs (e.g. a 100 GB BLAST database transferred once and then reused). The retention window for the staging area, and the related purge policy for Cromwell execution directories, are documented in :doc:`/jaws/jaws_policies`. Any input you use at least once within that window stays staged. Temporary Files (``/tmp``) ========================== If a task writes large intermediates and needs a guaranteed-writable temp location, use ``$TMPDIR`` inside the ``command`` block (JAWS wires it up to ``/tmp`` for you). See `Use of $JAWS_SITE Environment Variable <../jaws/jaws_guide.html#use-of-jaws-site-environment-variable>`_ in the JAWS Guide for the exact mechanism and an example. Run It ====== With ``blast.wdl`` and ``inputs.json``, you can run the workflow locally with Cromwell to confirm everything is wired up: .. code-block:: bash java -jar /path/to/cromwell-87.jar run blast.wdl --inputs inputs.json For submitting the same workflow to JAWS — ``jaws submit``, choosing a site with ``jaws info``, tracking a run with ``jaws status`` and ``jaws log``, and finding your output files — see :doc:`/jaws/jaws_quickstart` for the end-to-end walkthrough. Troubleshooting =============== .. dropdown:: ``File not found`` once the task starts running :ref:`🔗 ` :color: info :name: inputs-file-not-found :animate: fade-in - **Relative path resolving against the wrong directory.** Remember the JAWS rule: relative paths in ``inputs.json`` are resolved against the directory ``inputs.json`` lives in, not the directory you submit from. - **Meant to use refdata but used a Perlmutter absolute path.** Refdata is accessed via ``/refdata//...`` with the input declared as ``String``. See :doc:`/jaws/jaws_refdata`. - **URL unreachable from the compute site.** Pre-download the file or check with the JAWS team. .. dropdown:: ``Type mismatch``: JSON value doesn't match the WDL type :ref:`🔗 ` :color: info :name: inputs-type-mismatch :animate: fade-in The most common version: a bare string where the WDL declares ``Array[File]``. Wrap it as ``["..."]``. Same rule for ``Map[K, V]`` (must be a JSON object) and ``Int`` (unquoted number, not a string). You're Done with the Series =========================== - :doc:`Lesson 1: Local Development Environment ` - :doc:`Lesson 2: Docker Containers ` - :doc:`Lesson 3: Writing WDLs ` - **Lesson 4 (this lesson)** Next steps as you move to your own workflows: - :doc:`/jaws/jaws_quickstart`, end-to-end submission walkthrough. - :doc:`/jaws/jaws_usage`, full CLI reference. - :doc:`/jaws/jaws_refdata`, the refdata workflow in detail. - :doc:`/Resources/best_practices`, patterns for WDLs that scale.