How to Define the Input Data

Summary

This is Lesson 4 of the JAWS tutorial series. In Lesson 3 you wrote blast.wdl. The WDL describes what the workflow does; the inputs file says which files to run it on.

Pre-requisites

The Basic Format

The inputs file is a JSON object. Each key is <workflow_name>.<input_name>; each value is whatever that input expects.

For the BLAST workflow:

{
    "blast_example.reference_fasta": "data/reference.fasta",
    "blast_example.query_fasta": "data/query.fasta"
}

Keys are case-sensitive and must match the WDL exactly.

You can generate the template (so the keys are typo-proof) with:

jaws inputs blast.wdl > inputs.json

Then edit each value to point at a real file.

File Paths

Input values of type File can be:

  1. A relative path. Resolved against the location of the inputs.json file itself, not the directory you submit from.

  2. An absolute path.

  3. A URL (http://, https://, ftp://). JAWS downloads the file once and hands it to the task as a local file.

Important

The relative-path rule above is JAWS-specific. Plain Cromwell resolves relative paths against the directory you submit from; JAWS resolves them against the directory containing inputs.json. The JAWS community made this choice so an inputs file can live next to its WDL and stay portable.

For the BLAST example, inputs.json lives in blast_example/, and the data lives in blast_example/data/, so the relative path is just data/reference.fasta. If inputs.json were one directory up, the path would be blast_example/data/reference.fasta.

Reference Data (/refdata)

Note

The BLAST tutorial example doesn’t use refdata — its reference FASTA is small enough to ship in the repo’s data/ directory. The pattern below is what you’ll reach for in your own workflows once you start pointing at large shared resources.

Reference data, BLAST databases, genome FASTAs, annotation files, anything large that gets reused across many runs, lives in JAWS refdata, not as an absolute path in your inputs.json.

JAWS stores refdata centrally and syncs it to every compute site. Inside your task it’s mounted at /refdata on whichever site the workflow lands on, so you write the same inputs.json regardless of where the run goes.

Two rules:

  1. Declare refdata inputs as String, not File. Cromwell would otherwise try to stage the file out of the container and fail; /refdata only exists inside.

  2. Use the /refdata/<group>/... path in inputs.json, never the underlying Perlmutter absolute path.

# In the WDL
String reference_db    # NOT File
{
    "blast_example.reference_db": "/refdata/myteam/blast/swissprot",
    "blast_example.query_fasta": "data/query.fasta"
}

For setup, syncing, group permissions, and manifest files, see Using Reference Data in Your WDLs.

Arrays and Maps

WDL types map onto JSON directly. Array[File] becomes a JSON array; Map[String, String] becomes a JSON object:

{
    "blast_example.query_fastas": ["data/q1.fasta", "data/q2.fasta"],
    "blast_example.sample_info": {"name": "patient_42", "tissue": "liver"}
}

A single string where the WDL expects Array[File] is a type error; wrap it as ["..."].

Caching

JAWS caches recently-used input files in a staging area on each site. If you re-run a workflow with the same input file, JAWS reuses the staged copy rather than re-transferring it, which matters most for large inputs (e.g. a 100 GB BLAST database transferred once and then reused).

The retention window for the staging area, and the related purge policy for Cromwell execution directories, are documented in JAWS Policies. Any input you use at least once within that window stays staged.

Temporary Files (/tmp)

If a task writes large intermediates and needs a guaranteed-writable temp location, use $TMPDIR inside the command block (JAWS wires it up to /tmp for you). See Use of $JAWS_SITE Environment Variable in the JAWS Guide for the exact mechanism and an example.

Run It

With blast.wdl and inputs.json, you can run the workflow locally with Cromwell to confirm everything is wired up:

java -jar /path/to/cromwell-87.jar run blast.wdl --inputs inputs.json

For submitting the same workflow to JAWS — jaws submit, choosing a site with jaws info, tracking a run with jaws status and jaws log, and finding your output files — see JAWS Quickstart for the end-to-end walkthrough.

Troubleshooting

File not found once the task starts running 🔗
  • Relative path resolving against the wrong directory. Remember the JAWS rule: relative paths in inputs.json are resolved against the directory inputs.json lives in, not the directory you submit from.

  • Meant to use refdata but used a Perlmutter absolute path. Refdata is accessed via /refdata/<group>/... with the input declared as String. See Using Reference Data in Your WDLs.

  • URL unreachable from the compute site. Pre-download the file or check with the JAWS team.

Type mismatch: JSON value doesn’t match the WDL type 🔗

The most common version: a bare string where the WDL declares Array[File]. Wrap it as ["..."]. Same rule for Map[K, V] (must be a JSON object) and Int (unquoted number, not a string).

You’re Done with the Series

Next steps as you move to your own workflows: