How to Define the Input Data
Summary
This is Lesson 4 of the JAWS tutorial series. In Lesson 3 you wrote blast.wdl. The WDL describes what the workflow does; the inputs file says which files to run it on.
Pre-requisites
Completed Lesson 3: Writing WDLs. You have
blast.wdlfromjaws-tutorial-examples/blast_example/.
The Basic Format
The inputs file is a JSON object. Each key is <workflow_name>.<input_name>; each value is whatever that input expects.
For the BLAST workflow:
{
"blast_example.reference_fasta": "data/reference.fasta",
"blast_example.query_fasta": "data/query.fasta"
}
Keys are case-sensitive and must match the WDL exactly.
You can generate the template (so the keys are typo-proof) with:
jaws inputs blast.wdl > inputs.json
Then edit each value to point at a real file.
File Paths
Input values of type File can be:
A relative path. Resolved against the location of the
inputs.jsonfile itself, not the directory you submit from.An absolute path.
A URL (
http://,https://,ftp://). JAWS downloads the file once and hands it to the task as a local file.
Important
The relative-path rule above is JAWS-specific. Plain Cromwell resolves relative paths against the directory you submit from; JAWS resolves them against the directory containing inputs.json. The JAWS community made this choice so an inputs file can live next to its WDL and stay portable.
For the BLAST example, inputs.json lives in blast_example/, and the data lives in blast_example/data/, so the relative path is just data/reference.fasta. If inputs.json were one directory up, the path would be blast_example/data/reference.fasta.
Reference Data (/refdata)
Note
The BLAST tutorial example doesn’t use refdata — its reference FASTA is small enough to ship in the repo’s data/ directory. The pattern below is what you’ll reach for in your own workflows once you start pointing at large shared resources.
Reference data, BLAST databases, genome FASTAs, annotation files, anything large that gets reused across many runs, lives in JAWS refdata, not as an absolute path in your inputs.json.
JAWS stores refdata centrally and syncs it to every compute site. Inside your task it’s mounted at /refdata on whichever site the workflow lands on, so you write the same inputs.json regardless of where the run goes.
Two rules:
Declare refdata inputs as
String, notFile. Cromwell would otherwise try to stage the file out of the container and fail;/refdataonly exists inside.Use the
/refdata/<group>/...path in inputs.json, never the underlying Perlmutter absolute path.
# In the WDL
String reference_db # NOT File
{
"blast_example.reference_db": "/refdata/myteam/blast/swissprot",
"blast_example.query_fasta": "data/query.fasta"
}
For setup, syncing, group permissions, and manifest files, see Using Reference Data in Your WDLs.
Arrays and Maps
WDL types map onto JSON directly. Array[File] becomes a JSON array; Map[String, String] becomes a JSON object:
{
"blast_example.query_fastas": ["data/q1.fasta", "data/q2.fasta"],
"blast_example.sample_info": {"name": "patient_42", "tissue": "liver"}
}
A single string where the WDL expects Array[File] is a type error; wrap it as ["..."].
Caching
JAWS caches recently-used input files in a staging area on each site. If you re-run a workflow with the same input file, JAWS reuses the staged copy rather than re-transferring it, which matters most for large inputs (e.g. a 100 GB BLAST database transferred once and then reused).
The retention window for the staging area, and the related purge policy for Cromwell execution directories, are documented in JAWS Policies. Any input you use at least once within that window stays staged.
Temporary Files (/tmp)
If a task writes large intermediates and needs a guaranteed-writable temp location, use $TMPDIR inside the command block (JAWS wires it up to /tmp for you). See Use of $JAWS_SITE Environment Variable in the JAWS Guide for the exact mechanism and an example.
Run It
With blast.wdl and inputs.json, you can run the workflow locally with Cromwell to confirm everything is wired up:
java -jar /path/to/cromwell-87.jar run blast.wdl --inputs inputs.json
For submitting the same workflow to JAWS — jaws submit, choosing a site with jaws info, tracking a run with jaws status and jaws log, and finding your output files — see JAWS Quickstart for the end-to-end walkthrough.
Troubleshooting
File not found once the task starts running 🔗
Relative path resolving against the wrong directory. Remember the JAWS rule: relative paths in
inputs.jsonare resolved against the directoryinputs.jsonlives in, not the directory you submit from.Meant to use refdata but used a Perlmutter absolute path. Refdata is accessed via
/refdata/<group>/...with the input declared asString. See Using Reference Data in Your WDLs.URL unreachable from the compute site. Pre-download the file or check with the JAWS team.
Type mismatch: JSON value doesn’t match the WDL type 🔗
The most common version: a bare string where the WDL declares Array[File]. Wrap it as ["..."]. Same rule for Map[K, V] (must be a JSON object) and Int (unquoted number, not a string).
You’re Done with the Series
Lesson 4 (this lesson)
Next steps as you move to your own workflows:
JAWS Quickstart, end-to-end submission walkthrough.
JAWS Commands, full CLI reference.
Using Reference Data in Your WDLs, the refdata workflow in detail.
Best Practices for Creating WDLs, patterns for WDLs that scale.