updated: 2022-11-29
Snakefiles and Rules — Snakemake 7.14.2 documentation
shell
and script
are the directives that define the script to generate output files from input files. shell
specifies the shell command, e.g., things you type in your command line. script
specifies a script such as a python script. For instance, the following two rules produce the same output:
rule shell_version:
input:
input_file = UNSORTED_FILE
output:
output_file = SORTED_FILE
run:
shell("python main.py {input.input_file} {output.output_file}")
rule script_version:
input:
input_file = UNSORTED_FILE
output:
output_file = SORTED_FILE
script:
"main.py"
The difference is how the variables are passed to the script. With shell
, all variables should be given as command-line arguments. With the script
the variables are accessible from the script. For instance, with Python, all variables are accessible via snakemake.<directive name>
object, e.g.,
input_file = snakemake.input["input_file"]
output_file = snakemake.output["output_file"]
I favor script
because it makes workflow more readable and makes it easier to pass many variables. The shell
can be lengthy, especially when there are many variables (input
, output
, params
, resources
, etc.). Withshell
, the arguments are order sensitive, incurring an additional maintenance cost. With script
, each parameter is specified by parameter names, which is a big plus in terms of readability.
A drawback of the script
is that it makes the script non-standalone; you can run it only via snakemake because otherwise, the snakemake
object is not created. This is not a good feature when testing the hand. One way to make it standalone is to check if snakemake
is defined before accessing it:
import sys
If "snakemake" in sys. modules:
vector_data_file = snakemake.input["vector_data_file"]
clustering_model_file = snakemake.input["clustering_model_file"]
output_file = snakemake.output["output_file"]
else:
vector_data_file = ""
citation_embedding_model_file = "models/clustering_model"
output_file = "models/clustering_model"
This way, I can retrieve the variables from snakemake
only when snakemake
is created. Otherwise, I set the variables directly in the script so that I could run the script without snakemake
for testing.