updated: 2022-11-29
#tools/snakemake
Snakemake provides a rich set of functions to handle parameter spaces. Yet, it gets tedious when exploring many combinations of many types of parameters. Snakemake provides a helper called Paramspace
to handle this situation.
Paramspace
takes pandas DataFrame, where each row represents a combination of parameter values. Then, Paramspace
generates a placeholder for parameters that can be used as a part of a file name. For instance,
# declare a data frame to be a paramspace
paramspace = Paramspace(pd.read_csv("params.tsv", sep="\t"))
input_file = f"results/simulations/{paramspace.wildcard_pattern}.tsv"
Here, the input_file
looks like this:
"results/simulations/alpha~{alpha}_beta~{beta}_gamma~{gamma}.tsv"
where the alpha to gamma is parameter names taken from the columns of the input DataFrame.
It is convenient, though I don't like creating the DataFrame. So I wrote a simple utility function that makes parameter handling easier.
Utilities for Snakemake · GitHub
Download the util.smk
and put it under the same folder as Snakemake resides. Then, import it by
include: "./utils.smk"
The way it works is as follows. Suppose that I have five parameters and want to run a workflow for every combination of the parameters. I specify the name and value of parameters by dict as follows:
params_spherical_model = {
"geometry":[True],
"symmetric":[False, True],
"aging":[False, True],
"fitness":[True, False],
"dim": [16, 64, 128],
}
Create a Paramspace
helper by passing the dict to my utility function to_params
:
spherical_model_paramspace = to_paramspace(params_spherical_model)
Then, define the filename using wildcard_pattern
, e.g.,
GEOMETRIC_MODEL_FILE = f"model_{spherical_model_paramspace.wildcard_pattern}.pt"
You can use it as an input/output of rules, e.g.,
rule model_fitting_spherical_model:
input:
paper_table_file = PAPER_TABLE,
net_file = CITATION_NET,
output:
output_file = GEOMETRIC_MODEL_FILE
params:
dim = lambda wildcards : wildcards.dim,
geometry = lambda wildcards : wildcards.geometry,
aging = lambda wildcards : wildcards.aging,
symmetric = lambda wildcards : wildcards.symmetric,
fitness = lambda wildcards : wildcards.fitness,
#in_out_coupling_strength = lambda wildcards : float(wildcards.couplingStrength)
script:
"workflow/fit-spherical-model/fitting.py"
Here, the parameter values are retrieved via wildcards
, e.g.,
dim = lambda wildcards : wildcards.dim
which can be accessed from the script via the snakemake
object, e.g.,
dim = snakemake.params["dim"]
See Directive - Powerful integration of python scripts into workflow - Snakemake.