9. Modularization¶
The definition of module libraries simplifies the writing of complex data analysis workflows and makes re-use of processes much easier.
Using the hello.nf
example from earlier, we will convert the workflow’s processes into modules, then call them within the workflow scope in a variety of ways.
9.1 Modules¶
Nextflow DSL2 allows for the definition of stand-alone module scripts that can be included and shared across multiple workflows. Each module can contain its own process
or workflow
definition.
9.1.1 Importing modules¶
Components defined in the module script can be imported into other Nextflow scripts using the include
statement. This allows you to store these components in a separate file(s) so that they can be re-used in multiple workflows.
Using the hello.nf
example, we can achieve this by:
- Creating a file called
modules.nf
in the top-level directory. - Copying and pasting the two process definitions for
SPLITLETTERS
andCONVERTTOUPPER
intomodules.nf
. - Removing the
process
definitions in thehello.nf
script. - Importing the processes from
modules.nf
within thehello.nf
script anywhere above theworkflow
definition:
Note
In general, you would use relative paths to define the location of the module scripts using the ./
prefix.
Exercise
Create a modules.nf
file with the previously defined processes from hello.nf
. Then remove these processes from hello.nf
and add the include
definitions shown above.
Solution
The hello.nf
script should look similar like this:
You should have the following in the file ./modules.nf
:
We now have modularized processes which makes the code reusable.
9.1.2 Multiple imports¶
If a Nextflow module script contains multiple process
definitions they can also be imported using a single include
statement as shown in the example below:
9.1.3 Module aliases¶
When including a module component it is possible to specify a name alias using the as
declaration. This allows the inclusion and the invocation of the same component multiple times using different names:
Exercise
Save the previous snippet as hello.2.nf
, and try to guess what will be shown on the screen.
Solution
The hello.2.nf
output should look something like this:
N E X T F L O W ~ version 23.04.1
Launching `hello.2.nf` [crazy_shirley] DSL2 - revision: 99f6b6e40e
executor > local (6)
[2b/ec0395] process > SPLITLETTERS_one (1) [100%] 1 of 1 ✔
[d7/be3b77] process > CONVERTTOUPPER_one (1) [100%] 2 of 2 ✔
[04/9ffc05] process > SPLITLETTERS_two (1) [100%] 1 of 1 ✔
[d9/91b029] process > CONVERTTOUPPER_two (2) [100%] 2 of 2 ✔
WORLD!
HELLO
HELLO
WORLD!
Tip
You can store each process in separate files within separate sub-folders or combined in one big file (both are valid). You can find examples of this on public repos such as the Seqera RNA-Seq tutorial or within nf-core workflows, such as nf-core/rnaseq.
9.2 Output definition¶
Nextflow allows the use of alternative output definitions within workflows to simplify your code.
In the previous basic example (hello.nf
), we defined the channel names to specify the input to the next process:
Note
We have moved the greeting_ch
into the workflow scope for this exercise.
We can also explicitly define the output of one channel to another using the .out
attribute, removing the channel definitions completely:
If a process defines two or more output channels, each channel can be accessed by indexing the .out
attribute, e.g., .out[0]
, .out[1]
, etc. In our example we only have the [0]'th
output:
Alternatively, the process output
definition allows the use of the emit
statement to define a named identifier that can be used to reference the channel in the external scope.
For example, try adding the emit
statement on the CONVERTTOUPPER
process in your modules.nf
file:
modules.nf | |
---|---|
Then change the workflow scope in hello.nf
to call this specific named output (notice the added .upper
):
hello.nf | |
---|---|
9.2.1 Using piped outputs¶
Another way to deal with outputs in the workflow scope is to use pipes |
.
Exercise
Try changing the workflow script to the snippet below:
Here we use a pipe which passed the output as a channel to the next process without the need of applying .out
to the process name.
9.3 Workflow definition¶
The workflow
scope allows the definition of components that define the invocation of one or more processes or operators:
For example, the snippet above defines a workflow
named my_workflow
, that can be invoked via another workflow
definition.
Note
Make sure that your modules.nf
file is the one containing the emit
on the CONVERTTOUPPER
process.
Warning
A workflow component can access any variable or parameter defined in the outer scope. In the running example, we can also access params.greeting
directly within the workflow
definition.
9.3.1 Workflow inputs¶
A workflow
component can declare one or more input channels using the take
statement. For example:
Note
When the take
statement is used, the workflow
definition needs to be declared within the main
block.
The input for the workflow
can then be specified as an argument:
9.3.2 Workflow outputs¶
A workflow
can declare one or more output channels using the emit
statement. For example:
As a result, we can use the my_workflow.out
notation to access the outputs of my_workflow
in the invoking workflow
.
We can also declare named outputs within the emit
block.
The result of the above snippet can then be accessed using my_workflow.out.my_data
.
9.3.3 Calling named workflows¶
Within a main.nf
script (called hello.nf
in our example) we also can have multiple workflows. In which case we may want to call a specific workflow when running the code. For this we use the entrypoint call -entry <workflow_name>
.
The following snippet has two named workflows (my_workflow_one
and my_workflow_two
):
You can choose which workflow to run by using the entry
flag:
9.3.4 Parameter scopes¶
A module script can define one or more parameters or custom functions using the same syntax as with any other Nextflow script. Using the minimal examples below:
Module script (./modules.nf ) | |
---|---|
Main script (./hello.nf ) | |
---|---|
Running hello.nf
should print:
As highlighted above, the script will print Hola mundo!
instead of Hello world!
because parameters inherited from the including context are overwritten by the definitions in the script file where they're being included.
Info
To avoid being ignored, workflow parameters should be defined at the beginning of the script before any include
declarations.
The addParams
option can be used to extend the module parameters without affecting the external scope. For example:
Executing the main script above should print:
9.4 DSL2 migration notes¶
To view a summary of the changes introduced when Nextflow migrated from DSL1 to DSL2 please refer to the DSL2 migration notes in the official Nextflow documentation.