9. Modularization¶
The definition of module libraries simplifies the writing of complex data analysis workflows and makes re-use of processes much easier.
Using the hello.nf example from earlier, we will convert the workflow’s processes into modules, then call them within the workflow scope in a variety of ways.
9.1 Modules¶
Nextflow DSL2 allows for the definition of stand-alone module scripts that can be included and shared across multiple workflows. Each module can contain its own process or workflow definition.
9.1.1 Importing modules¶
Components defined in the module script can be imported into other Nextflow scripts using the include statement. This allows you to store these components in a separate file(s) so that they can be re-used in multiple workflows.
Using the hello.nf example, we can achieve this by:
- Creating a file called modules.nfin the top-level directory.
- Copying and pasting the two process definitions for SPLITLETTERSandCONVERTTOUPPERintomodules.nf.
- Removing the processdefinitions in thehello.nfscript.
- Importing the processes from modules.nfwithin thehello.nfscript anywhere above theworkflowdefinition:
Note
In general, you would use relative paths to define the location of the module scripts using the ./ prefix.
Exercise
Create a modules.nf file with the previously defined processes from hello.nf. Then remove these processes from hello.nf and add the include definitions shown above.
Solution
The hello.nf script should look similar like this:
You should have the following in the file ./modules.nf:
We now have modularized processes which makes the code reusable.
9.1.2 Multiple imports¶
If a Nextflow module script contains multiple process definitions they can also be imported using a single include statement as shown in the example below:
9.1.3 Module aliases¶
When including a module component it is possible to specify a name alias using the as declaration. This allows the inclusion and the invocation of the same component multiple times using different names:
Exercise
Save the previous snippet as hello.2.nf, and try to guess what will be shown on the screen.
Solution
The hello.2.nf output should look something like this:
N E X T F L O W  ~  version 23.04.1
Launching `hello.2.nf` [crazy_shirley] DSL2 - revision: 99f6b6e40e
executor >  local (6)
[2b/ec0395] process > SPLITLETTERS_one (1)   [100%] 1 of 1 ✔
[d7/be3b77] process > CONVERTTOUPPER_one (1) [100%] 2 of 2 ✔
[04/9ffc05] process > SPLITLETTERS_two (1)   [100%] 1 of 1 ✔
[d9/91b029] process > CONVERTTOUPPER_two (2) [100%] 2 of 2 ✔
WORLD!
HELLO
HELLO
WORLD!
Tip
You can store each process in separate files within separate sub-folders or combined in one big file (both are valid). You can find examples of this on public repos such as the Seqera RNA-Seq tutorial or within nf-core workflows, such as nf-core/rnaseq.
9.2 Output definition¶
Nextflow allows the use of alternative output definitions within workflows to simplify your code.
In the previous basic example (hello.nf), we defined the channel names to specify the input to the next process:
Note
We have moved the greeting_ch into the workflow scope for this exercise.
We can also explicitly define the output of one channel to another using the .out attribute, removing the channel definitions completely:
If a process defines two or more output channels, each channel can be accessed by indexing the .out attribute, e.g., .out[0], .out[1], etc. In our example we only have the [0]'th output:
Alternatively, the process output definition allows the use of the emit statement to define a named identifier that can be used to reference the channel in the external scope.
For example, try adding the emit statement on the CONVERTTOUPPER process in your modules.nf file:
| modules.nf | |
|---|---|
Then change the workflow scope in hello.nf to call this specific named output (notice the added .upper):
| hello.nf | |
|---|---|
9.2.1 Using piped outputs¶
Another way to deal with outputs in the workflow scope is to use pipes |.
Exercise
Try changing the workflow script to the snippet below:
Here we use a pipe which passed the output as a channel to the next process without the need of applying .out to the process name.
9.3 Workflow definition¶
The workflow scope allows the definition of components that define the invocation of one or more processes or operators:
For example, the snippet above defines a workflow named my_workflow, that can be invoked via another workflow definition.
Note
Make sure that your modules.nf file is the one containing the emit on the CONVERTTOUPPER process.
Warning
A workflow component can access any variable or parameter defined in the outer scope. In the running example, we can also access params.greeting directly within the workflow definition.
9.3.1 Workflow inputs¶
A workflow component can declare one or more input channels using the take statement. For example:
Note
When the take statement is used, the workflow definition needs to be declared within the main block.
The input for the workflow can then be specified as an argument:
9.3.2 Workflow outputs¶
A workflow can declare one or more output channels using the emit statement. For example:
As a result, we can use the my_workflow.out notation to access the outputs of my_workflow in the invoking workflow.
We can also declare named outputs within the emit block.
The result of the above snippet can then be accessed using my_workflow.out.my_data.
9.3.3 Calling named workflows¶
Within a main.nf script (called hello.nf in our example) we also can have multiple workflows. In which case we may want to call a specific workflow when running the code. For this we use the entrypoint call -entry <workflow_name>.
The following snippet has two named workflows (my_workflow_one and my_workflow_two):
You can choose which workflow to run by using the entry flag:
9.3.4 Parameter scopes¶
A module script can define one or more parameters or custom functions using the same syntax as with any other Nextflow script. Using the minimal examples below:
| Module script ( ./modules.nf) | |
|---|---|
| Main script ( ./hello.nf) | |
|---|---|
Running hello.nf should print:
As highlighted above, the script will print Hola mundo! instead of Hello world! because parameters inherited from the including context are overwritten by the definitions in the script file where they're being included.
Info
To avoid being ignored, workflow parameters should be defined at the beginning of the script before any include declarations.
The addParams option can be used to extend the module parameters without affecting the external scope. For example:
Executing the main script above should print:
9.4 DSL2 migration notes¶
To view a summary of the changes introduced when Nextflow migrated from DSL1 to DSL2 please refer to the DSL2 migration notes in the official Nextflow documentation.