For this exercise you have to write a Bash script to be submitted to the compute nodes of the HPC system. The script should include the necessary directives to specify the required resources on the cluster (see below). It needs to download the MINI-AC1 program from GitHub, fetch the required datasets and run some tests on the cluster. These steps are explained in detail below.
MINI-What?
If you want to know more about what MINI-AC actually does, feel free to skim through the README on GitHub. For this exercise, it is not required however to fully understand the purpose of MINI-AC. All the steps required to run the tests are described below. Your task is to compose a script that executes these tasks on the cluster.
Include the necessary directives in your script to request the following resources on the cluster:
Since the script will produce too much data for the quota on your home directory,
first navigate to your VSC scratch folder. The path to the scratch folder can be retrieved
from the $VSC_SCRATCH
environment variable.
In case there already is a MINI-AC folder in your scratch folder, e.g. from previous attempts to run the script, remove it (and all of its contents).
Your script should download the MINI-AC program from GitHub: https://github.com/VIB-PSB/MINI-AC2.
Find out which git
command you can use to easily download an existing remote repository into
a “local” copy on your HPC scratch folder. Afterwards, go inside the MINI-AC folder you created
in this step.
To be able to execute the tests, we need to download the Arabidopsis and maize test data with the following commands, executed from the top-level directory of the MINI-AC repository:
curl -k -o tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed https://floppy.psb.ugent.be/index.php/s/NekMYztyxEnsQiY/download/zma_v4_genome_wide_motif_mappings_chr1.bed
curl -k -o tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed https://floppy.psb.ugent.be/index.php/s/r2wQmFjPy79qSp7/download/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed
curl -k -o data/ath/ath_genome_wide_motif_mappings.bed https://floppy.psb.ugent.be/index.php/s/iaZPwdrRGe3YDdK/download/ath_genome_wide_motif_mappings.bed
curl -k -o data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed https://floppy.psb.ugent.be/index.php/s/qcQ7KndzHaSpd9e/download/ath_locus_based_motif_mappings_5kbup_1kbdown.bed
The test configuration file tests/nextflow.config
needs to be adjusted to run on the HPC system.
Execute the following command, again from the root of the MINI-AC repo:
sed -i -e "s@%TMP%@${TMPDIR}@g" tests/nextflow.config
What did we just do?
Can you figure out what the above command actually did? Have a look in the manual pages of the
sed
command to see what the applied options are used for. We are doing some in-place substitution of text in the config file. Probably you have mostly seen the pattern"s/.../.../g"
forsed
substitution commands, but the/
delimiter in the substitution pattern can be replaced with any other character of your choice, e.g."s@...@...@g"
. If you choose a delimiter that doesn’t occur in the substituted texts, you don’t have to worry about any escaping. As${TMPDIR}
contains a path, the usual delimiter/
doesn’t seem like a very good choice here!
MINI-AC is built with Nextflow3 and contains some automated test cases designed with nf-test
4.
To be able to run the tests, you need to install nf-test
in the root folder of the MINI-AC repository:
curl -fsSL https://code.askimed.com/install/nf-test | bash
To be able to run MINI-AC you need to load the Nextflow module:
module load Nextflow
Finally your script should run the automated tests by executing the nf-test
command,
again from the root directory of the MINI-AC repository:
./nf-test test
Executing the tests may take a while (10 to 15 minutes) so DON’T try this on the HPC login nodes. Always submit your script to the actual compute nodes of the cluster to try it out.
If the tests ran successfully, they should produce output similar to:
🚀 nf-test 0.8.4
https://code.askimed.com/nf-test
(c) 2021 - 2024 Lukas Forer and Sebastian Schoenherr
Found 1 files in test directory.
Test Workflow MINIAC
Test [771dd7b3] 'maize_v4 genome_wide' PASSED (154.802s)
Test [9b6193ca] 'maize_v4 locus_based' PASSED (116.447s)
Test [5319fea9] 'arabidopsis genome_wide' PASSED (147.802s)
Test [d312140b] 'arabidopsis locus_based' PASSED (258.858s)
SUCCESS: Executed 4 tests in 677.985s
If the tests failed, some of the steps above were not executed correctly. Refine your script and resubmit to the cluster until the tests succeed!
ANSI Color codes
If you look at the test output in a text editor (e.g. Visual Studio Code) you may be surprised to see weird characters like in
[1mTest Workflow MINIAC[0m
. This is becausenf-test
(like many other Unix commands) uses ANSI color codes5 to produce colorful output on the terminal. Why not try it out yourself:echo -e "\e[32mGreen text\e[0m"
.
After executing the tests, remove the entire MINI-AC folder from your scratch storage.
Code duplication
This is exactly the same action that was already performed as the first step of your script. Think about how you can avoid code duplication for this repeated task.
Also remove the folders .apptainer
, .nextflow
and .nf-test
that have been generated in
your home directory when executing the tests. The contents of these folders (especially .apptainer
)
might become quite large and exceed your home folder storage quota.
The submitted solution is not automatically evaluated on Dodona, as it requires specific interaction with the UGent HPC system. Be sure to test your solution carefully on the cluster. Feedback on the submission will be provided after the deadline.