MINI-AC: Motif-Informed Network Inference based on Accessible Chromatin

For this exercise you have to write a Bash script to be submitted to the compute nodes of the HPC system. The script should include the necessary directives to specify the required resources on the cluster (see below). It needs to download the MINI-AC1 program from GitHub, fetch the required datasets and run some tests on the cluster. These steps are explained in detail below.

MINI-What?

If you want to know more about what MINI-AC actually does, feel free to skim through the README on GitHub. For this exercise, it is not required however to fully understand the purpose of MINI-AC. All the steps required to run the tests are described below. Your task is to compose a script that executes these tasks on the cluster.

Required resources

Include the necessary directives in your script to request the following resources on the cluster:

Go to scratch folder

Since the script will produce too much data for the quote on your home directory, first navigate to your VSC scratch folder. The path to the scratch folder can be retrieved from the $VSC_SCRATCH environment variable.

Clean up previous runs

In case there already is a MINI-AC folder in your scratch folder, e.g. from previous attempts to run the script, remove it (and all of its contents).

Download MINI-AC

Your script should download the MINI-AC program from GitHub: https://github.com/VIB-PSB/MINI-AC2. Find out which git command you can use to easily download an existing remote repository into a “local” copy on your HPC scratch folder. Afterwards, go inside the MINI-AC folder you created in this step.

Fetch test data

To be able to execute the tests, we need to download the Arabidopsis and maize test data with the following commands, executed from the top-level directory of the MINI-AC repository:

curl -k -o tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed https://floppy.psb.ugent.be/index.php/s/NekMYztyxEnsQiY/download/zma_v4_genome_wide_motif_mappings_chr1.bed
curl -k -o tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed https://floppy.psb.ugent.be/index.php/s/r2wQmFjPy79qSp7/download/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed
curl -k -o data/ath/ath_genome_wide_motif_mappings.bed https://floppy.psb.ugent.be/index.php/s/iaZPwdrRGe3YDdK/download/ath_genome_wide_motif_mappings.bed
curl -k -o data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed https://floppy.psb.ugent.be/index.php/s/qcQ7KndzHaSpd9e/download/ath_locus_based_motif_mappings_5kbup_1kbdown.bed

Prepare test config

The test configuration file tests/nextflow.config needs to be adjusted to run on the HPC system. Execute the following command, again from the root of the MINI-AC repo:

sed -i -e "s@%TMP%@${TMPDIR}@g" tests/nextflow.config

What did we just do?

Can you figure out what the above command actually did? Have a look in the manual pages of the sed command to see what the applied options are used for. We are doing some in-place substitution of text in the config file. Probably you have mostly seen the pattern "s/.../.../g" for sed substitution commands, but the / delimiter in the substitution pattern can be replaced with any other character of your choice, e.g. "s@...@...@g". If you choose a delimiter that doesn’t occur in the substituted texts, you don’t have to worry about any escaping. As ${TMPDIR} contains a path, the usual delimiter / doesn’t seem like a very good choice here!

Install nf-test

MINI-AC is built with Nextflow3 and contains some automated test cases designed with nf-test4. To be able to run the tests, you need to install nf-test in the root folder of the MINI-AC repository:

curl -fsSL https://code.askimed.com/install/nf-test | bash

Load dependencies

To be able to run MINI-AC you need to load the Nextflow module:

module load Nextflow

Execute tests

Finally your script should run the automated tests by executing the nf-test command, again from the root directory of the MINI-AC repository:

./nf-test test

Executing the tests may take a while (10 to 15 minutes) so DON’T try this on the HPC login nodes. Always submit your script to the actual compute nodes of the cluster to try it out.

If the tests ran successfully, they should produce output similar to:

🚀 nf-test 0.8.4
https://code.askimed.com/nf-test
(c) 2021 - 2024 Lukas Forer and Sebastian Schoenherr

Found 1 files in test directory.

Test Workflow MINIAC

  Test [771dd7b3] 'maize_v4 genome_wide' PASSED (154.802s)
  Test [9b6193ca] 'maize_v4 locus_based' PASSED (116.447s)
  Test [5319fea9] 'arabidopsis genome_wide' PASSED (147.802s)
  Test [d312140b] 'arabidopsis locus_based' PASSED (258.858s)


SUCCESS: Executed 4 tests in 677.985s

If the tests failed, some of the steps above were not executed correctly. Refine your script and resubmit to the cluster until the tests succeed!

ANSI Color codes

If you look at the test output in a text editor (e.g. Visual Studio Code) you may be surprised to see weird characters like in Test Workflow MINIAC. This is because nf-test (like many other Unix commands) uses ANSI color codes5 to produce colorful output on the terminal. Why not try it out yourself: echo -e "\e[32mGreen text\e[0m".

Clean up

After executing the tests, remove the entire MINI-AC folder from your scratch storage.

Code duplication

This is exactly the same action that was already performed as the first step of your script. Think about how you can avoid code duplication for this repeated task.

Also remove the folders .apptainer, .nextflow and .nf-test that have been generated in your home directory when executing the tests. The contents of these folders (especially .apptainer) might become quite large and exceed your home folder storage quota.

Evaluation

The submitted solution is not automatically evaluated on Dodona, as it requires specific interaction with the UGent HPC system. Be sure to test your solution carefully on the cluster. Feedback on the submission will be provided after the deadline.