Q: Can a repository's nextflow_schema.json
support multiple input file mimetypes?
No. As of April 2022, it is not possible to configure an input field ( example) to support different mime types (e.g. a text/csv
-type file during one execution, and a text/tab-separated-values
file in a subsequent run).
Q: Why are my --outdir
artefacts not available when executing runs in a cloud environment?
As of April 2022, Nextflow resolves relative paths against the current working directory. In a classic grid HPC, this normally corresponds to a subdirectory of the user's $HOME directory. However, in a cloud execution environment, the path will be resolved relative to the container file system. This means files will be lost when the container is terminated. See here for more details.
Tower users can avoid this problem by specifying the following configuration in the Advanced options > Nextflow config file configuration textbox: params.outdir = workDir + '/results
. This will ensure the output files are written to your stateful storage rather than ephemeral container storage.
Q: Can Nextflow be configured to ignore a Singularity cache?
Yes. To ignore the Singularity cache, add the following configuration item to your workflow: process.container = 'file:///some/singularity/image.sif'
.
Q: Why does Nextflow fail with a WARN: Cannot read project manifest ... path=nextflow.config
error message?
This error can occur when executing a pipeline where the source git repository's default branch is not populated with main.nf
and nextflow.config
files, regardles of whether the invoked pipeline is using a non-default revision/branch (e.g. dev
).
Current as of May 16, 2022, there is no solution for this problem other than to create blank main.nf
and nextflow.config
files in the default branch. This will allow the pipeline to run, using the content of the main.nf
and nextflow.config
in your target revision.
Q: Is it possible to maintain different Nextflow configuration files for different environments?
Yes. The main nextflow.config
file will always be imported by default. Instead of managing multiple nextflow.config
files (each customized for an environment), you can create unique environment config files and import them as their own profile in the main nextflow.config
.
Example:
// nextflow.config <truncated> profiles { test { includeConfig 'conf/test.config' } prod { includeConfig 'conf/prod.config' } uat { includeConfig 'conf/uat.config' } } <truncated>
Q: Is there a file size limit to BAM files that are uploaded to the S3 bucket?
You may encounter this error in your log file: WARN: Failed to publish file: s3://[bucket-name]
AWS has a limitation on the size of the object that can be uploaded to S3 when using the multipart upload feature. Refer to this documentation for more information. For this specific instance, it is hitting the maximum number of parts per upload.
The following configuration is suggested to work with the above AWS limitation:
- Head Job CPUs = 16
- Head Job Memory = 60000
- Pre-run script = export NXF_OPTS="-Xms20G -Xmx40G"
- Update the
nextflow.config
to increase the chunk size and slow down the number of transfers.
aws { batch { maxParallelTransfers = 5 maxTransferAttempts = 3 delayBetweenAttempts = 30 } client { uploadChunkSize = '200MB' maxConnections = 10 maxErrorRetry = 10 uploadMaxThreads = 10 uploadMaxAttempts = 10 uploadRetrySleep = '10 sec' } }
Q: Why is Nextflow forbidden to retrieve a params file from Nextflow Tower?
Ephemeral endpoints can only be consumed once. Nextflow versions older than 22.04
may try to call the same endpoint more than once, resulting in an error similar to the following: Cannot parse params file: /ephemeral/example.json - Cause: Server returned HTTP response code: 403 for URL: https://api.tower.nf/ephemeral/example.json
.
To resolve this problem, please upgrade Nextflow to version22.04.x
or later.
Q: How can I prevent Nextflow from uploading intermediate files from local storage to my S3 work directory?
Nextflow will only unstage files/folders that have been explicitly defined as process outputs. If your workflow has processes that generate folder-type outputs, please ensure that the process also purges any intermediate files that reside within. Failure to do so will result in the intermediate files being copied as part of the task unstaging process, resulting in additional storage costs and lengthened pipeline execution times.
Q: Why do some values specified in my git repository's nextflow.config change when the pipeline is launched via Tower?
You may notice that some values specified in your pipeline repository's nextflow.config have changed when the pipeline is invoked via Tower. This occurs because Tower is configured with a set of default values that are superimposed on the pipeline configuration (with Tower default values superseding nextflow.config values).
Example: The following code block is specified in your nextflow.config:
aws { region = 'us-east-1' client { uploadChunkSize = 209715200 // 200 MB } ... }
When the job instantiates on the AWS Batch Compute Environment, you will see that the uploadChunkSize
changed:
aws { region = 'us-east-1' client { uploadChunkSize = 10485760 // 10 MB } ... }
This change occurred because Tower superimposes its 10 MB default value rather than using the value specified in the nextflow.config file.
To force the Tower-invoked job to use your desired value, please add the configuration setting in the Tower Workspace Launch screen's Advanced options > Nextflow config file textbox. In the case of our example above, you would simply need to add:
aws.client.uploadChunkSize = 209715200 // 200 MB
Nextflow configuration values that are affected by this behaviour include:
- aws.client.uploadChunkSize
- aws.client.storageEncryption
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article