Skip to content

mkdirs() makes local directories when base path is S3 #1185

@olgabot

Description

@olgabot

Bug report

Expected behavior and actual behavior

Hello, I'm running the nf-core/rnaseq pipeline with AWS Batch backend. Everything is working fine except for one small issue, that the pipeline_info.{html,txt} files are written locally instead of to S3.

Steps to reproduce the problem

Here is the command used:

 Wed 12 Jun - 09:14  ~/code/scSLAM_test 
  nextflow run nf-core/rnaseq \
                    --reads "s3://darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/fastqs/*{R1,R2}*.fastq.gz" \
                    --genome GRCh38 \
                    -profile czbiohub_aws \
                    --saveReference \
                    --saveTrimmed \
                    --saveAlignedIntermediates \
                    --outdir "s3://darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/results_nextflow" \
                    -resume \
                    -work-dir "s3://darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/nextflow-intermediates/"
                    --custom_config_base ~/code/nf-core/configs

Program output

Here is the output:

#   --custom_config_base https://github.com/czbiohub/configs --custom_config_version 'olgabot/czb-ignore-igenomes'
N E X T F L O W  ~  version 19.04.0
Launching `nf-core/rnaseq` [determined_turing] - revision: 37f260d360 [master]
WARN: It appears you have never run this project before -- Option `-resume` is ignored
Pipeline Release  : master
Run Name          : determined_turing
Reads             : s3://darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/fastqs/*{R1,R2}*.fastq.gz
Data Type         : Paired-End
Genome            : GRCh38
Strandedness      : None
Trimming          : 5'R1: 0 / 5'R2: 0 / 3'R1: 0 / 3'R2: 0
Aligner           : STAR
Fasta Ref         : s3://czbiohub-reference/gencode/human/v29/GRCh38.p12.genome.fa
GTF Annotation    : s3://czbiohub-reference/gencode/human/v29/gencode.vM19.annotation.gtf
Save prefs        : Ref Genome: Yes / Trimmed FastQ: Yes / Alignment intermediates: Yes
Max Resources     : 1.9 TB memory, 96 cpus, 10d time per job
Container         : docker - nfcore/rnaseq:1.3
Output dir        : s3://darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/results_nextflow
Launch dir        : /Users/olgabot/code/scSLAM_test
Working dir       : /darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/nextflow-intermediates
Script dir        : /Users/olgabot/.nextflow/assets/nf-core/rnaseq
User              : olgabot
Config Profile    : czbiohub_aws
Config Description: Chan Zuckerberg Biohub AWS Batch profile provided by nf-core/configs.
Config Contact    : Olga Botvinnik (@olgabot)
Config URL        : https://www.czbiohub.org/
Uploading local `bin` scripts folder to s3://darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/nextflow-intermediates/tmp/27/23a1934c5e988017a53a2eea25dd9d/bin
^C
^COne more CTRL+C to force exit
^CAdieu
zsh: command not found: --custom_config_base

Inspecting the output directory, these are the files we see:

(base) ll
Permissions Size User    Date Modified Name
drwxr-xr-x     - olgabot 12 Jun  9:15  .nextflow
.rw-r--r--   24k olgabot 12 Jun  9:15  .nextflow.log
.rw-r--r--   24k olgabot 12 Jun  9:13  .nextflow.log.1
.rw-r--r--  8.9k olgabot 12 Jun  9:12  .nextflow.log.2
.rw-r--r--   68k olgabot 12 Jun  9:08  .nextflow.log.3
.rw-r--r--  320k olgabot 11 Jun 20:48  .nextflow.log.4
.rw-r--r--  2.1k olgabot 11 Jun 20:45  .nextflow.log.5
.rw-r--r--  171k olgabot 11 Jun 16:48  .nextflow.log.6
.rw-r--r--   101 olgabot 12 Jun  9:15  execution_trace.txt
.rw-r--r--   101 olgabot 12 Jun  9:13  execution_trace.txt.1
.rw-r--r--   101 olgabot 12 Jun  9:12  execution_trace.txt.2
.rw-r--r--   101 olgabot 12 Jun  9:07  execution_trace.txt.3
.rw-r--r--   46k olgabot 11 Jun 20:48  execution_trace.txt.4
.rw-r--r--   101 olgabot 11 Jun 16:47  execution_trace.txt.5
drwxr-xr-x     - olgabot 11 Jun 16:48  s3:

The s3: folder contains the full output directory path:

 Wed 12 Jun - 09:27  ~/code/scSLAM_test 
  ll s3:/darmanis-group/shayanhoss/scSLAM/190610_M05295_0286_000000000-G43TH/results_nextflow/pipeline_info
Permissions Size User    Date Modified Name
.rw-r--r--@  14k olgabot 12 Jun  9:25  pipeline_report.html
.rw-r--r--  3.3k olgabot 12 Jun  9:25  pipeline_report.txt

Environment

  • Nextflow version: [?]
  • Java version: [?]
  • Operating system: [macOS, Linux, etc]

Additional context

Tracking down where pipeline_report.{html,txt} are created, I found these lines which seem to be writing those files:

    // Write summary e-mail HTML to a file
    def output_d = new File( "${params.outdir}/pipeline_info/" )
    if( !output_d.exists() ) {
      output_d.mkdirs()
    }
    def output_hf = new File( output_d, "pipeline_report.html" )
    output_hf.withWriter { w -> w << email_html }
    def output_tf = new File( output_d, "pipeline_report.txt" )
    output_tf.withWriter { w -> w << email_txt }

It seems to me that output_d.mkdirs() is somehow a local operation instead of a cloud operation. Am I interpreting that correctly?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions