Skip to content

created a new build_docker_remote.sh script for building the docker image remotely with google cloud build#7951

Merged
jamesemery merged 3 commits intomasterfrom
je_addRemoteDockerBuildScript
Jul 25, 2022
Merged

created a new build_docker_remote.sh script for building the docker image remotely with google cloud build#7951
jamesemery merged 3 commits intomasterfrom
je_addRemoteDockerBuildScript

Conversation

@jamesemery
Copy link

Fixes #7949

@codecov
Copy link

codecov bot commented Jul 19, 2022

Codecov Report

Merging #7951 (3bcf33a) into master (ace7821) will decrease coverage by 0.025%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #7951       +/-   ##
===============================================
- Coverage     87.089%   87.064%   -0.025%     
- Complexity     36993     37014       +21     
===============================================
  Files           2217      2218        +1     
  Lines         173847    173760       -87     
  Branches       18791     18769       -22     
===============================================
- Hits          151402    151282      -120     
- Misses         15816     15892       +76     
+ Partials        6629      6586       -43     
Impacted Files Coverage Δ
...er/tools/spark/sv/evidence/BreakpointEvidence.java 66.822% <0.000%> (-12.383%) ⬇️
...ute/hellbender/utils/reference/ReferenceUtils.java 73.529% <0.000%> (-5.418%) ⬇️
...on/FindBreakpointEvidenceSparkIntegrationTest.java 96.296% <0.000%> (-3.704%) ⬇️
...rg/broadinstitute/hellbender/utils/SVInterval.java 90.000% <0.000%> (-2.857%) ⬇️
...ute/hellbender/tools/spark/utils/IntHistogram.java 86.335% <0.000%> (-2.484%) ⬇️
.../sv/integration/SVIntegrationTestDataProvider.java 93.333% <0.000%> (-0.784%) ⬇️
...nder/utils/runtime/StreamingProcessController.java 66.818% <0.000%> (-0.455%) ⬇️
...tructuralVariationDiscoveryArgumentCollection.java 94.595% <0.000%> (-0.142%) ⬇️
...nstitute/hellbender/utils/haplotype/Haplotype.java 90.244% <0.000%> (ø)
...itute/hellbender/engine/spark/GATKRegistrator.java 100.000% <0.000%> (ø)
... and 18 more


## generate the tag if it wasn't explicitly specified
if [ -z "$DOCKER_IMAGE_TAG" ]; then
DOCKER_IMAGE_TAG=${GCR_REPO}:$(whoami)-${GITHUB_TAG}-${GIT_HASH_FOR_TAG}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a Github tag is used this ends up with a name ending in tag-tag. For example: us.gcr.io/broad-dsde-methods/broad-gatk-snapshots/gatk-remote-builds:mshand-4.2.6.1-4.2.6.1. I don't think that's a problem, but just fyi

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I decided against going down the rabbit hole of trying too hard to make a reasonable image name.... If you pull a specific commit then there isn't necessarily a branch associated and I don't think this is super high priority to fix given that most ofthe time people should give their own -T tags

``scripts/docker/gatkbase/build_docker_base.sh`` is a script to create the gatkbase docker image.
``scripts/docker/gatkbase/build_docker_base.sh`` is a script to create the gatkbase docker image.
``build_docker.sh`` is a script to create the full gatk4 docker image.
``build_docker_remote.sh`` is a script to create the full gatk4 docker image using google cloud build remotely. NOTE: this requires the user first specify their project with the command `gcloud config set project <PROJECT>` to a project that has access to google cloud build.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth mentioning somewhere that one potential motivation for using the remote version is if you have M1 hardware?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Back to you with my comments


REPO=broadinstitute
PROJECT=gatk
REPO_PRJ=${REPO}/${PROJECT}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete REPO_PRJ -- it is a reference to the dockerhub repo, which we can't push to in this script.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False, its used farther down git clone https://github.com/${REPO}/${PROJECT}.git when we do the git clone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not -- that's $REPO and $PROJECT. $REPO_PRJ is unused, and in the original script it's used for the dockerhub repo.


SUBMIT_COMMAND="gcloud builds submit --tag ${DOCKER_IMAGE_TAG} --timeout=24h --machine-type n1_highcpu_8"
echo "running the following gcloud command: ${SUBMIT_COMMAND}"
echo -n "" >> .gcloudignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this line do? Can you add a comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its from the Carrot version of this code. Without an explicit gcloudignore set it blacklists the .git directory which causes our script to crash since we need to download the lfs files from the intermediate docker image. I've added a comment


cd ${ORIGINAL_WORKING_DIRECTORY}
if [ -n "$STAGING_DIR" ] ; then
rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clone actually may get put into ${STAGING_DIR}/${STAGING_DIR}/${STAGING_CLONE_DIR} rather than ${STAGING_DIR}/${STAGING_CLONE_DIR}. Can you double-check this and make sure the script is cleaning up properly?

export GITHUB_TAG=4.2.6.1

# REPLACE with the directory where you'd like to clone the repo
export STAGING_DIR=~/tmp/tmp_build_docker_image/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't actually have to be exported, since you're passing them into the script directly below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm simply copying the syntax of the other dozen of these things in this folder

@jamesemery
Copy link
Author

@droazen i responded to your comments can this be merged?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Still a few issues here...

mkdir -p ${STAGING_DIR}
cd ${STAGING_DIR}
set +e
rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cd into ${STAGING_DIR}, and then rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}. This still seems wrong...

build_docker.sh Outdated
@@ -93,7 +93,7 @@ if [ -n "$STAGING_DIR" ]; then
set +e
rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here -- we've just cd'd into ${STAGING_DIR}, so why are we deleting ${STAGING_DIR}/${STAGING_CLONE_DIR}?

rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
set -e
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/${REPO}/${PROJECT}.git ${STAGING_CLONE_DIR}
cd ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this cd command even succeed? We already cd'd into ${STAGING_DIR}, and now we are cd'ing into ${STAGING_DIR}/${STAGING_CLONE_DIR}?


echo "Building image with the tag ${DOCKER_IMAGE_TAG}..."

SUBMIT_COMMAND="gcloud builds submit --tag ${DOCKER_IMAGE_TAG} --timeout=24h --machine-type n1_highcpu_8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will an existing image with the same tag get overwritten by this command?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. but thats effectively impossible unless the user specifies the image themselves (in which case its their fault and problem) since we include the github tag/hash in the image name

@jamesemery
Copy link
Author

@droazen one more round

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Merge after doing a test run of the latest version with both relative and absolute paths for the staging dir.

@jamesemery
Copy link
Author

@droazen just checked and it works with both relative and absolute paths. I'm going to hit merge.

@jamesemery jamesemery merged commit 8c348aa into master Jul 25, 2022
@jamesemery jamesemery deleted the je_addRemoteDockerBuildScript branch July 25, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a Gcloud build script to build docker images.

3 participants