We regularly run analyses that involve tens of thousands of genomes. Since such workloads are typically bursty, we use on-demand cloud resources, typically on Google Cloud or Microsoft Azure.
Our main workflow scheduling system is Hail Batch, for which we have set up a local deployment. It integrates directly with Hail Query, a set of scalable APIs designed specifically for genomics. For workflows like GATK-SV, we rely on Cromwell / Terra to run WDL.
About a dozen collaborating groups in Australia use our local deployment of seqr for rare disease analysis. Internally, we continue the development of Broad’s loss-of-function curation portal.
Our public data browsers typically use Django and React on the frontend, with Elasticsearch or Hail in the backend.
All our sample metadata is managed centrally with an extensive set of APIs, which allows us to automate our workflows and ingest new data regularly without incurring toil.
We like to set up our infrastructure as code either through Terraform or Pulumi, which helps to bring up consistent dev / prod namespaces across multiple clouds.
All our code is available on GitHub. We control production data access on a dataset level and enforce code reviews through an analysis runner wrapper, while allowing quick prototyping and exploration on subsets for testing.
The Centre for Population Genomics values diversity in our team and our work. We believe that including all human diversity in genomic research will empower medical care that benefits everyone.
We pay our respect to all Aboriginal and Torres Strait Islander cultures and to their Elders past and present. We gratefully accept the invitation in the Uluru Statement from the Heart “to walk with us in a movement of the Australian people for a better future”.