CrossConnects Bioinformatics


PRESENTATIONS from the workshop can be found HERE.

LOCATION: Lawrence Berkeley National Laboratory, Building 15, Room 253


Tuesday, April 12

Precision Medicine

Program Committee Chair: Bill Barnett, Indiana Clinical and Translational Sciences Institute and Regenstrief Institute


Continental Breakfast

Includes: Coffee/Tea Service, Assorted Baked Goods, Fruit Bowl


Welcome and Kickoff

(Inder Monga , ESnet CTO and Interim Director)


Keynote: The Promise of Precision Medicine

(Bill Barnett , Indiana Clinical and Translational Sciences Institute and Regenstrief Institute)

Precision Medicine, the science and practice of medicine through a more precise definition of the molecular, behavioral and environmental factors that contribute to an individual’s health and disease is expected to transform biomedical research, health care innovations and the delivery of health interventions in the future. Hundreds of millions of dollars are being invested in pursuing precision medicine research, with much more to follow. Biomedical research programs are moving aggressively to pursue the integration of multiple lines of evidence to identify the right treatment, and new treatments, that can improve our health. All precision medicine approaches are grappling with the central issue of collecting, analyzing, integrating, understanding, and ultimately translating large and distributed volumes of sensitive, heterogeneous, data into some sort of reliable system for supporting clinical decisions. This presentation will provide an overview of precision medicine approaches and challenges, and lay out the implication for informatics research and practice.


Morning Break and Discussions


Morning Talks

Data Driven Translation of Research to Enable Precision Medicine

(Sean Mooney, PhD; Chief Research Information Officer, UW Medicine; Professor, Department of Biomedical Informatics and Medical Education, University of Washington)

The increased use of electronic and personal health records coupled with clinical genome sequencing efforts is creating many opportunities for personalized medicine.  To this end, the US federal government is addressing these challenges with both Cancer Moonshot and Precision Medicine Initiatives.  The University of Washington is laying the groundwork to build the infrastructure to support research on personalized approaches, and are beginning to see the early successes of these efforts. There are many opportunities in precision and personalized medicine, from data management, big data science and engineering the new approaches to support implementation of innovative research projects. In this presentation, Sean Mooney will discuss an IT perspective from our approaches to research on electronic medical record data and will describe efforts to translate innovative approaches, such as pharmacogenomics, to the point of care through the EMR.  There are many requirements for the development of data standards, data sharing and integration across sites, to enable the next generation of research projects. Dr. Mooney will also describe our collaborative research uses of clinical data and our future plans.

Genomic-based precision medicine at scale

(Robert Freimuth, Mayo Clinic)

Genomic medicine, one aspect of precision medicine, has grown rapidly due to advances in sequencing technology that have made it possible to collect ever increasing amounts of patient genomic data at steadily decreasing cost.  Data collection is only the first step of the process to use genomic data clinically, however, and significant challenges still remain in the analysis, interpretation, and management of genomic data and knowledge. In particular, current systems are limited in their ability to scale due to (semi-)manual processes that cannot keep up with either the generation of data or advances in understanding.  Robust solutions to these challenges are required to enable the integration of genomic information and the sharing of knowledge among clinicians and investigators so that information can be applied at the point of care.  In this presentation, Dr. Freimuth will present examples of these challenges as well as emerging efforts to develop more computable renderings of clinical genomic knowledge, which are necessary to realize genomic-based precision medicine at scale.

Precision Medicine and Exposomics in the Military

(Chris Bradburne, Johns Hopkins University, Applied Physics Lab)

Precision medicine has the potential to grow in utility when combined with longitudinal environmental exposure data over time. In the military, translating these advances to individualized medicine for service-members has unique research, informatics, ethical, and policy issues. This talk will discuss the advent and advancement of precision medicine and exposomics in the military, including progress, practice, and upcoming needs. I will discuss our work in this area at laboratory and data standardization, policy efforts, clinical research and implementation, and the landscape for precision military medicine informed by environmental data.​




Panel and Discussion

(Moderator: Inder Monga, ESnet CTO and Interim Director)

Theme: Common data challenges in precision medicine. What commonalities exist in PM research? What one thing could dramatically improve PM advances? What can infrastructure providers (networks, HPC centers, etc.) do to assist in this space? What best practices can be applied from other science disciplines? What is the role of cloud computing?


Lunch (provided)



Afternoon Talks

Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Supercomputers and Supernetworks

(Larry Smarr, Calit2)

To truly understand the state of the human body in health or disease, we now realize that we must consider a much more complex system than medical science considered heretofore. This is because we now know that the human body is host to 100 trillion microorganisms, ten times the number of DNA­bearing cells in the human body and these microbes contain 300 times the number of DNA genes that our human DNA does. The microbial component of our “superorganism” is comprised of hundreds of species with immense biodiversity. Exponential decrease in the cost of genetic sequencing and supercomputing has enabled scientists to finally "read out" the nature of the changes in the microbial ecology in people in health and with disease. We use the fiber optic network of the Pacific Research Platform to rapidly move these large datasets. To put a more personal face on the “patient of the future,” I have been collecting massive amounts of data from my own body over the last five years, which reveals detailed examples of the episodic excursions of my coupled immune­microbial system. As similar techniques become more widely applied, we can look forward to revolutionary changes in medical practice over the next decade.

Securing and Auditing HPC Workloads for Human Subjects’ Protection and HIPAA Compliance

(Joe Hesse, UCSF)

This talk will cover work underway at UCSF to develop reusable and agile methods for systems administration and service orchestration that address the security, policy, auditing, and data isolation requirements associated with human subjects protection (IRB) and health care privacy (HIPAA) for health sciences computation. The initial goal of this work is to implement a regulatory compliant layer of computational services within the shared HPC cluster environment at UCSF. Secondary goals include 1) sharing these patterns and templates with the broader community as open­source software, and 2) exploring use of these methods to allow execution of compliant workloads on the largest scale computing platforms such as those at the national labs. A particular focus of this work is the desire to apply these templates and patterns in a just-­in-­time, and stepwise, manner within larger shared computational / high performance computing (HPC) environments such that the overhead and resource costs associated with these methods are applied solely to compliant workloads / jobs. This approach aims to provide an alternative to the “enclave” model of regulatory compliance.

It Takes Big Data to See Small Things

(Peter Denes, Kristofer Bouchard, National Center for Electron Microscopy)

Microscopes have improved in two significant ways over the past few centuries: better optics and probes (from visible light, to X-rays to electrons) make it possible to now see single atoms and to take snapshots on ultrafast time scales. Similarly, the eyeball, brain, pencil and paper have been replaced by advanced detectors, high performance computing, and significant archival storage. The ability to obtain biological structures through cryo-electron microscopy, as well as advanced optical and X-ray microscopies, has been revolutionized because we can now make movies, rather than simply take pictures – but this means coping with (and analyzing) many gigabytes per second of data. These same concepts can now be applied to neuroscience, with all of the concomitant challenges.

This talk will review certain data challenges in current bio-imaging, and describe new data challenges in neuroscience.  This includes: data formats and description languages for a broad and diverse community; web based portal for community access to centralized datasets and computational resources; data analytics and visualization codes that are able to ingest diverse data sets and be launched on HPC from web portals; real-time (<10ms) analysis of incoming streams of data for closed-loop analysis. Future microscopies are poised to produce ever larger data sets, and the challenge is can we arrive at a model where all information is extracted on the fly, or do we need to store everything: use it, lose it or move it?


Afternoon Break


Panel and Discussion

(Moderator: Brooklin Gore, Infrastructure Lead and Science Engagement Contributor, ESnet)

Theme: Infrastructure enabling precision medicine. What infrastructure capabilities and best practices  (networks, HPC centers, data curation, hosting and transfer) can be more broadly applied to advance PM? What can we learn from other disciplines? What is the role of cloud computing? How do these approaches apply to protected data? 


Wang Hall Data Center Tour (Host: Brent Draney)

Meet DOE’s latest NERSC supercomputer, Cori, and see the world’s first seismically stabilized data center floor.


End of Day 1

(Dinner on your own)

Wednesday, April 13

Data Challenges in Human and Plant Biomes

(Program Committee Chair: Dan Jacobson, Oak Ridge National Lab, Biosciences Division)


Continental breakfast

(Includes: Coffee/Tea Service, Frittata, Baked Goods, Fruit Bowl)


Welcome and Day 1 Recap

(Brooklin Gore, Infrastructure Lead and Science Engagement Contributor, ESnet)


Keynote: Data Challenges at the Intersection of Human and Plant Biome Discovery and Analysis

(Dan Jacobson¹ ² , Deborah Weighill ¹ ² , Carissa Bleker ¹ ² , Gerald Tuskan¹,  Wellington Muchero¹,  Timothy Tschaplinski¹)

1. Oak Ridge National Laboratory, Oak Ridge, Tennesse

2. University of Tennessee, Knoxville

Biological organisms are complex systems that are composed of pleiotropic functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchal, heterogeneous collections of expressed genomic variants regulated by and related to biotic and abiotic signals. However, the effects of these variants are the result of historic selective pressure and current environmental as well as epigenetic interactions, and, as such, their co-occurrence can be seen as genome- wide associations in a number of different manners. In this context, a plant’s association with its microbiome is a complex set of interactions involving many genes and metabolites. We are using data derived from the re-sequenced genomes from over 1000 alternate Populus trichocarpa genotypes in combination with transcriptomics, metabolomics and phenomics data across this population in order to better understand the molecular interactions involved in plant-microbe interfaces. The resulting Genome-Wide Association Study networks, integrated with SNP correlations and co-expression networks, are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, some of the molecular underpinnings for plant-microbiome associations. We have also found that, although separated by great evolutionary distances, plants and humans share many fundamental biological mechanisms. Surprisingly, we have found that the same allelic variants can lead to similar phenotypes in both organisms. As a result of these observations, we are starting to systematically explore these highly conserved functions and their phenotypic relationships in both plants and humans.


Morning Break


Morning Talks

The Study of Complex Systems

(Zaid Abdo, Colorado State)

There is an estimated 10 fold more microbial cells within the human body than human cells, with an estimated 100 fold more genes than ours. Recent studies have shown strong associations between the human microbiome and the state of health and disease. Here, we study the temporal dynamics of the vaginal microbiome. We evaluate factors that might influence its stability and that could help in predicting the risk of disease.

Data Challenges in Distributed Microbiome Research

(Natalia Ivanova, DOE Joint Genome Institute)

In the past 20 years the advent of new sequencing technologies has been a boon for microbiology transforming it into a data-intensive modern science. The unprecedented volume of microbiome sequence data being generated worldwide poses significant computational challenges, preventing their efficient utilization, sharing, systematic review and meta-analysis. In this talk I will briefly discuss the major data-related bottlenecks in microbiome research as viewed through the lens of the JGI experience in processing sequence data from diverse sources.

Managing Reference Data Sets Within Europe

(Steven Newhouse, European Molecular Biology Laboratory / European Bioinformatics Institute)

EMBL-EBI curates unique reference data sets for the life science community. Many of these data sets are used when undertaking a range of user driven data analytics. For this analysis to take place most effectively on cloud or computing resources away from EMBL-EBI, the reference data sets need to be available locally. Distributing reference data sets of such a size to multiple sites across Europe presents many technical and logistical challenges.

Working within the ELIXIR and EUDAT 2020 projects, EMBL-EBI has been working to develop a reference data set distribution service that leverages existing file transfer services to simplify and automate the management of data sets across cloud and computing resources across Europe. The presentation will describe current ongoing activity within Europe in ELIXIR research infrastructure and the emerging ELIXIR Compute platform that will link cloud resources across Europe with reference data sets in the life sciences.


Lunch (provided)


Afternoon Talks

Computational Demands of the DOE Joint Genomics Institute (JGI) Metagenome Program

(Kjiersten Fagnan, NERSC/JGI)

The National Energy Research Scientific Computing Center (NERSC) manages the computational resources for the DOE Joint Genome Institute (JGI). The data­intensive workload presents some interesting and unique computational challenges. In this talk I will focus on the infrastructure needed to support the metagenome and metatranscriptome pipelines that sometime require handling TB+ sized data sets. I will also discuss the approach the JGI has taken to make these data sets and corresponding analysis to their global user community.


The Medical Science DMZ

(Eli Dart, ESnet)

The Science DMZ is a widely­deployed network design pattern for data­intensive science cyberinfrastructure. While many science domains can deploy a Science DMZ without significant difficulty, medical environments (especially those which must process patient data) pose additional challenges due to the additional requirements for the protection of patient data. This talk will discuss the Science DMZ design pattern and its application to protected ­data environments, as well as aspects of protected­ data environments that pose additional challenges for deployment.


Managing Big Biomedical Data Using Globus

(Ravi Madduri, Globus)

The rapid growth of data in biomedical research is placing massive demands on research informatics cores, research labs and individual researchers. These groups must provide reliable analysis and data management services that can scale as the needs of researchers increase. Further, as collaborative research becomes commonplace, the ability to move and share large data sets between institutions, scaling the analysis from one sample to hundreds of samples are fundamental requirements for many researchers. In this talk, we present the Globus Research Data Management platform that provides turnkey solutions to address the emerging challenges in dealing and analyzing "big data" in the context of life sciences. In particular, we will present our experiences in delivering research data management and identity management capabilities to the European Bioinformatics Institute and the KBase project.


Afternoon Break


Panel and Discussion

(Moderator: Peter Nugent, Senior Scientist and Division Deputy for Science Engagement, Computational Research Division)

Theme: Common data challenges in metagenomics. What commonalities exist in metagomics research. What one thing could dramatically improve metagenomic advances?  What can infrastructure providers (networks, HPC centers, etc.) do to assist in this space? What best practices can be applied from other science disciplines? What is the role of cloud computing?


Berkeley Center for Structural Biology Tour (Host: Peter Zwart)

For over ten years, the Berkeley Center for Structural Biology has operated five protein crystallography beamlines at the Advanced Light Source at Lawrence Berkeley National Laboratory. Our vision is to provide state­of­the­art beamlines and outstanding service for crystallographers around the world, enabling structure resolution on even the most complex biological systems.


End of Day 2

(Dinner on your own)