Berkeley Lab to Partner with Jefferson Lab to Build $300+ Million High Performance Data Facility Hub

October 16, 2023

By Berkeley Lab Media Relations

Adapted from a news release by Jefferson Lab.

The U.S. Department of Energy has announced the selection of Thomas Jefferson National Accelerator Facility (Jefferson Lab) as the lead for its new High Performance Data Facility Hub. Jefferson Lab will partner with Lawrence Berkeley National Laboratory (Berkeley Lab) to form a joint project team led by Jefferson Lab. The HPDF will be a $300-$500M computing and data infrastructure resource that will provide transformational capabilities for data analysis, networking, and storage for the nation’s research enterprise. It will provide researchers with tools, methods and technologies to maximize the scientific value of data.

Vast amounts of research data are generated by major scientific user facilities, supercomputer simulations, and artificial intelligence tools every day. The mission of the HPDF will be to accelerate the pace of scientific discovery by providing researchers the ability to seamlessly access data from a wide range of sources and scientific facilities – even in real time – applying state-of-the-art computational capabilities on a high performance computing platform, while in a secure environment.

“Berkeley Lab is honored to partner with Jefferson Lab on the HPDF, which will accelerate scientific discovery by delivering state-of-the-art data management infrastructure, capabilities, and tools,” said Berkeley Lab Director Mike Witherell.

“Building on our extensive experience with large data sets and high performance computing, and our new and ongoing partnerships exploring state-of-the-art approaches to data and data science, we will build a new facility that will revolutionize the way we make scientific discoveries,” said Jefferson Lab Director Stuart Henderson. “Our partnership with Berkeley Lab will help ensure geographic resilience and innovative infrastructure for this unique facility in support of researchers across the United States.”

The HPDF will have a “hub-and-spoke” model in which Jefferson Lab and Berkeley Lab will host mirrored centralized resources, and also enable high priority DOE mission applications at “spoke” sites by deploying and orchestrating distributed infrastructure at the spokes or other SC locations.

HPDF will function as a new scientific user facility that specializes in advanced infrastructure for data-intensive science. This facility will provide unprecedented data analysis, networking, and storage resources. Scientists and engineers working in DOE Office of Science (SC) programs will have competitive access to these dedicated and geographically diverse resources as they address fundamental research problems that require swift, shared, and distributed access to large and complex data sets.

HPDF will also be a cornerstone of DOE’s Integrated Research Infrastructure (IRI) program, which focuses on the seamless integration of scientific facilities, data management, and computing to power scientific discovery. The facility will, for the first time, provide researchers with resources for harnessing these datasets and meeting the need for dynamic and scalable data management infrastructure.

“We’re really excited to work with Jefferson Lab on the HPDF Hub, which will address many complex data lifecycle challenges in the DOE ecosystem,” said Jonathan Carter, Associate Lab Director for the Computing Sciences Area at Berkeley Lab. “We aim to build on the expertise in high-performance computing and networking, and data life-cycle management across many science domains, in our DOE user facilities NERSC and ESnet, and in our research programs.”

HPDF at Berkeley Lab

Berkeley Lab will build on decades of experience with data management and user facilities to jointly oversee the management and deployment of the Hub. This partnership with Jefferson Lab aims to also achieve both geographical diversity and operational resilience. Berkeley Lab will provide management and operational support for HPDF, focusing on integration of the HPDF Hub infrastructure with NERSC and ESnet, coordination of the national federated architecture operations of HPDF, and development of strategic directions for the HPDF and IRI efforts in coordination with DOE ASCR and other stakeholders.

“Today, we see many challenges with data at DOE user facilities and projects. Understanding the user needs and providing methods and tools to communities to access, curate, process, share, and analyze data through HPDF Hub will spearhead new scientific discoveries and innovations,” said Lavanya Ramakrishnan, Division Deputy in the Scientific Data Division at Berkeley Lab.

HPDF in the Research Ecosystem

HPDF is set to become the newest capability in high-performance computing provided through DOE’s Advanced Scientific Computing Research (ASCR) program. ASCR operates three high performance computing user facilities: the National Energy Research Scientific Computing Center (NERSC) at Berkeley Lab and two Leadership Computing Facilities at Oak Ridge National Laboratory and Argonne National Laboratory. ASCR’s high-performance network user facility, the Energy Sciences Network (ESnet), operated by Berkeley Lab, delivers highly reliable data transport capabilities optimized for the requirements of large-scale science.