Superfacility Framework Advances Photosynthesis Research
Integrating experimental instruments with high-speed networking and computational resources yields real-time feedback
Keri Troutman, [email protected], 510-486-5071
For more than a decade, a team of international researchers led by Berkeley Lab bioscientists has been studying Photosystem II (PSII), a protein complex in green plants, algae, and cyanobacteria that plays a crucial role in photosynthesis. They’re now moving more quickly toward an understanding of this three-billion-year-old biological system, thanks to an integrated superfacility framework of experimental instrumentation with computational and data facilities. PSII researchers working at the SLAC National Accelerator Laboratory’s Linac Coherent Light Source (LCLS) recently began using the Energy Sciences Network (ESnet) at Lawrence Berkeley National Laboratory (Berkeley Lab) to enable real-time processing of experimental data at the National Energy Research Scientific Computing Center (NERSC).
PSII is the only known biological system able to harness sunlight for the oxidation of water into molecular oxygen. Scientists have been seeking an atomic-scale understanding of how PSII splits a water molecule during photosynthesis for decades now. This key understanding would help advance the development of artificial photosynthesis, a promising source of abundant and clean energy.
To gain insight into how PSII works the research team, which is led by Berkeley Lab bioscientists Vittal Yachandra, Junko Yano, and Jan Kern, uses X-ray free electron lasers (XFELs) at LCLS to capture images of PSII throughout the stages of its reaction cycle. At the core of PSII is an oxygen evolving complex (OEC; Mn4CaO5) that, when energized by solar photons, catalyzes a four photon-step cycle of oxidation states that ultimately yields molecular oxygen. Using XFELs to study the protein complex at specific time points in between each cycle helps them understand structural changes in PSII, consequently understanding the mechanism of bond formation between two oxygen atoms.
Historically, the ability to capture these images has been hindered by the fact that most X-ray crystallography technology destroys the samples before meaningful data can be collected. Scientists need to observe X-ray diffraction of the intact Mn4CaO5 complex in action, but the molecule is highly sensitive to radiation. However, the advent of XFELs and sophisticated data processing methods have changed this; last year researchers were able to capture the most complete and highest-resolution picture to date of PSII (the results were published in Nature in 2018).
LCLS upgrades have led to faster and higher resolution imaging results, which means the computational resources for data processing have also expanded. Concurrent developments at NERSC and ESnet have moved this research to the next level. With ESnet in place between SLAC and NERSC, the PSII researchers are now running their experiments with live data analysis feedback, which allows them to use their LCLS shift time more effectively.
“The high performance, reliable data placement service over ESnet is a fundamental building block of the superfacility model,” said Eli Dart, a network engineer in the ESnet Science Engagement Group. “This architectural construct is about removing the constraints of geography from the scientific process.”
The data analysis team can now tell the researchers whether they’re getting statistically significant results from a certain sample batch almost immediately, which means more samples can be tested and the beamtime is utilized efficiently. With LCLS beamtime data collection rates now at 20-30 images/second, the researchers are collecting 60-100 GB in every 5-minute data run. Each of these runs is transferred via ESnet within one minute, so that the data analysis team can use NERSC to process it immediately and give feedback within 5-10 minutes.
“As computational staff, our responsibility is to give the PSII researchers feedback on how they are performing, because without our involvement they are just collecting data blindly,” says Asmit Bhowmick, a postdoctoral researcher in Berkeley Lab’s Molecular Biophysics and Integrated Bioimaging (MBIB) Division. “This is critical when you are trying to push the resolution of these structures.”
The LCLS data that’s processed at NERSC is used to create electron density maps, which allows the researchers to evaluate structural differences in PSII between different time points in the reaction cycle. And the higher the resolution of the electron density maps, the closer researchers get to being able to see the oxygen bonds clearly. The bond length between oxygen atoms in the relevant intermediate state is about 2 angstroms—in 2016, PSII researchers published electron density maps with 2.25 angstrom resolution, which were considered unprecedented. Now, with NERSC computing power, researchers have been able to push the resolution down to 2.05 angstroms.
“More high-quality data is what allows us to push the resolution to the next level,” says Bhowmick, who works in the laboratory of MBIB senior scientist Nicholas Sauter. “Having NERSC available to process live data is what is getting us there.”
While LCLS produces data rates of 30-120 images per second, that rate will increase 10-10,000 fold when LCLS-II - the next generation of the LCLS - comes online in 2020. ESnet and NERSC will be absolutely necessary for LCLS-II. “Even now, at LCLS, we cannot do the speed and resolution of analysis we’re able to do at NERSC; the LCLS computing infrastructure just cannot keep up,” says Bhowmick. “With ESnet and NERSC, we have hit some major milestones in the past few months.”
NERSC and SLAC are DOE Office of Science User Facilities.