For Over 10 Years, ESnet has Driven Development, Deployments of perfSONAR Network Measurement Tools
Contact: Jon Bashor, [email protected], 510-486-5849
(Editor’s note: This is the first of a series highlighting ESnet’s contributions to the global networking community as ESnet marks its 30th anniversary.)
Since it was first deployed as a prototype in December 2005, the perfSONAR toolkit has provided the research and education networking community with tools for end-to-end monitoring and troubleshooting of multi-domain network performance. And over the years, this ability to diagnose network problems has become increasingly important as research is increasingly collaborative and dependent on sharing large data sets.
Currently a joint effort between ESnet, Internet2, Indiana University and GEANT, the pan-European research network, perfSONAR is now deployed at more than 1,700 public sites around the world, as well as at many private sites. About 40 percent of the public sites are at educational institutions. Among the major users is the Large Hadron Collider collaboration with users at hundreds of institutions.
The perfSONAR collaboration over the past 11 years has also included Fermilab, SLAC, Georgia Tech, the University of Delaware, the ATLAS Great Lakes Tier 2 team at the University of Michigan and RNP in Brazil.
As research and education institutions are increasingly reliant on networking, open-source tools like perfSONAR give network engineers with the ability to test and measure network performance, as well as to archive data in order to pinpoint and solve service problems that may span multiple networks and international boundaries.
Now ESnet and the Network Startup Resource Center at the University of Oregon are looking to make it easier for organizations to learn about and use perfSONAR through a set of nearly 30 training videos. The videos range from a basic introduction to configuration to testing to network measurements and the specific topic of each one allows users to quickly drill down to areas of partiicular interest..
“Although it started out with just one of two metrics of network performance, it’s now a menagerie of tools in one easy-to-install package with not a lot of steps required to install and configure,” said Jason Zurawski of ESnet’s Science Engagement Team. “At its core, perfSONAR is software to fix your network, whether it’s broken or improperly tuned. We find that many new users are tired of poor performance and want to fix it, or they are looking to upgrade their network and want to benchmark the current system to document improvement.”
Zurawski was a student at the University of Delaware when he first began working on development of perfSONAR in 2004, working with Professor Martin Swany, one of the visionaries of the project. In 2007, Zurawski joined Internet2, where he provided first-line support to users. With ESnet since 2013, Zurawski is now a “perfSONAR evangelist.”
Improving the performance of networks is critical as the size and number of datasets carried over research and education networks is growing, placing greater demands on the infrastructure. For example, network traffic on the ESnet doubles every 18 months. Such datasets are typically used by collaborators are multiple institutions, so the data often crosses over a number of networks as it is shared.
As networks have become more critical and interconnected, performance issues can be more difficult to diagnose. To make its collected expertise and experience available to the broader community, ESnet maintains http://fasterdata.es.net/, a site with links to a number of tools, including perfSONAR.
According to Brian Tierney, leader of ESnet’s Advanced Network Technologies Group and originator of the Fasterdata site, many of the bottlenecks occur at the end sites, often with newly upgraded equipment. To help researchers overcome obstacles like firewalls set up to protect enterprise systems, ESnet has developed the Science DMZ approach, which takes science data out of the mainstream network and routes it directly to the scientists systems, allowing large datasets to freely flow in and out without posing security issues.
Tierney said that perfSONAR is one component of the Science DMZ, and as awareness of the Science DMZ has increased over the past two years, so has the use of perfSONAR. In fact, the National Science Foundation’s Campus Cyberinfrastructure Program to upgrade campus networks supports deployment of the Science DMZ and use of perfSONAR. This support has helped drive perfSONAR deployments from 1,000 in 2014 to the current 1,600-plus.
Coinciding with this growth is the continual improvement of the perfSONAR software, which has eliminated many of the bugs and made it easier to install.
The software can help identify such common problems at dirty fiber or a bad jumper connection, or the more common issue of under-powered switches with not enough packet buffering. Zurawski said the most unusual case he has seen was a router that suffered packet loss. The facility had cleaned the fiber and replaced the hardware, but there were still fluctuations in the power level of the router. By running perfSONAR on servers on both sides of the router, it pinpointed the problem with the router – it turned out that the hardware had two power units but only one was plugged in. Once the second was plugged in, the system performed at the expected level.
As the number of deployments grows, the software becomes more effective by offering better coverage across more network paths, including ESnet and the connectivity to Department of Energy and NSF-funded resources. With enough monitoring points, ESnet staff can triangulate problems and narrow down the possible bottlenecks.
“Often it is difficult to identify the exact cause of a problem, but the more perfSONAR nodes we have, the more useful it becomes,” Tierney said. “More deployments are better and help the toolkit and software get better.”
Tierney said that commercially produced diagnostic software is designed for internal business networks and intranets, not designed to share performance data with outside organizations. In fact, it was that need to come up with standards for measuring and sharing network performance data that led to perfSONAR. The Global Grid Forum first discussed the issue in 2000 and issued its first document on network measurements in 2003. The first tests of the software began in 2004 at the University of Delaware and the first public software was released in 2005. Today, updates are issued about every six months and new versions released about every year.
In early 2015, ESnet deployed one of the first public 40 Gbps production perfSONAR host directly connected to an R&E backbone network, allowing research organizations to test and diagnose the performance of network links up to 40 gigabits per second.
The host, located in Boston, Mass., is available to any organization in the R&E (research and education) networking community. More and more, organizations are setting up their own 40 Gbps data transfer nodes to help systems keep up with the increasing size of research datasets.
As the software use has grown, so has the perfSONAR community, which shares information on configurations, occasional bugs and more. “The community is also good about answering questions, too,” Tierney said, “and has really taken off in the past few years.”