Deployments of Network Monitoring Software perfSONAR Hit 1,000

This graph shows monitoring results and how making fixes immediately boosts network performance.

January 27, 2014

Contact: Jon Bashor, 510-486-5849, [email protected]

Whether a person is downloading the latest movie or trying to move 10 terabytes of data from a research experiment to a computing center, when the flow bogs down, the first thought is often “There’s something wrong with the network.”

A more accurate assessment would be that there’s likely some problem in some component on one or more of the multiple networks comprising the connection. Pinpointing the problem on large-scale networks can be difficult, so a collaboration of research and academic networking organizations has developed perfSONAR, a publicly available, easy-to-install software suite that takes the guesswork out of network diagnostics.

In January 2014, perfSONAR reached a milestone with 1,000 instances of the diagnostic software installed on networking hosts around the U.S. and in 25 other countries. The perfSONAR scollaboration over the past 10 years has included Department of Energy’s Energy Sciences Network (ESnet) Fermilab, SLAC, Georgia Tech, Indiana University, Internet2, University of Delaware, the GÉANT project in Europe and RNP in Brazil. Among the major users is the Large Hadron Collider collaboration with users at hundreds of institutions.

Improving the performance of network is critical as the size and number of datasets carried over research and education networks is growing, placing greater demands on the infrastructure. For example, network traffic on the ESnet doubles every 18 months. Such datasets are typically used by collaborators are multiple institutions, so the data often crosses over a number of networks as it is shared.

According to Brian Tierney, leader of ESnet’s Advanced Network Technologies Group, many of the bottlenecks occur at the end sites, often at newly upgraded sites. To help researchers overcome obstacles like firewalls set up to protect enterprise systems, ESnet has developed the Science DMZ approach, which takes science data out of the mainstream network and routes it directly to the scientists systems, allowing large datasets to freely flow in and out without posing security issues.

Tierney said that perfSONAR is one component of the Science DMZ, and as awareness of the Science DMZ has increased over the past two years, so has the use of perfSONAR. In fact, the National Science Foundation’s Campus Cyberinfrastructure - Network Infrastructure and Engineering Program (CC-NIE) to upgrade campus networks supports deployment of the Science DMZ and use of perfSONAR.

Coinciding with this is the continual improvement of the perfSONAR software, which has eliminated many of of the bugs and made it easier to install.

“Although it started out with just one of two metrics of network performance, it’s now a menagerie of tools in one easy-to-install package with not a lot of steps required to install and configure,” said Jason Zurawski of ESnet’s Science Engagement Team. “At its core, perfSONAR is software to fix your network, whether it’s broken or improperly tuned. We find that many new users are tired of poor performance and want to fix it, or they are looking to upgrade their network and want to benchmark the current system to document improvement.”

Zurawski was a student at the University of Delaware when he first began working on development of perfSONAR in 2004, working with Martin Swany, one of the visionaries of the project. In 2007, Zurawski joined Internet2, where he provided first-line support to users. With ESnet since 2013, Zurawski is now a “perfSONAR evangelist.”

The software can help identify such common problems at dirty fiber or a bad jumper connection, or the more common issue of under-powered switches with not enough buffer space. Zurawski said the most unusual case he has seen was a router that suffered packet loss. The facility had cleaned the fiber and replaced the hardware, but there were still fluctuations in the power level of the router. By running perfSONAR on servers on both sides of the router, it pinpointed the problem with the router – it turned out that the hardware had two power units but only one as plugged in. Once the second was plugged in, the system performed at the expected level.

As the number of deployments grows, the software becomes more effective by offering better coverage across more network paths, including ESnet and the connectivity to Department of Energy and NSF funded resources. With enough monitoring points, ESnet staff can triangulate problems and narrow down the possible bottlenecks.

 “Often it is difficult to identify the exact cause of a problem, but the more perfSONAR nodes we have, the more useful it becomes,” Tierney said. At ESnet, we get reports of about five network performance problems in any given week. Without perfSONAR, we often wouldn’t know where to start.”

Tierney said that commercially produced diagnostic software is designed for internal business networks and intranets, not designed to share performance data with outside organizations. In fact, it was that need to come up with standards for measuring and sharing network performance data that led to perfSONAR. The Open Grid Forum first discussed the issue in 2000 and issued its first document on network measurements in 2003. The first tests of the software began in 2005 at the University of Delaware and the first public software was released in 2006. Today, updates are issued about every six months and new versions released about every year.

To further help network engineers, ESnet has created the Fasterdata Knowledge Base, a site with information about performance tuning, tools and techniques. The information, intended for network experts, is freely available.As the software use has grown, so has the perfSONAR community, which shares information on configurations, occasional bugs and more. “The community is also good about answering questions, too,” Tierney said, “and has really taken off in the past year or so.”