DRAFT Chin Guok August 2001 Energy Sciences Network BGP Persistent Route Oscillation 1. Abstract It has recently been discovered that persistent BGP route oscillations can occur when BGP route reflection [1] is used in certain configurations [2]. This document discusses a varient of the Type I oscillation as addressed in "BGP Persistent Route Oscillation Condition" [2]. 2. Introduction With the current tie breaking rules as defined in BGP-4 [3], competing route announcements may have different ranking orders when compared on different criterion (such as MULTI_EXIT_DISC value, external vs internal peering, interior cost, or BGP Identifier). It is this variation of ranking orders that can cause persistent route oscillations when used in conjunction with BGP route reflection. In a network where BGP route reflection is deployed, complete visibility of avaliable exit points are not disseminated to all internal routers. The lack of completeness results in path selections that are partial. This in turn can lead to instabilities when the selected paths are reannounced out [3]. 3. Discussion The following conditions are sufficient for route oscillation to occur: 1) a network with single-level Route Reflection deployed AND 2) the network accepts BGP MULTI_EXIT_DISC (MED) values from two or more ASs for a single prefix that are unique [2]. Althought not part of the BGP-4 standard [4], many vendors have implemented the tie breaking of AS path lengths (shorter being better) which preceeds the comparision of MEDs. 3.1. Route Oscillation Scenario Consider the topology as seen in Figure 1: -------------------------------------------------------------- / ------------------- \ | / \ | | AS1 | Cluster | | | | | | | *5 | | | | Ra . . . . . . . . . . . . . . Rb(RR) | | | .. | . | | | . . | .*12 | | | . . | . | | | . . | Rc(C) | | | . . | . | | | . . \ . / | | . . --------.---------- | \ (10). .(1) (0). / -------.----------------.--------------------.---------------- .[192.168.0.1] . . ------ . . / \ [172.16.0.1]. ------------ .[172.16.0.2] | AS10 | / \ \ / | AS6 | ------ \ / . ------------ . . . -------------- . / \ | AS100 |- 10.0.0.0/8 \ / -------------- Figure 1: Example of Route Reflection Topology In Figure 1, AS1 contains a Route Reflector Cluster with one Route Reflector (RR) (i.e., Rb), and one Client (C) (i.e., Rc). The BGP peerings are represented in dotted lines. The number contained in parentheses on the AS1 EBGP peering sessions represents the MED value advertised by the peer to be associated with the 10.0.0.0/8 network reachability advertisement. The IP address contained in brackets on the AS1 EBGP peering sessions represent the BGP Identifier associated with the peering session. The number proceeding each '*' on the IBGP peering sessions repre- sents the additive IGP metrics that are to be associated with the BGP NEXT_HOP attribute for the concerned route. For the following steps the best path will be marked with a '*'. 1) a) Ra receives route announcements from its external neighbors (i.e., [192.168.0.1] AS10, [172.16.0.1] AS6). b) With both routes being external, the tie break is done on the lowest BGP Identifier, resulting in the following BGP table. NEXT_HOP BGP AS_PATH MED IGP Cost Identifier ------------------------------------- * 6 100 1 - 172.16.0.1 10 100 10 - 192.168.0.1 c) Ra sends an UPDATE to its internal neighbors (i.e., Rb) with '6 100, 1, -, 172.16.0.1' selected as the best route. 2) a) Rc receives a route announcement from its external peer (i.e., [172.16.0.2] AS6). b) With no competing announcements, the route is marked as best and installed in the BGP table NEXT_HOP BGP AS_PATH MED IGP Cost Identifier ------------------------------------- * 6 100 0 - 172.16.0.2 c) Rc sends an UPDATE (to Rb) with the route it has learned. 3) a) Rb receives the UPDATE notification from both Ra and Rc ('6 100, 1, 5, 172.16.0.1' and '6 100, 0, 12, 172.16.0.2' respectively). b) With both route announcements coming from the same AS, the MED value is used for the tie break. The routes are installed in the BGP table with '6 100, 0, 12, 172.16.0.2', having the lower MED, marked as best. NEXT_HOP BGP AS_PATH MED IGP Cost Identifier ------------------------------------- * 6 100 0 12 172.16.0.2 6 100 1 5 172.16.0.1 c) Rb sends out an UPDATE to its neighbors with the selected route. 4) a) Ra receives the UPDATE message from Rb and compares it with its externally learned routes. b) Using the deterministic tie breaking rules in BGP 4 [3], the '6 100, 0, 17, 172.16.0.2' route is first compared to '6 100, 1, -, 172.16.0.1' and chosen due to a lower MED value. The comparision is then made between '6 100, 0, 17, 172.16.0.2' and '10 100, 10, -, 192.168.0.1' with the latter being selected, based on the preference of routes learned externally (over internal). NEXT_HOP BGP AS_PATH MED IGP Cost Identifier ------------------------------------- 6 100 0 17 172.16.0.2 6 100 1 - 172.16.0.1 * 10 100 10 - 192.168.0.1 c) Ra sends an UPDATE message (back to Rb) with its new best route. 5) a) Rb receives the UPDATE message from Ra, causing the '6 100, 1, 5, 172.16.0.1' BGP table entry to be replaced by '10 100, 10, 5, 192.168.0.1'. b) '10 100, 10, 5, 192.168.0.1' is marked as the best route due to a lower IGP metric. NEXT_HOP BGP AS_PATH MED IGP Cost Identifier ------------------------------------- 6 100 0 12 172.16.0.2 * 10 100 10 5 192.168.0.1 c) Rb sends an UPDATE/withdraw to notify its neighbors of the change. 6) a) Ra receives the UPDATE/withdraw message from Rb, and withdraws the '6 100, 0, 17, 172.16.0.2' route from its BGP table. b) With a lower BGP Identifier, '6 100, 1, -, 172.16.0.1' is once again selected as the best route. NEXT_HOP BGP AS_PATH MED IGP Cost Identifier ------------------------------------- * 6 100 1 - 172.16.0.1 10 100 10 - 192.168.0.1 c) Ra sends out a corresponding UPDATE to its neighbors (i.e., Rb) with the new best route. At this point, we have looped back to step 1. This cycle then repeats itself causing the BGP route oscillations. 3.1.1. Topology Variation Consider the topology as seen in Figure 2: -------------------------------------------------------------- / ----------------------------------------------- \ | / \ | | AS1 | Cluster | | | | | | | | *5 | | | | Ra(C). . . . . . . . . . . . . Rb(RR) | | | | .. . | | | | . . .*12 | | | | . . . | | | | . . Rc(C) | | | | . . . | | | \. . . / | | . -----------.------------------------.---------- | \ (10). .(1) (0). / -------.----------------.--------------------.---------------- .[192.168.0.1] . . ------ . . / \ [172.16.0.1]. ------------ .[172.16.0.2] | AS10 | / \ \ / | AS6 | ------ \ / . ------------ . . . -------------- . / \ | AS100 |- 10.0.0.0/8 \ / -------------- Figure 2: Example of Route Reflection Topology Variation In Figure 1, Ra is an IBGP meshed peer to Rb, which in turn is a RR to Rc. In Figure 2, both Ra and Rc are RR Clients of Rb. Due to the operational similarities between IBGP meshed peers and IBGP RR/Client relationships, the topology depicted in Figure 2, will experience the same route oscillation as discussed for Figure 1. 3.2. Potential Workarounds Apart from those discussed in "BGP Persistent Route Oscillation Condition" [2], there have been no addition workarounds. 4. References [1] Bates, T., Chandra, R., Chen, E., "BGP Route Reflection - An Alternative to Full Mesh IBGP", RFC 2796, April 2000. [2] McPherson, D., Gill, V., Walton, D., Retana, A., "BGP Persistent Route Oscillation Condition", Work in Progress (draft-ietf-idr-route-oscillation-00.txt), March 2001 [3] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", Work in Progress (draft-ietf-idr-bgp4-12.txt), March 2001. [4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 2001. 5. Authors' Addresses Chin Guok Energy Sciences Network 1 Cyclotron Road MS 50A-3111 Berkeley, CA 94720 Email: chin@es.net