Friday, January 28, 2011

Spanning Tree Protocol - 802.1D

My college campus used to have a gigantic problem with switching loops. The reason is, they didn't run STP.  All anyone had to do to bring the network to a complete standstill was plug a crossover cable from one port in the wall to another port in the wall.  This would cause a networking loop, leading to a broadcast storm, and obliterating the network.  Why are network loops so bad?  Isn't redundancy good?  Yes, but if you have more than one active path, broadcasts will obliterate your network.  Each time a broadcast it is sent out every port, this broadcast will reach the switch that sent it through the loop, and be forwarded again, and again, and again until there's no bandwidth left for legitimate traffic.

This is why it's so important to prevent loops, and STP is the way you accomplish that.  STP, the Spanning Tree Protocol eliminates loops in the network by essentially shutting down redundant links (paths) in a network.  Here's how it works.  All switches running STP send BPDUs (Bridge Protocol Data Units) as multicast packets to track down loops.  If these BPDUs find their way back to the originating switch, a loop was detected.  BPDUs are also essential to electing the Root Bridge, the pillar of the network.  All switches will try to find the most optimal path to the root bridge, and block all the other paths.  Now, STP is designed to work right out of the box, but without tweaking it can really slow down your network.  The reason for this is the way in which the root bridge is chosen, or elected.  Each switch has a Bridge ID, made of a priority and a MAC address.

Bridge ID = Priority.MACADDR.

It may seem counterintuitive, but lower is better when it comes to the bridge ID.  Out of the box, all switches have a default priority of 32768, so when the networking is choosing the root bridge, the switch with the lowest MAC address will be chosen, which is generally the oldest (going by manufacture date).  To rectify this you have several options.  You can lower the priority (in increments of 4096, don't ask me why) to a lower number than the default, but if there is a tie, they will default to MAC addresses to break it.  You can also use the following command, which will lower the priority to 24576 (IEEE's recommended value for the root):

Switch(config)#spanning-tree vlan 1 root primary

Making sure your root bridge is placed in an optimal location on a powerful switch is very important, otherwise your network will be slowed down by an outdated switch.

Once your root bridge has been elected, switches will start sending out BPDUs to find the best path to that switch.  It judges the path based on link cost.  The following Link Costs are assigned to these link types.
10 Mb/s   -  100
16 Mb/s   -  62
100 Mb/s -  19
1 Gb/s      -  4
2 Gb/s      -  2
4 Gb/s      -  1

So if a switch has to go across 3 100Mb/s links to get to the root bridge, that path will have a cost of 57.  If that switch connected directly to the root bridge over a 10 Mb/s link, it will prefer the former.

This leads us to the three different kinds of ports for STP.

Root Port - The port traffic goes out to reach the root bridge
Designated Port - Port that forwards traffic, there will be one of these per link.  It's either a port that leads to a host, a port on the root bridge, or a port opposite of one that is being blocked.
Blocking/Nondesignated - Port that is blocked by STP.  On a link that is blocked, only one port will be put in blocking mode, the other will remain designated.  Can you guess which side will be blocked?  Yup, the side of the switch with the higher MAC address, because in STP, lower is better.

Pop quiz, what are all the ports on the root bridge set to?  Answer: Designated.  Root ports are used to reach the root bridge, so it obviously won't want to reach itself, and it won't put any ports in blocking mode since other switches need to get to that switch.

It's a really good idea to go online and find some examples and sample problems to practice predicting which links will be shut down.

Let's say you have three switches, connected in a triangle.  Link 1 and 2 are active, and link 3 is in blocking.  Say someone unplugs link 2.  Well STP is going to see this and switch link 3 to a forwarding state, but it's going to take a while (30~50 seconds).  The reason it takes so long is that the ports on link 3 have several states to go through before they become designated ports.

Listening - Port listens to BPDUs to make sure it doesn't hear any loops on the network before it forwards frames.
Learning - Port listens to BPDUs and learns all paths on the network and populates the MAC address table.  The time it takes to go from listening to learning is known as a Forward Delay, and is set on the switch.
Forwarding - Sends and receives frames.  If the port is still a root port or a designated port at the end of the learning state, it will be put in this mode.
Blocking - Port will not send frames, but will listen for BPDUs.  When a switch is powered up, all ports are in this mode.
Disabled - Administratively shut down, does not forward frames or participate in STP.

When all ports in a network have entered a forwarding or blocking state, the switched network has converged.  While STP is in the process of converging, no host data is transmitting.  It usually takes about 50 seconds for the network achieve convergence (though you can reduce timers to lower this, I wouldn't recommend it for basic STP).  So every time there is a network change, your entire switched network goes down for 50 seconds!  Doesn't seem like a lot?  Well it's a huge amount of time in the networking world.  Imagine you have VOIP set up, it only takes a few seconds of downtime to drop your calls.  Credit card transactions won't go through, a server backup fails, and your boss can't get on Facebook.
Yikes.  Luckily there have been some major improvements to STP to speed up the process and make STP more efficient.

Thanks for reading!

No comments:

Post a Comment