IP Service


5.1 General Requirements

The IP Layer handles internal packets like both the Frame and ATM services, however IP services also provide the interface to the outside world (NMS, student Xterms, etc). All packets from the real world enter the simulation as IP packets on the ethernet device. Although the basic logic flow of MainIn() and MainOut() is the same as the Frame and ATM services, there is lot more to this service.

5.1.1 Background

Lucent switches can be configured as either regular switches or gateway switches. All switches have a Lucent internal IP address, but gateway switches also have an ethernet IP address. Gateway switches provide management paths for SNMP packets from the NMS to the non gateway switches.

In general, when the NMS needs to send an SNMP packet to a switch, the packet is built with the destination address of the Lucent Internal IP address (152.133.20.X in the above figure). For example, when the NMS sends a packet to SW2, he builds the packet for destination 152.133.20.2. However, the NMS has no visibility into the 152.133.20.X network. So he looks in his routing table and finds that the path to 152.133.20.2 is through the gateway 28.1.3.21. Therefore, he sends the packet to the MAC address of the device that responds to ARP for 28.1.3.21.

The gateway switch SW1 receives the packet with his MAC address but SW2's IP address as the destination. Since SW1 knows he is a gateway switch, he looks for a route to SW2, finds the cable and sends the packet on. The path back is similar.

5.1.2 The Problem

VTswitch uses a feature of Unix kernels called IP Aliasing to emulate all switches on a single machine. IP Aliasing allows one machine to respond to many different IP addresses. The kernel network layer examines the destination IP address in each packet, and passes the packet on to the application with a socket bound to that IP. This presents a problem in our simulation.

Again, when the NMS builds a packet destined for SW3, IP 152.133.20.3, he will send it through the gateway 28.1.3.21. He sends out an ARP to see who responds to 28.1.3.21. The VTSwitch system responds with his MAC address, and the NMS sends the packet to the kernel IP layer of VTSwitch. The problem is that the kernel IP layer knows that SW3 has a socket open and listening on 152.133.20.3, and sends the packet directly to SW3.

Why is this a problem? In the world that we are simulating, that packet must pass through the gateway switch SW1, and the intermediate switch SW2 before getting to SW3. If we bypass these switches, the simulation will be wrong. For example, suppose that the cable between SW1 and SW2 is down or the Pport is misconfigured. Suppose the Egress Lport on SW2 is in a state of congestion and is dropping packets. What if IP Record Route or Traceroute is used from the NMS to SW3? At the very least, the byte and packet statistics on the ingres and egress lports of SW 1 and SW2 will be incorrect.

So the problem is obvious. The kernel will want to deliver the packet directly to SW3, but for the simulation to be accurate, the packet must pass through SW1 and SW2.

5.1.3 IP Solution

The solution was a mixed use of RAW Ip sockets, a firewall feature called IP Transparent Proxy, and creative binding of IP addresses to specific devices. Here's how it works.

SW3 is a non gateway switch. When he is started, he opens a RAW IP socket listening on 152.133.20.3, but binds it to the loopback network device. SW2 is also a non gateway switch and does the same thing for IP 152.133.20.2. SW1 is a gateway switch, and therefore needs to act as a router for all packets bound to SW2 and SW3. He creates a RAW IP socket for all IP addresses (INADDR_ANY), and binds it to the ethernet network device.

What this has accomplished so far is that the RAW socket for SW1 will hear all packets from the outside world. This includes packets for 152.133.20.2 (SW2) and 152.133.20.3 (SW3). The Raw sockets opened by SW2 and SW3 are bound to the loopback device and won't hear anything from the outside world.

Finally, the IP Transparent Proxy feature of the kernel firewall is configured to redirect all TCP and UDP packets inbound to 152.X.X.X to port 1313. Listening on port 1313 is UdpSink, which reads these packets and drops them in the bitbucket. If this program was not there, the kernel would return ICMP port unreachable error messages to the NMS, because it would fail to deliver 152.133.20.2 and 152.133.20.3 packets to SW2 and SW3 as they are Raw sockets listening on the wrong device.

So, to follow a packet through the system, the NMS creates an SNMP packet (UDP port 161) to send to SW3. He places the destination IP address of 152.133.20.3 into the header, finds the route through the gateway, ARPs to see who responds to 28.1.3.21, and sends the packet to VTSwitch.

The packet arrives at the kernel network layer. Since there is a Raw socket listen for all packets placed by SW1, a copy of this packet is made and handed to SW1. The original packet continues to the IP transparent Proxy, where the destination port is changed to 1313, then sent on to the UdpSink bitbucket. No port unreachable message is sent back to the NMS.

The copy of the packet is delivered to SW1, who looks in his routing tables (if they were set up correctly by the student), and ships the packet out the correct Lport over the cable to SW2. SW2, if configured properly, passes the packet on to SW3, the destination switch. Mission accomplished. Kinda.

5.1.4 Destination Switch Trickery

When the packet arrives at the destination switch, something needs to be done with it. It still needs to be delivered to the process that needs it. In our example the UDP SNMP packet should be sent to SimAgent. If it was an ICMP Ping packet, or other type of packet, it would have to be handled appropriately.

Consider this. In the real world each switch has its own kernel IP layer. The IP service routines have to emulate this to some degree. There are separate components of the IP service routines to handle ICMP, TCP and UDP packets. ICMP packets are handled directly by by IPGlue. If a response is required, it is sent directly. UDP and TCP packets require a little more.

Implementing an entire UDP or TCP layer in IPGlue would have been too time consuming, so we pull a little trickery and ask the kernel for some help. SimAgent for SW3 is listening for all SNMP UDP packets on a Dot4 private IP address of 192.168.111.3. The NMS sent the packet to SimAgent for SW3 using IP 152.133.20.3. IPGlue masquerades the inbound packet by replacing the destination address with 192.168.113.3, replacing the source IP address with 152.133.20.3, and replacing the source port with a unique index into a masquerade table. It then uses a Raw socket write to send the packet directly to the IP layer in the kernel.

The kernel receives the masquaraded packet and knows to route it to whoever is listening on 192.168.113.3 port 161. SimAgent for SW3 wakes up, received the UDP SNMP packet and processes it, then sends the response to whoever sent it to it, in this case 152.133.20.3 port 9001.

IPGlue has the raw listen on 152.133.20.3 bound to the loopback device. It picks up a copy of the outbound packet before the IP Transparent Proxy redirects it to port 1313 and the bitbucket. It takes this copy and using the destination port as an index into the masquerade table, changes the source IP address to 152.133.20.3, the destination IP address to 28.1.3.10, and the destination port to 234. It then uses its internal static routing tables to direct the packet out the cable to SW2.

5.2 Overview IP Services

IP Service routines follow the same basic logic as the Frame Relay and ATM routines. However, since IP presents the interface to the real world, it rely more on the RawIn() and RawOut() routines. Also, since all packets passed in the sim are rooted in IP, the eventual destination of all internal and external simulation packets (Frame or ATM) is the IP layer.

5.2.1 IP RawIn()

IP RawIn() handles all packets that arrive over the Raw listen socket. On gateway switches, this includes all packets that enter the system over the ethernet network device for any destination. For non-gateway switches, this is any IP packet that has the switch IP as its destination and arrived over the loopback network device.

The packet is read into a local buffer while we decide if we need to deal with it. If we are a gateway switch, this can be fairly complex, as we 'hear' every packet. Basically, if the IP source address matches our ethernet network mask then it is a packet from the NMS or something on the student network (28.1.3.X in most cases). In that case, we see if we have a route to the destination IP address in our static route table. If so, then we handle the packet.

If we are not a gateway switch, we reject all packets that are not TCP or UDP protocol and we reject all packets with a destination port less than our masquerade base port. Basically, the only packets to arrive here should be masquaraded TCP or UDP responses from our SimAgent or telnet server. All other inbound ICMP, TCP and UDP packets should arrive over a cable (different path).

If we decide to keep this packet, then we allocate a DgramHdr buffer for it and encapsulate it in an internal packet header. This header will be kept with the packet for its entire journey through the simulation. This packet is then passed to MainIn().

5.2.2 IP RawOut()

IP RawOut() handles sending packets back out to the Kernel IP layer. On gateway switches this could be either sending the packet out to the real world or sending a masquaraded packet to the switch SimAgent process. Non gateway switches only send masquaraded packets out via RawOut().

RawOut() ensures that the packet is in the correct IP format, recalculates the IP Checksum and sends the packet to the kernel.

5.2.3 IP MainIn()

All packets eventually get here, either from RawIn() or arriving cable packets from another switch. After decrementing the Time To Live (TTL) field, we process the IP Options. The only IP option we support is the IP Record Route option, where the current IP is inserted into a list in the IP header. (This is for ping -R.)

If the destination IP address is our switch IP or gateway IP, we invoke the protocol handler for the type of packet (ICMP, TCP, UDP, OSPF, etc). This will be discussed later.

If the destination IP is not our switch, and the TTL field is non-zero, then we see if we have a static route to the destination. If so, we will eventually invoke IP MainOut(). If not, or if the TTL is zero, then we will format and send an ICMP error message to the IP source address.

5.2.4 MainOut()

MainOut() is simple. If the packet has our IP address as the destination, then it is a masquaraded packet for SimAgent and we invoke RawOut(). If not, we will invoke the static routing routines to send it out the correct cable.

5.2.5 Other Considerations()

The static routing routines work closely with the IP layer. These routines decide where the packet is bound and will call IP MainOut() directly for whichever cable is required.

5.3 ICMP Emulation

Just like real switches, the simulated switches need to have network layer. Since we can't depend on the kernel and masquerading to handle ICMP packets like we do with TCP and UDP, we have to implement a simple ICMP handler. There are many options to ICMP, but we only really care about ICMP ECHO, ECHO Response, and a few of the ICMP error messages. Any other type is ignored.

5.3.1 ICMP echo and Response

Ping requests are sent as ICMP Echo type packets. We handle these by reusing the same buffer and switching the source and destination IP addresses. We then reset the ICMP protocol type to ECHO_REPLY, recalculate the checksum, and send the packet back to the IP MainOut() routine which will decide where to send it.

Echo Replys are a slightly different matter. If we receive an echo reply with our IP as the destination address, it arrived from somewhere inside the simulation, and is a response to a Ping request from the telnet server for this switch. Keep in mind that the telnet server is a separate process from IPGlue. It sends out its Ping ECHO request with the source and destination addresses 'pre-masquaraded' so that IPGlue will pick them up and route them correctly. This pre-masquerading is discussed in detail later.

The telnet server is listening for Ping echo responses on its dot4 private IP address 192.168.111.X. The Icmp Echo handler puts this in the destination address and sends the packet to the RawOut() handler.

5.3.2 ICMP Error messages

Handling of ICMP error messages is similar to ICMP Echo responses. All error messages are responses to local requests from the telnet server or other switch component. The destination address is replaced with the Dot4 private switch IP address 192.168.111.X and the packet is sent to the RawOut() handler.

5.4 TCP/UDP Handling and Masquerading

Masquerading is discussed in detail above. In review, a packet arrives with a source and destination IP address, and a source and destination port. Masquerading replaces the destination address with a new destination, the source address with the IP address of the masquerading entity, and the source port with a unique index into a table. The original source IP and port are stored in this table. The packet is then sent on.

Whoever receives the packet will reply to the source IP address and source port. This is the masquerading entity. He receives the packet, looks up the source port as that unique index into a table, and retrieves the original target IP address and port. These are then placed in the packet and the packet is sent back to the original system.

As you can see, this is inherently a client-server operation. The client request places the correct entries in the masquerading table that the server eventually responds to. The server can not initiate a request to a masquaraded client unless that client has first requested something from the server and created an entry into the masquerade table. In most cases this meets the needs of the simulation. There are a couple of exceptions, of course.

5.4.1 Snmp Traps

Traps are an asynchronous event, not a response to a client request. Fortunately, a switch will not issue a trap before the NMS (SNMP client) has requested something from the switch (SNMP server). Therefore, an entry already exists in the masquerade table.

It is important that trap packets pass through IPGlue. Apart from keeping statistics and routes correct, it the trap is sent directly from SimAgent, the kernel will put some unpredictable IP address as the source IP in the packet (one of the many IP addresses that exist on the system, not necessarily the one for the switch that initiated the trap). The NMS needs the source IP address to be the correct one for the switch that initiated the trap. IPGlue handles this.

5.4.2 Pre-Masquaraded Ping and Traceroute Packets

The telnet server can send Ping and TraceRoute packets to any switch in the simulation. However, if it simply puts the IP address of the target switch as the destination of the packet, the kernel would deliver the packet directly to the destination switch, bypassing all the cables and routes that the packet should have taken to make the simulation real.

To fix this, the telnet server knows to pre-masquarade the packet. For example, the telnet server sends a Ping to 152.133.20.3. He will put his own Lucent internal IP address 152.133.20.1 in the destination IP address and the realdestination address in the source IP address and send the packet to the kernel. IPGlue hears the packet on the raw listen socket and flips the source and destination addresses. The kernel flushes the original packet to UdpSink.

When the target switch responds, it does so to the 152.133.20.1 address. All intermediate switches know how to route the packet on this address. When the packet arrives to IPGlue on the destination switch, he replaces the destination address with the Dot4 private IP address 192.168.111.1 and sends it out.

5.5 Static Routing

Static routing may need to be revisited eventually. Current implementation will suffice, as non-IP packets arrive over FRAD or ATM routers and those RawIn()s. The handling of routing needs to be separated further from the protocol requesting the route

5.6 Discussion

Is all of this really necessary? Yup. This simulation deals with a mixture of real world packets from the NMS and student Xterms (SNMP, Ping, and traceroute), as well as internal simulated packets from the telnet server. All packets are required to follow the correct paths, either directly to gateway switches, or over cables, trunks and circuits to other switches.

Any number of variables can affect the path a packet takes. Congestion, bad trunks on intermediate switches, misconfigured physical ports, unknown routes and incorrectly provisioned circuits to name just a very few. However, the path choosen for the packet, as well as the time it takes, has to be correct for the simulation to be useful. This applies equally for both internally generated packets from the telnet server, and externally generated packets from the NMS or student Xterm.

Not only do the paths choosen have to be correct for traceroute and ping -R, but the statistics on the destination switch, as well as all intermediate switches, must be accurate. Statistics generated by each switch are used by the student to determine if he or she has correctly provisioned the switch and the network. Without accurate network statistics, the simulation is not useful.