+1 (669) 231-3838 or +1 (800) 930-5144

I’ve been looking at P4 language already for a while and I find really interesting how a networking chip could be programmable with the flexibility that a language such as P4 provides. When I watched AWS  A Day in the Life of a Billion Packets presentation, which describes how AWS solves multi-tenancy networking in their Virtual Private Cloud (VPC), I thought it would be interesting to try implementing the same concept with P4.

If you watch the presentation you could see that Amazon basically found limitations on available routers that can be purchased from typical networking companies and they needed to solve this “available box” problem to be able to architect and provide a solution that scales to the size of AWS. They do not describe all the technical details but they provide us with enough information to understand the limitations on common networking deployments, the main building blocks and where they were seeing the problems they were looking to solve.

The AWS VPC networking solution allows customers to have one or multiple networks of any private IP range, so different customers could have the same IP range and their networks are isolated from each other. For example, two customers can have the same 192.168.0.0/24 network but their networks are independent from each other. There are many features around the AWS networking solution, such as firewall rules, connecting private data centres, etc.

Let’s go back to P4 now; I wanted to take the AWS VPC networking use case and write it in P4. So, as soon as packet is generated by a customer resource such a virtual machine and is sent to the network, the solution has to obtain the customer that is creating the packet. If the packet is generated from a valid source to a valid destination for a valid customer then the packet should be transmitted through the network.

I created my own L2 protocol (a new ethernet type 0x0777) to be able to save customer identifier, source switch, destination switch, source address and destination address. This is one of the great things of P4, not being limited to existing protocols and being able to create your own, this also ties back to the ASIC guys that are developing PISA chips, this next generation of silicon promises to be free of the old pipeline limitations that come from trying to design marketable chips with enough flexibility while having to reserve and sometimes hard-code the silicon to handle the possibility of fixed protocol data structures to switch L1-L4 packets with a limited range of field re-program-ability possible. This does not mean that current standard protocols are not useful, it just means that we can use current standard ones which are very useful and custom ones when is required.

All of my source code can be found in following github repository https://github.com/joncastro/p4vpc.

This is how my new protocol header looks in P4.

header_type vpc_t {
  fields {
    srcSw: 48;
    dstSw: 48;
    customer: 32;
    srcAddr: 32;
    dstAddr: 32;
    etherType : 16;
  }
}

Now we have a mechanism to send packets through our network for multiple tenants, let’s resolve basic IP networking.

First, we need to know who the customer is for each packet introduced in our network. We know where the customer resources are located (generally their IP and mac address). The following table matches the packet by IP and mac address on the switch and saves customer id in the ingress metadata, the ingress metadata is like a temporal context information while we process the packet.

action set_vpc_customer(customer) {
  modify_field(ingress_metadata.customer, customer);
}

table vpc_customer {
  reads {
      ethernet.srcAddr : exact;
      ingress_metadata.srcAddr : exact;
  }
  actions {
    _drop;
    set_vpc_customer;
  }
  size : 1024;
}

 

When a machine wants to send a packet to another machine using IP, first it should be able to get the mac address of the destination machine using an ARP request. In our solution, we do not want to broadcast the ARP request to the network because we control which resources (virtual machines) are connected and who is the owner of each resource and besides there maybe security concerns that seek to avoid broadcasting MAC addresses.

In the following example, the machine with the IP address 10.0.0.2 belongs to the red customer who wants to talk with the machine with the 10.0.0.4 IP address.

The switch “P4 A” will catch the ARP request, validates that the virtual machine is asking for a valid IP that belongs to its network, then transform the packet into an ARP reply with the proper information such as the mac address of the machine 10.0.0.4 and finally it sends the packet back as an ARP reply.

The P4 code looks like following, we define a table that allows the 10.0.0.0/24 network and allocates/associates the mac address of 10.0.0.4 with the red customer.

table arp_reply {
  reads {
      ingress_metadata.customer : exact;
      ingress_metadata.srcAddr : lpm;
      ingress_metadata.dstAddr : exact;
  }
  actions {
    _drop;
    set_arp_reply;
  }
  size : 1024;
}

Now, the virtual machine 10.0.0.2 knows the mac address of 10.0.0.4 and is able to send an IP packet; Let’s try an ICMP request (ping).

The switch P4 A gets the packets and detects that the red customer 10.0.0.2 machine wants to send a packet to the red customer machine 10.0.0.4. “P4 A” knows the 10.0.0.4 machine is connected to “P4 C” and it will encapsulate the traffic with our new protocol (ethernet type 0x0777). The switch in the middle will transport the packet to “P4 C” and “P4 C” will then remove the encapsulation header and deliver the packet to 10.0.0.4.

In this case, there is more code involved because we have 3 different roles involved. The switch that gathers and encapsulates the packet, the switches that transports the packet to the destination switch and the destination switch that removes the encapsulation and delivers the packet to the machine.

So for each of these type of roles we have tables to define required behaviour. For example, following code will apply 3 tables when a IPv4 ethernet type is detected, one table for encapsulating the packet, another one to define the source switch and the third one to define the destination switch, this logic is executed on the ingress switch (“P4 A” in this example).

} else if (ethernet.etherType == ETHERTYPE_IPV4 ){
        if (ingress_metadata.customer > 0){
          apply(encapsulate_vpc);
          apply(vpc_sw_id);
          apply(vpc_dst);
        }
    }

 

The intermediate switch just forwards the packet using a table named “route_vpc”. The table basically defines what is the output port to keep sending the packet the destination switch, the intermediate switch does not really care about the content of the customer packet and it just looks what the destination switch is of our new protocol header.

action route_vpc(port) {
  modify_field(standard_metadata.egress_spec, port);
}

table routing_vpc {
  reads {
      vpc.dstSw : exact;
  }
  actions {
      route_vpc;
  }
  size : 1024;
}

Finally, the egress switch (“P4 C” in this example) uses a table named “pop_route_vpc” that removes the encapsulation and delivers the packet to the destination machine.

action pop_route_vpc(port) {
  modify_field(standard_metadata.egress_spec, port);
  modify_field(ethernet.etherType, vpc.etherType);
  remove_header(vpc);
}

table deliver_vpc {
  reads {
      vpc.dstSw : exact;
      vpc.customer : exact;
      vpc.dstAddr : lpm;
  }
  actions {
      pop_route_vpc;
  }
  size : 1024;
}

The demo covers more scenarios, for example; What happens when two machines belonging to different subnets want to talk each other? They basically need a gateway in between, this solution is able to let both machines talk with each other using a “non-existing” gateway, basically, the virtual machines thinks they are sending the packets through a gateway but the gateway is emulated by P4.

For example, when 10.0.0.2 wants to send a packet to the IP 192.168.0.2 needs to use the gateway 10.0.0.1, which it does not really exists in the network, so; the switch responds with the fictitious mac address when 10.0.0.2 asks for the mac address of 10.0.0.1 gateway.

Then, machine 10.0.0.2 can send an IP packet (ICMP request in this example) through the gateway to the machine 192.168.0.2. Notice the destination mac address of the packet is the one belonging to the gateway.

This packet is encapsulated using the same mechanism to encapsulate the packet and send it to the destination switch.

But the destination switch will overwrite the source and destination mac address of the original packet to ensure that the machine 192.168.0.2 thinks this packets is coming from 10.0.0.2 through the gateway 192.168.0.1.

 

In conclusion, P4 proves to us that a switch behaviour can easily be modified and new protocols can be implemented using a common language. P4 language is quite flexible in the context of networking and provides enough primitives to manipulate and forward packets through a network.

 

 

 

Pin It on Pinterest

Share This