Five Nanos and Fibre Taps
Five Nanos and Fibre Taps
23rd November, 2020
Here at Options, when we say we are “on a mission to transform the financial sector”, we mean it. In recent times, the leading technology transforming the sector is Layer 1 switching, also known as multiplexers, FPGA switches or port replicators. These devices endeavour to replicate digital signals from one port to another on a single device. Additionally, they may also apply some form of logic to that signal, whether it be buffering, multiplexing, filtering or even timestamping. Over the last couple of years, we have seen several mainstream network vendors release their own Layer 1 offerings. These were aimed at the ultra-low latency trading space to provide the function of distributing market data or multiplexing order entry sessions.
Cutting through the noise of the Layer 1 offerings out there, it comes down to two primary offerings across the various vendors: Layer 1, with a 1:1 or 1:N replication of the bitstream at an electronic level with advertised single nanosecond latency; and Layer 1 with an FPGA. The FPGA supplements the Layer 1 technology and provides additional features such as N:1 packet forwarding, packet filtering or timestamping with typically less latency than the ASIC-based switches on the market. In short, Layer 1 is blindly pushing bits around, and Layer 1 with FPGA is using intelligence to determine how to forward or manipulate packets.
So why is Layer 1 technology transformative for the financial services sector? Well, it all comes down to latency, the less, the better. These devices all have one thing in common; they operate mostly at Layer 1 of the networking OSI model, which means there’s a requirement to inspect less of a packet before forwarding. Much like the “cut-through” switching method, this results in less time spent moving a packet between two ports on a device.
Over the last year, Options has been working with different Layer 1 offerings. In addition to developing a thorough independent understanding of the technology, it allows us to advise clients from a vendor and technology agnostic viewpoint and bring to market the latest and greatest in Layer 1 solutions for our client base. As of today, we’ve been able to develop two specific client solution models, namely dedicated and shared. Where the latter is a Layer 1 extension of our current low latency shared platform, allowing for customers to leverage Layer 1 latency benefits at a reduced cost, and the former providing the ultimate Layer 1 non-filtered, one hop connectivity solution.
The critical question though is; how do you prove latency? How do you know that you are getting what you are paying for? As a Managed Service Provider, we understand that transparency is key. Quoting vendor documentation statistics alone is not a responsible method when it comes to assessing something that claims to be “the best of the best”. Having the ability to prove to our clients exactly how much latency a device introduces to a specific packet flow is imperative. This area of trading technology is a little ambiguous, and there is a magnitude of often contradictory information available, meaning that navigating these murky waters to discover the ‘real’ unbiased facts is no mean feat.
So, to counter this challenge we’ve set out on a new mission, to build a test bench in our labs that has the capability to accurately measure the amount of latency any device or component introduces to single packet flow. Interestingly, in doing so, we also managed to prove something else.
For several years now, we have been delivering low latency network solutions for our clients. Whereby traditionally we would look at reducing latency by minimising the distance of fibre between a client server to a Top of Rack switch, minimising the distance of fibre between switch hops, reducing the number of switch hops or even trying to be physically closer to an Exchange/Broker end system or patch entry point. Putting the latency of a switch aside, the longer a piece of fibre anywhere along the packet forwarding path, results in higher latency. According to vendors, technical writings and the common wisdom, five nanoseconds per one meter of fibre is a calculation widely believed to hold water. Given where we are today, the ‘common wisdom’ just won’t do.
As we began to dig deeper, we set out to derive a Baseline figure of the test bench itself; introduce the device or component being tested to the test bench and then subtract the two resulting figures to provide an accurate calculation of the latency introduced. Now, as far lab environments go – there is always some form of trial and error involved in getting everything calibrated. Initially, we found that the Baseline results we were observing were very odd, packets were missing timestamps, delta times were at a zero or even minus values and several packet errors were present. The first step of trial and error was to introduce a 30-meter reel of fibre into the path to see if we could remove the minus delta values. In fact, it did nothing at all and just increased the resulting latency. After another round of calibration, we noticed that a script that we used to post-process the packet captures needed further tweaking. Eventually, we reached a point where the resulting captures were providing clear, clean and error-free data which we could use to calculate latency.
But wait! We still had the 30-meter reel of fibre in place, and the test bench Baseline was resulting in 172 nanoseconds. We knew that we had to remove this reel and re-calibrate the Baseline figure, but before doing so, we wanted to check something else. If the test bench was benchmarking at 172 nanoseconds, by removing the reel and deriving a new Baseline and subtracting the two resulting values, we could accurately calculate the latency introduced by the 30-meter reel of fibre. We then set out to remove the fibre reel and re-calibrated the test bench Baseline at 22 nanoseconds.
So, if we subtract the two figures, 172 minus 22, we are left with 150 nanoseconds. That is 150 nanoseconds of latency introduced by 30 meters of fibre. Furthermore, if we divide 150 by 30, we are left with the magic number five. That is five nanoseconds of latency introduced by every meter of fibre! And there you have it – we just proved the five-nanosecond rule!
The test bench itself utilises a couple of fibre tap devices; these are the same devices we use today across the global platform, mostly positioned to serve us with telemetry data. We know the amount of latency that they introduce into the test bench is irrelevant, due to the fact we compare everything against our Baseline figure – it did make us ask the question; how much latency does a tap device introduce? See results below:
|Test Description||Ping Payload Size||Tap A Timestamp||TAP B Timestamp||Latency (ns)|
|10M Single Fibre Baseline|
|Native (48 Bytes)||743559086||743559138||52|
|IXIA TAP – TPX-10-SR-50-50|
|Native (48 Bytes)||449410338||449410393||3|
The results are clear, for both packet sizes the total latency of the test bench increased to 55 nanoseconds, and the Baseline figure calibrated at the time was 52 nanoseconds, resulting in an overall latency introduction of 3 nanoseconds.
It is also worth noting that when these taps are used in a normal environment (i.e. not a test bench), a single fibre patch is also required to get them inline – and that fibre will also introduce latency. How much latency? Five nanoseconds per meter of latency!
So with a calibrated, stable and accurate test bench to measure latency and the ability to dissect the latency introduced by any component in the packet forwarding path, we’re all set to begin our journey reviewing the performance, nuances and key metrics of the top network vendor devices in this space.
Stay tuned for part two and further discussions around what we learned and our resulting whitepapers.
– Josh Patel, Network Engineer, Options