"characterized by the belief that the parts of
something are intimately interconnected and explicable only by reference to the
whole."
Its crucial to
understand all the components that make up a trading system as each will have
its own latency characteristics.
Layer
|
Sample
Factor
|
Mitigation
|
Effort
|
Software Program
|
Design
|
Key design
patterns / extensible framework / Efficient code
|
Hard
|
Language
|
Inherent
GC
JIT
|
Compile time args
/ run time args
Object Pooling
Warmup Code
|
Easy
Medium
High
|
Operating System
|
Scheduler
Hard Page Fault
Network Stack
|
Thread Affinity /
Core Spinning
Appropriate memory
in server, Process sizing
Kernal Bypass
Drivers and Tuning, Socket spinning
|
Easy
Easy
Easy
|
Hardware
|
CPU
Memory
Network
|
Disable H/T, get
fastest CPU, Overclock
Buy memory with
lowest latency and ensure enough
Buy Solarflare NIC
|
Easy
Easy Easy |
I have built four
rack servers with a box full Chelsio, Mellanox and Solarflare NIC's. By far the
easiest to install and easiest to tune and best performing was Solarflare.
Really disappointed in the Mellanox cards. Solarflare open onload provides one
sided acceleration suitable for colocation purposes and at no extra cost. This
was several years ago so maybe Mellanox have their own one sided acceleration
now but for me its come too late. I have preached Solarflare NIC's to everyone
I know.
Consider following
scenario
Read
next packet from socket
decodes
market data tick into exchange normalised event
log
event
Place
event into queue for async consumption
How much benefit
will there be in the end system by saving 20nano seconds in switching from a
queue from ConcurrentLinkedQueue to a
RingBuffer (eg Disruptor) to your system ? Will it be twice as quick ?
…. No, what about how you read the packet of the socket ? What about the log
event ? What about the queue size characterics ?
I have seen FX
systems with man years of effort put into latency optimisation when they didn’t
even use OpenOnload for their Solarflare cards ! What has higher risk .. Using
OpenOnload or the code a 10 man team has written over 3 years ?
You must understand
the key use cases for your system where latency is important, then create end to end repeatable bench tests in fully
controlled environment which will be reflective of the production
environment. I suggest wire to wire timings with PTP Solarflare NIC's.
Alternatively use two servers (1 simulation, 1 trading) with dual Solarflare
NIC's … in this case you don’t need PTP (I will cover this in a later blog with the JNI code I put together).
Micro-benchmarks
must be used with care, they can give good comparative performance against
other implementations. But wont necessarily produce production system gains
given all the variables in play. For example in above scenario consider what
happens when a fixed sized queue fills up. More on benchtesting another day.