Sunday, 19 April 2015

Which Language is Best for Ultra Low Latency

A highly subjective and emotive subject. For me its Java.

A C++ expert that is not also an expert in Java is probably because they don’t believe in Java or have just played with it for a year or two. They simply don’t have the experience to compare fairly. I know because that used to be me !

"just because you don’t know how, doesn’t mean its not possible"

A Java program can be 100 times faster than a functionally similar C++ program … fact.
The key is not the language, C, C++ and Java all end up as machine code. The biggest factor in performance is the application design and in particular its threading model.

Note many patterns for low latency (core affinity, pooling etc) work just as well in Java as they do in C++

I have been asked by many why bother with Java why not just use C++ ?  For the same reason we don’t use assembler anymore. The compilers are highly optimised and use of Java will minimise production downtime and maximise productivity while reducing project costs and avoid pain of trying to hire highly specialised and expensive expert devs / hardware experts.

Productivity is much higher with Java, it has much better tooling than C++.  Java systems excel with ease of maintenance, productive graduates, simpler code.

The argument against Java always comes back to Jitter from GC. To be honest that’s simple to avoid with object pooling. The really painful grief comes from the JIT compiler. Its really hard to write good warmup code that exercises enough of the code base to get best optimised code.

On C++ and avoiding virtual methods. I have seen low latency C++ systems which avoid any use of virtual methods to avoid vtable lookup. A lot of the code ends up looking like C at which point I wonder why bother. Object Orientation is key in trading systems to avoid code bloat, maximise code reuse and significantly reduce development time while increasing quality. One nice feature in Java is the ability at runtime to determine that a method is not extended and for virtual methods to be inlined. Ofcause if you are messing about with dynamic class generation or custom class loaders you may well end up with recompilation jitter (honestly don’t do it -> KISS)

KISS - thanks to David Straker for the best IT lesson I ever had. Back in 1988 in my first developer role, I wrote a really complex C installation program for HPWord. Its was damn complex and I thought it was clever, David kicked it back into touch and told me 'Keep It Simple Stupid'. Many years on and I can honestly say that’s the most important principle I learnt in software development. If someone cant understand the code they cant maintain it.

In my experience java systems are built in half the time, cost a third as much and are much more stable in production. That said, ultra low latency requires object reuse which in itself brings risk regardless of language. I developed SubMicroTrading on a dual core Intel 1.6Ghz, 4GB RAM,  11" Dell Adamo …. On that I could run exchange simulator, client simulator and the complete SMT Trading System !  It wouldn’t of been possible with C++. The dev cycle, code + compile + run + rinse repeat would have been factorially slower.

On other languages, SCALA is a GC sink and while functional code is great for prototyping strategies its implicit nature generates a lot of pretty inefficient code. The argument that the same code in SCALA compiles to the save code in Java misses the point. In my experience the algo written in SCALA can be much more efficently rewritten in Java. SCALA is not suitable for ultra low latency in my opinion.

On GO, I havent used it but was asked recently about using it, honestly my reply of "why" … whats it going to give me that I don’t have with Java with regard to ultra low latency. If I didn’t have over 4 years of IP in SubMicroTrading I would consider it …. But it’s a risk, maybe its tomorrows C# ?

FPGA Hybrid Systems … personally over the years I have seen various claims and promises which didn’t materialise (details of which I cant divulge due to the NDA's banks signed). Beware the smoke and mirrors, understand the true cost of latency in the hybrid system with wire to wire independent metrics for key use cases. Also ensure you understand the edge risks, eg hidden latency, partial TCP stacks, FPGA to main memory latency, and for the strategy atomic book reading and skipping intermediate book updates.

Please do not poke me about C++ or FPGA, I quite frankly am tired of all the arguing and hype. If you want to use C++ / C / FPGA or whatever then that’s your choice I hope you enjoy. 

What is Ultra Low Latency


There is no single answer, and it changes over time. In 1997  a single trade in 1second was normal. 1 trade in 100ms was fast.

Around seven years ago low latency was considered sub millisecond. At that time I was asked if I could design and build a trading system that was the fastest on the street which was 500 microseconds. I said yes. When asked why I would succeed when others had failed, I replied that my background in compilers and real time mixed with IB experience meant I knew what I was doing and thus HotRod was born.

HotRod was a huge success but when asked if I could get it down to 20 micros for FX I said it would require a complete rewrite. You cant take a 10millisecond system and profile / change it to 100 microseconds. And you cant take a 100 microsecond system and profile/improve down to under 10 micros which is what I currently consider ultra low latency. I quit my job and started to write SubMicrotrading framework from scratch. Every line of code has been written with latency in mind.

Low latency timings cannot be considered in isolation, you need to know the scoping of the statistics, what is the use case (eg wire to wire tick to trade) what is the throughput ?  What is the min/max/ave event rate etc.





About Me


Taught myself BASIC at age 12 on a Commodore PET, at 14 I was writing computer games using 6502 Assembler on an Atari 400 (that was much fun, alas nothing published).

Worked commercially in compilers, O/S drivers, real time control systems, distributed databases as well as telecoms. Worked for 3 years in Japan …. Gomenasai, my Japanese is very rusty and poor …. I blame my wife as every time I spoke Japanese to her she replied in English !

Worked in Investment Banking since 1997 at Goldman Sachs, Morgan Stanley, Bank of America, UBS and others. Managed, designed and built various large trading frameworks including Hotrod and SubMicroTrading (written solely by me from scratch with no third party runtime libs other than hwloc).

Specialist in ultra low latency and trading systems.

Regret is being lousy at marketting and never using SubMicroTrading commercially.


Blog List


This blog is about the journey of designing and developing SubMicroTrading (TM) an ultra low latency trading system over the last 5 years. As I cover certain topics (from SuperPools to Book Conflation to Strategy Container) I will make some sections of the code freely available.

The goal of this technical blog is to raise awareness of the potential for Java in implementing Ultra Low Latency Systems and share the techniques I have used.

Please note that work on SubMicroTrading has halted. I have a full time job in an IB (alas not in ultra low latency) and after more than 4 years have no more spare time for it.

Please follow the blog, If I get enough interest I will start open sourcing components including SubMicroFix a nanosecond level fix engine in java. Possibly the fastest fix engine in the world !

Blog List (from latest to oldest, click link to view)

Avoid Unnecessary Allocations and Memcpy's

SubMicroTrading Open Sourced on GIT

SubMicroTrading Ultra Low Latency Open Source Prep

Setting Thread Affinity and Priority from Java

Coding For Ultra Low Latency

Java Bytecode Latency Impact

Java JVM Tuning for Ultra Low Latency

Hardware and Linux Tuning for Ultra Low Latency

Measuring Latency in Ultra Low Latency Systems

Recommendations for Ultra Low Latency

Holistic Latency For Ultra Low Latency

Models & Generating Codecs






Future Blog Subjects (among others)

Application Design Techniques
Avoid GC with SuperPools 
ThreadLocal cost and Alternative
Threading Model in Ultra Low Latency System
Avoiding GC in Java
Utilising Unsafe Class
Java Class Field Offsets
Disruptor Anti-Pattern
Object Models in Ultra Low Latency systems
AntiSpring
Anti Reuse Pattern Immutable
Working Set Size
Run 24 * 7
Collection Sizing
SuperPools
Thread Multiplexing
Custom NIO : bypassing lock overhead and exception stack prep
Custom Maps with Reusable Nodes
SME Persistence vs Chronicle
Dates
Book Conflation & Atomic Book Reads (snapping)
Scaling Dynamic Market Data Sessions with CME
Strategy Container
T1 Strategy
Example Spread Strategy
JMX Admin Commands
High Availability & Resiliency
What Would I do Differently Now