Sunday, 19 April 2015

Which Language is Best for Ultra Low Latency

A highly subjective and emotive subject. For me its Java.

A C++ expert that is not also an expert in Java is probably because they don’t believe in Java or have just played with it for a year or two. They simply don’t have the experience to compare fairly. I know because that used to be me !

"just because you don’t know how, doesn’t mean its not possible"

A Java program can be 100 times faster than a functionally similar C++ program … fact.
The key is not the language, C, C++ and Java all end up as machine code. The biggest factor in performance is the application design and in particular its threading model.

Note many patterns for low latency (core affinity, pooling etc) work just as well in Java as they do in C++

I have been asked by many why bother with Java why not just use C++ ?  For the same reason we don’t use assembler anymore. The compilers are highly optimised and use of Java will minimise production downtime and maximise productivity while reducing project costs and avoid pain of trying to hire highly specialised and expensive expert devs / hardware experts.

Productivity is much higher with Java, it has much better tooling than C++.  Java systems excel with ease of maintenance, productive graduates, simpler code.

The argument against Java always comes back to Jitter from GC. To be honest that’s simple to avoid with object pooling. The really painful grief comes from the JIT compiler. Its really hard to write good warmup code that exercises enough of the code base to get best optimised code.

On C++ and avoiding virtual methods. I have seen low latency C++ systems which avoid any use of virtual methods to avoid vtable lookup. A lot of the code ends up looking like C at which point I wonder why bother. Object Orientation is key in trading systems to avoid code bloat, maximise code reuse and significantly reduce development time while increasing quality. One nice feature in Java is the ability at runtime to determine that a method is not extended and for virtual methods to be inlined. Ofcause if you are messing about with dynamic class generation or custom class loaders you may well end up with recompilation jitter (honestly don’t do it -> KISS)

KISS - thanks to David Straker for the best IT lesson I ever had. Back in 1988 in my first developer role, I wrote a really complex C installation program for HPWord. Its was damn complex and I thought it was clever, David kicked it back into touch and told me 'Keep It Simple Stupid'. Many years on and I can honestly say that’s the most important principle I learnt in software development. If someone cant understand the code they cant maintain it.

In my experience java systems are built in half the time, cost a third as much and are much more stable in production. That said, ultra low latency requires object reuse which in itself brings risk regardless of language. I developed SubMicroTrading on a dual core Intel 1.6Ghz, 4GB RAM,  11" Dell Adamo …. On that I could run exchange simulator, client simulator and the complete SMT Trading System !  It wouldn’t of been possible with C++. The dev cycle, code + compile + run + rinse repeat would have been factorially slower.

On other languages, SCALA is a GC sink and while functional code is great for prototyping strategies its implicit nature generates a lot of pretty inefficient code. The argument that the same code in SCALA compiles to the save code in Java misses the point. In my experience the algo written in SCALA can be much more efficently rewritten in Java. SCALA is not suitable for ultra low latency in my opinion.

On GO, I havent used it but was asked recently about using it, honestly my reply of "why" … whats it going to give me that I don’t have with Java with regard to ultra low latency. If I didn’t have over 4 years of IP in SubMicroTrading I would consider it …. But it’s a risk, maybe its tomorrows C# ?

FPGA Hybrid Systems … personally over the years I have seen various claims and promises which didn’t materialise (details of which I cant divulge due to the NDA's banks signed). Beware the smoke and mirrors, understand the true cost of latency in the hybrid system with wire to wire independent metrics for key use cases. Also ensure you understand the edge risks, eg hidden latency, partial TCP stacks, FPGA to main memory latency, and for the strategy atomic book reading and skipping intermediate book updates.

Please do not poke me about C++ or FPGA, I quite frankly am tired of all the arguing and hype. If you want to use C++ / C / FPGA or whatever then that’s your choice I hope you enjoy. 

No comments:

Post a Comment