Ultra Low Latency Trading Systems: 05/23/15

For a system to operate as fast as possible every line of code needs to be optimal. If you take the approach of writing lazy code then optimising you will end up rewriting everything. A profiler wont help you at the nanosecond level, the overhead of running with profiler metrics will have you "chasing your tail" !

Writing optimal code from the start of the project is easy, set up coding standards and enforce them. Have a simple set of guidelines that everyone follows.

Minimise synchronisation	The synchronized keyword used to be really slow and was avoided with more complex lock classes used in preference. But with the advent of under the cover lock spinning this is no longer the case. That said even if the lock was uncontended you still have the overhead of a read and write memory barrier. So use synchronized where its absolutely needed ie where you have real concurrency. Key here is application design where you want components to be single threaded and achieve throughput via concurrent instances which are independent and require no synchronisation.
Minimise use of volatile variables	Understand how your building blocks work eg AtomicInteger, ConcurrentHashMap. Only use concurrent techniques for the code that needs to be concurrent.
Minimise use of CAS operations	An efficient atomic operation bypassing O/S and implemented by CPU instruction. However to make it atomic and consistent will incur a memory barrier hitting cache effectiveness. So use it where needed and not where not !
Avoid copying objects unnecessarily	I see this A LOT and the overhead can soon mount up Same holds true for mempcy'ing buffer to buffer between API layers (especially in socket code)
Avoid statics	Can be a pain for unit tests, but real issue comes from required concurrency of shared state across instances running in separate threads
Avoid maps	I have worked on several C++ and java systems where instead of a real object model, they used abstract concepts with object values stored in maps. Not only do these systems run slowly, but they lack compile time safety and are simple a pain. Use maps where they are needed … eg a map of books or a map of orders. SMT has a goal of at most one map lookup for each event.
Presize collections	Understand the cost of growing collections, eg a HashMap has to create new array double the size then rehash its elements, an expensive operation when the map is growing into hundreds of thousands. Make initial size configurable.
Reuse heuristics	At end of the day write out the size of all collections. Next time process is bounced resize to previous stored max. Generate other metrics like number of orders created, hit percentage, max tick rate per second … figures that can be used to understand performance and give context to unexpected latency.
Use Object Orientation	Avoiding object orientation due to fear of the cost of vtable lookups seems wrong to me. I can understand it on a micro scale, but on a macro end to end scale whats the impact ? In java all methods are virtual, but the JIT compiler knows what classes are currently loaded and can not only avoid a vtable lookup but can also inline the code. The benefit of object orientation is huge. Component reuse and extensibility make it easy to extend and create new strategies without swathes of cut and paste code.
Use final keyword everywhere	Help the JIT compiler optimise .. If in future a method or class needs extending then you can always remove the final keyword
Small Methods	Keep methods small and easy to understand. Big big methods will never be compiled, big complex methods may be compiled, but the compiler may end of recompiling and recompiling the method to try and optimise. David Straker wrote "KISS" on the board and I never forgot it ! If the code is easy to understand that’s GOOD.
Avoid Auto Boxing	Stick to primitives and use long over Long and thus avoid any auto boxing overhead (stick the auto boxing warning on)
Avoid Immutables	Immutable objects are fine for long lived objects, but can cause GC for anything else … eg a trading system with market data would have GC every second if each tick creates an immutable POJO
Avoid String	String is immutable and is a big no-no for ultra low latency systems. In SMT I have a ZString immutable "string-like" interface. With ViewString and ReusableString concrete implementations.
Avoid Char	Use byte and byte[] and avoid translation between byte and char on every IO operation
Avoid temp objects	Objects take time to construct and initialise. Consider using instance variables for reuse instead (if instance is not used concurrently).
Facilitate object reuse by API	Where possible, pass into a method the object that needs to be populated. This allows invoking code to avoid object creation and reuse instances where appropriate String str = order.toString(); // the api forces construction of temporary string Versus _str.reset(); // a reusable "working" instance var Order.toString( _str ); // because buffer passed into method no temp objects required
Don’t make everything reusable	Just where otherwise the objects would cause GC Object reuse comes with risk of corruption, a key goal of java was to avoid those nasty bugs. Unfortunately for ultra low latency its not an option, you have to reuse objects (remember there are places in Java classes that already use pools and reuse)
Avoid finalize	Objects which hold resources such as files and sockets should all attempt to shutdown cleanly and not rely on finalisers. Add explicit open and close methods and add shutdown handlers to cleanly close if possible.
Avoid threadlocal	Every threadlocal call involves a map lookup for current thread so only use where really needed.
24 * 7	Design your systems to run 24 * 7 …. common in 80's and 90's less so now in finance.

Click here for my list of Ultra Low Latency Blogs and Future Topics

Ultra Low Latency Trading Systems

Saturday, 23 May 2015

Coding for Ultra Low Latency