For a system to
operate as fast as possible every line of code needs to be optimal. If you take
the approach of writing lazy code then optimising you will end up rewriting
everything. A profiler wont help you at the nanosecond level, the overhead of
running with profiler metrics will have you "chasing your tail" !
Writing optimal code
from the start of the project is easy, set up coding standards and enforce
them. Have a simple set of guidelines that everyone follows.
Minimise
synchronisation
|
The synchronized
keyword used to be really slow and was avoided with more complex lock classes
used in preference. But with the advent of under the cover lock spinning this
is no longer the case. That said even if the lock was uncontended you still
have the overhead of a read and write memory barrier. So use synchronized
where its absolutely needed ie where you have real concurrency.
Key here is
application design where you want components to be single threaded and
achieve throughput via concurrent instances which are independent and require
no synchronisation.
|
Minimise use of
volatile variables
|
Understand how
your building blocks work eg AtomicInteger, ConcurrentHashMap.
Only use
concurrent techniques for the code that needs to be concurrent.
|
Minimise use of
CAS operations
|
An efficient
atomic operation bypassing O/S and implemented by CPU instruction. However to
make it atomic and consistent will incur a memory barrier hitting cache
effectiveness. So use it where needed and not where not !
|
Avoid copying
objects unnecessarily
|
I see this A LOT
and the overhead can soon mount up
Same holds true for mempcy'ing buffer to buffer between API layers (especially in socket code) |
Avoid statics
|
Can be a pain for
unit tests, but real issue comes from required concurrency of shared state
across instances running in separate threads
|
Avoid maps
|
I have worked on
several C++ and java systems where instead of a real object model, they used
abstract concepts with object values stored in maps. Not only do these
systems run slowly, but they lack compile time safety and are simple a pain.
Use maps where they are needed … eg a map of books or a map of orders. SMT
has a goal of at most one map lookup for each event.
|
Presize
collections
|
Understand the
cost of growing collections, eg a HashMap has to create new array double the
size then rehash its elements, an expensive operation when the map is growing
into hundreds of thousands. Make initial size configurable.
|
Reuse heuristics
|
At end of the day
write out the size of all collections. Next time process is bounced resize to
previous stored max.
Generate other metrics like number of orders created, hit percentage, max tick rate per second … figures that can be used to understand performance and give context to unexpected latency. |
Use Object
Orientation
|
Avoiding object
orientation due to fear of the cost of vtable lookups seems wrong to me. I
can understand it on a micro scale, but on a macro end to end scale whats the
impact ? In java all methods are
virtual, but the JIT compiler knows what classes are currently loaded and can
not only avoid a vtable lookup but can also inline the code. The benefit of
object orientation is huge. Component reuse and extensibility make it easy to
extend and create new strategies without swathes of cut and paste code.
|
Use final keyword
everywhere
|
Help the JIT
compiler optimise .. If in future a method or class needs extending then you
can always remove the final keyword
|
Small Methods
|
Keep methods small
and easy to understand. Big big methods will never be compiled, big complex
methods may be compiled, but the compiler may end of recompiling and
recompiling the method to try and optimise.
David Straker wrote "KISS" on the board and I never forgot
it ! If the code is easy to understand
that’s GOOD.
|
Avoid Auto Boxing
|
Stick to
primitives and use long over Long and thus avoid any auto boxing overhead
(stick the auto boxing warning on)
|
Avoid Immutables
|
Immutable objects
are fine for long lived objects, but can cause GC for anything else … eg a
trading system with market data would have GC every second if each tick
creates an immutable POJO
|
Avoid String
|
String is
immutable and is a big no-no for ultra low latency systems. In SMT I have a
ZString immutable "string-like" interface. With ViewString and
ReusableString concrete implementations.
|
Avoid Char
|
Use byte and
byte[] and avoid translation between byte and char on every IO operation
|
Avoid temp objects
|
Objects take time
to construct and initialise. Consider using instance variables for reuse
instead (if instance is not used concurrently).
|
Facilitate object
reuse by API
|
Where possible,
pass into a method the object that needs to be populated. This allows
invoking code to avoid object creation and reuse instances where appropriate
String
str = order.toString(); // the api
forces construction of temporary string
Versus
_str.reset(); // a
reusable "working" instance var
Order.toString(
_str ); // because buffer
passed into method no temp objects required
|
Don’t make
everything reusable
|
Just where
otherwise the objects would cause GC
Object reuse comes with risk of corruption, a key goal of java was to avoid those nasty bugs.
Unfortunately for
ultra low latency its not an option, you have to reuse objects (remember
there are places in Java classes that already use pools and reuse)
|
Avoid finalize
|
Objects which hold
resources such as files and sockets should all attempt to shutdown cleanly
and not rely on finalisers. Add explicit open and close methods and add
shutdown handlers to cleanly close if possible.
|
Avoid threadlocal
|
Every threadlocal
call involves a map lookup for current thread so only use where really
needed.
|
24 * 7
|
Design your
systems to run 24 * 7 …. common in 80's and 90's less so now in finance.
|