Sunday 26 April 2015

Holistic Latency for Ultra Low Latency Systems



Definition of holistic

"characterized by the belief that the parts of something are intimately interconnected and explicable only by reference to the whole."

Its crucial to understand all the components that make up a trading system as each will have its own latency characteristics.

Layer
Sample
Factor
Mitigation
Effort
Software Program
Design
Key design patterns / extensible framework / Efficient code
Hard
Language
Inherent
GC
JIT
Compile time args / run time args
Object Pooling
Warmup Code
Easy
Medium
High
Operating System

Scheduler
Hard Page Fault
Network Stack
Thread Affinity / Core Spinning
Appropriate memory in server, Process sizing
Kernal Bypass Drivers and Tuning, Socket spinning
Easy
Easy
Easy
Hardware
CPU
Memory
Network
Disable H/T, get fastest CPU, Overclock
Buy memory with lowest latency and ensure enough
Buy Solarflare NIC
Easy
Easy
Easy

I have built four rack servers with a box full Chelsio, Mellanox and Solarflare NIC's. By far the easiest to install and easiest to tune and best performing was Solarflare. Really disappointed in the Mellanox cards. Solarflare open onload provides one sided acceleration suitable for colocation purposes and at no extra cost. This was several years ago so maybe Mellanox have their own one sided acceleration now but for me its come too late. I have preached Solarflare NIC's to everyone I know.

Consider following scenario

Read next packet from socket
decodes market data tick into exchange normalised event
log event
Place event into queue for async consumption

How much benefit will there be in the end system by saving 20nano seconds in switching from a queue from ConcurrentLinkedQueue  to  a  RingBuffer (eg Disruptor) to your system ? Will it be twice as quick ? …. No, what about how you read the packet of the socket ? What about the log event ? What about the queue size characterics ?
I have seen FX systems with man years of effort put into latency optimisation when they didn’t even use OpenOnload for their Solarflare cards ! What has higher risk .. Using OpenOnload or the code a 10 man team has written over 3 years ?

You must understand the key use cases for your system where latency is important, then create end to end repeatable bench tests in fully controlled environment which will be reflective of the production environment. I suggest wire to wire timings with PTP Solarflare NIC's. Alternatively use two servers (1 simulation, 1 trading) with dual Solarflare NIC's … in this case you don’t need PTP (I will cover this in a later blog with the JNI code I put together).

Micro-benchmarks must be used with care, they can give good comparative performance against other implementations. But wont necessarily produce production system gains given all the variables in play. For example in above scenario consider what happens when a fixed sized queue fills up. More on benchtesting another day.






Monday 20 April 2015

MODELs & Generated Code

Hand cranked codecs are a pain to write and a pain to upgrade, I have written many over the years !  The solution is to generate the code.

An XML model defines the internal model with POJO's that all components can work with eg NewOrderSingle, NewOrderAck, TradeNew. It also defines external models which can be client or exchange, FIX variants or binary protocols such as ETS, Millenium, UTP etc. Finally it defines codecs which specify how to translate external model to/from internal model.

Sample Internal Event for a New Order Single

<Base id="BaseOrderRequest" src="client" extends="CommonClientHeader">
  <Attribute typeId="Instrument"                              name="instrument"   mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="ClientProfile"                           name="client"       mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[CLORDID_LENGTH]"    tag="11"  name="clOrdId"      mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[CLORDID_LENGTH]"    tag="41"  name="origClOrdId"  mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[SECURITYID_LENGTH]" tag="48"  name="securityId"   mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[SYMBOL_LENGTH]"     tag="55"  name="symbol"       mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="Currency"                      tag="15"  name="currency"     mandatory="N"     outbound="seperate"/>
  <Attribute typeId="SecurityIDSource"              tag="22"                      mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="UTCTimestamp"                  tag="60"  name="transactTime" mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="UTCTimestamp"                  tag="52"  name="sendingTime"                    outbound="seperate"/>
  <Attribute typeId="Side"                          tag="54"                      mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[SRC_LINKID_LENGTH]"           name="srcLinkId"    mandatory="N"     outbound="delegate"/>
</Base>
   
<Base id="OrderRequest" src="client" extends="BaseOrderRequest">
  <Attribute typeId="viewstring[ACCOUNT_LENGTH]"        tag="1"   name="account"          mandatory="N" outbound="delegate"/>
  <Attribute typeId="viewstring[TEXT_LENGTH]"           tag="58"  name="text"             mandatory="N" outbound="delegate"/>
  <Attribute typeId="viewstring[EXDESTINATION_LENGTH]"  tag="100" name="exDest"           mandatory="N" outbound="delegate"/>
  <Attribute typeId="viewstring[SECURITYEXCH_LENGTH]"   tag="207" name="securityExchange" mandatory="N" outbound="delegate"/>
  <Attribute typeId="double"                        tag="44"  name="price"      mandatory="Y"     outbound="seperate"/>
  <Attribute typeId="int"                           tag="38"  name="orderQty"   mandatory="Y"     outbound="seperate"/>
  <Attribute typeId="ExecInst"                      tag="18"                                      outbound="delegate"/>
  <Attribute typeId="HandlInst"                     tag="21"                                      outbound="delegate"/>
  <Attribute typeId="OrderCapacity"                 tag="528"                                     outbound="seperate"/>
  <Attribute typeId="OrdType"                       tag="40"                    mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="SecurityType"                  tag="167"                                     outbound="delegate"/>
  <Attribute typeId="SecurityIDSource"              tag="22"                                      outbound="delegate"/>
  <Attribute typeId="TimeInForce"                   tag="59"                                      outbound="delegate"/>
  <Attribute typeId="BookingType"                   tag="775"                                     outbound="delegate"/>
  <Attribute typeId="long"                                    name="orderReceived" mandatory="Y"  outbound="delegate"/>
  <Attribute typeId="long"                                    name="orderSent"     mandatory="Y"  outbound="delegateGetAndSet"/>
</Base>

<Event id="NewOrderSingle" extends="OrderRequest" src="client">
  <Attribute typeId="viewstring[CLORDID_LENGTH]"    tag="11"  name="clOrdId"     mandatory="Y"     outbound="seperate"/>
</Event>

Sample external definition for a New Order in ETI … note each field must have a dictionary entry which defines its type in the external model.

<Message id="NewOrderRequestSimple"            msgType="10125">
    <Field id="msgSeqNum"               mand="Y"/>
    <Field id="senderSubID"             mand="Y"/>              
   
    <Field id="price"                   mand="Y"/>
    <Field id="senderLocationID"        mand="N"/>
    <Field id="clOrdId"                 mand="Y"/>
    <Field id="orderQty"                mand="Y"/>
    <Field id="filler1c"                len="4"/>   <!-- maxShow tag210 -->
    <Field id="simpleSecurityID"        mand="Y"/>
   
    <Field id="accountType"             mand="N"/>
    <Field id="side"                    mand="Y"/>
    <Field id="priceValidityCheckType"  mand="Y"/>
    <Field id="timeInForce"             mand="Y"/>
    <Field id="execInst"                mand="Y"/>
    <Field id="uniqueClientCode"        mand="N"/>
    <Field id="filler3"                 len="3" comment="pad3"/>     
</Message>

Sample CODEC for a New Order in BSE ETI

<MessageMap id="BaseBSEOrder" messageId="" ignore="true" extends="BaseRequest">
    <Map field="marketSegmentID"><Hook type="encode" code="encodeMarketSegmentID( msg.getInstrument() )"/></Map>

    <Map eventAttr="securityId" field="simpleSecurityID">
        <Hook type="encode" code="encodeSimpleSecurityId( msg.getInstrument() )"/>
        <Hook type="decode" code="_securityId = _builder.decodeUInt()"/>
    </Map>

    <Map field="priceValidityCheckType"><Hook type="encode" code="_builder.encodeByte( (byte)0 )"/></Map>
    <Map field="accountType"><Hook type="encode" code="_builder.encodeByte( (byte)20 )"/></Map>
    <Map field="maxPricePercentage"><Hook type="encode" code="_builder.encodePrice( 0.5 )"/></Map>
    <Map field="senderLocationID"><Hook type="encode" code="_builder.encodeLong( _locationId )"/></Map>
    <Map field="orderCapacity"><Hook type="encode" code="_builder.encodeByte( (byte)1 )"/></Map>
    <Map field="positionEffect"><Hook type="encode" code="_builder.encodeByte( (byte)'C' )"/></Map>
    <Map field="account"><Hook type="encode" code="_builder.encodeStringFixedWidth( _account, 2 )"/></Map>
    <Map field="applSeqIndicator"><Hook type="encode" code="_builder.encodeByte( (byte)0 )"/></Map>
    <Map field="execInst"><Hook type="encode" code="_builder.encodeByte( (byte)2 )"/></Map>
    <Map field="uniqueClientCode">
        <Hook type="encode" code="_builder.encodeStringFixedWidth( _uniqueClientCode, 12 )"/>
        <Hook type="decode" code="_builder.skip( 12 )"/>
    </Map>
</MessageMap>

<MessageMap id="NewLimitOrder"   eventId="NewOrderSingle" messageId="NewOrderRequestSimple" extends="BaseBSEOrder" encodeFunc="encodeNOS">
    <Map field="productComplex"><Hook type="encode" code="_builder.encodeByte( (byte)1 )"/></Map>
    <Hook type="postDecode" code="enrich( msg ) ; msg.setOrdType( OrdType.Limit )"/>
</MessageMap>

Note hooks allow overriding of the default code generation which is based on comparing the internal model dictionary entry with the external model dictionary entry. Map entries are only added for fields that don’t want the default behaviour.

Sample Generated Encoder for NOS :-

public final void encodeNewLimitOrder( final NewOrderSingle msg ) {
    final int now = _tzCalculator.getNowUTC();
    _builder.start( MSG_NewOrderRequestSimple );
    if ( _debug ) {
        _dump.append( "  encodeMap=" ).append( "NewOrderRequestSimple" ).append( "  eventType=" ).append( "NewOrderSingle" ).append( " : " );
    }

    if ( _debug ) _dump.append( "\nField: " ).append( "msgSeqNum" ).append( " : " );
    _builder.encodeUInt( (int)msg.getMsgSeqNum() );
    if ( _debug ) _dump.append( "\nHook : " ).append( "senderSubID" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeUInt( _senderSubID ); // senderSubID;
    if ( _debug ) _dump.append( "\nField: " ).append( "price" ).append( " : " );
    _builder.encodeDecimal( msg.getPrice() );
    if ( _debug ) _dump.append( "\nHook : " ).append( "senderLocationID" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeLong( _locationId );
    if ( _debug ) _dump.append( "\nField: " ).append( "clOrdId" ).append( " : " );
    _builder.encodeStringAsLong( msg.getClOrdId() );
    if ( _debug ) _dump.append( "\nField: " ).append( "orderQty" ).append( " : " );
    _builder.encodeQty( (int)msg.getOrderQty() );
    if ( _debug ) _dump.append( "\nField: " ).append( "filler1c" ).append( " : " );
    _builder.encodeFiller( 4 );
    if ( _debug ) _dump.append( "\nHook : " ).append( "simpleSecurityID" ).append( " : " ).append( "encode" ).append( " : " );
    encodeSimpleSecurityId( msg.getInstrument() );
    if ( _debug ) _dump.append( "\nHook : " ).append( "accountType" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeByte( (byte)20 );
    if ( _debug ) _dump.append( "\nField: " ).append( "side" ).append( " : " );
    _builder.encodeByte( transformSide( msg.getSide() ) );
    if ( _debug ) _dump.append( "\nHook : " ).append( "priceValidityCheckType" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeByte( (byte)0 );
    if ( _debug ) _dump.append( "\nField: " ).append( "timeInForce" ).append( " : " );
    final TimeInForce tTimeInForceBase = msg.getTimeInForce();
    final byte tTimeInForce = ( tTimeInForceBase == null ) ?  DEFAULT_TimeInForce : transformTimeInForce( tTimeInForceBase );
    _builder.encodeByte( tTimeInForce );
    if ( _debug ) _dump.append( "\nHook : " ).append( "execInst" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeByte( (byte)2 );
    if ( _debug ) _dump.append( "\nHook : " ).append( "uniqueClientCode" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeStringFixedWidth( _uniqueClientCode, 12 );
    if ( _debug ) _dump.append( "\nField: " ).append( "filler3" ).append( " : " );
    _builder.encodeFiller( 3 );
    _builder.end();
}


Sample debug log output … essential for debugging the model when testing exchange connectivity.

{  ENCODE msgType=10125  encodeMap=NewOrderRequestSimple  eventType=NewOrderSingle :
Field: msgSeqNum : uint 10,  bytes=4, offset=16, raw=[ 0A 00 00 00 ]
Hook : senderSubID : encode : uint 810701002,  bytes=4, offset=20, raw=[ CA 50 52 30 ]
Field: price : decimal 62.9,  bytes=8, offset=24, raw=[ 80 C8 E9 76 01 00 00 00 ]
Hook : senderLocationID : encode : long 1234567890123456,  bytes=8, offset=32, raw=[ C0 BA 8A 3C D5 62 04 00 ]
Field: clOrdId : stringAsLong 38470008  (len=8),  bytes=8, offset=40, raw=[ 78 01 4B 02 00 00 00 00 ]
Field: orderQty : qty 10,  bytes=4, offset=48, raw=[ 0A 00 00 00 ]
Field: filler1c : filler  len=4,  bytes=4, offset=52, raw=[ 00 00 00 00 ]
Hook : simpleSecurityID : encode : int 1000627,  bytes=4, offset=56, raw=[ B3 44 0F 00 ]
Hook : accountType : encode : byte ^T,  bytes=1, offset=60, raw=[ 14 ]
Field: side : byte ^A,  bytes=1, offset=61, raw=[ 01 ]
Hook : priceValidityCheckType : encode : byte ^@,  bytes=1, offset=62, raw=[ 00 ]
Field: timeInForce : byte ^@,  bytes=1, offset=63, raw=[ 00 ]
Hook : execInst : encode : byte ^B,  bytes=1, offset=64, raw=[ 02 ]
Hook : uniqueClientCode : encode : stringFixedWidth OWN  (len=12),  bytes=12, offset=65, raw=[ 4F 57 4E 00 00 00 00 00 00 00 00 00 ]
Field: filler3 : filler  len=3,  bytes=3, offset=77, raw=[ 00 00 00 ]
} bytes=80

16:15:12.445 [info]
 OUT [exchangeSession1]:
0000 P....'...............PR0...v.......<.b..x.K.......
0050 .......D.......OWN............
0100
     1        10        20        30        40        50
 OUT [exchangeSession1]:
0000 50 00 00 00 8D 27 00 00 00 00 00 00 00 00 00 00 0A 00 00 00 CA 50 52 30 80 C8 E9 76 01 00 00 00 C0 BA 8A 3C D5 62 04 00 78 01 4B 02 00 00 00 00 0A 00
0050 00 00 00 00 00 00 B3 44 0F 00 14 01 00 00 02 4F 57 4E 00 00 00 00 00 00 00 00 00 00 00 00
0100
     1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
byteCount=80

16:15:12.445 [info]   OUT [exchangeSession1]: NewOrderSingleImpl , clOrdId=38470008, account=, text=, exDest=, securityExchange=, price=62.9, orderQty=10, execInst=null, handlInst=null, orderCapacity=null, ordType=Limit, securityType=, securityIDSource=, timeInForce=Day, bookingType=null, orderReceived=[null], orderSent=[null], instrument=1000627, client=, origClOrdId=, securityId=, symbol=, currency=, transactTime=[null], sendingTime=[null], side=Buy, srcLinkId=, onBehalfOfId=, msgSeqNum=10, possDupFlag=N



Because the code is generated for both encoding and decoding, then the exchange simulator can simulate any exchange in the model.

I wrote the code generator from scratch in a couple of weeks, its completely custom and not general purpose.

The resulting internal POJO's interfaces, and CODEC's for external to internal translation result in over 100,000 lines of generated high quality code.



SubMicroTrading (SMT) Overview



I will relate a lot to the SubMicroTrading code base so an overview will help.

If enough people start following this blog I will share the "secrets" to building ultra low latency systems and start open sourcing code including the SubMicroFix engine ... possibly the fastest fix engine on the planet !

SubMicroTrading has been in development for over 4 years but is now on hold. It has been connected for testing to Eurex and CME and BSE but its not in production anywhere. Put that down to me being a technologist and a lousy marketer in what is after all a fairly small and difficult to approach market.


To summarise it’s a scalable, component based ultra low latency trading framework. It has Order Management, Market Data Sessions, Trading Sessions and all you need to build ultra low latency trading systems.

  1. AntiSpring . . . application bootstrapper, property file based, GC free, interceptor/proxy free, simple zero runtime overhead, component loader supporting dependency injection.
  2. Async Flexible Logger (copes with massive activity bursts 100,000+ log events per second mitigating GC).
  1. MemoryMapped Persistence with ISAM, with jitter avoidance when switching pages (can also free pages without GC finalisation) ... used for recoverable indexed session persistence.
  1. Low level utilities and GC free collections e.g. Int2IntMap, pooling pattern, java thread affinity, spinning thread multiplexors.
  1. SubMicroFix (sub microsecond fix engine, includes session state management, JMX admin, generated encoders/decoders). Tested against CME.
  1. Native binary trading session support (sub microsecond) for ETI (Eurex/BSE), FIX, Millennium, UTP.
  1. Native Market Data session support for FastFix tested with CME and BSE pcaps, SBE, ITCH.
  1. Session Drop copy support.
  1. Order Management System (sub-microsecond, support for in-line client limits, trade cancel/corrections and exchange normalisation .. excludes non latency sensitive trading functionality such as GTD orders).
  1. Basic Exchange simulator (can simulate any exchange in the model).
  1. Scalable Market Data Controller and Book conflation .. strategy snaps book to avoid mistrading due to concurrent book update during strategy cycle.
  1. Trading GUI shows price blotter, market orders, executions and supports multiple on-screen full depth books.
  2. Core strategy framework, include CME session loader to generate CME sessions on the fly as required by strats.
  1. Core strategy extensions with Spread Arbitrage Example and useful pipeline scalability example.
  1. Code Generator

Book Management - for an algo trading system this is the most critical component. The Book needs to asynchronously allow updates while minimising potential for blockage by strats using the book.
SMT uses conflation to prevent strategies having to process old ticks. When a strategy is processing a tick update it snaps the book taking a local copy (SMT optimizes this so a snap is shareable with no contention) to allow strategy to process the book atomically. After all if a strategy is acting on a BBO where the ask changes after it has just read the bid is a high level risk.

SubMicroTrading can process over 800,000 market data tick updates per core per second. Can decode a fast fix message in nanoseconds. Update a book in nanoseconds and encode an order to market in nanoseconds.

The SubMicroTrading system (with Intel servers from 2013) has been measured in a controlled lab with TipOff, wire to wire for "tick to trade" with average of 4 microseconds at a rate of 800,000 ticks per second (that includes UDP tick hop thru Solarflare on OpenOnload and TCP order hop total overhead of over 2+ micros).

Redline, Chronicle, Onyx, FixNetix all have part of the above functionality. They are all used currently in production systems and have full time support. SubMicroTrading has not been proven in production, but it is provided with exchange simulator and its performance can easily be proven. Dont forget also that the SOURCE CODE is available. Many of the techniques I used have been proven in production eg object pooling, CPU spinning etc. As for which is quickest ... that I would expect to depend on the trading scenario ... peak tick rates, number of market data sessions, number of actively ticking books, number of orders generated per second, number of concurrent strategies triggered per tick.



Sunday 19 April 2015

Which Language is Best for Ultra Low Latency

A highly subjective and emotive subject. For me its Java.

A C++ expert that is not also an expert in Java is probably because they don’t believe in Java or have just played with it for a year or two. They simply don’t have the experience to compare fairly. I know because that used to be me !

"just because you don’t know how, doesn’t mean its not possible"

A Java program can be 100 times faster than a functionally similar C++ program … fact.
The key is not the language, C, C++ and Java all end up as machine code. The biggest factor in performance is the application design and in particular its threading model.

Note many patterns for low latency (core affinity, pooling etc) work just as well in Java as they do in C++

I have been asked by many why bother with Java why not just use C++ ?  For the same reason we don’t use assembler anymore. The compilers are highly optimised and use of Java will minimise production downtime and maximise productivity while reducing project costs and avoid pain of trying to hire highly specialised and expensive expert devs / hardware experts.

Productivity is much higher with Java, it has much better tooling than C++.  Java systems excel with ease of maintenance, productive graduates, simpler code.

The argument against Java always comes back to Jitter from GC. To be honest that’s simple to avoid with object pooling. The really painful grief comes from the JIT compiler. Its really hard to write good warmup code that exercises enough of the code base to get best optimised code.

On C++ and avoiding virtual methods. I have seen low latency C++ systems which avoid any use of virtual methods to avoid vtable lookup. A lot of the code ends up looking like C at which point I wonder why bother. Object Orientation is key in trading systems to avoid code bloat, maximise code reuse and significantly reduce development time while increasing quality. One nice feature in Java is the ability at runtime to determine that a method is not extended and for virtual methods to be inlined. Ofcause if you are messing about with dynamic class generation or custom class loaders you may well end up with recompilation jitter (honestly don’t do it -> KISS)

KISS - thanks to David Straker for the best IT lesson I ever had. Back in 1988 in my first developer role, I wrote a really complex C installation program for HPWord. Its was damn complex and I thought it was clever, David kicked it back into touch and told me 'Keep It Simple Stupid'. Many years on and I can honestly say that’s the most important principle I learnt in software development. If someone cant understand the code they cant maintain it.

In my experience java systems are built in half the time, cost a third as much and are much more stable in production. That said, ultra low latency requires object reuse which in itself brings risk regardless of language. I developed SubMicroTrading on a dual core Intel 1.6Ghz, 4GB RAM,  11" Dell Adamo …. On that I could run exchange simulator, client simulator and the complete SMT Trading System !  It wouldn’t of been possible with C++. The dev cycle, code + compile + run + rinse repeat would have been factorially slower.

On other languages, SCALA is a GC sink and while functional code is great for prototyping strategies its implicit nature generates a lot of pretty inefficient code. The argument that the same code in SCALA compiles to the save code in Java misses the point. In my experience the algo written in SCALA can be much more efficently rewritten in Java. SCALA is not suitable for ultra low latency in my opinion.

On GO, I havent used it but was asked recently about using it, honestly my reply of "why" … whats it going to give me that I don’t have with Java with regard to ultra low latency. If I didn’t have over 4 years of IP in SubMicroTrading I would consider it …. But it’s a risk, maybe its tomorrows C# ?

FPGA Hybrid Systems … personally over the years I have seen various claims and promises which didn’t materialise (details of which I cant divulge due to the NDA's banks signed). Beware the smoke and mirrors, understand the true cost of latency in the hybrid system with wire to wire independent metrics for key use cases. Also ensure you understand the edge risks, eg hidden latency, partial TCP stacks, FPGA to main memory latency, and for the strategy atomic book reading and skipping intermediate book updates.

Please do not poke me about C++ or FPGA, I quite frankly am tired of all the arguing and hype. If you want to use C++ / C / FPGA or whatever then that’s your choice I hope you enjoy. 

What is Ultra Low Latency


There is no single answer, and it changes over time. In 1997  a single trade in 1second was normal. 1 trade in 100ms was fast.

Around seven years ago low latency was considered sub millisecond. At that time I was asked if I could design and build a trading system that was the fastest on the street which was 500 microseconds. I said yes. When asked why I would succeed when others had failed, I replied that my background in compilers and real time mixed with IB experience meant I knew what I was doing and thus HotRod was born.

HotRod was a huge success but when asked if I could get it down to 20 micros for FX I said it would require a complete rewrite. You cant take a 10millisecond system and profile / change it to 100 microseconds. And you cant take a 100 microsecond system and profile/improve down to under 10 micros which is what I currently consider ultra low latency. I quit my job and started to write SubMicrotrading framework from scratch. Every line of code has been written with latency in mind.

Low latency timings cannot be considered in isolation, you need to know the scoping of the statistics, what is the use case (eg wire to wire tick to trade) what is the throughput ?  What is the min/max/ave event rate etc.





About Me


Taught myself BASIC at age 12 on a Commodore PET, at 14 I was writing computer games using 6502 Assembler on an Atari 400 (that was much fun, alas nothing published).

Worked commercially in compilers, O/S drivers, real time control systems, distributed databases as well as telecoms. Worked for 3 years in Japan …. Gomenasai, my Japanese is very rusty and poor …. I blame my wife as every time I spoke Japanese to her she replied in English !

Worked in Investment Banking since 1997 at Goldman Sachs, Morgan Stanley, Bank of America, UBS and others. Managed, designed and built various large trading frameworks including Hotrod and SubMicroTrading (written solely by me from scratch with no third party runtime libs other than hwloc).

Specialist in ultra low latency and trading systems.

Regret is being lousy at marketting and never using SubMicroTrading commercially.


Blog List


This blog is about the journey of designing and developing SubMicroTrading (TM) an ultra low latency trading system over the last 5 years. As I cover certain topics (from SuperPools to Book Conflation to Strategy Container) I will make some sections of the code freely available.

The goal of this technical blog is to raise awareness of the potential for Java in implementing Ultra Low Latency Systems and share the techniques I have used.

Please note that work on SubMicroTrading has halted. I have a full time job in an IB (alas not in ultra low latency) and after more than 4 years have no more spare time for it.

Please follow the blog, If I get enough interest I will start open sourcing components including SubMicroFix a nanosecond level fix engine in java. Possibly the fastest fix engine in the world !

Blog List (from latest to oldest, click link to view)

Avoid Unnecessary Allocations and Memcpy's

SubMicroTrading Open Sourced on GIT

SubMicroTrading Ultra Low Latency Open Source Prep

Setting Thread Affinity and Priority from Java

Coding For Ultra Low Latency

Java Bytecode Latency Impact

Java JVM Tuning for Ultra Low Latency

Hardware and Linux Tuning for Ultra Low Latency

Measuring Latency in Ultra Low Latency Systems

Recommendations for Ultra Low Latency

Holistic Latency For Ultra Low Latency

Models & Generating Codecs






Future Blog Subjects (among others)

Application Design Techniques
Avoid GC with SuperPools 
ThreadLocal cost and Alternative
Threading Model in Ultra Low Latency System
Avoiding GC in Java
Utilising Unsafe Class
Java Class Field Offsets
Disruptor Anti-Pattern
Object Models in Ultra Low Latency systems
AntiSpring
Anti Reuse Pattern Immutable
Working Set Size
Run 24 * 7
Collection Sizing
SuperPools
Thread Multiplexing
Custom NIO : bypassing lock overhead and exception stack prep
Custom Maps with Reusable Nodes
SME Persistence vs Chronicle
Dates
Book Conflation & Atomic Book Reads (snapping)
Scaling Dynamic Market Data Sessions with CME
Strategy Container
T1 Strategy
Example Spread Strategy
JMX Admin Commands
High Availability & Resiliency
What Would I do Differently Now