Ultra Low Latency Trading Systems

Monday, 4 May 2015

Linux Tuning for Ultra Low Latency

These are some of my notes from BIOS and Linux tuning for Ultra Low Latency with SubMicroTrading.

Don’t blindly copy ANY settings here, test each one for impact and pick the values that suit your system. I include them as possible points of interest.

I have spent many weeks simply doing this, its tedious but necessary to determine best settings for your system.

BIOS settings

Research every option, remember try to change one item at a time and run benchtest to ascertain impact

Disable hyper-threading

Disable turbo mode if overclocking

Disable all options related to power saving (eg CPU C State support)

Set SATA Configuration to ENHANCED

ACPI Power Management Features : APIC ACPI SCI IRQ and High Precision Timer ENABLED

If using PCI3 NetworkCard the slot with the card HAS PCI3 enabled .. default may be PCI2

Operating System Tuning for Ultra Low Latency

Its over 20 years since I worked at a low level with Unix/Linux and the truth is I have forgotten more about the kernal than I now know. I am NOT an O/S specialist. For SubMicroTrading I didn’t have the luxury to pay someone to configure linux for me so I had to do it myself. I still have my trusty Stevens UNIX Network Programming book which helped and ofcause today we have Google ! David Riddoch from Solarflare was also helpful at answering questions regarding Solarflare and OpenOnload tuning.

I started with Redhat and dismissed the Realtime variant as it was slower for my benchtest, I currently recommend CentOS 5.10 (I am somewhat behind the later versions but honestly what do they have that helps with low latency ?).

Read the Solarflare optimisation document, only wish it had existed when I started ! INSTALL OpenOnload !! I don’t understand why people working in microsecond level latency still don’t use kernal bypass ! OpenOnload is great as its non intrusive and requires ZERO application code changes.

INSTALL SOLARFLARE (requires linux install to have dev env) … note these notes are OLD so versions will be well out of date

copy the SolarFlare drivers to /INSTALL/SOLARFLARE

s1) rpmbuild --rebuild /INSTALL/SOLARFLARE/sfc-3.0.6.2199-1.src.rpm

==> CREATES /usr/src/redhat/RPMS/x86_64/kernel-module-sfc-RHEL5-2.6.18-194.el5-3.0.6.2199-1.x86_64.rpm

s2) Install the RPM

rpm -ivh /usr/src/redhat/RPMS/x86_64/kernel-module-sfc-RHEL5-2.6.18-194.el5-3.0.6.2199-1.x86_64.rpm

==> eth2 and eth3 are now available

==> use rpm -e if old version around

s3) install OpenOnLoad

tar -xvf openonload-20100923.tar

./scripts/onload_install

modprobe -r sfc

modprobe sfc

openonload now ready to use

s4) install BIOS update tools

gunzip SF-104451-LS-4_Solarstorm_Linux_and_VMware_ESX_Utilities_RPM.tgz

tar -xvf SF-104451-LS-4_Solarstorm_Linux_and_VMware_ESX_Utilities_RPM.tar

==> creates ==> sfutils-3.0.8.2216-1.rpm

rpm -ivh /INSTALL/SOLARFLARE/sfutils-3.0.8.2216-1.rpm

(if get clash with previous version use rpm -e {oldRpm}

s5) check is BIOS update is required and update

sfupdate

sfupdate --write

onload_tool disable_cstates persist

s6) setup and env var with your required profile eg latency :-

export PRERUN="onload --profile=latency "

s7) simply add $PRERUN to the start of your application invocation command

$PRERUN java …….

Beware O/S upgrades, Linux 6 has some extra horrid latency which after several days tweaking I still hadnt eradicated … I went back to Centos 5.10

Here are my notes from CentOS / RedHat installation

Deselect virtualisation

Disable firewall

Disable SELinux

Delete SWAP partition (you don’t want swapping so ensure you have enough memory!)

Obviously disabling SELinux and firewall is for benchtesting in controlled environment. For colocation running you need determine an appropriate security level for your org. If you must have a firewall between you and the exchange then use a hardware one. Benchtest without security then with security on so you know the impact.

Protect cores against unwanted intrusion (will be discussed when I blog on using thread affinity)

Avoid millisecond latency impact by using discrete threading model with core affinity via kernal param isolcpus. The O/S wont share these cores via the scheduler so you will need to use thread affinity to bind threads to the protected cores (code to follow in later blog).

Edit the /boot/grub/grub.conf

kernel /vmlinuz-2.6.18-194.el5 ro root=LABEL=RH_ROOT nohz=off isolcpus=6,7,8,9,10,11 rhgb

Kernal Params

There are many ! Here are some to look at :-

transparent_hugepages=never

intel_idle.max_cstate=0

nohz=off

nosoftlockup

idle=poll

Disable unwanted services

/sbin/chkconfig --list | grep "5:on"

chkconfig irqbalance off

chkconfig anacron off

chkconfig atd off

chkconfig avahi-daemon off

chkconfig bluetooth off

chkconfig cups off

chkconfig hidd off

chkconfig isdn off

chkconfig pand off

chkconfig rhnsd off

chkconfig sendmail off

chkconfig cpuspeed off

chkconfig NetworkManager off

chkconfig iptables off

chkconfig ip6tables off

chkconfig libvirt-guests off

These are the services I disabled obviously you need to ensure you don’t need a service before you disabled it.

IRQBalance and CpuSpeed were the main services that I wished to disable … at the risk of sounding like a broken record disable single service, bench test rinse, repeat. Don’t disable ANY service without checking if YOU need it first !

System Scripts : rc.local

Edit /etc/rc.local

ethtool -C eth2 adaptive-rx off

ethtool -C eth2 rx-usecs 0 rx-frames 0 rx-usecs-high 0 rx-usecs-low 0 pkt-rate-low 0 pkt-rate-high 0

ethtool -C eth2 rx-usecs-irq 60

ethtool -A eth2 rx off tx off

ethtool -C eth3 adaptive-rx off

ethtool -C eth3 rx-usecs 0 rx-frames 0 rx-usecs-high 0 rx-usecs-low 0 pkt-rate-low 0 pkt-rate-high 0

ethtool -C eth2 rx-usecs-irq 60

ethtool -A eth3 rx off tx off

echo 0 > /sys/class/net/eth2/device/lro

echo 0 > /sys/class/net/eth3/device/lro

Only use rx irq 60 IF using openOnload …. You can experiment with this setting, I use spin reading so should never require an IRQ … but I found if I set lower or higher I could get nasty jitter.

System Scripts : sysctl.conf

Edit /etc/sysctl.conf

kernel.sysrq = 0

kernel.core_uses_pid = 1

net.ipv4.tcp_low_latency=1

# Controls the maximum size of a message, in bytes

kernel.msgmnb = 65536

# Controls the default maxmimum size of a mesage queue

kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes

kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages

kernel.shmall = 4294967296

kernal.isolcpus=6,7,8,9,10,11

kernel.vsyscall64 = 2

Some other useful CentOS "stuff"

Set Run Level

change /etc/inittab default run lvl from 5 to 3

id:3:initdefault

Network config

/etc/sysconfig/network-scripts/ifcfg-eth*

After changing run

/etc/init.d/network restart

Label Root Partition

Label the root partition eg to RH_ROOT … so its not confused with any other later O/S installs

SET DEVICE LABEL READY FOR GRUB (** DONT FORGET UPDATE fstab OR IT WONT BOOT **)

e2label /dev/sda7 RH_ROOT

edit /etc/fstab

LABEL=RH_ROOT / ext3 defaults 1 1

Edit the /boot/grub/grub.conf

kernel /vmlinuz-2.6.18-194.el5 ro root=LABEL=RH_ROOT nohz=off isolcpus=6,7,8,9,10,11 rhgb

Check Kernal CPU Params

cat /sys/devices/system/cpu/cpuidle/current_driver

intel_idle.max_cstate=0 idle=poll transparent_hugepage=never processor.max_cstate=0

Click here for my list of Ultra Low Latency Blogs and Future Topics

Sunday, 3 May 2015

Measuring Latency in Ultra Low Latency Systems

For years people have said optimise last, don’t worry .. You can fix later.

For ultra low latency its simply not true. Every line of code should be written with optimisation in mind. When writing SubMicroTrading I wrote micro benchmarks for comparing performance on a micro level. I tested pooling patterns, different queue implementations, impact of inheritance and generics. Some of the results were surprising. The one which stands out as misleading was the benchtest of Generic pools versus concrete pools. Theoretically generics should have no runtime impact, but on a PC I had a micro benchmark which showed there were some latency spikes. Because of this I made my code generator generate a discrete pool and recycle class. Ofcause when I reran the benchmarks on linux with tuned JVM parameters there was zero difference !

Absolutely essential is understand Key use cases and have a controlled benchtest setup which you can guarantee is not in use when testing.

Colocation OMS

The first use case for SubMicroTrading was as a colocation normalising order management with client risk checks. The OMS received client orders, validated them, ran risk checks then sent a market order to the exchange. The OMS would normalise all exchange eccentricities and allow for easy client customisation. This was before IB clients were allowed sponsored access. With sponsored access the need for ultra low latency order management systems disappeared … bad timing on my part.

With this use case I ran a client simulator and an exchange simulator on the sim server, and the OMS on the trade server. The client simulator stamped a nanosecond timestamp on the output client order. The OMS when it generated a market order would stamp the order with total time within the OMS as well as the original client order timestamp from the client simulator. The exchange simulator and client simulator both use thread affinity to the same core, and thus the nano second timestamp can be used to determine realistic end to end time. You can guestimate the time in the NIC and network by :-

TimeInNICsAndNetwork = ( timeMarketOrderRecieved - clientOrderTimeStamp ) - timeInOMS

There were 4 NIC TCP hops so roughly a

(Rough) NIC TCP hop = TimeInNICsAndNetwork / 4

Because the timestamps are generated on one host (even 1 core or 1 cpu) then you don’t need expensive PTP time synchronisation. Using System.nanoTime to measure latency is not recommended, it uses the HPET timer which shows some aberations of many milliseconds on long test runs. RDTSC seems to be more accurate but can have problems measuring across cores (I will include the code in a later blog on JNI).

To be honest however at this level all we are really concerned about is having a repeatable testbed. By changing some parameter, whats the impact ? Run the benchtest several times and measure the delta. Now we can see if we make things better or worse. Note now Solarflare have added packet capture facility I would hope its possible to use capture of input/output would allow accurate perf measurements.

Algo Trading System : Tick To Trade

The second use case was for Tick to Trade. Here I wrote an algo container, optimal book manager and market data session handling using the SMT framework.

NIC1 was used for market data, and NIC2 for the trade connection.

In this use case, the simulation server replays captured market data using tcpreplay at various rates (from x1 to flat out which was around 800,000pps).

The trade server gets the market data thru NIC1 into the market data session(s). CME has so many sessions that I wrote a dynamic session generator to lazily create sessions based on subscribed contracts. The market data is consumed by the algos which can then send order to exchange session over NIC2. The order is send with a marker field to identify the tick that generated it. This allows accurate correlation of the tick which really generated the order.

Hybrid FPGA, market data providers, exchange adapters all need to be considered holistically within the trading system as a whole for the KEY use cases of the system. Benchtest them not at the rate of a passive day but at the maximum rate you may need … consider the highest market spike rates and then try normal market rate x1, x5, x10, x100 to understand different performance levels.

For really accurate figures we need to use NIC splitters on the trade server and capture input/output from both NICS. I have done this most recently with a TipOFF device (with Grandmaster clock and solarflare PTP 10GE NICS which have CPU independent host clock adjustment). Here you measure reasonably accurate wire to wire latency. This is best way to compare tick to trade performance for a system as a whole and thus bypass the smoke and mirrors from component service providers.

I have seen people working on ultra low latency FX systems state the P99.9999 is most important because the arb opportuntities are 1 in 1,000,000 and they believe that opportunity at the exchange will occur at the time the trading system has any jitter … ofcause it was completely uncorroborated and personally I think it was tosh. Arb ops are likely to happen during peak loads so its critical that system has throughput that can cater with max load without say causing hidden packet backup in network buffers (beware dodgy TCP partial stack implementations that are likely to fail during network congestion / peaks).

Key measurements in my mind are the P50, P90, P95, P99 … with P95 being the most important. I am sure there will be plenty of people who disagree. But at the end of the day the only really important stat is how much money you make !!

Note ensure that any monitoring processes on trade servers are lightweight and do NO GC and not impact trading systems performance. Run the testbed with and without for few days to see impact.

Click here for my list of Ultra Low Latency Blogs and Future Topics

Saturday, 2 May 2015

Recommendations for Ultra Low Latency (ULL)

OS -> Centos 5.10

Use Redhat if you can afford it .. Don’t get Realtime its slower but more deterministic with more fine grained scheduling … not best for ULL

Avoid O/S upgrades unless you NEED it, … the devs are constantly adding new power saving tweaks which introduce jitter …. RedHat 5 to RedHat 6 was horrendous. I spent a week trying to undo all the new switches they had put on and failed …. Given I had no need for V6 I went back to V5.10.

Don’t just turn on "Huge Pages" or other O/S or language settings. Test them in a test bed first … you may well be surprised, key is the affect of the change on the system as a whole. Every tweak has plus and negatives and they are different per system … so test measure, rinse, repeat.

Hardware -> Fastest Intel CPU with lowest latency RAM

Using overclocked CPU's with non ECC memory running 24*7 brings risk of crashing, if however the extra 10% to 30% performance boost is the difference between making money and not making money then a certain level of risk will be acceptable. I have run overclocked X5680's with ECC RAM and i7's at 5Ghz with overclocked memory for weeks under load without crashing so it is possible to achieve stable overclocking.

Solarflare NIC with Open Onload

My first NIC's were top Mellanox card in 2010, installation was aweful, performance was terrible at high throughput. Support was not great, also the one sided TCP acceleration was useless for colocation trading. I signed an NDA so wont say more but after two months of pain I switched to Chelsio. Chelsio was just as bad to install and performance even worse for a top 10GB NIC very disappointing.

I got my first Solarflare card in 2010, installation was a breeze and the cards outperformed Chelsio and Mellanox with no tuning. With OpenOnload and the simplest tuning parameter ever (--profile=latency) they blew Mellanox completely away. My advice is ignore all the perf stats the NIC providers say and test your self in controlled environment with two servers having dual NICs connected directly together (no switch or anything else in the way).

Language -> Java 1.8

I use Sun … er sorry, Oracle standard Java 1.8 (don’t use Realtime java) …. No real perf difference between 1.6 and 1.8 for SMT.

There will future blogs on JVM args and another on application threading models and another on API design and latency impact.

Tools / Third Party Libs

In world of ULL I avoid third party libs due to lack of control over GC, threading model and JIT jitter. That said "hwloc" has been invaluable for its abstraction layer to core binding.

A future blog will show how to do thread affinity in Java.

hwloc, hwinfo, i7z

Click here for my list of Ultra Low Latency Blogs and Future Topics

Sunday, 26 April 2015

Holistic Latency for Ultra Low Latency Systems

Definition of holistic

"characterized by the belief that the parts of something are intimately interconnected and explicable only by reference to the whole."

Its crucial to understand all the components that make up a trading system as each will have its own latency characteristics.

Layer	Sample Factor	Mitigation	Effort
Software Program	Design	Key design patterns / extensible framework / Efficient code	Hard
Language	Inherent GC JIT	Compile time args / run time args Object Pooling Warmup Code	Easy Medium High
Operating System	Scheduler Hard Page Fault Network Stack	Thread Affinity / Core Spinning Appropriate memory in server, Process sizing Kernal Bypass Drivers and Tuning, Socket spinning	Easy Easy Easy
Hardware	CPU Memory Network	Disable H/T, get fastest CPU, Overclock Buy memory with lowest latency and ensure enough Buy Solarflare NIC	Easy Easy Easy

I have built four rack servers with a box full Chelsio, Mellanox and Solarflare NIC's. By far the easiest to install and easiest to tune and best performing was Solarflare. Really disappointed in the Mellanox cards. Solarflare open onload provides one sided acceleration suitable for colocation purposes and at no extra cost. This was several years ago so maybe Mellanox have their own one sided acceleration now but for me its come too late. I have preached Solarflare NIC's to everyone I know.

Consider following scenario

Monday, 20 April 2015

MODELs & Generated Code

Hand cranked codecs are a pain to write and a pain to upgrade, I have written many over the years ! The solution is to generate the code.

An XML model defines the internal model with POJO's that all components can work with eg NewOrderSingle, NewOrderAck, TradeNew. It also defines external models which can be client or exchange, FIX variants or binary protocols such as ETS, Millenium, UTP etc. Finally it defines codecs which specify how to translate external model to/from internal model.

Sample Internal Event for a New Order Single

<Base
id="BaseOrderRequest" src="client"
extends="CommonClientHeader">

  <Attribute
typeId="Instrument"                             
name="instrument"  
mandatory="Y"    
outbound="delegate"/>

  <Attribute
typeId="ClientProfile"                          
name="client"      
mandatory="Y"    
outbound="delegate"/>

  <Attribute
typeId="viewstring[CLORDID_LENGTH]"   
tag="11" 
name="clOrdId"     
mandatory="Y"    
outbound="delegate"/>

  <Attribute
typeId="viewstring[CLORDID_LENGTH]"   
tag="41" 
name="origClOrdId" 
mandatory="Y"    
outbound="delegate"/>

  <Attribute
typeId="viewstring[SECURITYID_LENGTH]" tag="48"  name="securityId"   mandatory="Y"     outbound="delegate"/>

  <Attribute
typeId="viewstring[SYMBOL_LENGTH]"    
tag="55" 
name="symbol"      
mandatory="Y"    
outbound="delegate"/>

  <Attribute
typeId="Currency"                     
tag="15" 
name="currency"    
mandatory="N"    
outbound="seperate"/>

  <Attribute
typeId="SecurityIDSource"             
tag="22"                     
mandatory="Y"    
outbound="delegate"/>

  <Attribute
typeId="UTCTimestamp"                 
tag="60" 
name="transactTime" mandatory="Y"     outbound="delegate"/>

  <Attribute
typeId="UTCTimestamp"                 
tag="52" 
name="sendingTime"                   
outbound="seperate"/>

  <Attribute typeId="Side"                         
tag="54"                     
mandatory="Y"    
outbound="delegate"/>

  <Attribute
typeId="viewstring[SRC_LINKID_LENGTH]"           name="srcLinkId"    mandatory="N"     outbound="delegate"/>

</Base>

<Base
id="OrderRequest" src="client"
extends="BaseOrderRequest">

  <Attribute
typeId="viewstring[ACCOUNT_LENGTH]"        tag="1"   name="account"          mandatory="N"
outbound="delegate"/>

  <Attribute
typeId="viewstring[TEXT_LENGTH]"           tag="58"  name="text"             mandatory="N"
outbound="delegate"/>

  <Attribute
typeId="viewstring[EXDESTINATION_LENGTH]"  tag="100"
name="exDest"          
mandatory="N" outbound="delegate"/>

  <Attribute
typeId="viewstring[SECURITYEXCH_LENGTH]"   tag="207"
name="securityExchange" mandatory="N"
outbound="delegate"/>

  <Attribute typeId="double"                        tag="44"  name="price"      mandatory="Y"     outbound="seperate"/>

  <Attribute typeId="int"                          
tag="38" 
name="orderQty"  
mandatory="Y"    
outbound="seperate"/>

  <Attribute
typeId="ExecInst"                     
tag="18"                                     
outbound="delegate"/>

  <Attribute
typeId="HandlInst"                    
tag="21"                                     
outbound="delegate"/>

  <Attribute
typeId="OrderCapacity"                
tag="528"                                    
outbound="seperate"/>

  <Attribute typeId="OrdType"                       tag="40"                    mandatory="Y"     outbound="delegate"/>

  <Attribute
typeId="SecurityType"                 
tag="167"                                    
outbound="delegate"/>

  <Attribute
typeId="SecurityIDSource"             
tag="22"                                     
outbound="delegate"/>

  <Attribute
typeId="TimeInForce"                  
tag="59"                                     
outbound="delegate"/>

  <Attribute
typeId="BookingType"                  
tag="775"                                    
outbound="delegate"/>

  <Attribute typeId="long"                                   
name="orderReceived" mandatory="Y"  outbound="delegate"/>

  <Attribute typeId="long"                                   
name="orderSent"    
mandatory="Y" 
outbound="delegateGetAndSet"/>

</Base>

<Event
id="NewOrderSingle" extends="OrderRequest"
src="client">

  <Attribute
typeId="viewstring[CLORDID_LENGTH]"   
tag="11" 
name="clOrdId"    
mandatory="Y"    
outbound="seperate"/>

</Event>

Sample external definition for a New Order in ETI … note each field must have a dictionary entry which defines its type in the external model.

<Message
id="NewOrderRequestSimple"           
msgType="10125">

    <Field id="msgSeqNum"               mand="Y"/>

    <Field id="senderSubID"             mand="Y"/>               

    <Field id="price"                   mand="Y"/>

    <Field
id="senderLocationID"       
mand="N"/>

    <Field id="clOrdId"                 mand="Y"/>

    <Field id="orderQty"                mand="Y"/>

    <Field id="filler1c"                len="4"/>   <!-- maxShow tag210 -->

    <Field
id="simpleSecurityID"       
mand="Y"/>

    <Field id="accountType"             mand="N"/>

    <Field id="side"                    mand="Y"/>

    <Field
id="priceValidityCheckType" 
mand="Y"/>

    <Field id="timeInForce"             mand="Y"/>

    <Field id="execInst"                mand="Y"/>

    <Field
id="uniqueClientCode"       
mand="N"/>

    <Field id="filler3"                 len="3"
comment="pad3"/>      

</Message>

Sample CODEC for a New Order in BSE ETI

<MessageMap
id="BaseBSEOrder" messageId="" ignore="true"
extends="BaseRequest">

    <Map
field="marketSegmentID"><Hook type="encode"
code="encodeMarketSegmentID( msg.getInstrument() )"/></Map>

    <Map eventAttr="securityId"
field="simpleSecurityID">

        <Hook type="encode"
code="encodeSimpleSecurityId( msg.getInstrument() )"/>

        <Hook type="decode"
code="_securityId = _builder.decodeUInt()"/>

    </Map>

    <Map
field="priceValidityCheckType"><Hook type="encode"
code="_builder.encodeByte( (byte)0 )"/></Map>

    <Map
field="accountType"><Hook type="encode"
code="_builder.encodeByte( (byte)20 )"/></Map>

    <Map
field="maxPricePercentage"><Hook type="encode"
code="_builder.encodePrice( 0.5 )"/></Map>

    <Map
field="senderLocationID"><Hook type="encode"
code="_builder.encodeLong( _locationId )"/></Map>

    <Map
field="orderCapacity"><Hook type="encode"
code="_builder.encodeByte( (byte)1 )"/></Map>

    <Map
field="positionEffect"><Hook type="encode"
code="_builder.encodeByte( (byte)'C' )"/></Map>

    <Map
field="account"><Hook type="encode"
code="_builder.encodeStringFixedWidth( _account, 2
)"/></Map>

    <Map
field="applSeqIndicator"><Hook type="encode"
code="_builder.encodeByte( (byte)0 )"/></Map>

    <Map
field="execInst"><Hook type="encode"
code="_builder.encodeByte( (byte)2 )"/></Map>

    <Map
field="uniqueClientCode">

        <Hook type="encode"
code="_builder.encodeStringFixedWidth( _uniqueClientCode, 12 )"/>

        <Hook type="decode"
code="_builder.skip( 12 )"/>

    </Map>

</MessageMap>

<MessageMap
id="NewLimitOrder"  
eventId="NewOrderSingle"
messageId="NewOrderRequestSimple" extends="BaseBSEOrder"
encodeFunc="encodeNOS">

    <Map
field="productComplex"><Hook type="encode"
code="_builder.encodeByte( (byte)1 )"/></Map>

    <Hook type="postDecode"
code="enrich( msg ) ; msg.setOrdType( OrdType.Limit )"/>

</MessageMap>

Note hooks allow overriding of the default code generation which is based on comparing the internal model dictionary entry with the external model dictionary entry. Map entries are only added for fields that don’t want the default behaviour.

Sample Generated Encoder for NOS :-

public final
void encodeNewLimitOrder( final NewOrderSingle msg ) {

    final int now = _tzCalculator.getNowUTC();

    _builder.start( MSG_NewOrderRequestSimple
);

    if ( _debug ) {

        _dump.append( "  encodeMap=" ).append(
"NewOrderRequestSimple" ).append( "  eventType=" ).append( "NewOrderSingle"
).append( " : " );

}

    if ( _debug ) _dump.append( "\nField:
" ).append( "msgSeqNum" ).append( " : " );

    _builder.encodeUInt(
(int)msg.getMsgSeqNum() );

    if ( _debug ) _dump.append( "\nHook :
" ).append( "senderSubID" ).append( " : " ).append(
"encode" ).append( " : " );

    _builder.encodeUInt( _senderSubID ); //
senderSubID;

    if ( _debug ) _dump.append( "\nField:
" ).append( "price" ).append( " : " );

    _builder.encodeDecimal( msg.getPrice() );

    if ( _debug ) _dump.append( "\nHook :
" ).append( "senderLocationID" ).append( " : "
).append( "encode" ).append( " : " );

    _builder.encodeLong( _locationId );

    if ( _debug ) _dump.append( "\nField:
" ).append( "clOrdId" ).append( " : " );

    _builder.encodeStringAsLong(
msg.getClOrdId() );

    if ( _debug ) _dump.append( "\nField:
" ).append( "orderQty" ).append( " : " );

    _builder.encodeQty( (int)msg.getOrderQty()
);

    if ( _debug ) _dump.append( "\nField:
" ).append( "filler1c" ).append( " : " );

    _builder.encodeFiller( 4 );

    if ( _debug ) _dump.append( "\nHook :
" ).append( "simpleSecurityID" ).append( " : "
).append( "encode" ).append( " : " );

    encodeSimpleSecurityId( msg.getInstrument()
);

    if ( _debug ) _dump.append( "\nHook :
" ).append( "accountType" ).append( " : " ).append(
"encode" ).append( " : " );

    _builder.encodeByte( (byte)20 );

    if ( _debug ) _dump.append( "\nField:
" ).append( "side" ).append( " : " );

    _builder.encodeByte( transformSide(
msg.getSide() ) );

    if ( _debug ) _dump.append( "\nHook :
" ).append( "priceValidityCheckType" ).append( " : "
).append( "encode" ).append( " : " );

    _builder.encodeByte( (byte)0 );

    if ( _debug ) _dump.append( "\nField:
" ).append( "timeInForce" ).append( " : " );

    final TimeInForce tTimeInForceBase =
msg.getTimeInForce();

    final byte tTimeInForce = (
tTimeInForceBase == null ) ? 
DEFAULT_TimeInForce : transformTimeInForce( tTimeInForceBase );

    _builder.encodeByte( tTimeInForce );

    if ( _debug ) _dump.append( "\nHook :
" ).append( "execInst" ).append( " : " ).append(
"encode" ).append( " : " );

    _builder.encodeByte( (byte)2 );

    if ( _debug ) _dump.append( "\nHook :
" ).append( "uniqueClientCode" ).append( " : "
).append( "encode" ).append( " : " );

    _builder.encodeStringFixedWidth(
_uniqueClientCode, 12 );

    if ( _debug ) _dump.append( "\nField:
" ).append( "filler3" ).append( " : " );

    _builder.encodeFiller( 3 );

    _builder.end();

}

Sample debug log output … essential for debugging the model when testing exchange connectivity.

{  ENCODE msgType=10125  encodeMap=NewOrderRequestSimple  eventType=NewOrderSingle :

Field:
msgSeqNum : uint 10,  bytes=4, offset=16,
raw=[ 0A 00 00 00 ]

Hook :
senderSubID : encode : uint 810701002, 
bytes=4, offset=20, raw=[ CA 50 52 30 ]

Field: price :
decimal 62.9,  bytes=8, offset=24, raw=[
80 C8 E9 76 01 00 00 00 ]

Hook :
senderLocationID : encode : long 1234567890123456,  bytes=8, offset=32, raw=[ C0 BA 8A 3C D5 62
04 00 ]

Field: clOrdId
: stringAsLong 38470008  (len=8),  bytes=8, offset=40, raw=[ 78 01 4B 02 00 00
00 00 ]

Field: orderQty
: qty 10,  bytes=4, offset=48, raw=[ 0A
00 00 00 ]

Field: filler1c
: filler  len=4,  bytes=4, offset=52, raw=[ 00 00 00 00 ]

Hook :
simpleSecurityID : encode : int 1000627, 
bytes=4, offset=56, raw=[ B3 44 0F 00 ]

Hook :
accountType : encode : byte ^T,  bytes=1,
offset=60, raw=[ 14 ]

Field: side :
byte ^A,  bytes=1, offset=61, raw=[ 01 ]

Hook :
priceValidityCheckType : encode : byte ^@, 
bytes=1, offset=62, raw=[ 00 ]

Field:
timeInForce : byte ^@,  bytes=1,
offset=63, raw=[ 00 ]

Hook : execInst
: encode : byte ^B,  bytes=1, offset=64,
raw=[ 02 ]

Hook :
uniqueClientCode : encode : stringFixedWidth OWN  (len=12), 
bytes=12, offset=65, raw=[ 4F 57 4E 00 00 00 00 00 00 00 00 00 ]

Field: filler3
: filler  len=3,  bytes=3, offset=77, raw=[ 00 00 00 ]

} bytes=80

16:15:12.445
[info]

 OUT [exchangeSession1]:

0000
P....'...............PR0...v.......<.b..x.K.......

0050
.......D.......OWN............

0100

     1       
10        20        30        40        50

 OUT [exchangeSession1]:

50 00 00
8D 27 00 00 00 00 00 00 00 00 00 00 0A 00 00 00 CA 50 52 30 80 C8 E9 76 01
00 00 C0 BA 8A 3C D5 62 04 00 78 01 4B 02 00 00 00 00 0A 00

0050 00 00 00
00 00 00 B3 44 0F 00 14 01 00 00 02 4F 57 4E 00 00 00 00 00 00 00 00 00 00 00
00

0100

     1 
2  3  4 
5  6  7 
8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

byteCount=80

16:15:12.445
[info]   OUT [exchangeSession1]:
NewOrderSingleImpl , clOrdId=38470008, account=, text=, exDest=,
securityExchange=, price=62.9, orderQty=10, execInst=null, handlInst=null,
orderCapacity=null, ordType=Limit, securityType=, securityIDSource=,
timeInForce=Day, bookingType=null, orderReceived=[null], orderSent=[null],
instrument=1000627, client=, origClOrdId=, securityId=, symbol=, currency=,
transactTime=[null], sendingTime=[null], side=Buy, srcLinkId=, onBehalfOfId=,
msgSeqNum=10, possDupFlag=N

Because the code is generated for both encoding and decoding, then the exchange simulator can simulate any exchange in the model.

I wrote the code generator from scratch in a couple of weeks, its completely custom and not general purpose.

The resulting internal POJO's interfaces, and CODEC's for external to internal translation result in over 100,000 lines of generated high quality code.

Click here for my list of Ultra Low Latency Blogs and Future Topics