Monday, 4 May 2015

Linux Tuning for Ultra Low Latency

These are some of my notes from BIOS and Linux tuning for Ultra Low Latency with SubMicroTrading.

Don’t blindly copy ANY settings here, test each one for impact and pick the values that suit your system. I include them as possible points of interest.
I have spent many weeks simply doing  this, its tedious but necessary to determine best settings for your system.

BIOS settings

Research every option, remember try to change one item at a time and run benchtest to ascertain impact

Disable hyper-threading
Disable turbo mode if overclocking
Disable all options related to power saving (eg CPU C State support)
Set SATA Configuration to ENHANCED
ACPI Power Management Features :  APIC ACPI SCI IRQ and  High Precision Timer ENABLED                                     
If using PCI3 NetworkCard the slot with the card HAS PCI3 enabled .. default may be PCI2

Operating System Tuning for Ultra Low Latency

Its over 20 years since I worked at a low level with Unix/Linux and the truth is I have forgotten more about the kernal than I  now know. I am NOT an O/S specialist. For SubMicroTrading I didn’t have the luxury to pay someone to configure linux for me so I had to do it myself. I still have my trusty Stevens UNIX Network Programming book which helped and ofcause today we have Google !  David Riddoch from Solarflare was also helpful at answering questions regarding Solarflare and OpenOnload tuning.

I started with Redhat and dismissed the Realtime variant as it was slower for my benchtest, I currently recommend CentOS 5.10 (I am somewhat behind the later versions but honestly what do they have that helps with low latency ?).

Read the Solarflare optimisation document, only wish it had existed when I started !   INSTALL OpenOnload !! I don’t understand why people working in microsecond level latency still don’t use kernal bypass !  OpenOnload is great as its non intrusive and requires ZERO application code changes.

INSTALL SOLARFLARE (requires linux install to have dev env) … note these notes are OLD so versions will be well out of date

copy the SolarFlare drivers to /INSTALL/SOLARFLARE

s1) rpmbuild --rebuild /INSTALL/SOLARFLARE/sfc-3.0.6.2199-1.src.rpm

==> CREATES  /usr/src/redhat/RPMS/x86_64/kernel-module-sfc-RHEL5-2.6.18-194.el5-3.0.6.2199-1.x86_64.rpm

s2) Install the RPM

rpm -ivh /usr/src/redhat/RPMS/x86_64/kernel-module-sfc-RHEL5-2.6.18-194.el5-3.0.6.2199-1.x86_64.rpm

==> eth2 and eth3 are now available
==> use rpm -e if old version around

s3) install OpenOnLoad

    tar -xvf openonload-20100923.tar
    ./scripts/onload_install
    modprobe -r sfc
    modprobe sfc

    openonload now ready to use

s4) install BIOS update tools
   
    gunzip SF-104451-LS-4_Solarstorm_Linux_and_VMware_ESX_Utilities_RPM.tgz
    tar -xvf SF-104451-LS-4_Solarstorm_Linux_and_VMware_ESX_Utilities_RPM.tar
    ==> creates ==> sfutils-3.0.8.2216-1.rpm
    rpm -ivh /INSTALL/SOLARFLARE/sfutils-3.0.8.2216-1.rpm
   
    (if get clash with  previous version use rpm -e {oldRpm}
   
s5) check is BIOS update is required and update
   
    sfupdate
    sfupdate --write
   
    onload_tool disable_cstates persist


s6) setup and env var with your required profile eg latency :-

export PRERUN="onload --profile=latency "


s7) simply add $PRERUN to the start of your application invocation command

$PRERUN java …….


Beware O/S upgrades, Linux 6 has some extra horrid latency which after several days tweaking I still hadnt eradicated … I went back to Centos 5.10

Here are my notes from CentOS / RedHat installation

Deselect virtualisation
Disable firewall
Disable SELinux
Delete SWAP partition (you don’t want swapping so ensure you have enough memory!)

Obviously disabling SELinux and firewall is for benchtesting in controlled environment. For colocation running you need determine an appropriate security level for your org. If you must have a firewall between you and the exchange then use a hardware one.  Benchtest without security then with security on so you know the impact.

Protect cores against unwanted intrusion (will be discussed when I blog on using thread affinity)

Avoid millisecond latency impact by using discrete threading model with core affinity via kernal param isolcpus. The O/S wont share these cores via the scheduler so you will need to use thread affinity to bind threads to the protected cores (code to follow in later blog).

Edit the /boot/grub/grub.conf

kernel /vmlinuz-2.6.18-194.el5 ro root=LABEL=RH_ROOT    nohz=off  isolcpus=6,7,8,9,10,11    rhgb

Kernal Params

There are many ! Here are some to look at :-

transparent_hugepages=never
intel_idle.max_cstate=0
nohz=off
nosoftlockup
idle=poll

Disable unwanted services

/sbin/chkconfig --list | grep "5:on"
 
chkconfig irqbalance off
chkconfig anacron off
chkconfig atd off
chkconfig avahi-daemon off
chkconfig bluetooth off
chkconfig cups off      
chkconfig hidd off      
chkconfig isdn off      
chkconfig pand off      
chkconfig rhnsd off     
chkconfig sendmail off  
chkconfig cpuspeed off  
chkconfig NetworkManager off
chkconfig iptables off
chkconfig ip6tables off
chkconfig libvirt-guests off

These are the services I disabled obviously you need to ensure you don’t need a service before you disabled it.
IRQBalance and CpuSpeed were the main services that I wished to disable … at the risk of sounding like a broken record disable single service, bench test rinse, repeat. Don’t disable ANY service without checking if YOU need it first !

System Scripts : rc.local

Edit /etc/rc.local

ethtool -C eth2 adaptive-rx off
ethtool -C eth2 rx-usecs 0 rx-frames 0 rx-usecs-high 0 rx-usecs-low 0 pkt-rate-low 0 pkt-rate-high 0
ethtool -C eth2 rx-usecs-irq 60
ethtool -A eth2 rx off tx off

ethtool -C eth3 adaptive-rx off
ethtool -C eth3 rx-usecs 0 rx-frames 0 rx-usecs-high 0 rx-usecs-low 0 pkt-rate-low 0 pkt-rate-high 0
ethtool -C eth2 rx-usecs-irq 60
ethtool -A eth3 rx off tx off

echo 0 > /sys/class/net/eth2/device/lro
echo 0 > /sys/class/net/eth3/device/lro

Only use rx irq 60  IF using openOnload …. You can experiment with this setting, I use spin reading so should never require an IRQ … but I found if I set lower or higher I could get nasty jitter.


System Scripts : sysctl.conf

Edit /etc/sysctl.conf

kernel.sysrq = 0
kernel.core_uses_pid = 1

net.ipv4.tcp_low_latency=1

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

kernal.isolcpus=6,7,8,9,10,11
kernel.vsyscall64 = 2

Some other useful CentOS "stuff"

Set Run Level

change /etc/inittab default run lvl from 5 to 3   

id:3:initdefault

Network config

/etc/sysconfig/network-scripts/ifcfg-eth*

After changing run

/etc/init.d/network restart

Label Root Partition

Label the root partition eg to RH_ROOT  … so its not confused with any other later O/S installs

SET DEVICE LABEL READY FOR GRUB  (** DONT FORGET UPDATE fstab OR IT WONT BOOT **)
e2label /dev/sda7 RH_ROOT

   edit /etc/fstab
LABEL=RH_ROOT          /                       ext3    defaults        1 1

Edit the /boot/grub/grub.conf
         kernel /vmlinuz-2.6.18-194.el5 ro root=LABEL=RH_ROOT    nohz=off  isolcpus=6,7,8,9,10,11    rhgb
         

Check Kernal CPU Params

     cat /sys/devices/system/cpu/cpuidle/current_driver
    
         intel_idle.max_cstate=0   idle=poll   transparent_hugepage=never processor.max_cstate=0


Sunday, 3 May 2015

Measuring Latency in Ultra Low Latency Systems


For years people have said optimise last, don’t worry .. You can fix later.

For ultra low latency its simply not true. Every line of code should be written with optimisation in mind. When writing SubMicroTrading I wrote micro benchmarks for comparing performance on a micro level. I tested pooling patterns, different queue implementations, impact of inheritance and generics. Some of the results were surprising. The one which stands out as misleading was the benchtest of Generic pools versus concrete pools. Theoretically generics should have no runtime impact, but on a PC I had a micro benchmark which showed there were some latency spikes. Because of this I made my code generator generate a discrete pool and recycle class. Ofcause when I reran the benchmarks on linux with tuned JVM parameters there was zero difference !

Absolutely essential is understand Key use cases and have a controlled benchtest setup which you can guarantee is not in use when testing.

Colocation OMS

The first use case for SubMicroTrading was as a colocation normalising order management with client risk checks. The OMS received client orders, validated them, ran risk checks then sent a market order to the exchange. The OMS would normalise all exchange eccentricities and allow for easy client customisation. This was before IB clients were allowed sponsored access. With sponsored access the need for ultra low latency order management systems disappeared … bad timing on my part. 

With this use case I ran a client simulator and an exchange simulator on the sim server, and the OMS on the trade server. The client simulator stamped a nanosecond timestamp on the output client order. The OMS when it generated a market order would stamp the order with total time within the OMS as well as the original client order timestamp from the client simulator. The exchange simulator and client simulator both use thread affinity to the same core, and thus the nano second timestamp can be used to determine realistic end to end time. You can guestimate the time in the NIC and network by :-

    TimeInNICsAndNetwork = ( timeMarketOrderRecieved - clientOrderTimeStamp ) - timeInOMS

There were 4 NIC TCP hops so roughly a

    (Rough) NIC TCP hop  = TimeInNICsAndNetwork / 4

Because the timestamps are generated on one host (even 1 core or 1 cpu) then you don’t need expensive PTP time synchronisation. Using System.nanoTime to measure latency is not recommended, it uses the HPET timer which shows some aberations of many milliseconds on long test runs. RDTSC seems to be more accurate but can have problems measuring across cores (I will include the code in a later blog on JNI).

To be honest however at this level all we are really concerned about is having a repeatable testbed. By changing some parameter, whats the impact ?  Run the benchtest several times and measure the delta. Now we can see if we make things better or worse. Note now Solarflare have added packet capture facility I would hope its possible to use capture of input/output would allow accurate perf measurements.

Algo Trading System : Tick To Trade

The second use case was for Tick to Trade. Here I wrote an algo container, optimal book manager and market data session handling using the SMT framework.



NIC1 was used for market data, and NIC2 for the trade connection.

In this use case, the simulation server replays captured market data using tcpreplay at various rates (from x1 to flat out which was around 800,000pps).

The trade server gets the market data thru NIC1 into the market data session(s). CME has so many sessions that I wrote a dynamic session generator to lazily create sessions based on subscribed contracts. The market data is consumed by the algos which can then send order to exchange session over NIC2.  The order is send with a marker field to identify the tick that generated it. This allows accurate correlation of the tick which really generated the order.

Hybrid FPGA, market data providers, exchange adapters all need to be considered holistically within the trading system as a whole for the KEY use cases of the system. Benchtest them not at the rate of a passive day but at the maximum rate you may need … consider the highest market spike rates and then try normal market rate x1, x5, x10, x100 to understand different performance levels.

For really accurate figures we need to use NIC splitters on the trade server and capture input/output from both NICS. I have done this most recently with a TipOFF device (with Grandmaster clock and solarflare PTP 10GE NICS which have CPU independent host clock adjustment). Here you measure reasonably accurate wire to wire latency. This is best way to compare tick to trade performance for a system as a whole and thus bypass the smoke and mirrors from component service providers.

I have seen people working on ultra low latency FX systems state the P99.9999 is most important because the arb opportuntities are 1 in 1,000,000 and they believe that opportunity at the exchange will occur at the time the trading system has any jitter … ofcause it was completely uncorroborated and personally I think it was tosh. Arb ops are likely to happen during peak loads so its critical that system has throughput that can cater with max load without say causing hidden packet backup in network  buffers (beware dodgy TCP partial stack implementations that are likely to fail during network congestion / peaks).

Key measurements in my mind are the P50, P90, P95, P99 … with P95 being the most important. I am sure there will be plenty of people who disagree. But at the end of the day the only really important stat is how much money you make !! 

Note ensure that any monitoring processes on trade servers are lightweight and do NO GC and not impact trading systems performance. Run the testbed with and without for few days to see impact.

Saturday, 2 May 2015

Recommendations for Ultra Low Latency (ULL)

OS -> Centos 5.10

Use Redhat if you can afford it .. Don’t get Realtime its slower but more deterministic with more fine grained scheduling … not best for ULL

Avoid O/S upgrades unless you NEED it, … the devs are constantly adding new power saving tweaks which introduce jitter …. RedHat 5 to RedHat 6 was horrendous. I spent a week trying to undo all the new switches they had put on and failed …. Given I had no need for V6 I went back to V5.10.

Don’t just turn on "Huge Pages" or  other O/S or language settings. Test them in a test bed first … you may well be surprised, key is the affect of the change on the system as a whole. Every tweak has plus and negatives and they are different per system … so test measure, rinse, repeat.

Hardware -> Fastest Intel CPU with lowest latency RAM

Using overclocked CPU's with non ECC memory running 24*7 brings risk of crashing, if however the extra 10% to 30% performance boost is the difference between making money and not making money then a certain level of risk will be acceptable. I have run overclocked X5680's with ECC RAM and i7's at 5Ghz with overclocked memory for weeks under load without crashing so it is possible to achieve stable overclocking.

Solarflare NIC with Open Onload

My first NIC's were top Mellanox card in 2010, installation was aweful, performance was terrible at high throughput. Support was not great, also the one sided TCP acceleration was useless for colocation trading. I signed an NDA so wont say more but after two months of pain I switched to Chelsio. Chelsio was just as bad to install and performance even worse for a top 10GB NIC very disappointing.

I got my first Solarflare card in 2010, installation was a breeze and the cards outperformed Chelsio and Mellanox with no tuning. With OpenOnload and the simplest tuning parameter ever (--profile=latency)  they blew Mellanox completely away. My advice is ignore all the perf stats the NIC providers say and test your self in controlled environment with two servers having dual NICs connected directly together (no switch or anything else in the way).

Language -> Java 1.8

I use Sun … er sorry, Oracle standard Java 1.8 (don’t use Realtime java) …. No real perf difference between 1.6 and 1.8 for SMT.

There will future blogs on JVM args and another on application threading models and another on API design and latency impact.

Tools / Third Party Libs

In world of ULL I  avoid third party libs due to lack of control over GC, threading model and JIT jitter. That said "hwloc" has been invaluable for its abstraction layer to core binding.

A future blog will show how to do thread affinity in Java.


hwloc, hwinfo, i7z

Sunday, 26 April 2015

Holistic Latency for Ultra Low Latency Systems



Definition of holistic

"characterized by the belief that the parts of something are intimately interconnected and explicable only by reference to the whole."

Its crucial to understand all the components that make up a trading system as each will have its own latency characteristics.

Layer
Sample
Factor
Mitigation
Effort
Software Program
Design
Key design patterns / extensible framework / Efficient code
Hard
Language
Inherent
GC
JIT
Compile time args / run time args
Object Pooling
Warmup Code
Easy
Medium
High
Operating System

Scheduler
Hard Page Fault
Network Stack
Thread Affinity / Core Spinning
Appropriate memory in server, Process sizing
Kernal Bypass Drivers and Tuning, Socket spinning
Easy
Easy
Easy
Hardware
CPU
Memory
Network
Disable H/T, get fastest CPU, Overclock
Buy memory with lowest latency and ensure enough
Buy Solarflare NIC
Easy
Easy
Easy

I have built four rack servers with a box full Chelsio, Mellanox and Solarflare NIC's. By far the easiest to install and easiest to tune and best performing was Solarflare. Really disappointed in the Mellanox cards. Solarflare open onload provides one sided acceleration suitable for colocation purposes and at no extra cost. This was several years ago so maybe Mellanox have their own one sided acceleration now but for me its come too late. I have preached Solarflare NIC's to everyone I know.

Consider following scenario

Read next packet from socket
decodes market data tick into exchange normalised event
log event
Place event into queue for async consumption

How much benefit will there be in the end system by saving 20nano seconds in switching from a queue from ConcurrentLinkedQueue  to  a  RingBuffer (eg Disruptor) to your system ? Will it be twice as quick ? …. No, what about how you read the packet of the socket ? What about the log event ? What about the queue size characterics ?
I have seen FX systems with man years of effort put into latency optimisation when they didn’t even use OpenOnload for their Solarflare cards ! What has higher risk .. Using OpenOnload or the code a 10 man team has written over 3 years ?

You must understand the key use cases for your system where latency is important, then create end to end repeatable bench tests in fully controlled environment which will be reflective of the production environment. I suggest wire to wire timings with PTP Solarflare NIC's. Alternatively use two servers (1 simulation, 1 trading) with dual Solarflare NIC's … in this case you don’t need PTP (I will cover this in a later blog with the JNI code I put together).

Micro-benchmarks must be used with care, they can give good comparative performance against other implementations. But wont necessarily produce production system gains given all the variables in play. For example in above scenario consider what happens when a fixed sized queue fills up. More on benchtesting another day.






Monday, 20 April 2015

MODELs & Generated Code

Hand cranked codecs are a pain to write and a pain to upgrade, I have written many over the years !  The solution is to generate the code.

An XML model defines the internal model with POJO's that all components can work with eg NewOrderSingle, NewOrderAck, TradeNew. It also defines external models which can be client or exchange, FIX variants or binary protocols such as ETS, Millenium, UTP etc. Finally it defines codecs which specify how to translate external model to/from internal model.

Sample Internal Event for a New Order Single

<Base id="BaseOrderRequest" src="client" extends="CommonClientHeader">
  <Attribute typeId="Instrument"                              name="instrument"   mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="ClientProfile"                           name="client"       mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[CLORDID_LENGTH]"    tag="11"  name="clOrdId"      mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[CLORDID_LENGTH]"    tag="41"  name="origClOrdId"  mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[SECURITYID_LENGTH]" tag="48"  name="securityId"   mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[SYMBOL_LENGTH]"     tag="55"  name="symbol"       mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="Currency"                      tag="15"  name="currency"     mandatory="N"     outbound="seperate"/>
  <Attribute typeId="SecurityIDSource"              tag="22"                      mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="UTCTimestamp"                  tag="60"  name="transactTime" mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="UTCTimestamp"                  tag="52"  name="sendingTime"                    outbound="seperate"/>
  <Attribute typeId="Side"                          tag="54"                      mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="viewstring[SRC_LINKID_LENGTH]"           name="srcLinkId"    mandatory="N"     outbound="delegate"/>
</Base>
   
<Base id="OrderRequest" src="client" extends="BaseOrderRequest">
  <Attribute typeId="viewstring[ACCOUNT_LENGTH]"        tag="1"   name="account"          mandatory="N" outbound="delegate"/>
  <Attribute typeId="viewstring[TEXT_LENGTH]"           tag="58"  name="text"             mandatory="N" outbound="delegate"/>
  <Attribute typeId="viewstring[EXDESTINATION_LENGTH]"  tag="100" name="exDest"           mandatory="N" outbound="delegate"/>
  <Attribute typeId="viewstring[SECURITYEXCH_LENGTH]"   tag="207" name="securityExchange" mandatory="N" outbound="delegate"/>
  <Attribute typeId="double"                        tag="44"  name="price"      mandatory="Y"     outbound="seperate"/>
  <Attribute typeId="int"                           tag="38"  name="orderQty"   mandatory="Y"     outbound="seperate"/>
  <Attribute typeId="ExecInst"                      tag="18"                                      outbound="delegate"/>
  <Attribute typeId="HandlInst"                     tag="21"                                      outbound="delegate"/>
  <Attribute typeId="OrderCapacity"                 tag="528"                                     outbound="seperate"/>
  <Attribute typeId="OrdType"                       tag="40"                    mandatory="Y"     outbound="delegate"/>
  <Attribute typeId="SecurityType"                  tag="167"                                     outbound="delegate"/>
  <Attribute typeId="SecurityIDSource"              tag="22"                                      outbound="delegate"/>
  <Attribute typeId="TimeInForce"                   tag="59"                                      outbound="delegate"/>
  <Attribute typeId="BookingType"                   tag="775"                                     outbound="delegate"/>
  <Attribute typeId="long"                                    name="orderReceived" mandatory="Y"  outbound="delegate"/>
  <Attribute typeId="long"                                    name="orderSent"     mandatory="Y"  outbound="delegateGetAndSet"/>
</Base>

<Event id="NewOrderSingle" extends="OrderRequest" src="client">
  <Attribute typeId="viewstring[CLORDID_LENGTH]"    tag="11"  name="clOrdId"     mandatory="Y"     outbound="seperate"/>
</Event>

Sample external definition for a New Order in ETI … note each field must have a dictionary entry which defines its type in the external model.

<Message id="NewOrderRequestSimple"            msgType="10125">
    <Field id="msgSeqNum"               mand="Y"/>
    <Field id="senderSubID"             mand="Y"/>              
   
    <Field id="price"                   mand="Y"/>
    <Field id="senderLocationID"        mand="N"/>
    <Field id="clOrdId"                 mand="Y"/>
    <Field id="orderQty"                mand="Y"/>
    <Field id="filler1c"                len="4"/>   <!-- maxShow tag210 -->
    <Field id="simpleSecurityID"        mand="Y"/>
   
    <Field id="accountType"             mand="N"/>
    <Field id="side"                    mand="Y"/>
    <Field id="priceValidityCheckType"  mand="Y"/>
    <Field id="timeInForce"             mand="Y"/>
    <Field id="execInst"                mand="Y"/>
    <Field id="uniqueClientCode"        mand="N"/>
    <Field id="filler3"                 len="3" comment="pad3"/>     
</Message>

Sample CODEC for a New Order in BSE ETI

<MessageMap id="BaseBSEOrder" messageId="" ignore="true" extends="BaseRequest">
    <Map field="marketSegmentID"><Hook type="encode" code="encodeMarketSegmentID( msg.getInstrument() )"/></Map>

    <Map eventAttr="securityId" field="simpleSecurityID">
        <Hook type="encode" code="encodeSimpleSecurityId( msg.getInstrument() )"/>
        <Hook type="decode" code="_securityId = _builder.decodeUInt()"/>
    </Map>

    <Map field="priceValidityCheckType"><Hook type="encode" code="_builder.encodeByte( (byte)0 )"/></Map>
    <Map field="accountType"><Hook type="encode" code="_builder.encodeByte( (byte)20 )"/></Map>
    <Map field="maxPricePercentage"><Hook type="encode" code="_builder.encodePrice( 0.5 )"/></Map>
    <Map field="senderLocationID"><Hook type="encode" code="_builder.encodeLong( _locationId )"/></Map>
    <Map field="orderCapacity"><Hook type="encode" code="_builder.encodeByte( (byte)1 )"/></Map>
    <Map field="positionEffect"><Hook type="encode" code="_builder.encodeByte( (byte)'C' )"/></Map>
    <Map field="account"><Hook type="encode" code="_builder.encodeStringFixedWidth( _account, 2 )"/></Map>
    <Map field="applSeqIndicator"><Hook type="encode" code="_builder.encodeByte( (byte)0 )"/></Map>
    <Map field="execInst"><Hook type="encode" code="_builder.encodeByte( (byte)2 )"/></Map>
    <Map field="uniqueClientCode">
        <Hook type="encode" code="_builder.encodeStringFixedWidth( _uniqueClientCode, 12 )"/>
        <Hook type="decode" code="_builder.skip( 12 )"/>
    </Map>
</MessageMap>

<MessageMap id="NewLimitOrder"   eventId="NewOrderSingle" messageId="NewOrderRequestSimple" extends="BaseBSEOrder" encodeFunc="encodeNOS">
    <Map field="productComplex"><Hook type="encode" code="_builder.encodeByte( (byte)1 )"/></Map>
    <Hook type="postDecode" code="enrich( msg ) ; msg.setOrdType( OrdType.Limit )"/>
</MessageMap>

Note hooks allow overriding of the default code generation which is based on comparing the internal model dictionary entry with the external model dictionary entry. Map entries are only added for fields that don’t want the default behaviour.

Sample Generated Encoder for NOS :-

public final void encodeNewLimitOrder( final NewOrderSingle msg ) {
    final int now = _tzCalculator.getNowUTC();
    _builder.start( MSG_NewOrderRequestSimple );
    if ( _debug ) {
        _dump.append( "  encodeMap=" ).append( "NewOrderRequestSimple" ).append( "  eventType=" ).append( "NewOrderSingle" ).append( " : " );
    }

    if ( _debug ) _dump.append( "\nField: " ).append( "msgSeqNum" ).append( " : " );
    _builder.encodeUInt( (int)msg.getMsgSeqNum() );
    if ( _debug ) _dump.append( "\nHook : " ).append( "senderSubID" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeUInt( _senderSubID ); // senderSubID;
    if ( _debug ) _dump.append( "\nField: " ).append( "price" ).append( " : " );
    _builder.encodeDecimal( msg.getPrice() );
    if ( _debug ) _dump.append( "\nHook : " ).append( "senderLocationID" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeLong( _locationId );
    if ( _debug ) _dump.append( "\nField: " ).append( "clOrdId" ).append( " : " );
    _builder.encodeStringAsLong( msg.getClOrdId() );
    if ( _debug ) _dump.append( "\nField: " ).append( "orderQty" ).append( " : " );
    _builder.encodeQty( (int)msg.getOrderQty() );
    if ( _debug ) _dump.append( "\nField: " ).append( "filler1c" ).append( " : " );
    _builder.encodeFiller( 4 );
    if ( _debug ) _dump.append( "\nHook : " ).append( "simpleSecurityID" ).append( " : " ).append( "encode" ).append( " : " );
    encodeSimpleSecurityId( msg.getInstrument() );
    if ( _debug ) _dump.append( "\nHook : " ).append( "accountType" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeByte( (byte)20 );
    if ( _debug ) _dump.append( "\nField: " ).append( "side" ).append( " : " );
    _builder.encodeByte( transformSide( msg.getSide() ) );
    if ( _debug ) _dump.append( "\nHook : " ).append( "priceValidityCheckType" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeByte( (byte)0 );
    if ( _debug ) _dump.append( "\nField: " ).append( "timeInForce" ).append( " : " );
    final TimeInForce tTimeInForceBase = msg.getTimeInForce();
    final byte tTimeInForce = ( tTimeInForceBase == null ) ?  DEFAULT_TimeInForce : transformTimeInForce( tTimeInForceBase );
    _builder.encodeByte( tTimeInForce );
    if ( _debug ) _dump.append( "\nHook : " ).append( "execInst" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeByte( (byte)2 );
    if ( _debug ) _dump.append( "\nHook : " ).append( "uniqueClientCode" ).append( " : " ).append( "encode" ).append( " : " );
    _builder.encodeStringFixedWidth( _uniqueClientCode, 12 );
    if ( _debug ) _dump.append( "\nField: " ).append( "filler3" ).append( " : " );
    _builder.encodeFiller( 3 );
    _builder.end();
}


Sample debug log output … essential for debugging the model when testing exchange connectivity.

{  ENCODE msgType=10125  encodeMap=NewOrderRequestSimple  eventType=NewOrderSingle :
Field: msgSeqNum : uint 10,  bytes=4, offset=16, raw=[ 0A 00 00 00 ]
Hook : senderSubID : encode : uint 810701002,  bytes=4, offset=20, raw=[ CA 50 52 30 ]
Field: price : decimal 62.9,  bytes=8, offset=24, raw=[ 80 C8 E9 76 01 00 00 00 ]
Hook : senderLocationID : encode : long 1234567890123456,  bytes=8, offset=32, raw=[ C0 BA 8A 3C D5 62 04 00 ]
Field: clOrdId : stringAsLong 38470008  (len=8),  bytes=8, offset=40, raw=[ 78 01 4B 02 00 00 00 00 ]
Field: orderQty : qty 10,  bytes=4, offset=48, raw=[ 0A 00 00 00 ]
Field: filler1c : filler  len=4,  bytes=4, offset=52, raw=[ 00 00 00 00 ]
Hook : simpleSecurityID : encode : int 1000627,  bytes=4, offset=56, raw=[ B3 44 0F 00 ]
Hook : accountType : encode : byte ^T,  bytes=1, offset=60, raw=[ 14 ]
Field: side : byte ^A,  bytes=1, offset=61, raw=[ 01 ]
Hook : priceValidityCheckType : encode : byte ^@,  bytes=1, offset=62, raw=[ 00 ]
Field: timeInForce : byte ^@,  bytes=1, offset=63, raw=[ 00 ]
Hook : execInst : encode : byte ^B,  bytes=1, offset=64, raw=[ 02 ]
Hook : uniqueClientCode : encode : stringFixedWidth OWN  (len=12),  bytes=12, offset=65, raw=[ 4F 57 4E 00 00 00 00 00 00 00 00 00 ]
Field: filler3 : filler  len=3,  bytes=3, offset=77, raw=[ 00 00 00 ]
} bytes=80

16:15:12.445 [info]
 OUT [exchangeSession1]:
0000 P....'...............PR0...v.......<.b..x.K.......
0050 .......D.......OWN............
0100
     1        10        20        30        40        50
 OUT [exchangeSession1]:
0000 50 00 00 00 8D 27 00 00 00 00 00 00 00 00 00 00 0A 00 00 00 CA 50 52 30 80 C8 E9 76 01 00 00 00 C0 BA 8A 3C D5 62 04 00 78 01 4B 02 00 00 00 00 0A 00
0050 00 00 00 00 00 00 B3 44 0F 00 14 01 00 00 02 4F 57 4E 00 00 00 00 00 00 00 00 00 00 00 00
0100
     1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
byteCount=80

16:15:12.445 [info]   OUT [exchangeSession1]: NewOrderSingleImpl , clOrdId=38470008, account=, text=, exDest=, securityExchange=, price=62.9, orderQty=10, execInst=null, handlInst=null, orderCapacity=null, ordType=Limit, securityType=, securityIDSource=, timeInForce=Day, bookingType=null, orderReceived=[null], orderSent=[null], instrument=1000627, client=, origClOrdId=, securityId=, symbol=, currency=, transactTime=[null], sendingTime=[null], side=Buy, srcLinkId=, onBehalfOfId=, msgSeqNum=10, possDupFlag=N



Because the code is generated for both encoding and decoding, then the exchange simulator can simulate any exchange in the model.

I wrote the code generator from scratch in a couple of weeks, its completely custom and not general purpose.

The resulting internal POJO's interfaces, and CODEC's for external to internal translation result in over 100,000 lines of generated high quality code.