Low latency, high throughput and extreme scalability are imprinted into Tbricks by Itiviti’s DNA.
The principal tenet of Tbricks’ system architecture is what we believe is one of its most important strengths: “Do the right thing, in the right place, at the right time.” The internal protocols and data flows have been designed with a server-based co-located system in mind, with the goal to minimize the machine resources wasted on performing unnecessary work. This includes extensive use of source-side filtering for data streams whenever possible.
The implementation uses highly efficient data structures and algorithms, with the best possible time-complexity characteristics, all the way down to O(1) for performance-sensitive operations, like source-side filtering of data streams. We have minimized or outright eliminated all possible system calls, database accesses, context switches, mutex locks and other synchronization primitives in latency-sensitive critical paths.
Sophisticated caching schemes are used when proved beneficial, for example for numerically intensive calculated instrument values, such as options pricing. Platform-unique performance optimizations are used whenever beneficial; for example, adopting the scalable jemalloc memory allocator, tuned file system settings, and tuning kernel, TCP/IP and NIC driver settings for minimum latency, including kernel-bypass using Solarflare® OpenOnload®. Tbricks also has built-in support for creating processor sets and easily assigning component to them.
“Do the right thing, in the right place, at the right time.”
Consistently considered throughout the design and implementation process, performance is imprinted into the DNA of the system and engineering resources are consistently dedicated to further improve performance with each release.
All performance critical services can be run in multiple instances for true horizontal scalability, and transparent multiplexing for market data and trading is built right in. All services have been heavily optimized to perform their designated task quickly and robustly, and all apps are built with native development tools for no-compromise performance.
For excellent vertical scalability, the Tbricks services have been carefully multithreaded to ensure they can use all available processor cores. Multiple services running on the same machine will additionally benefit automatically from the multiprocessing provided by the operating system. To ensure efficient use of threading resources, Grand Central Dispatch has been integrated and is used throughout the system, allowing for lock-free operation of critical sections under load.
This consistent work to scale well on multi-core processors, ensures excellent performance even when facing the ever-more relevant challenges of Amdahl’s law.
“Tbricks inherently supports the fusing of latency-sensitive services into a single process”
Services in Tbricks typically run as separate processes using shared memory for interprocess communication. Tbricks inherently supports the fusing of latency-sensitive services into a single process using our Speedcore® technology. This allows for mimicking the deployment of a typical in-house application, while retaining a clear architectural separation of services.
Services can easily be moved into or out from a Speedcore®.
This innovative approach, allows you to carefully control how services should be deployed to ensure the best possible performance. The benefits of running in a Speedcore® configuration is the removal of the interprocess communication overhead between the services running in the Speedcore® as well as an improved CPU cache hit rate with dedicated CPU resources assigned to the Speedcore® using processor sets.
Tbricks includes a blazingly fast embedded transactional database — WiredTiger — that vastly outperforms conventional SQL databases. The embedded database resides in the same address space as the service, so there is zero IPC overhead for communicating with a database server.
The fact that each service has access to its own private storage also allows for highly parallelized I/O across the system. WiredTiger is consistently used for all storage in the system and requires virtually no configuration.
“A typical front-end only uses 200 kbit/s on average”
When performing inter-process communication on the same host, Tbricks uses shared memory transport for the best possible latency and throughput. For services running on different machines, TCP/IP is used to allow for the source-side filtering and throttling of data streams.
All inter-process communications in the system are done using an efficient binary encoded protocol, which is further efficiently compressed for traffic sent across the WAN. Partial message updates are fully supported and are consistently used throughout the system to only send the actual delta changes over the wire rather than full business objects each time. The extensive use of source-side filtering also avoids superfluous data transfers and removes unnecessary wake ups of threads and allows trading apps to simply react when something of note has happened, thus avoiding repeated inefficient ‘should I do something?’ checks.
The Tbricks front-end is carefully implemented to use a minimum amount of bandwidth, as only the exact information that you see on screen is transferred. A typical front-end uses just 200 Kbit/s on average, with a full-fidelity truly responsive user experience. This removes the need for using remote display solutions such as Citrix, which additionally do not solve the problem of connecting a single unified front-end to a fully distributed system running in multiple geographical locations.
It is also possible to further improve performance by dampening quickly oscillating data streams using throttling conditions. This is beneficial when you aren’t interested in, say, market data updates unless they deviate more than a certain amount since the last update you received, or when you don’t need updates more often than at a predefined maximum frequency. For instance, it’s possible to set up a throttling condition that limits the update rate to be at most every X milliseconds, or to only send an update for a currency rate when the bid or ask changes more than Y% since the last update received.
Such throttling conditions provide an additional performance boost, as trading strategies don’t have to react on smaller price movements while still making sure an up-to-date value is received periodically by specifying the maximum update frequency.
The use of server-side filtering together with data stream throttling is a powerful combination that allows trading strategies as well as internal Tbricks services to eliminate unnecessary updates that are wasting processing power. -For instance, it’s possible to set up a throttling condition that limits the update rate to be at most every X milliseconds, or to only send an update for a currency rate when the bid or ask changes more than Y% since the last update received.
The key to understanding performance is measuring. We have designed Tbricks to make it possible to both measure and monitor many interesting performance aspects of production systems, including internal tick-to-trade latency as well as trading strategy runtime performance.
Tbricks includes a performance tracing framework, which makes it possible to measure and monitor data flows and corresponding latency metrics.
It is challenging to analyze the throughput and latency characteristics of a trading system, or even a specific trading strategy, without having a probe effect, especially when measuring time events in the low microsecond range. Using log statements and comparing timestamps often proves to be misleading when measuring such short time spans.
To enable such analysis without a significant probe effect, Tbricks supports live export of latency correlation information, allowing external performance measurement tools to correlate e.g. a given market data update to a specific outbound quote or order. This allows for proper integration with external hardware-capturing and performance analytics solutions.