Performance Tuning Guide ======================== This guide covers advanced performance tuning techniques to achieve sub-microsecond latency and maximum throughput with FIX-FastTrade. Performance Overview -------------------- FIX-FastTrade is designed for ultra-low latency trading with the following performance characteristics: * **Latency**: Sub-microsecond message processing (< 1μs) * **Throughput**: 1M+ messages/second sustained * **Memory**: Zero-copy operations with custom allocators * **CPU**: Multi-core scaling with 90%+ efficiency Platform-Specific Performance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: Performance by Platform :header-rows: 1 :widths: 20 15 20 20 25 * - Platform - Latency (μs) - Throughput (msg/s) - Memory Usage - CPU Efficiency * - macOS ARM64 - 0.8 - 2.5M - Excellent - Native NEON * - macOS x86_64 - 0.6 - 4.0M - Excellent - AVX2 optimized * - Linux x86_64 - 0.5 - 5.0M - Excellent - Full SIMD * - Linux ARM64 - 0.9 - 2.2M - Excellent - NEON optimized System-Level Optimizations -------------------------- CPU Configuration ~~~~~~~~~~~~~~~~~ **CPU Affinity and Isolation** Bind critical threads to specific CPU cores: .. code-block:: bash # Isolate CPUs for FIX-FastTrade (add to kernel boot parameters) isolcpus=1,2,3,4 # Set CPU governor to performance mode echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor # Disable CPU frequency scaling echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo **NUMA Optimization** For multi-socket systems: .. code-block:: bash # Check NUMA topology numactl --hardware # Run FIX-FastTrade on specific NUMA node numactl --cpunodebind=0 --membind=0 ./bin/fix-fasttrade --config config/fix-config.xml Memory Optimization ~~~~~~~~~~~~~~~~~~~ **Huge Pages Configuration** Enable huge pages for better memory performance: .. code-block:: bash # Configure huge pages (add to /etc/sysctl.conf) vm.nr_hugepages = 1024 # Apply settings sudo sysctl -p # Verify huge pages cat /proc/meminfo | grep Huge **Memory Locking** Lock memory pages to prevent swapping: .. code-block:: bash # Increase memory lock limits (add to /etc/security/limits.conf) * soft memlock unlimited * hard memlock unlimited # Or run with sudo for memory locking sudo ./bin/fix-fasttrade --config config/fix-config.xml --memory-lock Network Optimization ~~~~~~~~~~~~~~~~~~~~ **Network Interface Tuning** Optimize network interface for low latency: .. code-block:: bash # Increase network buffer sizes echo 'net.core.rmem_max = 134217728' >> /etc/sysctl.conf echo 'net.core.wmem_max = 134217728' >> /etc/sysctl.conf echo 'net.ipv4.tcp_rmem = 4096 87380 134217728' >> /etc/sysctl.conf echo 'net.ipv4.tcp_wmem = 4096 65536 134217728' >> /etc/sysctl.conf # Disable TCP timestamp and SACK echo 'net.ipv4.tcp_timestamps = 0' >> /etc/sysctl.conf echo 'net.ipv4.tcp_sack = 0' >> /etc/sysctl.conf # Apply settings sudo sysctl -p **Network Interface Card (NIC) Tuning** .. code-block:: bash # Set interrupt affinity echo 2 > /proc/irq/24/smp_affinity # Bind NIC interrupts to CPU 1 # Increase ring buffer sizes ethtool -G eth0 rx 4096 tx 4096 # Enable hardware timestamping (if supported) ethtool -T eth0 Application-Level Optimizations ------------------------------- Configuration Tuning ~~~~~~~~~~~~~~~~~~~~ **High-Performance Configuration** .. code-block:: xml true 0 1 2 3 true 2097152 true -15 true 8192 50000 true **Network Configuration for Low Latency** .. code-block:: xml true 131072 5 15 false false false Compiler Optimizations ~~~~~~~~~~~~~~~~~~~~~~ **Build with Maximum Optimization** .. code-block:: bash # Build with aggressive optimizations mkdir build && cd build cmake .. \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_CXX_FLAGS="-O3 -march=native -mtune=native -flto -funroll-loops" \ -DENABLE_SIMD=ON \ -DENABLE_PROFILE_GUIDED_OPTIMIZATION=ON make -j$(nproc) **Profile-Guided Optimization (PGO)** .. code-block:: bash # Build with PGO instrumentation cmake .. -DCMAKE_CXX_FLAGS="-fprofile-generate" make -j$(nproc) # Run training workload ./bin/fix-fasttrade --config config/training-config.xml # Rebuild with PGO optimization cmake .. -DCMAKE_CXX_FLAGS="-fprofile-use" make -j$(nproc) Runtime Optimizations ~~~~~~~~~~~~~~~~~~~~~ **Command Line Tuning** .. code-block:: bash # Maximum performance configuration sudo ./bin/fix-fasttrade \ --config config/high-performance.xml \ --cpu-main 0 \ --cpu-fix 1 \ --cpu-order 2 \ --cpu-market 3 \ --memory-lock \ --priority -15 \ --stats-interval 300 **Environment Variables** .. code-block:: bash # Disable address space randomization export ADDR_NO_RANDOMIZE=1 # Set CPU affinity mask export CPU_AFFINITY_MASK=0x0F # Optimize memory allocation export MALLOC_ARENA_MAX=1 export MALLOC_MMAP_THRESHOLD_=131072 Monitoring and Measurement -------------------------- Performance Metrics ~~~~~~~~~~~~~~~~~~~ **Built-in Statistics** FIX-FastTrade provides real-time performance metrics: .. code-block:: text === Performance Statistics === Uptime: 3600 seconds Messages processed: 18,000,000 Orders processed: 1,200,000 Message rate: 5,000 msg/sec Order rate: 333 orders/sec Latency Statistics: - Mean: 0.45μs - P50: 0.42μs - P95: 0.68μs - P99: 1.23μs - P99.9: 2.45μs Queue Statistics: - Order queue size: 0 - Message queue size: 0 - Memory pool utilization: 23% **External Monitoring Tools** .. code-block:: bash # CPU usage monitoring top -p $(pgrep fix-fasttrade) # Memory usage pmap -x $(pgrep fix-fasttrade) # Network statistics ss -i dst :9878 # System-wide latency cyclictest -p 80 -t5 -m -n Latency Measurement ~~~~~~~~~~~~~~~~~~~ **Timestamping** Enable hardware timestamping for accurate measurements: .. code-block:: bash # Check hardware timestamping support ethtool -T eth0 # Enable hardware timestamping in application export ENABLE_HW_TIMESTAMPING=1 **Latency Testing** .. code-block:: bash # Run latency benchmark ./bin/fix-fasttrade-benchmark \ --config config/benchmark.xml \ --test-duration 300 \ --message-rate 10000 Troubleshooting Performance Issues ---------------------------------- Common Performance Problems ~~~~~~~~~~~~~~~~~~~~~~~~~~~ **High Latency Symptoms** .. code-block:: text P99 latency > 10μs Message processing rate < 100,000 msg/sec High CPU wait time **Diagnostic Steps** .. code-block:: bash # Check CPU frequency scaling cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq # Monitor context switches vmstat 1 # Check for memory swapping free -h && cat /proc/swaps # Network interface statistics cat /proc/net/dev **Memory Issues** .. code-block:: bash # Check for memory leaks valgrind --tool=memcheck ./bin/fix-fasttrade --config config/test.xml # Monitor memory allocation strace -e trace=mmap,munmap,brk ./bin/fix-fasttrade --config config/test.xml **Network Issues** .. code-block:: bash # Check network latency ping -c 10 fix.exchange.com # Monitor network drops netstat -i # Check TCP retransmissions ss -i dst :9878 Performance Tuning Checklist ---------------------------- System Level ~~~~~~~~~~~~ * [ ] CPU governor set to performance mode * [ ] CPU frequency scaling disabled * [ ] Huge pages configured and enabled * [ ] Memory locking limits increased * [ ] Network buffers optimized * [ ] Interrupt affinity configured * [ ] NUMA topology optimized Application Level ~~~~~~~~~~~~~~~~~ * [ ] CPU affinity configured for all threads * [ ] Memory locking enabled * [ ] Real-time thread priorities set * [ ] Zero-copy message processing enabled * [ ] Message validation disabled (if safe) * [ ] Logging minimized for production * [ ] Connection pooling optimized Build Configuration ~~~~~~~~~~~~~~~~~~~ * [ ] Release build with maximum optimization * [ ] Link-time optimization (LTO) enabled * [ ] Profile-guided optimization applied * [ ] SIMD instructions enabled * [ ] Native CPU architecture targeting Monitoring Setup ~~~~~~~~~~~~~~~~ * [ ] Real-time performance metrics enabled * [ ] Latency percentiles tracked * [ ] System resource monitoring active * [ ] Network performance monitoring * [ ] Alert thresholds configured Expected Performance Targets ---------------------------- After applying these optimizations, you should achieve: * **Latency**: P99 < 2μs, P99.9 < 5μs * **Throughput**: > 1M messages/second * **CPU Usage**: < 50% on dedicated cores * **Memory**: < 1GB resident set size * **Network**: < 0.1ms round-trip time to exchange For specific performance requirements or advanced tuning, consult the :doc:``../reference/troubleshooting/index`` section or contact support.