Building Low-Latency Systems with GoLang and C++ • Sujal Gupta

From my experience building NSE/BSE exchange simulators at TCS, achieving sub-millisecond latency requires careful architecture, optimized code, and deep system understanding.

Architecture Overview

The exchange simulator processes millions of order events daily. Here's the high-level architecture:

// Worker pool for concurrent processing
type WorkerPool struct {
    workers int
    tasks   chan Task
}

func NewWorkerPool(workers int) *WorkerPool {
    return &WorkerPool{
        workers: workers,
        tasks:   make(chan Task, 10000),
    }
}

Key Optimization Techniques

1. Goroutine Pooling

Avoid goroutine creation overhead by reusing workers:

// Fixed pool of 1000 workers
pool := NewWorkerPool(1000)
for i := 0; i < 1000; i++ {
    go pool.worker()
}

2. Memory Pooling

Reduce GC pressure with sync.Pool:

var orderPool = sync.Pool{
    New: func() interface{} {
        return &Order{}
    },
}

func getOrder() *Order {
    return orderPool.Get().(*Order)
}

func putOrder(o *Order) {
    orderPool.Put(o)
}

3. Linux Kernel Tuning

TCP_NODELAY: Disable Nagle's algorithm
SO_REUSEPORT: Allow multiple sockets on same port
CPU affinity: Pin goroutines to specific cores
Hugepages: 2MB pages for memory-intensive operations

4. Zero-Copy Serialization

Using Protocol Buffers with zero-copy optimization:

message Order {
    bytes id = 1;
    int64 timestamp = 2;
    double price = 3;
    int32 quantity = 4;
    OrderType type = 5;
}

Performance Results

Metric	Before Optimization	After Optimization
Average Latency	5.2ms	0.8ms
Throughput	50K orders/sec	250K orders/sec
CPU Usage	85%	65%

Lessons Learned

Profile before optimizing - identify real bottlenecks
Batch operations when possible
Use appropriate data structures (arrays vs maps)
Monitor GC pauses and memory usage
Test with production-like load patterns

Note: These optimizations were implemented for TCS's order routing system handling real-time NSE/BSE market data.

Architecture Overview

Key Optimization Techniques

1. Goroutine Pooling

2. Memory Pooling

3. Linux Kernel Tuning

4. Zero-Copy Serialization

Performance Results

Lessons Learned

Related Writings

Advanced Data Structures for Competitive Programming

Kafka Integration Patterns for Real-Time Systems