How We Handle 2 Million Requests per Second Using a 100% Local (On-Prem) IT Stack

Most teams immediately think cloud when they hear “millions of requests per second.”

We deliberately chose no cloud at all.

This article explains step by step how we designed and operated a local, on-premise system capable of handling ~2 million requests per second, using C++, Angular, and ArangoDB.

1. Design Philosophy: Control Over Convenience

Our constraints were clear:

No cloud providers No vendor lock-in Predictable latency Full data ownership Ability to scale horizontally using commodity hardware

So instead of abstracting problems away, we designed for physics: CPU, memory, network, and I/O.

2. High-Level Architecture

Stack overview:

Frontend: Angular (served locally via Nginx) Backend: C++ (custom high-performance services) Database: ArangoDB (multi-model: key-value + graph + document) Networking: Kernel-tuned Linux + internal load balancing Deployment: Bare-metal servers, no virtualization overhead

Everything runs inside our own data center.

3. Frontend Strategy (Angular)

Angular was chosen not for raw speed, but for:

Predictable UI state Clear separation of concerns Efficient API batching

Key optimizations:

Aggressive API request batching Stateless frontend (no server sessions) CDN-like static asset caching inside the local network Strict change detection control

Angular never talks to the database directly — only to C++ gateway services.

4. Why C++ for the Backend

When targeting millions of requests per second, abstraction layers matter.

We used modern C++ (C++17/20) because:

Near-zero runtime overhead Fine-grained memory control Predictable latency Excellent async/network libraries

Backend Structure

Event-driven, non-blocking I/O No thread-per-request model Lock-free queues where possible Memory pools instead of heap allocations Binary internal protocols (not JSON)

Each service is single-purpose, small, and fast.

5. Request Flow (Step by Step)

Client sends request (Angular) Local load balancer routes it C++ API gateway receives it Gateway validates & routes internally Business logic service processes request ArangoDB queried using optimized AQL Response returned through gateway Angular updates UI

Average end-to-end latency: single-digit milliseconds.

6. ArangoDB at Scale

ArangoDB was a strategic choice.

Why?

Multi-model database (no multiple engines) Native clustering Very fast key-value access Graph queries without separate graph DB AQL is expressive and optimizable

Database Optimizations

Sharded collections by access pattern Heavy use of primary keys (O(1) lookups) Read-optimized replicas Writes separated from reads where possible Indexes designed per query, not per data model

We avoided “generic schemas” completely.

7. Scaling Without the Cloud

Scaling was horizontal, not vertical.

Add more C++ service nodes Add more ArangoDB shards Increase internal bandwidth Zero downtime rolling restarts

No auto-scaling magic — just predictable engineering.

8. Linux & Network Tuning

This is where most systems fail.

We tuned:

Kernel TCP buffers File descriptor limits IRQ affinity NUMA awareness CPU pinning for services Zero-copy networking where possible

Default Linux settings cannot handle this load.

9. Monitoring & Backpressure

To survive high traffic:

Backpressure at every layer Requests rejected early if needed Circuit breakers between services Real-time metrics (latency, queue depth, memory)

Fail fast > fail late.

10. Results

~2,000,000 requests/sec Stable under sustained load Predictable latency No cloud dependency Full data sovereignty Lower long-term cost

Final Thoughts

Handling millions of requests per second is not about tools, it’s about:

Understanding system limits Choosing the right language for the job Removing unnecessary abstractions Respecting hardware realities

Cloud platforms hide complexity.

Local systems force you to master it.

And that mastery is the real advantage.

How We Handle 2 Million Requests per Second Using a 100% Local (On-Prem) IT Stack

Share this:

Related

Leave a ReplyCancel reply

Discover more from Sowft | Transforming Ideas into Digital Success