Most teams immediately think cloud when they hear “millions of requests per second.”
We deliberately chose no cloud at all.
This article explains step by step how we designed and operated a local, on-premise system capable of handling ~2 million requests per second, using C++, Angular, and ArangoDB.
1. Design Philosophy: Control Over Convenience
Our constraints were clear:
No cloud providers No vendor lock-in Predictable latency Full data ownership Ability to scale horizontally using commodity hardware
So instead of abstracting problems away, we designed for physics: CPU, memory, network, and I/O.
2. High-Level Architecture
Stack overview:
Frontend: Angular (served locally via Nginx) Backend: C++ (custom high-performance services) Database: ArangoDB (multi-model: key-value + graph + document) Networking: Kernel-tuned Linux + internal load balancing Deployment: Bare-metal servers, no virtualization overhead
Everything runs inside our own data center.
3. Frontend Strategy (Angular)
Angular was chosen not for raw speed, but for:
Predictable UI state Clear separation of concerns Efficient API batching
Key optimizations:
Aggressive API request batching Stateless frontend (no server sessions) CDN-like static asset caching inside the local network Strict change detection control
Angular never talks to the database directly — only to C++ gateway services.
4. Why C++ for the Backend
When targeting millions of requests per second, abstraction layers matter.
We used modern C++ (C++17/20) because:
Near-zero runtime overhead Fine-grained memory control Predictable latency Excellent async/network libraries
Backend Structure
Event-driven, non-blocking I/O No thread-per-request model Lock-free queues where possible Memory pools instead of heap allocations Binary internal protocols (not JSON)
Each service is single-purpose, small, and fast.
5. Request Flow (Step by Step)
Client sends request (Angular) Local load balancer routes it C++ API gateway receives it Gateway validates & routes internally Business logic service processes request ArangoDB queried using optimized AQL Response returned through gateway Angular updates UI
Average end-to-end latency: single-digit milliseconds.
6. ArangoDB at Scale
ArangoDB was a strategic choice.
Why?
Multi-model database (no multiple engines) Native clustering Very fast key-value access Graph queries without separate graph DB AQL is expressive and optimizable
Database Optimizations
Sharded collections by access pattern Heavy use of primary keys (O(1) lookups) Read-optimized replicas Writes separated from reads where possible Indexes designed per query, not per data model
We avoided “generic schemas” completely.
7. Scaling Without the Cloud
Scaling was horizontal, not vertical.
Add more C++ service nodes Add more ArangoDB shards Increase internal bandwidth Zero downtime rolling restarts
No auto-scaling magic — just predictable engineering.
8. Linux & Network Tuning
This is where most systems fail.
We tuned:
Kernel TCP buffers File descriptor limits IRQ affinity NUMA awareness CPU pinning for services Zero-copy networking where possible
Default Linux settings cannot handle this load.
9. Monitoring & Backpressure
To survive high traffic:
Backpressure at every layer Requests rejected early if needed Circuit breakers between services Real-time metrics (latency, queue depth, memory)
Fail fast > fail late.
10. Results
~2,000,000 requests/sec Stable under sustained load Predictable latency No cloud dependency Full data sovereignty Lower long-term cost
Final Thoughts
Handling millions of requests per second is not about tools, it’s about:
Understanding system limits Choosing the right language for the job Removing unnecessary abstractions Respecting hardware realities
Cloud platforms hide complexity.
Local systems force you to master it.
And that mastery is the real advantage.