BR833263
Jan 16, 2026
10 min read
Scaling for a 10× Traffic Surge When Infrastructure Has No Time to React
We were hosting our final demo day, an event that has attracted significant public and investor attention. As the live stream begins and announcements are distributed across multiple channels, a large number of users attempt to access the platform simultaneously.
Under normal conditions, the system operates under a predictable and wellunderstood load profile. However, within minutes of the broadcast going live, traffic increases sharply, reaching a 10× surge relative to baseline usage. This sudden shift pushes the system beyond its typical operating envelope.
Editorial context:
Build Roulette documents productioninformed decisions based on a combination of direct experience and observed industry patterns. Specific details are representative, not exhaustive.
Monitoring dashboards begin to surface stress indicators. CPU utilization approaches 95%, database connection pools near exhaustion and request latency steadily increases across critical application paths. A platform that was stable moments earlier is now operating close to failure thresholds.
This situation represents a realworld burst scenario. It is not a synthetic load test or a planned capacity exercise, but a genuine production event where user demand changes faster than infrastructure can react. In these moments, engineering decisions must be made quickly to prevent cascading failures and service downtime.
To respond effectively to this type of surge, engineering teams typically evaluate two primary scaling strategies:
Vertical Scaling Scaling Up
Horizontal Scaling Scaling Out
This article examines how each approach behaves under extreme, shortlived traffic spikes, the operational and cost tradeoffs involved and how to determine the most appropriate strategy or combination of strategies to maintain system reliability during highvisibility events.
Assumptions
This evaluation assumes a cloudnative AWS environment with a stateless application tier, managed database services, existing observability and the ability to modify instance types or Auto Scaling policies during planned or unplanned traffic surges.
1. VERTICAL SCALING: THE IMMEDIATE CAPACITY EXPANSION
When a highvisibility event triggers a sudden increase in demand, the most direct engineering response is often vertical scaling, commonly referred to as scaling up. Rather than introducing additional servers, vertical scaling increases the capacity of existing infrastructure by upgrading CPU, memory, or storage resources.
This approach allows a single instance to process a higher volume of requests within the same architectural footprint. From an operational perspective, it is often the fastest way to create additional headroom during an active incident.
EVALUATING THE BURST RESPONSE
The primary advantage of vertical scaling during a live surge is its low operational complexity. Because the architecture remains unchanged, no new loadbalancing logic, service discovery mechanisms, or internode coordination is required.
Speed of Relief: Scaling up provides immediate access to additional compute resources. A larger instance can absorb higher concurrency levels without waiting for new servers to be provisioned or warmed up.
Data Simplicity: For stateful components such as relational databases, vertical scaling is typically the most straightforward way to increase capacity without introducing data consistency risks or replication complexity.
At peak capacity, a vertically scaled instance can sustain substantial throughput on its own. However, this approach introduces important cost considerations and hard scalability limits.
COST CONSIDERATIONS: PERFORMANCE VS. ECONOMICS
When evaluating vertical scaling, cost analysis must go beyond the hourly price of a larger instance. For burstheavy workloads, AWS Burstable Performance Instances Tfamily are commonly used, operating on a CPU credit model.
The Breakeven Threshold: For instances such as t3.large, cost efficiency begins to degrade once sustained average CPU utilization approaches ~42.5%, depending on region, pricing model and workload characteristics.
Unlimited Mode Costs: When Unlimited Mode is enabled, AWS charges for surplus CPU credits for example, approximately $0.05 per vCPUhour for Linux. Under sustained high utilization, this can quickly exceed the cost of fixedperformance instances such as M5 or M6i, reducing the economic advantage of burstable instances.
Efficiency Gains: In some cases, upgrading to a more powerful instance class allows computeintensive workloads to complete faster, resulting in fewer total instancehours consumed compared to a smaller, constantly saturated instance.
PRODUCTION CONSTRAINTS
Despite its speed and simplicity, vertical scaling has clear limitations under extreme burst conditions:
Downtime Risk: Instance resizing often requires a stopandstart operation. During a live event, even brief downtime can result in significant user dropoff and reputational impact.
The Hardware Ceiling: Vertical scaling is bounded by the largest available instance types like the u24tb1.metal 24 TB RAM or c5d.metal 96 vCPUs. Once these limits are reached, no further scaling is possible without architectural changes.
Vertical scaling is highly effective for immediate relief, but if a burst transitions into sustained demand, cost inefficiencies and hardware ceilings will eventually necessitate scaling out.
2. HORIZONTAL SCALING: DISTRIBUTED CAPACITY
While vertical scaling focuses on strengthening a single node, horizontal scaling, or scaling out, takes a fundamentally different approach. Instead of relying on one powerful machine, the system adds more identical nodes and distributes traffic across them.
This strategy aligns closely with modern cloudnative design principles and is the foundation of highavailability architectures.
EVALUATING THE BURST RESPONSE
Horizontal scaling is designed to handle unpredictable, highvolume traffic spikes that exceed the capacity of any single machine.
Failure Resilience: Because the workload is distributed across multiple machines, the system significantly reduces single points of failure. By distributing workload across multiple instances, the system reduces single points of failure. If one node fails under load, remaining healthy nodes continue serving traffic.
Elastic Growth: Unlike vertical scaling, which is constrained by hardware limits, horizontal scaling can expand as long as additional instances can be provisioned using Auto Scaling Groups ASG. This provides a much higher theoretical ceiling for growth.
BUSINESS PERSPECTIVE: COST AND ELASTICITY
From a financial standpoint, horizontal scaling offers a more granular approach to cost control.
PayAsYouGo Efficiency: With EC2 Auto Scaling, capacity can scale in automatically once traffic subsides. This prevents longterm overprovisioning after a shortlived event.
Use of Spot Instances: Distributed fleets can incorporate Spot Instances to reduce cost significantly. Because the workload is spread across many nodes, occasional interruptions have minimal impact when properly architected.
• Operational Overhead: Horizontal scaling introduces additional complexity. Load balancers, health checks, networking and distributed monitoring require more engineering effort than singleinstance deployments.
PRODUCTION CONSTRAINTS
Horizontal scaling introduces its own challenges during sudden bursts:
"ColdStart" Latency: New instances require time to launch, initialize and pass health checks. If the surge is extremely sudden, existing capacity may be overwhelmed before new instances become available.
Load Balancing Dependency: Effective scaling out requires a properly configured load balancer, adding a small but unavoidable layer of latency and cost.
Database Bottleneck: While application servers can scale horizontally, databases are significantly harder to scale horizontally without architectural changes. Without mitigations such as read replicas or connection pooling, scaling out can shift the bottleneck to the data tier.
Horizontal scaling is the preferred strategy for longterm reliability, but its effectiveness during a burst depends on how quickly infrastructure can react.
3. DIAGONAL SCALING: COMBINING SPEED AND RESILIENCE
In highpressure scenarios such as live broadcasts, choosing between vertical and horizontal scaling is often a false dilemma. Many production systems rely on a hybrid approach commonly referred to as diagonal scaling.
Diagonal scaling combines the immediate relief of vertical scaling with the resilience and elasticity of horizontal scaling.
THE DIAGONAL WORKFLOW
Immediate Capacity Expansion Vertical:: Prior to or at the onset of the event, core instances are upgraded to a more powerful class to create instant headroom and avoid coldstart delays tier e.g., moving from a t3.medium to a c5.xlarge.
Fleet Replication Horizontal: Once utilization reaches predefined thresholds like the ~42.5% breakeven point, Auto Scaling replicates the optimized instance across multiple Availability Zones to absorb continued growth.
This approach provides rapid response while maintaining a high ceiling for sustained demand.
EXAMPLE CONFIGURATION SIMPLIFIED
Base instance: c5.xlarge right sized for sustained efficiency.
Auto Scaling Group: Minimum 2, Maximum 20 across multiple Availability Zones AZs.
Scaling trigger: CPU utilization > 60% for 2 minutes.
Database protection: RDS Proxy enabled prior to scaleout.
PROTECTING THE DATA TIER
Even the most resilient application tier can fail if the database is overwhelmed during a connection surge.
Amazon RDS Proxy: : RDS Proxy maintains a pool of established connections, shielding the database from connection storms as new application instances spin up.
Edge Caching: Using services such as Amazon CloudFront to cache static and frequently accessed content can offload a significant portion of traffic from the application entirely.
These measures ensure that scaling strategies focus on dynamic workloads rather than unnecessary database pressure.
4. COMPARATIVE ANALYSIS: EVALUATING SCALING TRADEOFFS
The team evaluated these strategies against three core drivers: Latency, Resilience, and Complexity. While vertical scaling addresses latency through immediate power, horizontal scaling provides longterm resilience. The primary technical challenge identified was the "Reaction Gap", the time between a traffic spike and the infrastructure's ability to respond.
SCALING STRATEGY COMPARISON
5. THE STRATEGIC CHOICE
For the "Final Demo Day" event, Roulette Investment explicitly chose a Diagonal Scaling strategy. This decision was driven by the need for zero downtime during the broadcast and a lean engineering team that could not afford to manage complex manual interventions at the height of the surge.
How the Problem was Solved?
Phase 1
Proactive Vertical Headroom: Thirty minutes before the live stream, the core application nodes were vertically scaled from t3.medium to c5.xlarge instances. This provided immediate "compute insurance" against the initial wave of traffic, bypassing the ColdStart Latency of horizontal scaling.
Phase 2
Automated Horizontal Elasticity: An Auto Scaling Group ASG was configured with a minimum of 4 and a maximum of 20 instances, triggered when CPU utilization exceeded 60%. Once the vertical nodes reached their efficiency limit the ~42.5% breakeven point, the system automatically replicated the optimized fleet to absorb the millions of simultaneous hits.
Phase 3
Database Decoupling: To prevent a Database Bottleneck, Amazon RDS Proxy was enabled. This shielded the database from "connection storms" as the application fleet expanded, allowing the data tier to remain stable while the web tier grew.
Why This Balance Worked:
Operational Clarity: The team had a clear "preflight" checklist, reducing the risk of human error during the live event.
Cost Control: By using Spot Instances for the horizontal fleet, the company reduced its surgerelated compute costs by nearly 70% compared to a purely ondemand vertical strategy.
User Experience: The combined approach resulted in Zero Downtime and consistent request latency throughout the 10x surge.
Engineering is the art of choosing the right tradeoffs at the right time. For Roulette Investment, the "rightsized" architecture was not the most complex one, but the one that guaranteed a successful launch under the most intense scrutiny.
When the spotlight is on, infrastructure should be the last thing engineers have to worry about.
Scalability
High Availability
Reliability Engineering