From $1,670 to $1,031/month on AWS in 6 weeks: the correct sequence of index → rightsizing → Reserved Instance
Published on May 4, 2026
The context: a payment application on AWS
A payment platform running on AWS us-east-1. Stack: EC2 Auto Scaling Group with c6g.2xlarge ARM instances, RDS MariaDB 10.11 on db.r6g.2xlarge, ALB, WAF, EFS, ElastiCache. February 2026 billing: $1,670/month.
The account had obvious optimization room — but the sequence of actions mattered more than the actions themselves. This post documents the 3 phases, the validation gates between them, and what happens when you skip steps.
The central lesson: buying a Reserved Instance before confirming the right RDS size locks capital into the wrong size. In our case, that would have meant $2,316/year of RI for a db.r6g.2xlarge that was downsized to xlarge 3 weeks later.
Phase 1 (weeks 1-2): indexes — CPU from 94% to 26%, zero cost
The orders table was the most queried in the application — reseller reports, tracking cron jobs, data exports — but had no supporting indexes. The index creation script left by a developer covered 13 secondary tables and completely omitted the most critical table in the system.
The diagnosis came from query patterns in the code: full scans on the orders table on every report request, every cron cycle (every 15 minutes), every export. RDS CPU peaked at 94.1% — on a db.r6g.2xlarge with 8 vCPUs. This was not capacity pressure, it was unindexed query pressure.
-- Critical missing indexes on the orders table:
CREATE INDEX idx_tx_reseller_status
ON orders (reseller, order_status) ALGORITHM=INPLACE LOCK=NONE;
CREATE INDEX idx_tx_vendor_status
ON orders (vendor, order_status) ALGORITHM=INPLACE LOCK=NONE;
-- Tracking cron: full scan every 15 minutes
CREATE INDEX idx_tx_tracking
ON orders (tracking_flag, tracking_last_updated) ALGORITHM=INPLACE LOCK=NONE;
-- Exports and reports
CREATE INDEX idx_sales_last_event_date
ON orders (last_event_date) ALGORITHM=INPLACE LOCK=NONE;
CREATE INDEX idx_sales_created_at
ON orders (created_at) ALGORITHM=INPLACE LOCK=NONE;23 indexes created in production in 3.91 seconds via ALGORITHM=INPLACE LOCK=NONE — no downtime, no query blocking. Before/after benchmark:
Total time for 15 main queries: 624ms → 153ms (75% reduction)
12 of 13 full table scan queries eliminated
RDS CPU peak (following week): 94.1% → 26.0%
Cost of this phase: zero. No infrastructure change, no Reserved Instance, no resize. Just composite indexes on the right queries.
The composite index (reseller, order_status) is 43x faster than the equivalent full scan for report queries. But this gain only exists while the data fits in the InnoDB buffer pool — in small tables. As the table grows, the gain increases.
Gate 1: wait one full business week before rightsizing
After creating the indexes, the instruction was clear: do nothing for a week. Monitor the actual RDS behavior with the new indexes in production during a full business week (Monday to Friday) before any resize decision.
Why a full week? Because the application's load pattern was irregular — burst traffic spikes from social media campaigns (TikTok, Instagram). A single quiet Tuesday didn't represent the real load. A full week's data showed the absolute peak and the real distribution of spikes.
# CloudWatch metrics collection (week after indexes: 10-14/Mar)
aws cloudwatch get-metric-statistics --namespace AWS/RDS --metric-name CPUUtilization --dimensions Name=DBInstanceIdentifier,Value=myapp-db --start-time 2026-03-10T00:00:00Z --end-time 2026-03-15T00:00:00Z --period 86400 --statistics Average Maximum
# Result week after indexes:
# Average CPU (business days): 2.27%
# Average of peaks: 13.6%
# Absolute peak: 26.0% (was 94.1%)Only after confirming these numbers — with 5 business days of data — was the rightsizing decision made.
Phase 2 (week 3): RDS rightsizing — db.r6g.2xlarge → xlarge
With the absolute peak at 26% on the db.r6g.2xlarge (8 vCPU), the projection for the db.r6g.xlarge (4 vCPU) was clear: percentages would double, projected maximum peak at ~52%. Acceptable. Memory: 6.6 GB in use out of 64 GB — on the xlarge with 32 GB, ~25 GB would remain free.
# Schedule rightsizing for maintenance window
aws rds modify-db-instance --db-instance-identifier myapp-db --db-instance-class db.r6g.xlarge --no-apply-immediately
# Check pending modification
aws rds describe-db-instances --db-instance-identifier myapp-db --query 'DBInstances[0].PendingModifiedValues'
# Returns: { "DBInstanceClass": "db.r6g.xlarge" }
# Scheduled maintenance window: Mar 15 01:00-02:00 UTC
# Estimated downtime: 10-15 minutesRightsizing was applied automatically during the maintenance window. Immediate savings: $373/month (from $747 to $374 On-Demand).
Why not rightsize to db.r6g.large (2 vCPU)? The projection showed peaks of ~82% — no margin for new unindexed queries, no margin for burst traffic. The additional savings of $213/month did not justify the risk of a production incident.
Gate 2: wait one week on xlarge before buying RI
After rightsizing, again: do nothing for a week. The reason is simple and has a name: you don't reserve capacity before confirming that capacity is the right size.
A 1-year Reserved Instance is a commitment of $181/month regardless of usage. If the peaks had been higher than projected, or if a new feature had arrived with unindexed queries causing spikes above 80%, the next step would be another rightsizing — and the RI would be locked into the wrong size.
Data from the first business week on xlarge (17-21/Mar):
Average CPU: 4.24% (projection was ~4.5%)
Absolute peak: 40.9% (projection was ~52%)
Memory: 30 GB free out of 32 GB
Incidents: zero
With these numbers confirmed, the decision to buy RI was made safely.
Phase 3 (week 4): Reserved Instances — RDS + EC2
With the RDS size confirmed, the Reserved Instance simulation:
# RI simulation for RDS (via AWS CLI)
aws rds describe-reserved-db-instances-offerings --db-instance-class db.r6g.xlarge --product-description "mariadb" --multi-az false --region us-east-1 --query 'ReservedDBInstancesOfferings[?OfferingType==`No Upfront`].[Duration,FixedPrice,RecurringCharges]'
# 1-year No Upfront RI result:
# $0/upfront + $0.248/h = $181/month
# vs On-Demand: $0.519/h = $374/month
# Savings: $193/month ($2,316/year)For EC2, the strategy differed from the obvious choice. The ASG ran c6g.2xlarge instances — the direct option would be to reserve 1x c6g.2xlarge. The choice was 2x c6g.xlarge with Scope=Region.
The AWS Normalized Units trick
EC2 Reserved Instances with Scope=Region work with a Normalized Units system. Each instance size has a normalization weight:
xlarge = 8 normalized units
2xlarge = 16 normalized units
2x c6g.xlarge = 16 normalized units = exact equivalent of 1x c6g.2xlarge. With Scope=Region, these 2 RIs automatically cover the c6g.2xlarge ASG instance — no manual association needed, no instance replacement.
Why reserve xlarge instead of 2xlarge? Future flexibility. If the ASG rightsizes to c6g.xlarge in the future, the 2 RIs cover 2 whole instances — better coverage. Reserving 2xlarge would lock the reservation into a single size with no room to adapt.
# Final RIs purchased (2026-03-22):
# RDS: db.r6g.xlarge, 1yr No Upfront
# $0/upfront, $0.248/h, $181/month — savings $193/month
#
# EC2: 2x c6g.xlarge, Scope=Region, 1yr No Upfront
# $0/upfront, $0.0857/h x2, $125/month — savings $73/month
# (covers 1x c6g.2xlarge via 16 normalized units)The result: $1,670 → $1,031/month in 6 weeks
Consolidating the three phases:
Phase 1 — Indexes (weeks 1-2): CPU 94.1% → 26.0%. Cost: $0.
Phase 2 — RDS Rightsizing (week 3): $747 → $374/month On-Demand. Savings: $373/month.
Phase 3 — Reserved Instances (week 4): RDS $374 → $181, EC2 $199 → $125. Additional savings: $266/month.
Total: $639/month saved. $7,668/year.
# Billing comparison (Feb vs post-optimization):
# Resource | Feb/2026 | Post-optimization | Reduction
# ---------------------|-----------|-------------------|----------
# RDS (compute) | $747 | $181 (RI) | 76%
# EC2 Compute (1 inst) | $199 | $125 (RI) | 37%
# Other | ~$724 | ~$725 | ~0%
# Total | ~$1,670 | ~$1,031 | 38%Cost dropped 38%. RDS was the biggest contributor: 76% reduction combining rightsizing + RI. EC2 had smaller reduction because the ASG still uses On-Demand for extra instances above the first — the RI covers only the base instance that runs 24/7.
What would have happened if we bought the RI first
This hypothetical scenario matters. If the Reserved Instance had been purchased in February, before the other phases:
RI for db.r6g.2xlarge 1-year No Upfront: $0.454/h = $330/month — savings of $417/month vs On-Demand.
Looks good — until you realize that 3 weeks later the RDS would be downsized to xlarge. At that point, the 2xlarge RI keeps charging $330/month, while the xlarge instance generates separate On-Demand charges (because the size isn't covered by the 2xlarge RI via normalized units in the same way).
The real outcome: you'd pay two and a half months of RI at the wrong size before being able to make the right decision — or stay locked in the 2xlarge out of fear of 'wasting' the RI. In both scenarios, the financial outcome is worse than following the correct sequence.
The practical rule: only buy Reserved Instance after confirming 2 weeks of stable metrics at the target size. Not before.
The sequence as methodology
The three phases weren't sequential by accident. Each one enables the next:
Indexes first: reveal the real required capacity. Without indexes, the server appears to need more CPU than it actually does.
Rightsizing second: with real load known, it is safe to reduce the instance.
RI third: with the size confirmed, it is safe to lock in the discount for 1 year.
Reversing any pair in this sequence creates one of three problems:
RI before rightsizing: RI locked into the wrong size.
Rightsizing before indexes: smaller instance under inefficient query load → incident.
RI before indexes: discount on an oversized instance that will be downsized.
Infrastructure optimization is not a list of actions — it is a sequence with validation gates. The gate is the data that confirms the next action is safe. Without the gate, each action is a risk; with the gate, each action is a logical consequence of the previous one.
The indexes we created and the CPU metrics they generated are documented in detail in a separate post about the technical RDS diagnosis. This post focuses on the strategy and sequence — the 'how to do this at your company' playbook — not the technical detail of each index.
