Buzeli
buzeliSoluções Digitais
Costs

From $1,670 to $1,031/month on AWS in 6 weeks: the correct sequence of index → rightsizing → Reserved Instance

Published on May 4, 2026

The context: a payment application on AWS

A payment platform running on AWS us-east-1. Stack: EC2 Auto Scaling Group with c6g.2xlarge ARM instances, RDS MariaDB 10.11 on db.r6g.2xlarge, ALB, WAF, EFS, ElastiCache. February 2026 billing: $1,670/month.

The account had obvious optimization room — but the sequence of actions mattered more than the actions themselves. This post documents the 3 phases, the validation gates between them, and what happens when you skip steps.

The central lesson: buying a Reserved Instance before confirming the right RDS size locks capital into the wrong size. In our case, that would have meant $2,316/year of RI for a db.r6g.2xlarge that was downsized to xlarge 3 weeks later.

Phase 1 (weeks 1-2): indexes — CPU from 94% to 26%, zero cost

The orders table was the most queried in the application — reseller reports, tracking cron jobs, data exports — but had no supporting indexes. The index creation script left by a developer covered 13 secondary tables and completely omitted the most critical table in the system.

The diagnosis came from query patterns in the code: full scans on the orders table on every report request, every cron cycle (every 15 minutes), every export. RDS CPU peaked at 94.1% — on a db.r6g.2xlarge with 8 vCPUs. This was not capacity pressure, it was unindexed query pressure.

Copy
-- Critical missing indexes on the orders table:
CREATE INDEX idx_tx_reseller_status
  ON orders (reseller, order_status) ALGORITHM=INPLACE LOCK=NONE;

CREATE INDEX idx_tx_vendor_status
  ON orders (vendor, order_status) ALGORITHM=INPLACE LOCK=NONE;

-- Tracking cron: full scan every 15 minutes
CREATE INDEX idx_tx_tracking
  ON orders (tracking_flag, tracking_last_updated) ALGORITHM=INPLACE LOCK=NONE;

-- Exports and reports
CREATE INDEX idx_sales_last_event_date
  ON orders (last_event_date) ALGORITHM=INPLACE LOCK=NONE;

CREATE INDEX idx_sales_created_at
  ON orders (created_at) ALGORITHM=INPLACE LOCK=NONE;

23 indexes created in production in 3.91 seconds via ALGORITHM=INPLACE LOCK=NONE — no downtime, no query blocking. Before/after benchmark:

Total time for 15 main queries: 624ms → 153ms (75% reduction)

12 of 13 full table scan queries eliminated

RDS CPU peak (following week): 94.1% → 26.0%

Cost of this phase: zero. No infrastructure change, no Reserved Instance, no resize. Just composite indexes on the right queries.

The composite index (reseller, order_status) is 43x faster than the equivalent full scan for report queries. But this gain only exists while the data fits in the InnoDB buffer pool — in small tables. As the table grows, the gain increases.

Gate 1: wait one full business week before rightsizing

After creating the indexes, the instruction was clear: do nothing for a week. Monitor the actual RDS behavior with the new indexes in production during a full business week (Monday to Friday) before any resize decision.

Why a full week? Because the application's load pattern was irregular — burst traffic spikes from social media campaigns (TikTok, Instagram). A single quiet Tuesday didn't represent the real load. A full week's data showed the absolute peak and the real distribution of spikes.

Copy
# CloudWatch metrics collection (week after indexes: 10-14/Mar)
aws cloudwatch get-metric-statistics   --namespace AWS/RDS   --metric-name CPUUtilization   --dimensions Name=DBInstanceIdentifier,Value=myapp-db   --start-time 2026-03-10T00:00:00Z   --end-time 2026-03-15T00:00:00Z   --period 86400   --statistics Average Maximum

# Result week after indexes:
# Average CPU (business days): 2.27%
# Average of peaks:            13.6%
# Absolute peak:               26.0% (was 94.1%)

Only after confirming these numbers — with 5 business days of data — was the rightsizing decision made.

Phase 2 (week 3): RDS rightsizing — db.r6g.2xlarge → xlarge

With the absolute peak at 26% on the db.r6g.2xlarge (8 vCPU), the projection for the db.r6g.xlarge (4 vCPU) was clear: percentages would double, projected maximum peak at ~52%. Acceptable. Memory: 6.6 GB in use out of 64 GB — on the xlarge with 32 GB, ~25 GB would remain free.

Copy
# Schedule rightsizing for maintenance window
aws rds modify-db-instance   --db-instance-identifier myapp-db   --db-instance-class db.r6g.xlarge   --no-apply-immediately

# Check pending modification
aws rds describe-db-instances   --db-instance-identifier myapp-db   --query 'DBInstances[0].PendingModifiedValues'
# Returns: { "DBInstanceClass": "db.r6g.xlarge" }

# Scheduled maintenance window: Mar 15 01:00-02:00 UTC
# Estimated downtime: 10-15 minutes

Rightsizing was applied automatically during the maintenance window. Immediate savings: $373/month (from $747 to $374 On-Demand).

Why not rightsize to db.r6g.large (2 vCPU)? The projection showed peaks of ~82% — no margin for new unindexed queries, no margin for burst traffic. The additional savings of $213/month did not justify the risk of a production incident.

Gate 2: wait one week on xlarge before buying RI

After rightsizing, again: do nothing for a week. The reason is simple and has a name: you don't reserve capacity before confirming that capacity is the right size.

A 1-year Reserved Instance is a commitment of $181/month regardless of usage. If the peaks had been higher than projected, or if a new feature had arrived with unindexed queries causing spikes above 80%, the next step would be another rightsizing — and the RI would be locked into the wrong size.

Data from the first business week on xlarge (17-21/Mar):

Average CPU: 4.24% (projection was ~4.5%)

Absolute peak: 40.9% (projection was ~52%)

Memory: 30 GB free out of 32 GB

Incidents: zero

With these numbers confirmed, the decision to buy RI was made safely.

Phase 3 (week 4): Reserved Instances — RDS + EC2

With the RDS size confirmed, the Reserved Instance simulation:

Copy
# RI simulation for RDS (via AWS CLI)
aws rds describe-reserved-db-instances-offerings   --db-instance-class db.r6g.xlarge   --product-description "mariadb"   --multi-az false   --region us-east-1   --query 'ReservedDBInstancesOfferings[?OfferingType==`No Upfront`].[Duration,FixedPrice,RecurringCharges]'

# 1-year No Upfront RI result:
# $0/upfront + $0.248/h = $181/month
# vs On-Demand: $0.519/h = $374/month
# Savings: $193/month ($2,316/year)

For EC2, the strategy differed from the obvious choice. The ASG ran c6g.2xlarge instances — the direct option would be to reserve 1x c6g.2xlarge. The choice was 2x c6g.xlarge with Scope=Region.

The AWS Normalized Units trick

EC2 Reserved Instances with Scope=Region work with a Normalized Units system. Each instance size has a normalization weight:

xlarge = 8 normalized units

2xlarge = 16 normalized units

2x c6g.xlarge = 16 normalized units = exact equivalent of 1x c6g.2xlarge. With Scope=Region, these 2 RIs automatically cover the c6g.2xlarge ASG instance — no manual association needed, no instance replacement.

Why reserve xlarge instead of 2xlarge? Future flexibility. If the ASG rightsizes to c6g.xlarge in the future, the 2 RIs cover 2 whole instances — better coverage. Reserving 2xlarge would lock the reservation into a single size with no room to adapt.

Copy
# Final RIs purchased (2026-03-22):
# RDS: db.r6g.xlarge, 1yr No Upfront
#   $0/upfront, $0.248/h, $181/month — savings $193/month
#
# EC2: 2x c6g.xlarge, Scope=Region, 1yr No Upfront
#   $0/upfront, $0.0857/h x2, $125/month — savings $73/month
#   (covers 1x c6g.2xlarge via 16 normalized units)

The result: $1,670 → $1,031/month in 6 weeks

Consolidating the three phases:

Phase 1 — Indexes (weeks 1-2): CPU 94.1% → 26.0%. Cost: $0.

Phase 2 — RDS Rightsizing (week 3): $747 → $374/month On-Demand. Savings: $373/month.

Phase 3 — Reserved Instances (week 4): RDS $374 → $181, EC2 $199 → $125. Additional savings: $266/month.

Total: $639/month saved. $7,668/year.

Copy
# Billing comparison (Feb vs post-optimization):
# Resource             | Feb/2026  | Post-optimization | Reduction
# ---------------------|-----------|-------------------|----------
# RDS (compute)        | $747      | $181 (RI)         | 76%
# EC2 Compute (1 inst) | $199      | $125 (RI)         | 37%
# Other                | ~$724     | ~$725             | ~0%
# Total                | ~$1,670   | ~$1,031           | 38%
Cost dropped 38%. RDS was the biggest contributor: 76% reduction combining rightsizing + RI. EC2 had smaller reduction because the ASG still uses On-Demand for extra instances above the first — the RI covers only the base instance that runs 24/7.

What would have happened if we bought the RI first

This hypothetical scenario matters. If the Reserved Instance had been purchased in February, before the other phases:

RI for db.r6g.2xlarge 1-year No Upfront: $0.454/h = $330/month — savings of $417/month vs On-Demand.

Looks good — until you realize that 3 weeks later the RDS would be downsized to xlarge. At that point, the 2xlarge RI keeps charging $330/month, while the xlarge instance generates separate On-Demand charges (because the size isn't covered by the 2xlarge RI via normalized units in the same way).

The real outcome: you'd pay two and a half months of RI at the wrong size before being able to make the right decision — or stay locked in the 2xlarge out of fear of 'wasting' the RI. In both scenarios, the financial outcome is worse than following the correct sequence.

The practical rule: only buy Reserved Instance after confirming 2 weeks of stable metrics at the target size. Not before.

The sequence as methodology

The three phases weren't sequential by accident. Each one enables the next:

Indexes first: reveal the real required capacity. Without indexes, the server appears to need more CPU than it actually does.

Rightsizing second: with real load known, it is safe to reduce the instance.

RI third: with the size confirmed, it is safe to lock in the discount for 1 year.

Reversing any pair in this sequence creates one of three problems:

RI before rightsizing: RI locked into the wrong size.

Rightsizing before indexes: smaller instance under inefficient query load → incident.

RI before indexes: discount on an oversized instance that will be downsized.

Infrastructure optimization is not a list of actions — it is a sequence with validation gates. The gate is the data that confirms the next action is safe. Without the gate, each action is a risk; with the gate, each action is a logical consequence of the previous one.

The indexes we created and the CPU metrics they generated are documented in detail in a separate post about the technical RDS diagnosis. This post focuses on the strategy and sequence — the 'how to do this at your company' playbook — not the technical detail of each index.