MySQL Performance at Scale: Common Bottlenecks and How to Find Them

If you're running MySQL on AWS (whether RDS or Aurora), you've likely encountered performance problems that didn't exist when the database was smaller. Queries that ran in milliseconds now take seconds. Replication lag spikes during peak traffic. ALTER TABLE operations that used to be routine now require maintenance windows. These aren't signs that MySQL is the wrong choice. They're signs that your database has outgrown its default configuration and that your diagnostic tooling needs to catch up with your workload.

This guide covers the areas where MySQL operators at scale find the most leverage: understanding the real differences between Aurora and standard MySQL, using Performance Schema for serious query analysis, taking advantage of MySQL 8.x features that many teams haven't adopted, scaling writes with Vitess, and building a diagnostic methodology that catches problems before they become incidents.

Aurora MySQL vs standard MySQL: what actually matters

AWS markets Aurora as delivering "up to 5x the throughput of standard MySQL."^[1] That claim is based on specific benchmark conditions. Independent testing by hackmysql.com found that with MySQL properly configured for modern hardware, Aurora's real-world advantage is closer to 1.7x to 3x.^[2] That's still a meaningful improvement, but it changes the cost-benefit calculation for teams deciding between RDS MySQL and Aurora.

The architectural differences matter more than the headline throughput number. Aurora separates compute from storage, which means your database storage is distributed across a fleet of SSDs and auto-scales in 10GiB increments up to 256TiB.^[3] Write amplification is reduced because I/O is offloaded to the storage layer.^[4] Read replicas share the same underlying storage, so replication lag is dramatically lower than with standard MySQL replication.

Dimension	Standard MySQL (RDS)	Aurora MySQL
Storage	EBS (provisioned IOPS)	Distributed SSD fleet, auto-scales to 256TiB^[3]
Write amplification	Higher (EBS write model)	Reduced, I/O offloaded to storage layer^[4]
Throughput	Baseline	1.7-3x in real benchmarks (marketed as 5x)^[2]
Parallel query	No	Available for analytical queries^[3]
Temporary storage	EBS local	Local to compute instance^[5]
Read replicas	Standard replication	Shared storage, low-lag cluster replicas
Vertical scaling	Instance replacement	Serverless v2 autoscales within ACU bounds^[6]

The temporary storage trap

One Aurora-specific issue that catches teams off guard: while Aurora's database storage is distributed and auto-scaling, temporary storage for operations like ALTER TABLE is local to the compute instance.^[5] Large schema changes that need significant temporary space can fail with insufficient local storage, even when the database itself has plenty of room. This is a well-documented issue in AWS community forums, and the workaround is either to use a larger instance class for the operation or to use online DDL tools like pt-online-schema-change that avoid the large temporary space requirement.

Serverless v2 cold starts

Aurora Serverless v2 can scale compute capacity up and down automatically within defined bounds. The scaling-to-zero capability, where the instance spins down completely when idle, works but comes with latency penalties on the first connections after a cold start.^[6] For workloads with unpredictable traffic patterns, this trade-off may be acceptable. For workloads that need consistent low-latency responses, it's worth setting a minimum ACU that keeps the instance warm.

Finding your bottlenecks with Performance Schema

Performance Schema is MySQL's built-in instrumentation framework, and it's the primary analysis tool for RDS and Aurora environments where direct access to the slow query log may be limited.^[7] It's been available since MySQL 5.6, but many teams still rely on slow query logs or external monitoring tools without taking advantage of what Performance Schema offers natively.

The single most useful query for identifying performance problems is the query digest summary:

SELECT DIGEST_TEXT, COUNT_STAR,
  ROUND(SUM_TIMER_WAIT/1000000000000, 4) AS total_time_sec,
  ROUND(AVG_TIMER_WAIT/1000000000000, 4) AS avg_time_sec,
  ROUND(SUM_ROWS_EXAMINED/NULLIF(SUM_ROWS_SENT,0), 2) AS exam_to_sent_ratio
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;

The exam_to_sent_ratio column is the signal worth paying attention to. A ratio of 1000:1, meaning the query examines 1,000 rows to return one, strongly suggests a missing index or a full table scan on a filtered query.^[8] Ratios above 100:1 on frequently-executed queries are usually worth investigating.

pt-query-digest for deeper analysis

When you have access to the slow query log (which you may on RDS, depending on your parameter group configuration), Percona's pt-query-digest provides normalized query fingerprints, frequency distributions, and P95/P99 execution times.^[9] The tool was originally created by Baron Schwartz, co-author of "High Performance MySQL" and later founder of VividCortex (now part of SolarWinds Database Performance Monitor). It's particularly useful for identifying queries that are only slow occasionally. A query that averages 50ms but hits 2 seconds at P99 won't always show up in a top-10 list sorted by average time, but pt-query-digest surfaces these outliers.

The sys schema

MySQL's sys schema provides pre-built views on top of Performance Schema for common diagnostic tasks. The view sys.x$statements_with_runtimes_in_95th_percentile surfaces the worst-performing query patterns directly, without requiring you to write raw Performance Schema queries.^[10] For teams that find Performance Schema's raw tables overwhelming, the sys schema is a practical middle ground between basic monitoring and full instrumentation.

MySQL 8.x features worth adopting

MySQL 8.0 and 8.4 introduced several features that address long-standing operational pain points. Many teams running MySQL at scale haven't adopted these yet, either because they're on older versions or because the features aren't well publicised.

Invisible indexes

This is the feature that experienced MySQL operators tend to be most enthusiastic about. Invisible indexes let you hide an index from the query optimizer without actually dropping it.^[11] You make the index invisible, observe whether any queries degrade, and if nothing breaks, you drop it with confidence. If something does break, you make it visible again instantly. Before MySQL 8.0, removing a potentially unused index meant either dropping it and hoping for the best, or running extensive query analysis to verify it wasn't needed. For production environments where an accidental index drop can cascade into a performance incident, invisible indexes remove the risk from what used to be a high-stakes operation.

Descending indexes

Prior to MySQL 8.0, descending index specifications were silently ignored. The engine stored indexes in ascending order and simply scanned them in reverse when a descending sort was needed.^[12] MySQL 8.0 introduced true descending index storage, which enables genuine mixed ASC/DESC multi-column index optimisation. This matters for queries that sort by multiple columns in different directions, such as ORDER BY created_at DESC, priority ASC, which are common in queue-processing and dashboard applications.

Clone plugin

The clone plugin enables server cloning for provisioning new replicas without going through a full logical backup and restore cycle. MySQL 8.4 relaxed the version matching requirement so that only the major.minor version needs to match between source and clone.^[13] For teams that maintain read replica fleets, this significantly reduces the time and operational overhead of spinning up new replicas.

Where MySQL bottlenecks hide at scale

Once the obvious issues (missing indexes, undersized instances, untuned buffer pool) are addressed, MySQL performance problems at scale tend to cluster in categories that are harder to diagnose with standard monitoring.

Lock contention and long transactions

InnoDB's row-level locking is one of MySQL's strengths, but long-running transactions holding locks create cascading contention that can bring a busy system to a standstill. A single slow UPDATE that holds a lock for 30 seconds can block dozens of subsequent transactions on the same rows, and those blocked transactions in turn block others. The result is a contention cascade that looks like a sudden, system-wide performance collapse rather than a single slow query.

DDL operations are particularly dangerous in this regard. ALTER TABLE in MySQL acquires metadata locks that can block reads and writes on the affected table for the duration of the operation. Online DDL has improved this significantly in recent versions, but not all ALTER operations can be performed online. Tools like Percona's pt-online-schema-change or GitHub's gh-ost remain essential for schema changes on large, busy tables.

Write amplification on high-write workloads

MySQL's InnoDB storage engine writes data twice: once to the redo log and once to the tablespace. On high-write workloads, this write amplification becomes a significant I/O bottleneck.^[14] This is the issue that famously contributed to Uber's decision to move from PostgreSQL to MySQL and then to build their Schemaless storage layer on top, because neither engine's write path was sufficient for their workload at scale. Aurora's architecture mitigates this by offloading I/O to the storage layer, which is one of the areas where its throughput advantage over standard MySQL is most pronounced.

Replication lag under write pressure

Standard MySQL replication is single-threaded by default, meaning that a replica can fall behind the primary during sustained write bursts. MySQL 8.0 improved multi-threaded replication, but configuring it correctly (particularly the replica_parallel_type and replica_parallel_workers settings) requires understanding your write patterns. Aurora's shared-storage architecture avoids this problem entirely for its cluster replicas, which is a genuine operational advantage for read-heavy workloads that need consistent replica freshness.

Scaling writes: sharding with Vitess

Read scaling in MySQL is relatively straightforward: add replicas. Write scaling is a harder problem, and it's one where the MySQL ecosystem has a more mature answer than many teams realise. Vitess, originally built at YouTube and now a CNCF graduated project, provides transparent horizontal sharding for MySQL without requiring application-level sharding logic.^[15]

The production track record is substantial. Slack migrated all their databases to Vitess and serves 2.3 million queries per second at peak, with a median query latency of 2ms.^[16] Shopify uses Vitess to horizontally scale the Rails backend of their Shop app.^[17] GitHub and Square both run Vitess in production for their MySQL fleets. PlanetScale built its entire managed database platform on Vitess, offering automated horizontal sharding as a service.^[18]

For teams hitting write throughput limits on a single MySQL primary, Vitess is the established path to horizontal write scaling. It handles shard routing, cross-shard queries, online resharding, and automated failover. The operational complexity is real, but it's a solved problem at organisations running some of the largest MySQL deployments in the world, and PlanetScale offers a managed version for teams that don't want to run Vitess infrastructure themselves.

The extension gap (and what's changing)

One area where MySQL has been at a structural disadvantage compared to PostgreSQL is the extension ecosystem. PostgreSQL's pgvector for vector similarity search, PostGIS for geospatial queries, and TimescaleDB for time-series data are engine-level extensions that run inside the database process. MySQL's plug-in and component system is more limited, and teams that need these capabilities typically bolt on external services, adding operational complexity and network latency.^[14]

This gap may be narrowing. VillageSQL, an open-source tracking fork of MySQL that launched in early 2026 with $35M in Series A funding, introduces a new extension framework (VEF) designed to bring PostgreSQL-style extensibility to MySQL.^[19] VillageSQL is a drop-in replacement for MySQL, meaning existing applications can adopt it without code changes, and its roadmap includes custom data types, custom functions, and vector indexing support for AI and RAG workloads. Percona and VillageSQL have also been collaborating on broader MySQL governance questions, including a joint proposal to Oracle for more transparent community stewardship of MySQL development.^[20] It's early days, but VillageSQL represents a serious effort to address one of MySQL's longest-standing limitations.

A diagnostic methodology, not just diagnostic queries

The Performance Schema queries and bottleneck categories above give you tools and categories. What they don't give you is a systematic methodology for when to look at what. Two complementary frameworks are worth adopting.

The first is the USE method (Utilization, Saturation, Errors), developed by Brendan Gregg at Netflix.^[21] USE is infrastructure-focused: for every resource (CPU, memory, disk, network), check how busy it is (utilization), whether work is queuing (saturation), and whether errors are occurring. Applied to MySQL, this means tracking InnoDB buffer pool utilization, checking for thread contention and lock wait queues, and monitoring disk I/O saturation on the storage backing your tablespace. USE gives you a systematic checklist for ruling out infrastructure bottlenecks before you dive into query-level analysis.

The second is the RED method (Rate, Errors, Duration), which Peter Zaitsev and the Percona team have advocated for database-specific performance analysis.^[22] RED is request-focused: for every query pattern, track how often it runs (rate), how often it fails (errors), and how long it takes (duration). Where USE tells you whether your infrastructure is constrained, RED tells you whether your queries are behaving. Together they cover both layers: is the hardware keeping up, and are the queries performing as expected?

This matters because the bottlenecks described in this guide don't announce themselves. Lock contention builds gradually. Write amplification degrades slowly. Replication lag appears in bursts that may not correlate with obvious events. Without a systematic framework for continuous monitoring, you're left diagnosing problems after they've become incidents. The diagnostic queries in this guide are useful for investigation once you know something is wrong. USE and RED are what tell you something is wrong before your users notice.

Disclosure: Percona and Baron Schwartz (creator of pt-query-digest, co-author of "High Performance MySQL," and founder of VividCortex) are both investors in SIXTA. We recommend their tools because they're the best available, and their involvement in SIXTA reflects a shared conviction that database reliability engineering needs better automation.

MySQL's installed base remains enormous, with the cloud MySQL market estimated at roughly $15 billion in 2025.^[23] But net-new project selection has shifted toward PostgreSQL (55-65% of new open-source relational projects vs MySQL's 25-35%),^[24] which means the operational knowledge for running MySQL well at scale is increasingly concentrated among experienced practitioners. The performance challenges don't solve themselves, and the skills for systematic MySQL performance analysis are more valuable than ever.

The bottlenecks described above are not the kind of problems you solve once and move on from. Query patterns change as applications evolve. Traffic shapes shift. Schema changes introduce new locking risks. What made your MySQL deployment fast six months ago may not be enough today, and the right diagnostic query for one workload may be irrelevant to another.

This is where continuous performance analysis earns its keep. The teams that maintain good MySQL performance at scale are the ones that monitor query digest trends over time, track lock contention patterns, and catch replication lag before it affects users. The investigative work is repetitive but essential, and it scales with the size of the fleet. One database, you can watch manually. Twenty databases across multiple environments, you need systematic analysis that runs continuously and surfaces problems before they become incidents that take hours to resolve.

The same tuning principles that apply to PostgreSQL, treating performance as an ongoing process, investing in observability, and progressively adopting automated analysis, apply equally to MySQL. The engines are different, but the operational discipline is the same.

References

Amazon Aurora Features — AWS. Marketing documentation for Aurora's throughput claims and feature set.
Are Aurora Performance Claims True? — hackmysql.com. Independent benchmark analysis showing Aurora's real-world advantage at 1.7-3x over properly configured standard MySQL.
AWS MySQL Showdown: RDS vs. Aurora vs. Serverless — Mydbops. Comparative analysis of architecture, storage, and parallel query capabilities.
Amazon Aurora MySQL vs RDS MySQL — House of Brick. Covers write amplification differences and I/O offloading in Aurora's storage layer.
AWS Aurora: What issues have you had? — Reddit r/aws. Community discussion documenting temporary storage locality issues and ALTER TABLE failures on Aurora.
Aurora Serverless scaling to 0 capacity — Reddit r/aws. Discussion of cold start latency and Serverless v2 scaling behaviour.
Understanding MySQL Query Digest with Performance Schema — Percona. Guide to using Performance Schema for query digest analysis in managed environments.
How to Implement MySQL Performance Schema Analysis — OneUptime. Covers the exam-to-sent ratio as a key diagnostic signal for missing indexes and full table scans.
The DBA's Toolkit: Mastering Percona Toolkit — dohost. Covers pt-query-digest for slow query log analysis and pt-online-schema-change for safe DDL operations.
Top 10 Tips for MySQL Performance Tuning — MySQL Summit 2024. Includes sys schema usage for identifying unused indexes and worst-performing query patterns.
MySQL 8.4 Indexing Explained: Invisible, Functional, and Descending Indexes — GenexDBS. Practical guide to MySQL 8.x index features including invisible indexes. See also: MySQL 8.0 Reference Manual: Invisible Indexes.
MySQL 8 New Features I Will Be Using Often — Shiv Iyer. Covers descending indexes, window functions, CTEs, and other MySQL 8.0 additions.
Changes in MySQL 8.4.0 — Oracle. Official release notes covering clone plugin version matching changes. See also: What Is New in MySQL 8.4 since MySQL 8.0.
Why Uber Ditched Postgres for MySQL — dev.to. Documents write amplification challenges and architectural decisions at scale. Context for MySQL's write path limitations and the Schemaless project.
Vitess Blog — CNCF. Vitess is a database clustering system for horizontal scaling of MySQL, originally built at YouTube and now used by Slack, Shopify, GitHub, and Square.
Scaling Datastores at Slack with Vitess — Slack Engineering. Documents Slack's migration to Vitess, serving 2.3 million QPS at peak with 2ms median query latency.
Horizontally Scaling the Rails Backend of Shop App with Vitess — Shopify Engineering. Covers Shopify's Vitess deployment for horizontal write scaling.
PlanetScale on Vitess — PlanetScale. Managed Vitess platform offering automated horizontal sharding for MySQL.
Introducing VillageSQL: A New Path for MySQL in the Agentic AI Era — VillageSQL. Open-source MySQL tracking fork with the VillageSQL Extension Framework (VEF) for custom data types, functions, and vector indexing. See also: FAUN: VillageSQL Launches.
An Open Letter to Oracle: Let's Talk About MySQL's Future — Percona. Joint Percona/VillageSQL initiative for more transparent community governance of MySQL development.
The USE Method — Brendan Gregg. Utilization, Saturation, Errors framework for systematic infrastructure performance analysis. Provides a checklist-based approach for identifying bottlenecks across CPU, memory, disk, and network resources.
RED Method for MySQL Performance Analyses — Percona. Applying the Rate, Errors, Duration framework to systematic MySQL query-level performance monitoring.
Cloud Database MySQL Market — Market Report Analytics. MySQL cloud market estimated at ~$15B in 2025, growing at 18% CAGR through 2033.
PostgreSQL Secures 55-65% of Net-New Open-Source Relational Projects — MTech Research. MySQL-family databases at 25-35% of net-new projects in 2026.

Ewen Fortune

CEO & Founder, SIXTA. The autonomous DBRE that joins your team.