Senior-Level MySQL Interview Questions

Q: Why is column order important in composite indexes?

Composite index (a, b, c): MySQL can use it for queries that filter on (a), (a, b), or (a, b, c) but not on (b) alone or (c) alone (leftmost prefix rule). Put the most selective and equality-matched column first. Range conditions should come last. index on (user_id, created_at) helps: WHERE user_id = 5 ORDER BY created_at - both filter and sort are covered.

Q: How does MySQL detect and resolve deadlocks?

InnoDB deadlock detection: a background thread checks for lock wait cycles. When it finds a cycle (A waits for B, B waits for A), it selects a victim transaction (usually the one with fewest row modifications) and rolls it back. The other transaction proceeds. Error 1213 is returned to the rolled-back transaction's client. Enable innodb_print_all_deadlocks=ON to log all deadlocks.

Q: How does MySQL handle high-concurrency environments?

High-concurrency MySQL: use InnoDB (row-level locking, MVCC). Keep transactions short to minimize lock hold time. Use connection pooling (ProxySQL, application-level pools) to limit connection overhead. Read-write splitting: route SELECTs to replicas. Add read replicas for read scaling. Use Redis for caching frequently read data. Avoid long-running queries that hold locks and block others.

Q: What is the difference between shared locks and exclusive locks?

Shared lock (S lock): multiple transactions can hold shared locks on the same row simultaneously - used for reads. Exclusive lock (X lock): only one transaction can hold an exclusive lock - blocks all other reads and writes. SELECT ... FOR SHARE acquires shared locks. SELECT ... FOR UPDATE acquires exclusive locks. INSERT, UPDATE, DELETE acquire exclusive row locks automatically.

Q: What is a redo log and why is it important?

Redo log records every change made to InnoDB data pages before the change is applied. On crash, MySQL replays the redo log from the last checkpoint to recover uncommitted changes. Written sequentially (fast). Controlled by innodb_log_file_size. Larger redo log = better write throughput (less checkpoint frequency) but longer crash recovery time. Critical for InnoDB's durability guarantee.

Q1:

How does MySQL’s InnoDB engine implement MVCC internally?

Senior

Answer

InnoDB uses undo logs to store old row versions. Each transaction reads based on its snapshot ID, enabling non-blocking reads while writers modify the latest row version.

Quick Summary: InnoDB MVCC internals: each row has DB_TRX_ID (last modifying transaction) and DB_ROLL_PTR (undo log chain pointer). At transaction start, MySQL creates a read view capturing which transaction IDs are active. For each row, if DB_TRX_ID is visible in the read view, use this row version. Otherwise, follow DB_ROLL_PTR to the undo log to find the last visible version. Readers never block writers.

Permalink

Q2:

How does MySQL determine the best execution plan for a query?

Senior

Answer

The optimizer evaluates table statistics, cardinality, join orders, and cost estimates. It chooses the lowest I/O and CPU cost plan, relying heavily on accurate table statistics.

Quick Summary: MySQL query optimizer builds an execution plan by: parsing the query, transforming it (subquery flattening, constant folding), generating candidate plans (different join orders, index choices), estimating costs using table statistics (row counts, cardinality, index size), and picking the lowest-cost plan. Run EXPLAIN FORMAT=JSON to see detailed cost estimates. ANALYZE TABLE updates the statistics the optimizer uses.

Permalink

Q3:

What is the difference between index scan and table scan?

Senior

Answer

Table scans read every row, while index scans use index trees to quickly locate matching rows. Index scans improve performance when filtering selective values.

Quick Summary: Index scan: MySQL uses the index B-Tree to find rows. Can be range scan (range of index values) or full index scan (all index values in order). Faster than table scan for selective queries. Table scan (full scan): reads every row in the table sequentially - used when no suitable index exists or when the optimizer estimates it's cheaper than the index (e.g., retrieving 70% of rows). Seen in EXPLAIN as type=ALL.

Permalink

Q4:

What causes index fragmentation and how do you fix it?

Senior

Answer

Frequent updates and deletes leave gaps inside index pages. Use OPTIMIZE TABLE or rebuild indexes to reorganize and compact index pages.

Quick Summary: Index fragmentation in MySQL: InnoDB uses B-Tree indexes. After many random inserts, updates, and deletes, index pages become partially filled (fragmented). This wastes space and slows reads. Fix: OPTIMIZE TABLE (rebuilds table and indexes, needs table lock) or ALTER TABLE ... ENGINE=InnoDB (online rebuild with ALGORITHM=INPLACE). Regular ANALYZE TABLE updates statistics but doesn't defragment.

Permalink

Q5:

What is a B-Tree index and how does MySQL use it?

Senior

Answer

B-Tree indexes maintain sorted keys allowing fast lookups, prefix scans, and range queries. They underpin most MySQL indexing except FULLTEXT and spatial indexes.

Quick Summary: B-Tree (Balanced Tree) index organizes values in a sorted tree structure. All leaves are at the same depth (balanced). InnoDB uses B+Trees where all data is in leaf nodes and leaves are linked for range scans. Supports: equality (=), range (>, <, BETWEEN), prefix matching (LIKE "abc%"), ORDER BY, and GROUP BY optimization. Doesn't support: LIKE "%abc" (leading wildcard), full-text search.

Permalink

Q6:

Why is column order important in composite indexes?

Senior

Answer

MySQL uses composite indexes left-to-right. If the first column is not used in filtering, the index cannot be used efficiently.

Quick Summary: Composite index (a, b, c): MySQL can use it for queries that filter on (a), (a, b), or (a, b, c) but not on (b) alone or (c) alone (leftmost prefix rule). Put the most selective and equality-matched column first. Range conditions should come last. index on (user_id, created_at) helps: WHERE user_id = 5 ORDER BY created_at - both filter and sort are covered.

Permalink

Q7:

How does MySQL detect and resolve deadlocks?

Senior

Answer

InnoDB builds a waits-for graph and checks for cycles. When detected, it rolls back the cheapest transaction to break the deadlock.

Quick Summary: InnoDB deadlock detection: a background thread checks for lock wait cycles. When it finds a cycle (A waits for B, B waits for A), it selects a victim transaction (usually the one with fewest row modifications) and rolls it back. The other transaction proceeds. Error 1213 is returned to the rolled-back transaction's client. Enable innodb_print_all_deadlocks=ON to log all deadlocks.

Permalink

Q8:

How does MySQL handle high-concurrency environments?

Senior

Answer

MySQL uses row-level locks, MVCC, adaptive hash indexing, and buffer pool optimizations. Proper schema design and indexing reduce contention.

Quick Summary: High-concurrency MySQL: use InnoDB (row-level locking, MVCC). Keep transactions short to minimize lock hold time. Use connection pooling (ProxySQL, application-level pools) to limit connection overhead. Read-write splitting: route SELECTs to replicas. Add read replicas for read scaling. Use Redis for caching frequently read data. Avoid long-running queries that hold locks and block others.

Permalink

Q9:

What is the difference between shared locks and exclusive locks?

Senior

Answer

Shared locks permit concurrent reads, while exclusive locks block all reads and writes. Writes always require exclusive locks.

Quick Summary: Shared lock (S lock): multiple transactions can hold shared locks on the same row simultaneously - used for reads. Exclusive lock (X lock): only one transaction can hold an exclusive lock - blocks all other reads and writes. SELECT ... FOR SHARE acquires shared locks. SELECT ... FOR UPDATE acquires exclusive locks. INSERT, UPDATE, DELETE acquire exclusive row locks automatically.

Permalink

Q10:

What is a redo log and why is it important?

Senior

Answer

Redo logs store committed changes before flushing to disk. They guarantee durability and allow crash recovery.

Quick Summary: Redo log records every change made to InnoDB data pages before the change is applied. On crash, MySQL replays the redo log from the last checkpoint to recover uncommitted changes. Written sequentially (fast). Controlled by innodb_log_file_size. Larger redo log = better write throughput (less checkpoint frequency) but longer crash recovery time. Critical for InnoDB's durability guarantee.

Permalink

Q11:

What is a binlog and how does it differ from redo logs?

Senior

Answer

Binlogs store logical SQL changes for replication. Redo logs store physical page changes for recovery. They serve distinct purposes.

Quick Summary: Binlog (binary log) records all SQL changes (DDL + DML) for replication and point-in-time recovery. Written after transaction commits. Used by replicas to replay changes. Differs from redo log: binlog is at the SQL/logical level, redo log is at the physical page level. Redo log is for crash recovery; binlog is for replication and PITR. Both are needed for durability + replication.

Permalink

Q12:

How do you design MySQL for horizontal scalability?

Senior

Answer

Use sharding, replicas, partitioning, and stateless application logic. Avoid cross-shard joins and rely on denormalized or distributed data models.

Quick Summary: Horizontal MySQL scaling: read replicas (distribute SELECT queries across multiple replicas). Sharding (split data across multiple MySQL instances by user_id range, geography, or hash). ProxySQL for intelligent query routing. Vitess for automated sharding and replication management at scale. Application-level sharding with consistent hashing. Sharding adds complexity - start with replicas and vertical scaling first.

Permalink

Q13:

What is the purpose of GTID in MySQL replication?

Senior

Answer

GTIDs uniquely identify each transaction and simplify failover, resynchronization, and replication consistency.

Quick Summary: GTID (Global Transaction Identifier) assigns a unique ID to every committed transaction on the primary. Format: source_uuid:transaction_id. Replicas track which GTIDs they've applied. Benefits: precise failover (replica can continue exactly from where it was), easier to identify which transactions a replica has applied, and supports multi-source replication. Replace old file+offset based replication tracking.

Permalink

Q14:

What is semi-synchronous replication?

Senior

Answer

Semi-sync replication requires at least one replica to acknowledge before the primary commits, providing improved durability.

Quick Summary: Semi-synchronous replication: primary waits for at least one replica to acknowledge receiving the transaction to its relay log before returning success to the client. Not the same as fully synchronous (which waits for commit on replica). Reduces data loss window compared to async replication. Latency increases by one network round-trip. Falls back to async if no replica acknowledges within rpl_semi_sync_master_timeout.

Permalink

Q15:

What causes replication lag and how do you detect it?

Senior

Answer

Lag occurs when replicas apply changes slower than the primary. Use Seconds_Behind_Master and check slow queries or overloaded replica hardware.

Quick Summary: Replication lag causes: replica hardware slower than primary, large transactions (single big transaction blocks replica for its full duration), missing indexes on replica (applies DML slower), insufficient replica I/O or SQL threads. Detect: SHOW SLAVE STATUS shows Seconds_Behind_Master. Monitor with Prometheus + replication exporter. Fix: optimize slow queries on replica, use parallel replication (slave_parallel_workers).

Permalink

Q16:

What is the difference between statement-based and row-based replication?

Senior

Answer

Statement-based logs SQL text while row-based logs changed values. Row-based avoids inconsistencies but increases log size.

Quick Summary: Statement-based replication (SBR): logs the SQL statement itself. Compact, but non-deterministic functions (NOW(), UUID(), RAND()) can cause replica divergence. Row-based replication (RBR): logs the actual row changes (before + after images). Larger binlog but accurate for all operations. Mixed: uses SBR when safe, RBR otherwise. MySQL recommends ROW format for most cases - it's safer and supports more operations correctly.

Permalink

Q17:

How do you handle inconsistent replica data?

Senior

Answer

Use checksums, pt-table-sync, or resync using GTID. In severe cases, recreate the replica.

Quick Summary: Inconsistent replica data: prevent by using row-based replication and read_only=ON on replicas. Detect with pt-table-checksum (Percona Toolkit) - compares checksums of tables across primary and replicas. Fix inconsistencies with pt-table-sync. Rebuild from scratch: stop replica, take fresh backup from primary, restore and start replication from the backup's binlog position.

Permalink

Q18:

How do you design indexes for large OLTP systems?

Senior

Answer

Use highly selective, narrow, composite indexes. Avoid indexing low-cardinality columns and review real query patterns regularly.

Quick Summary: Index design for OLTP: identify the most common query patterns and their WHERE, JOIN, and ORDER BY columns. Create indexes for each query pattern. Composite indexes for multi-column filters. Covering indexes for high-frequency queries. Avoid redundant indexes (index on (a,b) makes a separate index on (a) redundant). Use pt-duplicate-key-checker to find redundant indexes. Monitor with sys.schema_unused_indexes.

Permalink

Q19:

What is query profiling in MySQL?

Senior

Answer

Query profiling breaks execution into detailed stages. It helps locate time-consuming operations like sorting or copying.

Quick Summary: Query profiling in MySQL: enable slow query log (slow_query_log=ON, long_query_time=1). Use EXPLAIN and EXPLAIN ANALYZE (MySQL 8). Use performance_schema to find expensive queries: SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC. Or use sys.statements_with_runtimes_in_95th_percentile. Percona's pt-query-digest analyzes slow query logs.

Permalink

Q20:

How does MySQL handle sorting and temporary tables?

Senior

Answer

MySQL sorts using in-memory temp tables when possible and switches to disk-based files if data exceeds memory limits.

Quick Summary: MySQL uses in-memory sort buffers (sort_buffer_size) for ORDER BY. If result fits in buffer: in-memory quicksort. If too large: uses temporary files on disk (filesort with external merge sort). EXPLAIN shows "Using filesort" when sorting can't use an index. Temporary tables: created in memory (tmp_table_size, max_heap_table_size limit) or spill to disk. Increase these limits or add indexes to avoid disk operations.

Permalink

Q21:

What is the difference between HASH and B-Tree indexes?

Senior

Answer

HASH indexes support equality lookups and are used in MEMORY tables. B-Tree indexes support ranges, sorting, and prefix lookups.

Quick Summary: B-Tree index: sorted structure, supports range queries, ORDER BY, BETWEEN, prefix LIKE - versatile. HASH index: maps keys to hash buckets, O(1) exact lookup, no range queries, no sorting, no prefix matching. InnoDB only supports B-Tree (it simulates hash with adaptive hash index in memory automatically). MEMORY engine supports explicit HASH indexes. For most use cases, B-Tree is the right choice.

Permalink

Q22:

How do you optimize JOIN performance in multi-table queries?

Senior

Answer

Index join keys, simplify join order, avoid functions on columns, and use covering indexes to reduce table lookups.

Quick Summary: Optimize multi-table JOIN performance: index every join column on the driving table's side (and ensure the joined table's PK or indexed FK). Reduce the result set before joining with WHERE on the earliest possible table. Use STRAIGHT_JOIN to force join order if the optimizer chooses wrong. Use EXPLAIN to verify join type is ref or eq_ref (not ALL). Minimize columns in SELECT to reduce memory usage.

Permalink

Q23:

Why can functions in WHERE clause hurt performance?

Senior

Answer

Functions prevent MySQL from using indexes because they alter column values, forcing full scans.

Quick Summary: Functions in WHERE clause prevent index usage on that column. WHERE YEAR(created_at) = 2024 causes a full scan because MySQL must evaluate YEAR() for every row. Rewrite to use range: WHERE created_at BETWEEN "2024-01-01" AND "2024-12-31" - this uses the index on created_at. Similarly: WHERE LOWER(email) = "alice" won't use an index. Use functional indexes (MySQL 8) or store pre-computed values.

Permalink

Q24:

How do you handle large DELETE operations efficiently?

Senior

Answer

Delete in batches, drop partitions instead of deleting rows, or disable constraints if safe. This avoids long locks and table bloat.

Quick Summary: Large DELETE operations lock rows (InnoDB row-level locks) and generate undo log entries. Delete millions of rows at once = huge lock contention and slow rollback if something fails. Best practice: delete in batches with a loop: DELETE FROM logs WHERE created_at < cutoff LIMIT 1000; repeat until 0 rows affected. Use pt-archiver for efficient large-scale deletion. Schedule during off-peak hours.

Permalink

Q25:

Why was the MySQL query cache removed in version 8?

Senior

Answer

Query cache caused severe contention and invalidation overhead under concurrency. MySQL 8 removed it for scalability reasons.

Quick Summary: MySQL 8 removed the query cache because it was a bottleneck under high concurrency (global mutex lock on every query/invalidation). The cache was invalidated on any write to the table - making it useless for write-heavy tables. Modern applications use application-level caching (Redis, Memcached) which is far more effective, scalable, and controllable. The query cache removal was the right call.

Permalink

Q26:

What is the importance of ANALYZE TABLE?

Senior

Answer

ANALYZE TABLE refreshes statistics so the optimizer can choose efficient execution plans.

Quick Summary: ANALYZE TABLE updates the table statistics that the MySQL query optimizer uses to choose execution plans. Scans the table and updates cardinality estimates in information_schema. Run after: bulk data loads, large insertions/deletions, or when EXPLAIN shows clearly wrong row estimates. InnoDB auto-updates statistics but ANALYZE forces an immediate refresh. Essential after significant data distribution changes.

Permalink

Q27:

What happens internally during an UPDATE statement?

Senior

Answer

MySQL locks the row, updates memory pages, logs changes to redo logs, modifies undo logs for MVCC, and writes binlog entries.

Quick Summary: InnoDB UPDATE internals: 1) Find the row using index or full scan. 2) Acquire exclusive row lock. 3) Write old values to undo log (for rollback and MVCC old-version reads). 4) Write the change to redo log (WAL). 5) Modify the row in the buffer pool (in memory). 6) On COMMIT: flush redo log to disk. Data pages may be written to disk lazily (checkpoint). Secondary indexes also updated.

Permalink

Q28:

What is a gap lock in InnoDB?

Senior

Answer

Gap locks block inserts in a range. They enforce REPEATABLE READ isolation and reduce phantom reads but can increase contention.

Quick Summary: Gap lock in InnoDB: locks a gap between two existing index values to prevent other transactions from inserting rows in that range. Example: if you have rows with id 10 and 20, a gap lock on the gap 10-20 prevents inserts of id 11-19. Gap locks prevent phantom reads in REPEATABLE READ. They only exist in REPEATABLE READ isolation level - not in READ COMMITTED (which trades phantom-read protection for less locking).

Permalink

Q29:

Why is REPEATABLE READ default in MySQL?

Senior

Answer

It provides strong consistency while relying on MVCC snapshots to avoid heavy locking and phantom reads.

Quick Summary: REPEATABLE READ is MySQL InnoDB's default isolation level because it provides a good balance: consistent reads (your snapshot doesn't change mid-transaction), prevents dirty reads and non-repeatable reads, and prevents most phantom reads (via gap locks). Higher than READ COMMITTED (which allows non-repeatable reads) but lower than SERIALIZABLE (which would lock everything). Suitable for most OLTP workloads.

Permalink

Q30:

How do you tune MySQL for high-write workloads?

Senior

Answer

Increase buffer pool size, tune redo logs, minimize secondary indexes, optimize flush frequency, use batching, and partition large tables.

Quick Summary: Tune MySQL for high write throughput: increase innodb_buffer_pool_size (reduce disk writes), set innodb_flush_log_at_trx_commit=2 (flush every second instead of every commit - risk 1 second of data loss), use innodb_io_capacity tuned to SSD IOPS, enable innodb_flush_method=O_DIRECT (bypass OS cache for data files), batch inserts, disable binary logging for non-replicated servers, and use fast SSDs.

Permalink

Get Pro for Free

Senior MySQL Interview Questions

MySQL Interview Questions & Answers

What you will learn from these MySQL interview questions:

Questions

How does MySQL’s InnoDB engine implement MVCC internally?

Answer

How does MySQL determine the best execution plan for a query?

Answer

What is the difference between index scan and table scan?

Answer

What causes index fragmentation and how do you fix it?

Answer

What is a B-Tree index and how does MySQL use it?

Answer

Why is column order important in composite indexes?

Answer

How does MySQL detect and resolve deadlocks?

Answer

How does MySQL handle high-concurrency environments?

Answer

What is the difference between shared locks and exclusive locks?

Answer

What is a redo log and why is it important?

Answer

What is a binlog and how does it differ from redo logs?

Answer

How do you design MySQL for horizontal scalability?

Answer

What is the purpose of GTID in MySQL replication?

Answer

What is semi-synchronous replication?

Answer

What causes replication lag and how do you detect it?

Answer

What is the difference between statement-based and row-based replication?

Answer

How do you handle inconsistent replica data?

Answer

How do you design indexes for large OLTP systems?

Answer

What is query profiling in MySQL?

Answer

How does MySQL handle sorting and temporary tables?

Answer

What is the difference between HASH and B-Tree indexes?

Answer

How do you optimize JOIN performance in multi-table queries?

Answer

Why can functions in WHERE clause hurt performance?

Answer

How do you handle large DELETE operations efficiently?

Answer

Why was the MySQL query cache removed in version 8?

Answer

What is the importance of ANALYZE TABLE?

Answer

What happens internally during an UPDATE statement?

Answer

What is a gap lock in InnoDB?

Answer

Why is REPEATABLE READ default in MySQL?

Answer

How do you tune MySQL for high-write workloads?

Answer

Curated Sets for MySQL

People Also Ask - Related MySQL Questions