Amazon MySQL Interview Questions

Q1:

What is MySQL?

Entry

Answer

MySQL is an open-source relational database management system that stores data in tables and uses SQL for managing and querying data.

Quick Summary: MySQL is an open-source relational database management system (RDBMS). It stores data in tables with rows and columns, uses SQL for querying, and supports ACID transactions (with InnoDB engine). Widely used for web applications - powers WordPress, Facebook (historically), and countless other apps. Owned by Oracle. MariaDB is the popular open-source fork.

Permalink

Q2:

What is a database table?

Entry

Answer

A table is a structured set of rows and columns used to store related data, where each row is a record and each column has a data type.

Quick Summary: A database table organizes data into rows and columns. Each column has a defined name and data type (INT, VARCHAR, DATE). Each row is one record. Tables have a structure (schema) that all rows follow. A database contains multiple tables. Related tables are connected via foreign keys. Think of it as a spreadsheet with strict data types and constraints.

Permalink

Q3:

What is a primary key?

Entry

Answer

A primary key uniquely identifies each row in a table and cannot be NULL. It ensures uniqueness and is used for indexing and relationships.

Quick Summary: Primary key uniquely identifies each row in a table. Must be unique and NOT NULL. A table can have only one primary key (but it can be a composite of multiple columns). MySQL creates a clustered index on the primary key (InnoDB) - rows are stored in primary key order. Use an integer auto_increment column as primary key for best performance in InnoDB.

Permalink

Q4:

What is a foreign key?

Entry

Answer

A foreign key is a column that references a primary key in another table, enforcing referential integrity between related tables.

Quick Summary: Foreign key links one table to another, enforcing referential integrity. The FK column in the child table must match a value in the referenced (parent) table's primary/unique key or be NULL. InnoDB enforces FK constraints on INSERT, UPDATE, and DELETE. ON DELETE CASCADE, ON DELETE SET NULL, or ON DELETE RESTRICT control behavior when the parent row is deleted.

Permalink

Q5:

What is SQL?

Entry

Answer

SQL stands for Structured Query Language and is used to insert, update, delete, and query data in relational databases.

Quick Summary: SQL (Structured Query Language) is the standard language for interacting with relational databases. DML: SELECT (read), INSERT, UPDATE, DELETE. DDL: CREATE, ALTER, DROP (schema changes). DCL: GRANT, REVOKE (permissions). TCL: COMMIT, ROLLBACK, SAVEPOINT (transactions). All major RDBMS support SQL with minor variations.

Permalink

Q6:

What are the different types of SQL commands?

Entry

Answer

SQL commands include DDL (CREATE, ALTER, DROP), DML (INSERT, UPDATE, DELETE), DQL (SELECT), DCL (GRANT, REVOKE), TCL (COMMIT, ROLLBACK).

Quick Summary: SQL command types: DDL (Data Definition Language) - CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE. DML (Data Manipulation Language) - SELECT, INSERT, UPDATE, DELETE. DCL (Data Control Language) - GRANT, REVOKE permissions. TCL (Transaction Control Language) - COMMIT, ROLLBACK, SAVEPOINT. Each type serves a different purpose in database management.

Permalink

Q7:

What is a constraint in MySQL?

Entry

Answer

A constraint is a rule applied to table columns to maintain data accuracy and consistency, such as PRIMARY KEY, UNIQUE, NOT NULL, CHECK, DEFAULT.

Quick Summary: Constraints enforce rules on data in tables. Types: PRIMARY KEY (unique + not null), FOREIGN KEY (referential integrity), UNIQUE (no duplicates), NOT NULL (required field), CHECK (custom condition on value - MySQL 8.0.16+), DEFAULT (fallback value on insert). Constraints protect data integrity at the database level, not just the application level.

Permalink

Q8:

What is the difference between CHAR and VARCHAR?

Entry

Answer

CHAR is fixed-length storage while VARCHAR is variable-length. VARCHAR saves space and is preferred for dynamic text.

Quick Summary: CHAR(n): fixed-length string - always stores exactly n bytes, padded with spaces. Fast for fixed-length data (country codes, status flags). VARCHAR(n): variable-length string - stores actual length + 1-2 length bytes. More space-efficient for varying lengths. CHAR is slightly faster to read; VARCHAR saves space. Use CHAR for truly fixed-length values, VARCHAR for everything else.

Permalink

Q9:

What is the purpose of the SELECT statement?

Entry

Answer

SELECT retrieves data from tables and supports filtering, sorting, grouping, and joining.

Quick Summary: SELECT retrieves data from one or more tables. Basic: SELECT col1, col2 FROM table_name WHERE condition. SELECT * fetches all columns. Use aliases: SELECT name AS full_name. Combine with JOIN for multiple tables, GROUP BY for aggregation, ORDER BY for sorting, LIMIT for pagination. It's the most used SQL statement and the foundation of all data retrieval.

Permalink

Q10:

What is a WHERE clause used for?

Entry

Answer

WHERE filters records based on conditions, allowing retrieval of only relevant rows.

Quick Summary: WHERE filters rows before they are returned or affected. Only rows matching the condition are processed. Used with SELECT, UPDATE, and DELETE. Conditions: comparison (=, >, <, !=), logical (AND, OR, NOT), pattern (LIKE "%term%"), NULL checks (IS NULL, IS NOT NULL), ranges (BETWEEN x AND y), sets (IN (1,2,3)). WHERE runs before GROUP BY; use HAVING to filter after aggregation.

Permalink

Q11:

What is ORDER BY used for?

Entry

Answer

ORDER BY sorts query results in ascending or descending order on one or more columns.

Quick Summary: ORDER BY sorts query results. ASC (ascending, default) or DESC (descending). Sort by multiple columns: ORDER BY last_name ASC, first_name ASC. Without ORDER BY, MySQL makes no guarantee about row order - never assume rows come back in insert order. ORDER BY can sort by column position (ORDER BY 2) but this is fragile. Always use column names for clarity.

Permalink

Q12:

What is LIMIT in MySQL?

Entry

Answer

LIMIT restricts the number of rows returned by a query, commonly used for pagination.

Quick Summary: LIMIT restricts how many rows are returned: SELECT * FROM users LIMIT 10. LIMIT with OFFSET for pagination: LIMIT 10 OFFSET 20 skips 20 rows and returns the next 10. Or shorthand: LIMIT 20, 10 (offset first, then count). Large OFFSET values are slow (MySQL scans and discards rows). For efficient deep pagination, use keyset/cursor-based pagination instead.

Permalink

Q13:

What is a JOIN?

Entry

Answer

JOIN combines rows from multiple tables based on related columns such as INNER JOIN, LEFT JOIN, RIGHT JOIN, and CROSS JOIN.

Quick Summary: JOIN combines rows from two or more tables based on a related column. INNER JOIN: rows with matching values in both tables. LEFT JOIN: all rows from left table, matched rows from right (NULLs where no match). RIGHT JOIN: reverse. FULL OUTER JOIN: not native in MySQL (use UNION of LEFT + RIGHT JOINs). CROSS JOIN: Cartesian product (every combination of rows).

Permalink

Q14:

What is an index in MySQL?

Entry

Answer

An index improves read performance by speeding up searches but adds slight overhead to write operations.

Quick Summary: An index is a data structure (B-Tree by default in InnoDB) that speeds up data retrieval. Without an index, MySQL scans every row (full table scan). With an index, it jumps to matching rows directly. Creates overhead on INSERT/UPDATE/DELETE (index must be maintained). Index frequently searched, joined, or sorted columns. Too many indexes slow down write operations.

Permalink

Q15:

What is NULL in MySQL?

Entry

Answer

NULL represents missing or unknown data. It is not equal to zero or an empty string and requires IS NULL or IS NOT NULL for checks.

Quick Summary: NULL represents unknown or missing value - it's not zero or empty string. NULL != NULL (comparing NULL to anything yields NULL, not true/false). Use IS NULL or IS NOT NULL to check. In aggregate functions: COUNT(*) counts all rows, COUNT(col) skips NULLs. NULL values sort last in ASC order. COALESCE(col, "default") returns a fallback when the column is NULL.

Permalink

Q16:

Difference between DELETE and TRUNCATE?

Entry

Answer

DELETE removes selected rows and is logged, while TRUNCATE removes all rows quickly and resets auto-increment but cannot be rolled back in many cases.

Quick Summary: DELETE: removes specific rows matching WHERE, logged in the transaction log, fires triggers, can be rolled back, allows WHERE filter. TRUNCATE: removes all rows instantly (no WHERE), minimal logging, resets AUTO_INCREMENT, can't be rolled back in most cases (DDL in MySQL, though InnoDB makes it transactional). TRUNCATE is much faster for emptying a table; DELETE for selective removal.

Permalink

Q17:

What does AUTO_INCREMENT do?

Entry

Answer

AUTO_INCREMENT automatically generates sequential numeric values, commonly used for primary key columns.

Quick Summary: AUTO_INCREMENT automatically generates a unique incrementing integer value for a column (usually the primary key) on each INSERT without specifying a value. MySQL tracks the next value to use. Gaps can occur (from failed inserts, rollbacks). The counter never decreases. Use BIGINT AUTO_INCREMENT for tables that will grow large to avoid integer overflow at 2 billion rows.

Permalink

Q18:

What is a default value in MySQL?

Entry

Answer

A default value is used when no explicit value is provided during insertion, ensuring consistent data entry.

Quick Summary: DEFAULT sets a value used when no value is provided for a column during INSERT. Examples: created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, status VARCHAR(20) DEFAULT "active", score INT DEFAULT 0. If you INSERT without specifying the column, the default is used. Defaults can be constant values or a limited set of MySQL functions (CURRENT_TIMESTAMP, NOW(), etc.).

Permalink

Q19:

What is the purpose of GROUP BY?

Entry

Answer

GROUP BY groups rows based on column values and is used with aggregate functions to summarize data.

Quick Summary: GROUP BY groups rows with the same value in specified columns into summary rows. Used with aggregate functions (COUNT, SUM, AVG, MAX, MIN). SELECT department, COUNT(*) FROM employees GROUP BY department. Each row in the result represents one group. Columns in SELECT must be either in GROUP BY or inside an aggregate function. HAVING filters groups after aggregation.

Permalink

Q20:

What is an aggregate function?

Entry

Answer

Aggregate functions calculate values over multiple rows, such as COUNT, SUM, AVG, MIN, and MAX.

Quick Summary: Aggregate functions compute a single value from multiple rows. COUNT(*): total row count. COUNT(col): non-NULL values count. SUM(col): total of values. AVG(col): average. MAX(col)/MIN(col): largest/smallest value. GROUP_CONCAT(col): combines values into a string. Used with GROUP BY to compute per-group summaries. NULL values are ignored by all aggregates except COUNT(*).

Permalink

Q21:

How does MySQL’s InnoDB storage engine manage transactions internally?

Junior

Answer

InnoDB manages transactions using undo logs, redo logs, and a buffer pool. It maintains ACID compliance through MVCC, row-level locking, and transaction logs that support rollbacks and crash recovery.

Quick Summary: InnoDB transactions: each transaction writes changes to the redo log (on disk) before data pages are modified (write-ahead logging). Uncommitted data pages modified in the buffer pool. On COMMIT: redo log flushed to disk, transaction committed. On ROLLBACK: undo log reverses changes. MVCC allows concurrent reads of consistent snapshots without blocking writers or readers.

Permalink

Q22:

What is MVCC (Multi-Version Concurrency Control) in MySQL?

Junior

Answer

MVCC allows multiple transactions to read and write data without blocking. Readers view snapshot data while writers update the latest version using undo logs.

Quick Summary: MySQL InnoDB MVCC: each row has two hidden columns - DB_TRX_ID (transaction that last modified it) and DB_ROLL_PTR (pointer to undo log for old versions). When a transaction reads a row, it uses a read view (snapshot of active transactions at read start) to determine which row version is visible. Older versions are stored in the undo log and accessed via the rollback pointer chain.

Permalink

Q23:

What is the purpose of the InnoDB buffer pool?

Junior

Answer

The buffer pool caches data and index pages in memory to reduce disk I/O and improve performance, keeping frequently used data readily available.

Quick Summary: InnoDB buffer pool is the main memory cache for data and index pages. MySQL reads pages from disk into the buffer pool and serves subsequent reads from memory. Sized by innodb_buffer_pool_size - set to 70-80% of available RAM for dedicated DB servers. The bigger it is, the more data fits in memory, fewer disk reads. Monitor with SHOW ENGINE INNODB STATUS for buffer pool hit rate.

Permalink

Q24:

What is a covering index?

Junior

Answer

A covering index includes all columns required by a query, allowing MySQL to fetch results directly from the index without accessing the table.

Quick Summary: Covering index: the index contains all columns needed to satisfy a query without accessing the actual data rows. MySQL can fulfill the query entirely from the index B-Tree. Seen in EXPLAIN as "Using index" (no "Using where" or table access). Example: index on (user_id, email) covers SELECT email FROM users WHERE user_id = 1. Dramatically faster than regular index access for frequently-run queries.

Permalink

Q25:

What is an execution plan and why is EXPLAIN important?

Junior

Answer

An execution plan describes how MySQL executes a query. Using EXPLAIN reveals join types, index usage, and potential performance problems.

Quick Summary: EXPLAIN shows the query execution plan without running the query. Key columns: type (ALL=full scan, index, range, ref, const - const is best), key (index used), rows (estimated rows examined), Extra (Using where, Using index, Using filesort, Using temporary). Using filesort and Using temporary signal potential performance issues. Always run EXPLAIN on slow queries before adding indexes.

Permalink

Q26:

What is the difference between row-level and table-level locking?

Junior

Answer

Row-level locking locks only specific rows, improving concurrency. Table-level locking locks the entire table and blocks other write operations.

Quick Summary: Table-level locking: the entire table is locked for one operation. Simple but allows no concurrency on the same table (MyISAM). Row-level locking: only the specific rows being accessed are locked. Other rows (and thus other transactions) are unaffected (InnoDB). Row-level locking enables much higher concurrency on write-heavy workloads. InnoDB uses row-level locking with next-key locks for range queries.

Permalink

Q27:

How does MySQL handle indexing on large tables?

Junior

Answer

Indexing large tables may involve rebuilding indexes and using memory buffers. Online DDL and partitioning help avoid downtime.

Quick Summary: Large table indexing considerations: index creation is slow (reads all rows) and blocks writes by default (MySQL 5.6+ has Online DDL for InnoDB). Indexes consume disk space and slow down writes. Use ONLINE algorithm for adding indexes: ALTER TABLE ... ADD INDEX ... ALGORITHM=INPLACE, LOCK=NONE. Analyze query patterns before adding indexes. More isn't better - each index has a write cost.

Permalink

Q28:

What is a slow query log and when should you enable it?

Junior

Answer

The slow query log records queries that exceed a specified execution time. It helps identify unoptimized queries or missing indexes.

Quick Summary: Slow query log records queries that exceed long_query_time seconds. Enable: slow_query_log=ON, slow_query_log_file=/path/to/log, long_query_time=1.0. Also log queries without indexes: log_queries_not_using_indexes=ON. Use mysqldumpslow or pt-query-digest to analyze the log and find top offenders. Enable in production to catch performance regressions early.

Permalink

Q29:

What is the difference between logical and physical backups?

Junior

Answer

Logical backups export SQL statements, while physical backups copy actual data files. Physical backups are faster for large datasets.

Quick Summary: Logical backup: exports data as SQL statements (INSERT) or CSV - readable, portable, slow to restore. Tools: mysqldump, mysqlpump. Physical backup: copies raw data files - fast to restore, same MySQL version required. Tools: MySQL Enterprise Backup, Percona XtraBackup (hot backup - no downtime). Use logical for small databases and portability, physical for large databases where restore speed matters.

Permalink

Q30:

What is a stored routine and why use it?

Junior

Answer

Stored routines include procedures and functions that store logic on the server, reducing network usage and improving consistency.

Quick Summary: Stored routine (procedure or function) encapsulates SQL logic in the database. Benefits: network efficiency (multiple SQL statements executed in one call), reusability across applications, security (grant EXECUTE without table access), consistency (business rules enforced at DB level). Downsides: difficult to version control, unit test, and debug. Modern practice favors application-layer logic with ORMs.

Permalink

Q31:

How does MySQL handle foreign key constraints during deletes or updates?

Junior

Answer

Foreign keys enforce referential integrity using actions like CASCADE, RESTRICT, and SET NULL for parent-child relationships.

Quick Summary: InnoDB foreign key on DELETE: CASCADE (delete child rows too), SET NULL (set FK to NULL), RESTRICT/NO ACTION (block delete if children exist - default). On UPDATE: CASCADE (update FK in children), SET NULL, RESTRICT. MySQL checks FK constraints on every DELETE and UPDATE of parent rows. Add an index on the FK column in the child table - MySQL requires it for performance.

Permalink

Q32:

What is partitioning and when is it useful?

Junior

Answer

Partitioning splits a large table into smaller parts to improve performance, maintenance, and query speed.

Quick Summary: Partitioning splits one logical table into multiple physical segments (partitions) based on a key. Types: RANGE (by date range), LIST (by specific values), HASH (evenly distributed), KEY (like hash using MySQL's hashing). Benefits: partition pruning (queries only scan relevant partitions), faster DELETE of old data (DROP PARTITION), parallel queries across partitions. Best for time-series data with range queries on the partition key.

Permalink

Q33:

What are FULLTEXT indexes and when are they used?

Junior

Answer

FULLTEXT indexes speed up text searching and support natural language and boolean search modes.

Quick Summary: FULLTEXT indexes enable natural language text search. Supported on TEXT, VARCHAR, CHAR columns in InnoDB (MySQL 5.6+). Create: CREATE FULLTEXT INDEX idx ON articles(title, body). Query: WHERE MATCH(title, body) AGAINST("search terms" IN BOOLEAN MODE). Much faster than LIKE "%term%" for text search. Supports relevance ranking. Not suitable for partial word matching or complex linguistic analysis.

Permalink

Q34:

What is a deadlock detector in InnoDB?

Junior

Answer

InnoDB automatically identifies deadlocks and rolls back one transaction to allow others to proceed.

Quick Summary: InnoDB's deadlock detector checks for deadlock cycles every second. When detected, it rolls back the transaction with the least amount of undo (typically the most recently started). The rolled-back transaction receives error 1213 (Deadlock found). The application must catch this error and retry the transaction. You can view the last deadlock with SHOW ENGINE INNODB STATUS.

Permalink

Q35:

How do prepared statements improve performance and security?

Junior

Answer

Prepared statements precompile SQL and prevent SQL injection by separating queries from data.

Quick Summary: Prepared statements: parse and plan the SQL once, execute many times with different parameters. Security: parameters are sent separately from the query (prevents SQL injection - no string interpolation). Performance: plan is cached in the session. In application code: most drivers use prepared statements by default when you pass parameters. At DB level: PREPARE stmt FROM "SELECT * FROM users WHERE id = ?"; EXECUTE stmt USING @id.

Permalink

Q36:

What are triggers and their performance implications?

Junior

Answer

Triggers automate actions but add overhead to write operations, making heavy trigger use slow and harder to debug.

Quick Summary: Triggers add overhead: they execute hidden SQL for every matching DML operation, increasing write latency. Cascading triggers (trigger fires another trigger) multiply the overhead. Triggers are not visible in EXPLAIN, making performance debugging harder. Alternatives: application-layer logic or scheduled jobs for audit/denormalization. Only use triggers for database-level enforcement that can't be enforced at the application layer.

Permalink

Q37:

What is the difference between CHAR, VARCHAR, and TEXT in performance?

Junior

Answer

CHAR is fixed-length and fast. VARCHAR is variable-length and space-efficient. TEXT is stored separately and slower to query.

Quick Summary: CHAR(n): fixed-length, padded with spaces. Fast for fixed-size values (codes, flags). VARCHAR(n): variable-length with 1-2 byte length prefix. Best for most text columns. TEXT/MEDIUMTEXT/LONGTEXT: stored outside the row when large, slower to access. For short text (<255 chars): VARCHAR. For long documents: TEXT. CHAR is slightly faster for fixed-size data due to no length calculation. VARCHAR is almost always the right choice.

Permalink

Q38:

How do you optimize JOIN performance in MySQL?

Junior

Answer

Optimizing joins involves indexing join columns, avoiding functions on indexed fields, and analyzing EXPLAIN output.

Quick Summary: Optimize MySQL JOINs: ensure join columns are indexed on both sides (especially the child table FK). Use EXPLAIN to check join type (ref is good, ALL is bad). Smaller table should drive the join (MySQL's optimizer usually handles this). Use covering indexes that include join and filter columns. Avoid functions on join columns (breaks index usage). Rewrite complex JOINs with derived tables or CTEs if needed.

Permalink

Q39:

How does MySQL handle replication?

Junior

Answer

Replication sends changes from a primary server to replicas using async, semi-sync, or GTID-based methods.

Quick Summary: MySQL replication: primary (master) writes changes to the binary log (binlog). Replica (slave) has an I/O thread that copies binlog to its relay log, and a SQL thread that replays the relay log. Replication is asynchronous by default (replica may lag). Supports: read scaling (route SELECTs to replicas), high availability (promote replica if primary fails), and backups (take backup from replica). GTID replication makes failover easier.

Permalink

Q40:

What are common reasons for slow MySQL performance?

Junior

Answer

Slow performance is often due to poor indexing, large table scans, insufficient memory, high disk I/O, or heavy locking.

Quick Summary: Common MySQL performance issues: missing indexes (causes full table scans), bad indexes (too many or low cardinality), N+1 queries from ORMs, large result sets fetched but only a few rows used (add LIMIT), SELECT * fetching unused columns, queries with functions on indexed columns (index not used), buffer pool too small (constant disk I/O), and connection overhead (use connection pooling).

Permalink

Q41:

What is the difference between INNER JOIN and LEFT JOIN?

Mid

Answer

INNER JOIN returns only matching rows between tables. LEFT JOIN returns all rows from the left table and matching rows from the right, with NULL for non-matching rows.

Quick Summary: INNER JOIN: returns only rows where there is a match in BOTH tables - unmatched rows from either table are excluded. LEFT JOIN (LEFT OUTER JOIN): returns ALL rows from the left table, matched rows from the right table; unmatched right-side columns are NULL. Use LEFT JOIN when the right-side data is optional (e.g., users who may or may not have orders).

Permalink

Q42:

What is a composite primary key?

Mid

Answer

A composite primary key uses multiple columns together to uniquely identify a row, useful when no single column is unique.

Quick Summary: Composite primary key is a primary key made up of two or more columns whose combined values must be unique. Example: PRIMARY KEY (order_id, product_id) for an order_items table. Neither column alone needs to be unique - only the combination. Used for many-to-many junction tables or when no single natural column uniquely identifies a row. InnoDB stores rows in composite PK order.

Permalink

Q43:

What is the use of the DISTINCT keyword?

Mid

Answer

DISTINCT removes duplicate rows from the result set and is used to fetch unique values.

Quick Summary: DISTINCT removes duplicate rows from query results. SELECT DISTINCT country FROM users returns each country only once even if multiple users share the same country. Applied after the query runs - MySQL evaluates all rows then removes duplicates. Can be slow on large result sets. Use GROUP BY as an alternative when also performing aggregation. Cannot DISTINCT on a subset of selected columns.

Permalink

Q44:

What is an alias in MySQL?

Mid

Answer

An alias temporarily renames a table or column to improve readability in complex queries.

Quick Summary: Aliases give a temporary name to a column or table in a query. Column alias: SELECT first_name AS name. Table alias: FROM employees e JOIN departments d ON e.dept_id = d.id. Aliases make queries shorter and more readable. Column aliases can be used in ORDER BY but not in WHERE (evaluated before aliases are applied). Table aliases are required when joining a table to itself (self join).

Permalink

Q45:

What is the purpose of HAVING clause?

Mid

Answer

HAVING filters grouped records after GROUP BY, whereas WHERE filters rows before grouping.

Quick Summary: HAVING filters groups after GROUP BY aggregation - WHERE can't reference aggregate values. SELECT department, AVG(salary) AS avg_sal FROM employees GROUP BY department HAVING avg_sal > 50000. WHERE runs before grouping (filters individual rows), HAVING runs after grouping (filters groups). Use WHERE to filter rows, HAVING to filter aggregated results.

Permalink

Q46:

What is the difference between UNION and UNION ALL?

Mid

Answer

UNION removes duplicates, UNION ALL returns all rows including duplicates. UNION requires extra processing.

Quick Summary: UNION combines results from two SELECT statements and removes duplicates (deduplication step). UNION ALL combines results and keeps all rows including duplicates (faster - no dedup). Both queries must have the same number of columns with compatible types. Use UNION ALL unless you specifically need duplicate removal - it's always faster since MySQL skips the sort/dedup step.

Permalink

Q47:

What is a subquery?

Mid

Answer

A subquery is a query inside another query used for filtering, comparisons, or dynamic data selection.

Quick Summary: A subquery is a SELECT statement nested inside another SQL statement. Can appear in WHERE (filter with subquery result), FROM (inline view/derived table), SELECT (scalar subquery). Correlated subquery references the outer query's columns and executes once per outer row (can be slow). Non-correlated subquery executes once, result reused. Often replaceable with JOINs for better performance.

Permalink

Q48:

What is a self join?

Mid

Answer

Self join is when a table is joined with itself, useful for hierarchical or relational comparisons.

Quick Summary: A self join joins a table to itself. Requires table aliases to distinguish the two instances. Common use: hierarchical data (employee and their manager in the same table). SELECT e.name AS employee, m.name AS manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.id. The self join treats the same table as two logical tables with different roles.

Permalink

Q49:

What are MySQL data types?

Mid

Answer

MySQL supports numeric, string, date/time, and spatial data types. Choosing correct types improves performance.

Quick Summary: MySQL data types: Numeric (TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL), String (CHAR, VARCHAR, TEXT, BLOB, ENUM, SET), Date/Time (DATE, TIME, DATETIME, TIMESTAMP, YEAR), Binary (BINARY, VARBINARY, BLOB types), Spatial types (POINT, GEOMETRY), JSON (native JSON type in MySQL 5.7+). Choose the smallest type that fits your data.

Permalink

Q50:

What is a view?

Mid

Answer

A view is a virtual table based on a SELECT query, storing no physical data but simplifying data access.

Quick Summary: A view is a saved SELECT query presented as a virtual table. Doesn't store data itself - queries the underlying tables each time accessed. Benefits: simplify complex queries (hide JOINs), security (expose only certain columns), abstraction (insulate apps from schema changes). Updatable views allow INSERT/UPDATE/DELETE if they meet certain conditions (no GROUP BY, DISTINCT, UNION, etc.).

Permalink

Q51:

What are the advantages of using views?

Mid

Answer

Views improve security, simplify complex queries, and provide logical abstraction of data.

Quick Summary: View advantages: security (hide sensitive columns, expose only what users need), simplification (encapsulate complex JOINs and filters behind a simple name), consistency (ensure all queries use the same logic), backward compatibility (rename a column but keep old view with old name), and reusability (write the join once, query the view many times).

Permalink

Q52:

What is a stored procedure?

Mid

Answer

A stored procedure is precompiled SQL code stored in the database, reducing repetition and improving performance.

Quick Summary: Stored procedure is a set of SQL statements saved in the database and executed by calling its name. CALL procedure_name(params). Can contain loops, conditionals, error handling. Benefits: reduce network round-trips (logic runs on DB server), reusability, security (grant EXECUTE without exposing tables). Downside: harder to version, debug, and test than application code.

Permalink

Q53:

What is a stored function?

Mid

Answer

A stored function returns a single value and can be used in SQL statements for transformations or calculations.

Quick Summary: Stored function is a routine that takes parameters, performs a computation, and returns a single value. Unlike procedures, functions can be used inside SQL expressions: SELECT get_full_name(first_name, last_name) FROM users. Must return a value. Cannot use CALL - invoked like built-in functions. Useful for computed fields and reusable calculations within queries.

Permalink

Q54:

What is a trigger in MySQL?

Mid

Answer

A trigger executes automatically before or after INSERT, UPDATE, or DELETE, useful for logs or business rules.

Quick Summary: Triggers automatically execute SQL code BEFORE or AFTER INSERT, UPDATE, or DELETE on a table. Use cases: audit logging (record who changed what), maintaining denormalized data, enforcing complex business rules. Each table can have 6 trigger types (BEFORE/AFTER x 3 operations). Triggers can slow write performance - use sparingly. Access OLD (pre-change) and NEW (post-change) row values.

Permalink

Q55:

What is normalization?

Mid

Answer

Normalization organizes data to reduce redundancy and improve integrity by splitting into logical tables.

Quick Summary: Normalization organizes a database to reduce redundancy and improve data integrity. The process involves decomposing tables into smaller, more focused tables and defining relationships between them. This reduces update anomalies (updating one place instead of many), insertion anomalies, and deletion anomalies. Trade-off: more tables = more JOINs, which can impact query performance.

Permalink

Q56:

What are the normal forms?

Mid

Answer

1NF removes repeating groups, 2NF removes partial dependencies, 3NF removes transitive dependencies.

Quick Summary: 1NF: atomic values (no repeating groups or arrays in columns). 2NF: 1NF + no partial dependency (every non-key column depends on the whole primary key). 3NF: 2NF + no transitive dependency (non-key columns depend only on the primary key, not on other non-key columns). BCNF: stricter 3NF. 4NF and 5NF handle multi-valued dependencies. Most production databases target 3NF.

Permalink

Q57:

What is denormalization?

Mid

Answer

Denormalization adds redundancy to improve read performance, often used in analytics and reporting.

Quick Summary: Denormalization intentionally introduces redundancy for performance. Instead of joining tables, store computed/duplicated data in one table. Example: store total_order_amount in the orders table instead of computing it with a SUM JOIN every time. Trade-off: faster reads, slower and more complex writes (must update all copies). Used for read-heavy workloads, analytics, and reporting tables.

Permalink

Q58:

What is a schema in MySQL?

Mid

Answer

A schema is another name for a database, containing tables, views, procedures, and triggers.

Quick Summary: In MySQL, schema and database are synonymous - CREATE SCHEMA and CREATE DATABASE do the same thing. A MySQL server hosts multiple databases/schemas. Each schema contains tables, views, stored procedures, functions, and triggers. Unlike PostgreSQL where schema is a namespace within a database, MySQL uses "database" and "schema" interchangeably as the top-level container.

Permalink

Q59:

What is a transaction?

Mid

Answer

A transaction is a group of SQL operations treated as a single unit that can be committed or rolled back.

Quick Summary: A transaction is a group of SQL operations that execute as a single unit. Either all succeed (COMMIT) or all fail and roll back (ROLLBACK). Start: START TRANSACTION (or BEGIN). End: COMMIT or ROLLBACK. InnoDB supports transactions. MyISAM does not. Use transactions for operations that must be atomic: transferring money (debit + credit must both succeed or both fail).

Permalink

Q60:

What are ACID properties?

Mid

Answer

ACID ensures data reliability: Atomicity, Consistency, Isolation, Durability.

Quick Summary: ACID properties for reliable transactions: Atomicity (all or nothing - transaction either completes fully or not at all), Consistency (data moves from one valid state to another), Isolation (concurrent transactions don't interfere with each other), Durability (committed transactions survive crashes - written to disk/redo log). InnoDB provides full ACID compliance.

Permalink

Q61:

What is a deadlock in MySQL?

Mid

Answer

A deadlock occurs when two transactions wait indefinitely for each other’s locks. MySQL resolves it by rolling back one.

Quick Summary: Deadlock: transaction A holds lock on row 1, waits for row 2. Transaction B holds lock on row 2, waits for row 1. Both wait forever. InnoDB detects deadlocks automatically and rolls back one transaction (usually the one with fewer changes). The application must catch the deadlock error and retry. Prevent by: acquiring locks in consistent order and keeping transactions short.

Permalink

Q62:

What is index cardinality?

Mid

Answer

Cardinality measures uniqueness of indexed values. High cardinality indexes improve search speed.

Quick Summary: Index cardinality is the number of unique values in an indexed column. High cardinality (unique emails, user IDs) = highly selective = index is very useful. Low cardinality (boolean, status with few values) = not selective = MySQL may skip the index and prefer a full table scan. MySQL uses cardinality estimates to decide whether to use an index. Run ANALYZE TABLE to update cardinality stats.

Permalink

Q63:

What is a clustered index?

Mid

Answer

In InnoDB, the primary key creates a clustered index, storing rows physically in key order.

Quick Summary: In InnoDB, the clustered index IS the table - data rows are stored in the B-Tree leaves sorted by the primary key. There is exactly one clustered index per table. All secondary indexes store the primary key value as a reference to find the full row (index lookup then PK lookup). Choosing a good primary key (sequential integers) is critical for InnoDB write performance.

Permalink

Q64:

What is a non-clustered index?

Mid

Answer

A non-clustered index stores pointers to table rows and is used on non-primary key columns.

Quick Summary: In InnoDB, all indexes other than the primary key are non-clustered (secondary indexes). They store the indexed columns plus the primary key value for row lookup. A query using a secondary index often needs two lookups: find the PK in the secondary index, then look up the full row in the clustered index. Covering indexes avoid the second lookup by including all needed columns.

Permalink

Q65:

What is the difference between DELETE and DROP?

Mid

Answer

DELETE removes rows, whereas DROP removes the table structure entirely.

Quick Summary: DELETE removes specific rows but keeps the table structure, can use WHERE, is transactional (rollback possible), fires triggers, logged in binlog. DROP TABLE removes the entire table (structure, data, indexes, triggers) immediately and permanently. DROP is DDL (auto-committed). You can't WHERE on a DROP - it deletes everything. Use DELETE for data removal, DROP to remove the table itself.

Permalink

Q66:

What is ENUM data type?

Mid

Answer

ENUM restricts a column to a predefined list of values, ensuring consistent data input.

Quick Summary: ENUM defines a column that can only store one value from a predefined list. CREATE TABLE t (status ENUM("active", "inactive", "pending")). Stored as an integer internally (1-2 bytes) but displayed as the string. Compared as strings in queries. Advantage: storage efficient, enforces valid values. Downside: adding new values requires ALTER TABLE (which can lock the table).

Permalink

Q67:

What is IN operator?

Mid

Answer

IN checks if a value exists within a list of values and simplifies conditional queries.

Quick Summary: IN checks if a value matches any value in a list. SELECT * FROM users WHERE country IN ("USA", "UK", "Canada"). Equivalent to multiple OR conditions but more readable. Can also use a subquery: WHERE id IN (SELECT user_id FROM orders WHERE total > 100). NOT IN excludes listed values. Caution: NOT IN with a subquery that returns NULL values can give unexpected results.

Permalink

Q68:

What is BETWEEN operator?

Mid

Answer

BETWEEN checks if a value lies within a range, commonly used with dates and numbers.

Quick Summary: BETWEEN checks if a value falls within a range (inclusive of both endpoints). SELECT * FROM orders WHERE amount BETWEEN 100 AND 500 is equivalent to amount >= 100 AND amount <= 500. Works for numbers, dates, and strings. NOT BETWEEN excludes the range. For dates: BETWEEN "2024-01-01" AND "2024-12-31" includes both boundary dates.

Permalink

Q69:

What are aggregate functions?

Mid

Answer

Aggregate functions compute results from sets of rows such as COUNT, SUM, AVG, MIN, and MAX.

Quick Summary: Aggregate functions operate on sets of rows and return a single value. COUNT(*): total rows. COUNT(col): non-NULL values. SUM(col): total. AVG(col): mean. MAX/MIN: extreme values. GROUP_CONCAT(col): strings joined. STDDEV, VARIANCE for statistical analysis. Used in SELECT with or without GROUP BY. Excluding NULLs is default behavior for all except COUNT(*).

Permalink

Q70:

What are scalar functions?

Mid

Answer

Scalar functions operate on individual values such as UPPER, LOWER, CONCAT, and LENGTH.

Quick Summary: Scalar functions return one value per row (unlike aggregate functions that return one value per group). Examples: string functions (UPPER, LOWER, LENGTH, CONCAT, SUBSTRING, TRIM), numeric (ROUND, FLOOR, CEIL, ABS, MOD), date (NOW, DATE_FORMAT, DATEDIFF, YEAR, MONTH, DAY), and control functions (IF, CASE, COALESCE, NULLIF). They transform individual column values.

Permalink

Q71:

What is a join condition?

Mid

Answer

A join condition defines how tables relate, usually through primary and foreign key columns.

Quick Summary: A join condition specifies how tables are related in a JOIN. ON clause: explicit condition (e.g., ON orders.user_id = users.id). USING clause: when the column name is the same in both tables (USING(user_id)). Natural join condition: automatically matches same-named columns (implicit, avoid in production - fragile). Always specify explicit join conditions with ON for clarity and correctness.

Permalink

Q72:

What is a cross join?

Mid

Answer

A cross join returns the Cartesian product of two tables.

Quick Summary: CROSS JOIN returns the Cartesian product - every row from table A combined with every row from table B. 10 rows x 10 rows = 100 result rows. No join condition needed (or specify ON 1=1). Rarely used in practice but useful for: generating combinations, creating a date/time grid, or pairing every item with every other item. Beware: cross joins on large tables create massive result sets.

Permalink

Q73:

What is a natural join?

Mid

Answer

A natural join automatically matches columns with the same name, which can be risky if unintended matches occur.

Quick Summary: NATURAL JOIN automatically joins tables on all columns with the same name. MySQL matches column names implicitly - no ON clause needed. Fragile: adding a same-named column to either table silently changes the join behavior. Not recommended for production code. Use explicit JOIN with ON conditions instead - more readable, predictable, and not affected by future schema changes.

Permalink

Q74:

How does MySQL handle NULL values in comparison?

Mid

Answer

NULL does not equal anything, even another NULL. Use IS NULL or IS NOT NULL for comparisons.

Quick Summary: NULL is not equal to anything, including itself. col = NULL always evaluates to UNKNOWN (not TRUE or FALSE). WHERE col = NULL never matches any rows. Use IS NULL or IS NOT NULL. In ORDER BY, NULLs appear last in ASC, first in DESC. IFNULL(col, "default") and COALESCE(col, "default") provide substitutes. Most aggregate functions ignore NULLs.

Permalink

Q75:

What is a temporary table?

Mid

Answer

A temporary table exists only for the session and is deleted automatically when the session ends.

Quick Summary: Temporary tables exist only for the duration of a session. CREATE TEMPORARY TABLE temp_results (...). Automatically dropped when session closes. Invisible to other connections. Useful for: breaking complex queries into steps, staging intermediate results, reusing a computed result multiple times in a session. Note: "temp table" in EXPLAIN refers to internal query processing temp tables, not user-created ones.

Permalink

Q76:

Difference between COUNT(*) and COUNT(column)?

Mid

Answer

COUNT(*) counts all rows including NULLs, while COUNT(column) counts only non-null values.

Quick Summary: COUNT(*): counts all rows in the result set including NULLs (counts the row, not a column value). COUNT(column): counts rows where the specified column is NOT NULL - skips NULL values. COUNT(DISTINCT col): counts unique non-NULL values. Example: if 10 rows, 2 have NULL email - COUNT(*) returns 10, COUNT(email) returns 8. Use COUNT(*) for total rows, COUNT(col) for non-null counts.

Permalink

Q77:

Join vs Subquery — when to use which?

Mid

Answer

JOINs are faster and used to combine related tables. Subqueries help structure logic but may be slower.

Quick Summary: Use JOIN when: both tables are large (optimizer handles it better), you need rows from both tables in output, the relationship is well-indexed. Use subquery when: you need a result from one table to filter another, readability improves, or using EXISTS for existence checks. Correlated subqueries are generally slow. In most cases, JOINs are faster than equivalent subqueries - prefer JOINs and check with EXPLAIN.

Permalink

Q78:

Difference between DATETIME and TIMESTAMP?

Mid

Answer

DATETIME stores exact values independent of timezone; TIMESTAMP converts values based on timezone and supports auto-update.

Quick Summary: DATETIME: stores date and time without timezone, range 1000-9999. Unchanged on retrieval. TIMESTAMP: stores as UTC internally, converts to server timezone on retrieval. Range: 1970-2038 (32-bit Unix timestamp). TIMESTAMP auto-updates on INSERT/UPDATE if defined with DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP. Use TIMESTAMP for "last modified" tracking, DATETIME for future dates or timezone-neutral storage.

Permalink

Q79:

Difference between MyISAM and InnoDB engines?

Mid

Answer

MyISAM is faster for reads but lacks transactions and foreign keys. InnoDB supports row-level locking and ACID transactions.

Quick Summary: MyISAM: older engine, table-level locking (only one writer at a time), no transactions, no foreign keys, faster for simple reads, supports FULLTEXT index in older MySQL. InnoDB: modern engine, row-level locking (many concurrent writers), full ACID transactions, foreign keys, MVCC for concurrent reads and writes, better crash recovery. InnoDB is the default since MySQL 5.5 and should be used for almost all cases.

Permalink

Q80:

What is referential integrity?

Mid

Answer

Referential integrity ensures foreign keys always refer to valid rows, preventing orphaned records.

Quick Summary: Referential integrity ensures relationships between tables stay valid. A foreign key value in the child table must either match a primary key in the parent table or be NULL. InnoDB enforces this: you can't insert a child row with a non-existent parent, and you can't delete a parent row that child rows reference (unless CASCADE is configured). Maintains data consistency across related tables.

Permalink

Q81:

How does MySQL’s InnoDB engine implement MVCC internally?

Senior

Answer

InnoDB uses undo logs to store old row versions. Each transaction reads based on its snapshot ID, enabling non-blocking reads while writers modify the latest row version.

Quick Summary: InnoDB MVCC internals: each row has DB_TRX_ID (last modifying transaction) and DB_ROLL_PTR (undo log chain pointer). At transaction start, MySQL creates a read view capturing which transaction IDs are active. For each row, if DB_TRX_ID is visible in the read view, use this row version. Otherwise, follow DB_ROLL_PTR to the undo log to find the last visible version. Readers never block writers.

Permalink

Q82:

How does MySQL determine the best execution plan for a query?

Senior

Answer

The optimizer evaluates table statistics, cardinality, join orders, and cost estimates. It chooses the lowest I/O and CPU cost plan, relying heavily on accurate table statistics.

Quick Summary: MySQL query optimizer builds an execution plan by: parsing the query, transforming it (subquery flattening, constant folding), generating candidate plans (different join orders, index choices), estimating costs using table statistics (row counts, cardinality, index size), and picking the lowest-cost plan. Run EXPLAIN FORMAT=JSON to see detailed cost estimates. ANALYZE TABLE updates the statistics the optimizer uses.

Permalink

Q83:

What is the difference between index scan and table scan?

Senior

Answer

Table scans read every row, while index scans use index trees to quickly locate matching rows. Index scans improve performance when filtering selective values.

Quick Summary: Index scan: MySQL uses the index B-Tree to find rows. Can be range scan (range of index values) or full index scan (all index values in order). Faster than table scan for selective queries. Table scan (full scan): reads every row in the table sequentially - used when no suitable index exists or when the optimizer estimates it's cheaper than the index (e.g., retrieving 70% of rows). Seen in EXPLAIN as type=ALL.

Permalink

Q84:

What causes index fragmentation and how do you fix it?

Senior

Answer

Frequent updates and deletes leave gaps inside index pages. Use OPTIMIZE TABLE or rebuild indexes to reorganize and compact index pages.

Quick Summary: Index fragmentation in MySQL: InnoDB uses B-Tree indexes. After many random inserts, updates, and deletes, index pages become partially filled (fragmented). This wastes space and slows reads. Fix: OPTIMIZE TABLE (rebuilds table and indexes, needs table lock) or ALTER TABLE ... ENGINE=InnoDB (online rebuild with ALGORITHM=INPLACE). Regular ANALYZE TABLE updates statistics but doesn't defragment.

Permalink

Q85:

What is a B-Tree index and how does MySQL use it?

Senior

Answer

B-Tree indexes maintain sorted keys allowing fast lookups, prefix scans, and range queries. They underpin most MySQL indexing except FULLTEXT and spatial indexes.

Quick Summary: B-Tree (Balanced Tree) index organizes values in a sorted tree structure. All leaves are at the same depth (balanced). InnoDB uses B+Trees where all data is in leaf nodes and leaves are linked for range scans. Supports: equality (=), range (>, <, BETWEEN), prefix matching (LIKE "abc%"), ORDER BY, and GROUP BY optimization. Doesn't support: LIKE "%abc" (leading wildcard), full-text search.

Permalink

Q86:

Why is column order important in composite indexes?

Senior

Answer

MySQL uses composite indexes left-to-right. If the first column is not used in filtering, the index cannot be used efficiently.

Quick Summary: Composite index (a, b, c): MySQL can use it for queries that filter on (a), (a, b), or (a, b, c) but not on (b) alone or (c) alone (leftmost prefix rule). Put the most selective and equality-matched column first. Range conditions should come last. index on (user_id, created_at) helps: WHERE user_id = 5 ORDER BY created_at - both filter and sort are covered.

Permalink

Q87:

How does MySQL detect and resolve deadlocks?

Senior

Answer

InnoDB builds a waits-for graph and checks for cycles. When detected, it rolls back the cheapest transaction to break the deadlock.

Quick Summary: InnoDB deadlock detection: a background thread checks for lock wait cycles. When it finds a cycle (A waits for B, B waits for A), it selects a victim transaction (usually the one with fewest row modifications) and rolls it back. The other transaction proceeds. Error 1213 is returned to the rolled-back transaction's client. Enable innodb_print_all_deadlocks=ON to log all deadlocks.

Permalink

Q88:

How does MySQL handle high-concurrency environments?

Senior

Answer

MySQL uses row-level locks, MVCC, adaptive hash indexing, and buffer pool optimizations. Proper schema design and indexing reduce contention.

Quick Summary: High-concurrency MySQL: use InnoDB (row-level locking, MVCC). Keep transactions short to minimize lock hold time. Use connection pooling (ProxySQL, application-level pools) to limit connection overhead. Read-write splitting: route SELECTs to replicas. Add read replicas for read scaling. Use Redis for caching frequently read data. Avoid long-running queries that hold locks and block others.

Permalink

Q89:

What is the difference between shared locks and exclusive locks?

Senior

Answer

Shared locks permit concurrent reads, while exclusive locks block all reads and writes. Writes always require exclusive locks.

Quick Summary: Shared lock (S lock): multiple transactions can hold shared locks on the same row simultaneously - used for reads. Exclusive lock (X lock): only one transaction can hold an exclusive lock - blocks all other reads and writes. SELECT ... FOR SHARE acquires shared locks. SELECT ... FOR UPDATE acquires exclusive locks. INSERT, UPDATE, DELETE acquire exclusive row locks automatically.

Permalink

Q90:

What is a redo log and why is it important?

Senior

Answer

Redo logs store committed changes before flushing to disk. They guarantee durability and allow crash recovery.

Quick Summary: Redo log records every change made to InnoDB data pages before the change is applied. On crash, MySQL replays the redo log from the last checkpoint to recover uncommitted changes. Written sequentially (fast). Controlled by innodb_log_file_size. Larger redo log = better write throughput (less checkpoint frequency) but longer crash recovery time. Critical for InnoDB's durability guarantee.

Permalink

Q91:

What is a binlog and how does it differ from redo logs?

Senior

Answer

Binlogs store logical SQL changes for replication. Redo logs store physical page changes for recovery. They serve distinct purposes.

Quick Summary: Binlog (binary log) records all SQL changes (DDL + DML) for replication and point-in-time recovery. Written after transaction commits. Used by replicas to replay changes. Differs from redo log: binlog is at the SQL/logical level, redo log is at the physical page level. Redo log is for crash recovery; binlog is for replication and PITR. Both are needed for durability + replication.

Permalink

Q92:

How do you design MySQL for horizontal scalability?

Senior

Answer

Use sharding, replicas, partitioning, and stateless application logic. Avoid cross-shard joins and rely on denormalized or distributed data models.

Quick Summary: Horizontal MySQL scaling: read replicas (distribute SELECT queries across multiple replicas). Sharding (split data across multiple MySQL instances by user_id range, geography, or hash). ProxySQL for intelligent query routing. Vitess for automated sharding and replication management at scale. Application-level sharding with consistent hashing. Sharding adds complexity - start with replicas and vertical scaling first.

Permalink

Q93:

What is the purpose of GTID in MySQL replication?

Senior

Answer

GTIDs uniquely identify each transaction and simplify failover, resynchronization, and replication consistency.

Quick Summary: GTID (Global Transaction Identifier) assigns a unique ID to every committed transaction on the primary. Format: source_uuid:transaction_id. Replicas track which GTIDs they've applied. Benefits: precise failover (replica can continue exactly from where it was), easier to identify which transactions a replica has applied, and supports multi-source replication. Replace old file+offset based replication tracking.

Permalink

Q94:

What is semi-synchronous replication?

Senior

Answer

Semi-sync replication requires at least one replica to acknowledge before the primary commits, providing improved durability.

Quick Summary: Semi-synchronous replication: primary waits for at least one replica to acknowledge receiving the transaction to its relay log before returning success to the client. Not the same as fully synchronous (which waits for commit on replica). Reduces data loss window compared to async replication. Latency increases by one network round-trip. Falls back to async if no replica acknowledges within rpl_semi_sync_master_timeout.

Permalink

Q95:

What causes replication lag and how do you detect it?

Senior

Answer

Lag occurs when replicas apply changes slower than the primary. Use Seconds_Behind_Master and check slow queries or overloaded replica hardware.

Quick Summary: Replication lag causes: replica hardware slower than primary, large transactions (single big transaction blocks replica for its full duration), missing indexes on replica (applies DML slower), insufficient replica I/O or SQL threads. Detect: SHOW SLAVE STATUS shows Seconds_Behind_Master. Monitor with Prometheus + replication exporter. Fix: optimize slow queries on replica, use parallel replication (slave_parallel_workers).

Permalink

Q96:

What is the difference between statement-based and row-based replication?

Senior

Answer

Statement-based logs SQL text while row-based logs changed values. Row-based avoids inconsistencies but increases log size.

Quick Summary: Statement-based replication (SBR): logs the SQL statement itself. Compact, but non-deterministic functions (NOW(), UUID(), RAND()) can cause replica divergence. Row-based replication (RBR): logs the actual row changes (before + after images). Larger binlog but accurate for all operations. Mixed: uses SBR when safe, RBR otherwise. MySQL recommends ROW format for most cases - it's safer and supports more operations correctly.

Permalink

Q97:

How do you handle inconsistent replica data?

Senior

Answer

Use checksums, pt-table-sync, or resync using GTID. In severe cases, recreate the replica.

Quick Summary: Inconsistent replica data: prevent by using row-based replication and read_only=ON on replicas. Detect with pt-table-checksum (Percona Toolkit) - compares checksums of tables across primary and replicas. Fix inconsistencies with pt-table-sync. Rebuild from scratch: stop replica, take fresh backup from primary, restore and start replication from the backup's binlog position.

Permalink

Q98:

How do you design indexes for large OLTP systems?

Senior

Answer

Use highly selective, narrow, composite indexes. Avoid indexing low-cardinality columns and review real query patterns regularly.

Quick Summary: Index design for OLTP: identify the most common query patterns and their WHERE, JOIN, and ORDER BY columns. Create indexes for each query pattern. Composite indexes for multi-column filters. Covering indexes for high-frequency queries. Avoid redundant indexes (index on (a,b) makes a separate index on (a) redundant). Use pt-duplicate-key-checker to find redundant indexes. Monitor with sys.schema_unused_indexes.

Permalink

Q99:

What is query profiling in MySQL?

Senior

Answer

Query profiling breaks execution into detailed stages. It helps locate time-consuming operations like sorting or copying.

Quick Summary: Query profiling in MySQL: enable slow query log (slow_query_log=ON, long_query_time=1). Use EXPLAIN and EXPLAIN ANALYZE (MySQL 8). Use performance_schema to find expensive queries: SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC. Or use sys.statements_with_runtimes_in_95th_percentile. Percona's pt-query-digest analyzes slow query logs.

Permalink

Q100:

How does MySQL handle sorting and temporary tables?

Senior

Answer

MySQL sorts using in-memory temp tables when possible and switches to disk-based files if data exceeds memory limits.

Quick Summary: MySQL uses in-memory sort buffers (sort_buffer_size) for ORDER BY. If result fits in buffer: in-memory quicksort. If too large: uses temporary files on disk (filesort with external merge sort). EXPLAIN shows "Using filesort" when sorting can't use an index. Temporary tables: created in memory (tmp_table_size, max_heap_table_size limit) or spill to disk. Increase these limits or add indexes to avoid disk operations.

Permalink

Q101:

What is the difference between HASH and B-Tree indexes?

Senior

Answer

HASH indexes support equality lookups and are used in MEMORY tables. B-Tree indexes support ranges, sorting, and prefix lookups.

Quick Summary: B-Tree index: sorted structure, supports range queries, ORDER BY, BETWEEN, prefix LIKE - versatile. HASH index: maps keys to hash buckets, O(1) exact lookup, no range queries, no sorting, no prefix matching. InnoDB only supports B-Tree (it simulates hash with adaptive hash index in memory automatically). MEMORY engine supports explicit HASH indexes. For most use cases, B-Tree is the right choice.

Permalink

Q102:

How do you optimize JOIN performance in multi-table queries?

Senior

Answer

Index join keys, simplify join order, avoid functions on columns, and use covering indexes to reduce table lookups.

Quick Summary: Optimize multi-table JOIN performance: index every join column on the driving table's side (and ensure the joined table's PK or indexed FK). Reduce the result set before joining with WHERE on the earliest possible table. Use STRAIGHT_JOIN to force join order if the optimizer chooses wrong. Use EXPLAIN to verify join type is ref or eq_ref (not ALL). Minimize columns in SELECT to reduce memory usage.

Permalink

Q103:

Why can functions in WHERE clause hurt performance?

Senior

Answer

Functions prevent MySQL from using indexes because they alter column values, forcing full scans.

Quick Summary: Functions in WHERE clause prevent index usage on that column. WHERE YEAR(created_at) = 2024 causes a full scan because MySQL must evaluate YEAR() for every row. Rewrite to use range: WHERE created_at BETWEEN "2024-01-01" AND "2024-12-31" - this uses the index on created_at. Similarly: WHERE LOWER(email) = "alice" won't use an index. Use functional indexes (MySQL 8) or store pre-computed values.

Permalink

Q104:

How do you handle large DELETE operations efficiently?

Senior

Answer

Delete in batches, drop partitions instead of deleting rows, or disable constraints if safe. This avoids long locks and table bloat.

Quick Summary: Large DELETE operations lock rows (InnoDB row-level locks) and generate undo log entries. Delete millions of rows at once = huge lock contention and slow rollback if something fails. Best practice: delete in batches with a loop: DELETE FROM logs WHERE created_at < cutoff LIMIT 1000; repeat until 0 rows affected. Use pt-archiver for efficient large-scale deletion. Schedule during off-peak hours.

Permalink

Q105:

Why was the MySQL query cache removed in version 8?

Senior

Answer

Query cache caused severe contention and invalidation overhead under concurrency. MySQL 8 removed it for scalability reasons.

Quick Summary: MySQL 8 removed the query cache because it was a bottleneck under high concurrency (global mutex lock on every query/invalidation). The cache was invalidated on any write to the table - making it useless for write-heavy tables. Modern applications use application-level caching (Redis, Memcached) which is far more effective, scalable, and controllable. The query cache removal was the right call.

Permalink

Q106:

What is the importance of ANALYZE TABLE?

Senior

Answer

ANALYZE TABLE refreshes statistics so the optimizer can choose efficient execution plans.

Quick Summary: ANALYZE TABLE updates the table statistics that the MySQL query optimizer uses to choose execution plans. Scans the table and updates cardinality estimates in information_schema. Run after: bulk data loads, large insertions/deletions, or when EXPLAIN shows clearly wrong row estimates. InnoDB auto-updates statistics but ANALYZE forces an immediate refresh. Essential after significant data distribution changes.

Permalink

Q107:

What happens internally during an UPDATE statement?

Senior

Answer

MySQL locks the row, updates memory pages, logs changes to redo logs, modifies undo logs for MVCC, and writes binlog entries.

Quick Summary: InnoDB UPDATE internals: 1) Find the row using index or full scan. 2) Acquire exclusive row lock. 3) Write old values to undo log (for rollback and MVCC old-version reads). 4) Write the change to redo log (WAL). 5) Modify the row in the buffer pool (in memory). 6) On COMMIT: flush redo log to disk. Data pages may be written to disk lazily (checkpoint). Secondary indexes also updated.

Permalink

Q108:

What is a gap lock in InnoDB?

Senior

Answer

Gap locks block inserts in a range. They enforce REPEATABLE READ isolation and reduce phantom reads but can increase contention.

Quick Summary: Gap lock in InnoDB: locks a gap between two existing index values to prevent other transactions from inserting rows in that range. Example: if you have rows with id 10 and 20, a gap lock on the gap 10-20 prevents inserts of id 11-19. Gap locks prevent phantom reads in REPEATABLE READ. They only exist in REPEATABLE READ isolation level - not in READ COMMITTED (which trades phantom-read protection for less locking).

Permalink

Q109:

Why is REPEATABLE READ default in MySQL?

Senior

Answer

It provides strong consistency while relying on MVCC snapshots to avoid heavy locking and phantom reads.

Quick Summary: REPEATABLE READ is MySQL InnoDB's default isolation level because it provides a good balance: consistent reads (your snapshot doesn't change mid-transaction), prevents dirty reads and non-repeatable reads, and prevents most phantom reads (via gap locks). Higher than READ COMMITTED (which allows non-repeatable reads) but lower than SERIALIZABLE (which would lock everything). Suitable for most OLTP workloads.

Permalink

Q110:

How do you tune MySQL for high-write workloads?

Senior

Answer

Increase buffer pool size, tune redo logs, minimize secondary indexes, optimize flush frequency, use batching, and partition large tables.

Quick Summary: Tune MySQL for high write throughput: increase innodb_buffer_pool_size (reduce disk writes), set innodb_flush_log_at_trx_commit=2 (flush every second instead of every commit - risk 1 second of data loss), use innodb_io_capacity tuned to SSD IOPS, enable innodb_flush_method=O_DIRECT (bypass OS cache for data files), batch inserts, disable binary logging for non-replicated servers, and use fast SSDs.

Permalink

Amazon Interview MySQL Interview Questions

MySQL Interview Questions & Answers

Questions

What is MySQL?

Answer

What is a database table?

Answer

What is a primary key?

Answer

What is a foreign key?

Answer

What is SQL?

Answer

What are the different types of SQL commands?

Answer

What is a constraint in MySQL?

Answer

What is the difference between CHAR and VARCHAR?

Answer

What is the purpose of the SELECT statement?

Answer

What is a WHERE clause used for?

Answer

What is ORDER BY used for?

Answer

What is LIMIT in MySQL?

Answer

What is a JOIN?

Answer

What is an index in MySQL?

Answer

What is NULL in MySQL?

Answer

Difference between DELETE and TRUNCATE?

Answer

What does AUTO_INCREMENT do?

Answer

What is a default value in MySQL?

Answer

What is the purpose of GROUP BY?

Answer

What is an aggregate function?

Answer

How does MySQL’s InnoDB storage engine manage transactions internally?

Answer

What is MVCC (Multi-Version Concurrency Control) in MySQL?

Answer

What is the purpose of the InnoDB buffer pool?

Answer

What is a covering index?

Answer

What is an execution plan and why is EXPLAIN important?

Answer

What is the difference between row-level and table-level locking?

Answer

How does MySQL handle indexing on large tables?

Answer

What is a slow query log and when should you enable it?

Answer

What is the difference between logical and physical backups?

Answer

What is a stored routine and why use it?

Answer

How does MySQL handle foreign key constraints during deletes or updates?

Answer

What is partitioning and when is it useful?

Answer

What are FULLTEXT indexes and when are they used?

Answer

What is a deadlock detector in InnoDB?

Answer

How do prepared statements improve performance and security?

Answer

What are triggers and their performance implications?

Answer

What is the difference between CHAR, VARCHAR, and TEXT in performance?

Answer

How do you optimize JOIN performance in MySQL?

Answer

How does MySQL handle replication?