PostgreSQL Interview Questions

Q1:

What is PostgreSQL?

Entry

Answer

PostgreSQL is an open-source relational database known for reliability, SQL compliance, JSON support, advanced indexing, and ACID transactions.

Permalink

Q2:

What is a database cluster in PostgreSQL?

Entry

Answer

A PostgreSQL cluster is a collection of databases managed by a single server instance, sharing one data directory.

Permalink

Q3:

What is a schema in PostgreSQL?

Entry

Answer

A schema is a logical container used to group tables, views, and functions. It helps organize objects and avoid name conflicts.

Permalink

Q4:

What is the difference between a table and a view?

Entry

Answer

A table stores data physically, while a view is a virtual table created from a SELECT query and stores no actual data.

Permalink

Q5:

What is a primary key in PostgreSQL?

Entry

Answer

A primary key uniquely identifies each row and automatically creates a unique index for fast lookups.

Permalink

Q6:

What is a foreign key?

Entry

Answer

A foreign key links a column to another table’s primary key to enforce referential integrity.

Permalink

Q7:

What data types does PostgreSQL support?

Entry

Answer

PostgreSQL supports numeric, text, date/time, Boolean, arrays, JSON/JSONB, UUID, geometric types, and custom types.

Permalink

Q8:

What is the purpose of the SELECT statement?

Entry

Answer

SELECT retrieves data from tables or views and supports filtering, sorting, grouping, and formatting.

Permalink

Q9:

What is the WHERE clause used for?

Entry

Answer

WHERE filters rows that meet specified conditions, returning only relevant records.

Permalink

Q10:

What is ORDER BY in PostgreSQL?

Entry

Answer

ORDER BY sorts query results in ascending or descending order.

Permalink

Q11:

What is the LIMIT clause?

Entry

Answer

LIMIT restricts the number of returned rows and is often used for pagination.

Permalink

Q12:

What is a UNIQUE constraint?

Entry

Answer

A UNIQUE constraint ensures all values in a column or column group are distinct.

Permalink

Q13:

What is the purpose of the INSERT statement?

Entry

Answer

INSERT adds new rows to a table, either individually or using INSERT SELECT.

Permalink

Q14:

What is the difference between DELETE and DROP?

Entry

Answer

DELETE removes rows but keeps the table structure. DROP removes the entire table.

Permalink

Q15:

What is a default value in PostgreSQL?

Entry

Answer

A default value is automatically applied when no explicit value is provided during INSERT.

Permalink

Q16:

What is the PostgreSQL psql tool used for?

Junior

Answer

psql is the interactive command-line tool for connecting to PostgreSQL, executing SQL queries, running scripts, and viewing metadata.

Permalink

Q17:

What is the purpose of the SERIAL data type?

Junior

Answer

SERIAL creates an auto-incrementing integer column using sequences that automatically generate unique values.

Permalink

Q18:

What is the difference between TEXT and VARCHAR?

Junior

Answer

TEXT stores unlimited variable-length text, whereas VARCHAR can enforce a maximum length. Performance difference is minimal.

Permalink

Q19:

What are ENUM types used for?

Junior

Answer

ENUM restricts a column to predefined string values, useful for static categories such as status fields.

Permalink

Q20:

What is a CHECK constraint?

Junior

Answer

CHECK ensures column values follow specific rules, enforcing data integrity at the database level.

Permalink

Q21:

What is the purpose of the RETURNING clause in PostgreSQL?

Junior

Answer

RETURNING returns columns from rows affected by INSERT, UPDATE, or DELETE, useful for retrieving generated IDs.

Permalink

Q22:

What is the difference between INNER JOIN and FULL OUTER JOIN?

Junior

Answer

INNER JOIN returns only matching rows, while FULL OUTER JOIN returns all rows from both tables with NULLs for non-matching data.

Permalink

Q23:

What is a sequence in PostgreSQL?

Junior

Answer

A sequence is an independent object that generates incremental numeric values, commonly used for primary keys.

Permalink

Q24:

What is the purpose of the COALESCE function?

Junior

Answer

COALESCE returns the first non-NULL value from a list, useful for handling missing data.

Permalink

Q25:

What are indexes in PostgreSQL and why are they important?

Junior

Answer

Indexes speed up reads by creating fast lookup structures. PostgreSQL supports B-tree, Hash, GIN, GiST, SP-GiST, and BRIN.

Permalink

Q26:

What is a partial index?

Junior

Answer

A partial index applies only to rows meeting a condition, reducing storage and improving performance.

Permalink

Q27:

What is a composite index?

Junior

Answer

A composite index includes multiple columns and is used left to right, making column order important for optimization.

Permalink

Q28:

What is a materialized view?

Junior

Answer

A materialized view stores query results physically and must be refreshed manually, improving performance for expensive queries.

Permalink

Q29:

What is the difference between DELETE and TRUNCATE in PostgreSQL?

Junior

Answer

DELETE removes rows with full logging. TRUNCATE removes all rows instantly, resets identities, and is minimally logged.

Permalink

Q30:

What is the VACUUM operation in PostgreSQL?

Junior

Answer

VACUUM removes dead tuples, reclaims storage, improves performance, and prevents transaction ID wraparound.

Permalink

Q31:

What is the ANALYZE command used for?

Junior

Answer

ANALYZE collects table statistics used by the query optimizer to choose efficient execution plans.

Permalink

Q32:

What is the difference between a function and a stored procedure in PostgreSQL?

Junior

Answer

Functions return values and can be used in queries, while procedures do not return values and can manage transactions.

Permalink

Q33:

What is PostgreSQL’s JSONB data type?

Junior

Answer

JSONB stores JSON in a binary format for fast searching and indexing, supporting advanced JSON operations.

Permalink

Q34:

What are foreign data wrappers (FDW)?

Junior

Answer

FDWs allow PostgreSQL to query external data sources like MySQL, MongoDB, APIs, or files as if they were local.

Permalink

Q35:

What is the purpose of the EXPLAIN command?

Junior

Answer

EXPLAIN shows how a query will execute, including index usage, join types, and cost estimates.

Permalink

Q36:

What is PostgreSQL’s MVCC and how does it enable concurrent reads and writes?

Mid

Answer

MVCC gives each transaction a snapshot of the database. PostgreSQL stores multiple row versions with transaction IDs, allowing readers and writers to work without blocking each other.

Permalink

Q37:

What is a tuple in PostgreSQL?

Mid

Answer

A tuple is a row stored in a PostgreSQL table. Updates create new tuple versions while old ones become dead tuples until vacuumed.

Permalink

Q38:

What is a transaction ID and why is wraparound dangerous?

Mid

Answer

Each transaction gets a unique XID. Because XIDs are 32-bit, they eventually wrap around, risking marking live rows as expired. VACUUM prevents wraparound by freezing tuples.

Permalink

Q39:

What is VACUUM FULL and how does it differ from regular VACUUM?

Mid

Answer

VACUUM FULL rewrites the table, reclaiming space and compacting rows but requires an exclusive lock. Regular VACUUM only removes dead tuples and updates visibility maps without heavy locking.

Permalink

Q40:

What is the visibility map in PostgreSQL?

Mid

Answer

The visibility map tracks pages containing only visible tuples, enabling index-only scans without reading table pages.

Permalink

Q41:

What is an index-only scan?

Mid

Answer

An index-only scan satisfies a query using only index data when all required columns are in the index, avoiding table reads.

Permalink

Q42:

What is the difference between B-Tree, GIN, and GiST indexes?

Mid

Answer

B-Tree handles equality and range queries. GIN is optimized for arrays, JSONB, and full-text search. GiST supports ranges, geometric types, and custom index logic.

Permalink

Q43:

When would you use a BRIN index?

Mid

Answer

BRIN is ideal for very large tables where data is naturally ordered, storing summaries of value ranges to minimize storage.

Permalink

Q44:

What is the purpose of pg_stat_activity?

Mid

Answer

pg_stat_activity displays running queries, wait events, and session details for debugging locks and performance issues.

Permalink

Q45:

What is a CTE and why use it?

Mid

Answer

A CTE organizes complex queries into readable blocks and supports recursion and reusable query logic.

Permalink

Q46:

What is the difference between a CTE and a subquery?

Mid

Answer

CTEs improve readability and reusability, while subqueries are nested directly. CTEs may materialize results depending on PostgreSQL version.

Permalink

Q47:

How do recursive CTEs work?

Mid

Answer

Recursive CTEs contain an anchor part and a recursive part that repeats until no new rows are returned.

Permalink

Q48:

What are dead tuples and why do they accumulate?

Mid

Answer

Dead tuples are obsolete row versions created by updates and deletes. They accumulate because PostgreSQL never overwrites rows.

Permalink

Q49:

How does PostgreSQL handle deadlocks?

Mid

Answer

PostgreSQL detects cycles in lock waits and cancels one transaction, usually the one with lowest cost.

Permalink

Q50:

What is a tablespace?

Mid

Answer

A tablespace maps database storage to filesystem locations, enabling distribution of data across disks.

Permalink

Q51:

What are PostgreSQL extensions?

Mid

Answer

Extensions add new features such as PostGIS, pg_stat_statements, and hstore without modifying core PostgreSQL.

Permalink

Q52:

What is pg_stat_statements used for?

Mid

Answer

pg_stat_statements collects query execution statistics like total time and frequency, useful for performance tuning.

Permalink

Q53:

What is WAL in PostgreSQL?

Mid

Answer

WAL logs all changes before writing data files to ensure durability and enable crash recovery.

Permalink

Q54:

What is a WAL checkpoint?

Mid

Answer

A checkpoint flushes dirty pages to disk, reducing recovery time but increasing I/O load if too frequent.

Permalink

Q55:

What is WAL archiving?

Mid

Answer

WAL archiving stores old WAL segments for point-in-time recovery and disaster recovery.

Permalink

Q56:

What are HOT updates?

Mid

Answer

HOT updates modify rows without updating indexes when non-indexed columns change, reducing index bloat.

Permalink

Q57:

What is index bloat and how is it fixed?

Mid

Answer

Index bloat occurs when indexes accumulate dead entries. Fix with REINDEX or VACUUM FULL.

Permalink

Q58:

What is logical replication?

Mid

Answer

Logical replication sends row-level changes to subscribers and supports selective table replication.

Permalink

Q59:

Difference between physical and logical replication?

Mid

Answer

Physical replicates data files, while logical replicates row changes. Physical is faster; logical is more flexible.

Permalink

Q60:

What is synchronous replication?

Mid

Answer

Synchronous replication waits for confirmation from at least one replica before committing.

Permalink

Q61:

What is asynchronous replication?

Mid

Answer

Asynchronous replication commits without waiting for replicas, improving performance at the cost of durability.

Permalink

Q62:

What is the role of autovacuum?

Mid

Answer

Autovacuum removes dead tuples, prevents wraparound, and updates visibility maps.

Permalink

Q63:

What are prepared statements?

Mid

Answer

Prepared statements compile SQL once and reuse execution plans, improving speed and preventing SQL injection.

Permalink

Q64:

What is query planning and cost estimation?

Mid

Answer

The planner evaluates possible execution paths and selects the lowest-cost plan using statistics.

Permalink

Q65:

Why does PostgreSQL choose sequential scans sometimes?

Mid

Answer

Sequential scans occur when tables are small, statistics are outdated, or index selectivity is poor.

Permalink

Q66:

What is ANALYZE and why is it important?

Mid

Answer

ANALYZE updates statistics used by the optimizer. Outdated stats lead to poor performance.

Permalink

Q67:

What is keyset pagination?

Mid

Answer

Keyset pagination uses indexed values to fetch the next rows efficiently, avoiding slow OFFSET scans.

Permalink

Q68:

What is constraint exclusion?

Mid

Answer

Constraint exclusion allows PostgreSQL to skip irrelevant partitions in partitioned tables.

Permalink

Q69:

What are generated columns?

Mid

Answer

Generated columns compute values from other columns, either stored physically or computed on demand.

Permalink

Q70:

Difference between BEFORE and AFTER triggers?

Mid

Answer

BEFORE triggers validate or modify data before writes. AFTER triggers run after data is written.

Permalink

Q71:

How does concurrent index creation work?

Mid

Answer

CREATE INDEX CONCURRENTLY builds indexes without blocking writes, though it takes longer to complete.

Permalink

Q72:

What is a partial index and when is it useful?

Mid

Answer

Partial indexes apply only to rows satisfying a condition, reducing index size and improving efficiency.

Permalink

Q73:

What is the Query Execution Engine?

Mid

Answer

The execution engine runs the query plan, performing scans, joins, filtering, and computations.

Permalink

Q74:

What are temporary tables?

Mid

Answer

Temporary tables exist only during a session and are used for intermediate data processing.

Permalink

Q75:

What causes buffer cache misses and how do you prevent them?

Mid

Answer

Buffer misses occur when needed pages are not in memory. Prevent with tuning shared_buffers, indexing correctly, and reducing full scans.

Permalink

Q76:

How does PostgreSQL internally implement MVCC without using locks?

Senior

Answer

PostgreSQL stores multiple row versions with transaction IDs. Readers check tuple visibility using snapshots, avoiding locks. Writers create new versions instead of overwriting data. VACUUM later removes dead tuples.

Permalink

Q77:

What is the role of hint bits in tuple visibility?

Senior

Answer

Hint bits mark tuples as committed or rolled back after checks, reducing future visibility lookups. These updates happen lazily and do not require WAL writes.

Permalink

Q78:

How do index-only scans work and when do they fail?

Senior

Answer

Index-only scans succeed when the visibility map confirms all tuples on a page are visible. If pages contain dead tuples, PostgreSQL must recheck the heap, making the scan not fully index-only.

Permalink

Q79:

How does PostgreSQL manage tuple freezing and why is it required?

Senior

Answer

Tuple freezing marks old tuples with special XIDs to prevent wraparound. Autovacuum freezes tuples in stages to avoid large rewrites and ensure data remains valid indefinitely.

Permalink

Q80:

What are HOT updates and why are they important?

Senior

Answer

HOT updates modify rows without touching indexes when only non-indexed columns change. This reduces index bloat and improves write performance.

Permalink

Q81:

What is index bloat and how is it handled in PostgreSQL?

Senior

Answer

Index bloat occurs due to dead entries. It is resolved using REINDEX, VACUUM FULL, or pg_repack for online index rebuilds without blocking writes.

Permalink

Q82:

How does PostgreSQL implement full-text search?

Senior

Answer

Full-text search uses tsvector and tsquery types with GIN or GiST indexes. Dictionaries handle stemming, tokenization, and stop-word removal.

Permalink

Q83:

Difference between GIN and GiST indexes and when to choose each?

Senior

Answer

GIN is optimized for discrete values like arrays and JSONB. GiST supports flexible structures like geospatial or range types. GIN is fast for reads; GiST handles approximate matching.

Permalink

Q84:

Why is autovacuum tuning important in large systems?

Senior

Answer

Autovacuum tuning prevents bloat, avoids XID wraparound, and keeps index efficiency high. Parameters like scale factors, cost delay, and worker count are critical.

Permalink

Q85:

How does PostgreSQL use multi-column statistics?

Senior

Answer

Multi-column statistics capture value correlations across columns to help the optimizer choose better join orders and filter strategies.

Permalink

Q86:

What is WAL and how does WAL archiving support disaster recovery?

Senior

Answer

WAL logs every change before writing to data files. WAL archiving stores old WAL segments enabling point-in-time recovery.

Permalink

Q87:

What are the performance implications of increasing shared_buffers?

Senior

Answer

Increasing shared_buffers improves caching but may cause slow checkpoints or reduce OS cache efficiency. A balanced configuration is required.

Permalink

Q88:

How do checkpoints affect PostgreSQL performance?

Senior

Answer

Frequent checkpoints cause I/O spikes, while infrequent ones increase crash recovery time. Proper tuning balances durability with performance.

Permalink

Q89:

What is synchronous_commit and its effect on durability?

Senior

Answer

synchronous_commit waits for WAL flush before acknowledging commits, improving durability but increasing latency.

Permalink

Q90:

How does PostgreSQL handle isolation levels, especially REPEATABLE READ?

Senior

Answer

REPEATABLE READ ensures consistent snapshots but not full phantom protection. SERIALIZABLE with SSI is required to prevent anomalies.

Permalink

Q91:

What is Serializable Snapshot Isolation (SSI)?

Senior

Answer

SSI tracks read-write dependencies to detect anomaly risks. PostgreSQL aborts conflicting transactions to guarantee serializability.

Permalink

Q92:

How does PostgreSQL detect lock contention?

Senior

Answer

Lock contention is detected using pg_locks, pg_stat_activity, and wait events. These tools reveal blocked queries and lock chains.

Permalink

Q93:

How can join performance be optimized in PostgreSQL?

Senior

Answer

Optimize joins using proper join orders, indexing, avoiding functions on indexed columns, and ensuring accurate statistics.

Permalink

Q94:

How does PostgreSQL choose between various scan types?

Senior

Answer

The optimizer evaluates selectivity, table size, and cost. Sequential scans are chosen for large portions of tables; bitmap scans help combine multiple indexes.

Permalink

Q95:

What are temp buffers and how do they differ from shared buffers?

Senior

Answer

Temp buffers store session-local temporary table data and bypass shared buffers, affecting performance based on temp_buffers.

Permalink

Q96:

How does PostgreSQL execute parallel queries?

Senior

Answer

Parallel workers perform scans, joins, and aggregates. Parallelism depends on cost thresholds, table size, and query shape.

Permalink

Q97:

How should you design table partitioning for large datasets?

Senior

Answer

Partition keys should have high cardinality. Range and hash partitioning distribute data and enable pruning for faster queries.

Permalink

Q98:

What is the role of FDW (Foreign Data Wrapper) in enterprise systems?

Senior

Answer

FDWs enable PostgreSQL to query external databases or APIs, supporting federated queries and hybrid architectures.

Permalink

Q99:

How does PostgreSQL optimize JSONB querying?

Senior

Answer

GIN indexes accelerate JSONB containment and key-based queries. Operators like @> and ? support flexible search patterns.

Permalink

Q100:

How are materialized view refresh strategies handled?

Senior

Answer

Materialized views require manual refresh. Concurrent refresh avoids locking but is slower. Used for analytics and reporting.

Permalink

Q101:

What is TOAST and how does PostgreSQL store large values?

Senior

Answer

TOAST stores oversized values externally, compressing or splitting them to keep rows small and efficient.

Permalink

Q102:

How does PostgreSQL handle high write throughput?

Senior

Answer

Use partitioning, tuned WAL settings, more autovacuum workers, and unlogged tables. Scaling writes often requires sharding or logical replication.

Permalink

Q103:

What is pg_repack and why is it used?

Senior

Answer

pg_repack rebuilds tables and indexes without blocking writes, unlike VACUUM FULL, making it ideal for production cleanup.

Permalink

Q104:

How do you handle long-running transactions?

Senior

Answer

Long transactions block vacuum cleanup and cause bloat. Use timeouts, break transactions into smaller pieces, and monitor stale sessions.

Permalink

Q105:

What is parallel query skew and how do you detect it?

Senior

Answer

Parallel skew occurs when workers process uneven workloads. Detected via EXPLAIN with parallel details and fixed with better partitioning or query rewrites.

Permalink

Q106:

How does PostgreSQL achieve Serializable isolation without using strict two-phase locking?

Expert

Answer

PostgreSQL uses Serializable Snapshot Isolation (SSI), which tracks read-write dependencies. If a dangerous pattern occurs, it aborts one transaction to maintain serializability. This avoids heavy 2PL locking while preserving correctness.

Permalink

Q107:

How do you design a PostgreSQL cluster that supports multi-region writes with minimal conflict?

Expert

Answer

Design uses logical replication with conflict-free partitioning where each region owns a separate writable key range. Use region-prefixed keys or GUID variants to avoid collisions. Tools like BDR support conflict handlers for true multi-master write safety.

Permalink

Q108:

How do you troubleshoot severe performance regression caused by the PostgreSQL planner choosing a wrong execution path?

Expert

Answer

Analyze EXPLAIN ANALYZE output to compare estimated vs actual row counts. Refresh statistics, increase default_statistics_target, create extended stats, add selective indexes, or rewrite queries. Correcting cardinality estimation is the core fix.

Permalink

Q109:

How does PostgreSQL internally manage WAL syncing for extreme durability requirements?

Expert

Answer

PostgreSQL flushes WAL buffers using wal_writer and synchronous_commit settings. Extreme durability requires synchronous_commit=on, full_page_writes=on, fast NVMe storage, tuned WAL-segmentation, and aggressive archiving.

Permalink

Q110:

How do you handle massive table bloat in mission-critical tables without downtime?

Expert

Answer

Use pg_repack to rebuild tables and indexes online with shadow copies. Alternatively rebuild tables on replicas via logical replication and fail over. VACUUM FULL is avoided due to locking.

Permalink

Q111:

How would you architect PostgreSQL for real-time analytics without overwhelming OLTP performance?

Expert

Answer

Use dual-layer architecture: OLTP writes to PostgreSQL, while logical decoding streams WAL changes to OLAP systems. Use incremental materialized views, partitioning, and move historical data to cold storage to isolate workloads.

Permalink

Q112:

How do you detect and fix snapshot-too-old errors in PostgreSQL?

Expert

Answer

Snapshot-too-old occurs when required row versions are vacuumed. Fix by avoiding long-running transactions, tuning autovacuum aggressively, running analytics on replicas, and monitoring vacuum lag via system views.

Permalink

Q113:

What is logical decoding and how is it used for event-driven PostgreSQL systems?

Expert

Answer

Logical decoding reads WAL and converts changes into logical events using plugins like pgoutput or wal2json. Used for CDC, microservice event sourcing, real-time pipelines, and incremental ETL.

Permalink

Q114:

How do you design partitioning for trillion-row datasets in PostgreSQL?

Expert

Answer

Use multi-level RANGE+HASH partitioning, align queries to pruning keys, distribute partitions across tablespaces, use BRIN indexes, automate partition lifecycle, and separate hot vs cold storage.

Permalink

Q115:

How do you ensure zero-data-loss failover (RPO=0) in PostgreSQL?

Expert

Answer

Enable synchronous replication with quorum commit so primary waits for WAL confirmation from replicas. Use low-latency networks, fast disks, aggressive lag monitoring, and tuned synchronous_standby_names settings to guarantee RPO=0.

Permalink

Top PostgreSQL Interview Questions