PostgreSQL Interview Cheat Sheet

1. What is PostgreSQL?

PostgreSQL is an open-source, enterprise-grade relational database known for reliability, standards compliance, and advanced features. It supports ACID transactions, complex queries, JSON, full-text search, extensions, and custom data types. Used widely in production for web apps, analytics, and geospatial data (PostGIS). Considered the most feature-rich open-source RDBMS.

Entry Full Answer →

2. What is a database cluster in PostgreSQL?

A database cluster in PostgreSQL is a collection of databases managed by a single PostgreSQL server instance, sharing the same data directory (PGDATA), configuration files, and WAL log. Not a "cluster" in the distributed sense - just one server managing multiple databases. Each cluster has its own port, processes, and system catalogs.

Entry Full Answer →

3. What is a schema in PostgreSQL?

A schema is a namespace inside a database that groups related tables, views, functions, and other objects. Default schema is "public". Multiple schemas in one database allow logical separation (like departments or tenants). Access with schema_name.table_name. search_path controls which schemas are searched without prefix. Useful for multi-tenant apps and organizing large databases.

Entry Full Answer →

4. What is the difference between a table and a view?

Table stores actual data rows on disk with persistent storage. View is a saved SELECT query - it looks like a table but computes results at query time (no stored data). Materialized view stores the query result on disk (like a cached table). Use views for simplifying complex queries and security (expose subset of columns). Use materialized views for performance when query is expensive and data can be slightly stale.

Entry Full Answer →

5. What is a primary key in PostgreSQL?

Primary key uniquely identifies each row in a table - must be unique and NOT NULL. PostgreSQL automatically creates a unique index on the primary key columns. You can define it inline (id SERIAL PRIMARY KEY) or as a table constraint. Composite primary keys use multiple columns together. The primary key is the main way to reference a row from foreign keys in other tables.

Entry Full Answer →

6. What is a foreign key?

Foreign key creates a referential integrity constraint - a column in one table must match a value in the primary key column of another table (or be NULL). Prevents orphaned records. ON DELETE/UPDATE: CASCADE (propagate change), SET NULL, SET DEFAULT, or RESTRICT (block the operation). FK lookups benefit from an index on the FK column in the child table.

Entry Full Answer →

7. What data types does PostgreSQL support?

PostgreSQL data types: Numeric (INTEGER, BIGINT, NUMERIC, REAL, DOUBLE PRECISION), Text (TEXT, VARCHAR(n), CHAR(n)), Boolean, Date/Time (DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL), JSON/JSONB (binary JSON), Arrays (any_type[]), UUID, ENUM (custom set of values), Geometric types, Network types (INET, CIDR), BYTEA (binary data), and custom composite types.

Entry Full Answer →

8. What is the purpose of the SELECT statement?

SELECT retrieves data from one or more tables. Basic: SELECT col1, col2 FROM table WHERE condition ORDER BY col LIMIT 10. SELECT * fetches all columns (avoid in production - fetches unnecessary data). Use column aliases: SELECT name AS full_name. Combine with JOIN for multi-table queries, GROUP BY for aggregation, HAVING to filter groups. The most used SQL statement.

Entry Full Answer →

9. What is the WHERE clause used for?

WHERE filters rows returned by SELECT, UPDATE, or DELETE - only rows matching the condition are included. Conditions use comparison operators (=, >, <, >=, <=, !=), logical operators (AND, OR, NOT), pattern matching (LIKE, ILIKE), NULL checks (IS NULL, IS NOT NULL), ranges (BETWEEN), and set membership (IN, NOT IN). WHERE is evaluated before GROUP BY.

Entry Full Answer →

10. What is ORDER BY in PostgreSQL?

ORDER BY sorts query results. ASC (ascending, default) or DESC (descending). ORDER BY multiple columns: ORDER BY last_name ASC, first_name ASC. NULL values sort last in ASC (NULLS LAST) by default. Use NULLS FIRST or NULLS LAST to control. Without ORDER BY, PostgreSQL makes no guarantee about result order - never assume rows come back in insert order.

Entry Full Answer →

11. What is the LIMIT clause?

LIMIT restricts the number of rows returned: SELECT * FROM users LIMIT 10. OFFSET skips rows: LIMIT 10 OFFSET 20 (page 3 of 10 results). Common for pagination. Caution: OFFSET-based pagination gets slower as offset grows (PostgreSQL must scan and discard rows). For large datasets, use keyset pagination (WHERE id > last_seen_id LIMIT 10) instead.

Entry Full Answer →

12. What is a UNIQUE constraint?

UNIQUE constraint ensures all values in a column (or combination of columns) are distinct. Unlike PRIMARY KEY, UNIQUE allows one NULL value. PostgreSQL automatically creates a unique index to enforce it. Add to single column: email TEXT UNIQUE. Composite unique: UNIQUE(user_id, product_id). Use to enforce natural business keys (email addresses, usernames, product codes).

Entry Full Answer →

13. What is the purpose of the INSERT statement?

INSERT adds new rows to a table. Single row: INSERT INTO users (name, email) VALUES ("Alice", "alice@example.com"). Multiple rows: INSERT INTO users VALUES (...), (...), (...). INSERT ... SELECT copies from another query. INSERT ... ON CONFLICT (upsert): handle duplicate key violations gracefully - DO NOTHING or DO UPDATE SET. Use RETURNING to get the inserted row back.

Entry Full Answer →

14. What is the difference between DELETE and DROP?

DELETE removes rows from a table but preserves table structure, can use WHERE, fires triggers, is logged in WAL, and is transactional (can be rolled back). DROP TABLE removes the entire table and its structure permanently - can't be rolled back after commit. TRUNCATE removes all rows fast (like DROP + recreate) but keeps the table structure - also transactional in PostgreSQL.

Entry Full Answer →

15. What is a default value in PostgreSQL?

A default value is used when no value is provided for a column during INSERT. Define with: age INTEGER DEFAULT 18. Common defaults: DEFAULT NOW() for timestamps, DEFAULT TRUE for boolean flags, DEFAULT gen_random_uuid() for UUID PKs, DEFAULT CURRENT_USER for audit fields. Defaults can be expressions or function calls. DEFAULT is applied column by column during INSERT.

Entry Full Answer →

16. What is the PostgreSQL psql tool used for?

psql is PostgreSQL's interactive command-line client. Connect: psql -h host -U user -d dbname. Run SQL interactively or from files (-f file.sql). Meta-commands: \l (list databases), \dt (list tables), \d tablename (describe table), \i file.sql (run file), \timing (show query time), \copy for CSV import/export. Essential tool for DBA work and debugging.

Junior Full Answer →

17. What is the purpose of the SERIAL data type?

SERIAL is a shorthand for creating an auto-incrementing integer column. SERIAL creates an INTEGER with a sequence. BIGSERIAL for BIGINT. Modern PostgreSQL prefers: id BIGINT GENERATED ALWAYS AS IDENTITY which is SQL-standard. Both create a sequence that provides the next value on each insert. Use BIGINT/BIGSERIAL for tables that might grow large to avoid integer overflow.

Junior Full Answer →

18. What is the difference between TEXT and VARCHAR?

TEXT and VARCHAR are functionally identical in PostgreSQL - both store unlimited length strings with the same performance. VARCHAR(n) enforces a maximum length (inserts longer strings fail). TEXT has no length limit. Unlike other databases, PostgreSQL TEXT is not slower than VARCHAR. Use TEXT unless you have a specific business reason to enforce a maximum length.

Junior Full Answer →

19. What are ENUM types used for?

ENUM types define a static set of allowed string values: CREATE TYPE mood AS ENUM ("happy", "sad", "neutral"). Use as column type: mood mood NOT NULL. Storage is efficient (1-4 bytes). Values have a defined order (useful for BETWEEN, ORDER BY). Downside: adding new values requires ALTER TYPE - needs exclusive lock in older PostgreSQL (non-blocking in newer versions).

Junior Full Answer →

20. What is a CHECK constraint?

CHECK constraint enforces a condition on column values: age INTEGER CHECK (age >= 0 AND age <= 150). Table-level CHECK for multi-column conditions: CHECK (end_date > start_date). Fails silently for NULL values (NULL is not checked - it passes). PostgreSQL validates CHECK on INSERT and UPDATE. NOT NULL is a special CHECK that prevents NULL. Use for business rule enforcement.

Junior Full Answer →

21. What is the purpose of the RETURNING clause in PostgreSQL?

RETURNING clause returns values from rows affected by INSERT, UPDATE, or DELETE - like a SELECT on modified rows. INSERT ... RETURNING id gets the auto-generated ID. UPDATE ... RETURNING new_value gets updated values. DELETE ... RETURNING * shows what was deleted. Eliminates a second SELECT query. More efficient and atomic than insert-then-select.

Junior Full Answer →

22. What is the difference between INNER JOIN and FULL OUTER JOIN?

INNER JOIN returns only rows with matching values in both tables (intersection). LEFT JOIN returns all rows from the left table plus matched rows from the right (unmatched right rows are NULL). RIGHT JOIN is the reverse. FULL OUTER JOIN returns all rows from both tables - matched where possible, NULL where not. Most commonly used: INNER JOIN for related data, LEFT JOIN to include optional relations.

Junior Full Answer →

23. What is a sequence in PostgreSQL?

A sequence is an independent database object that generates unique integer values in order. Used to auto-increment primary keys. CREATE SEQUENCE seq_name. Get next value: nextval("seq_name"). Current value: currval(). SERIAL/IDENTITY columns use sequences internally. Sequences are not transaction-aware - a rolled-back transaction still advances the sequence, creating gaps (this is expected and fine).

Junior Full Answer →

24. What is the purpose of the COALESCE function?

COALESCE returns the first non-NULL value from a list of arguments. COALESCE(col, "default") returns col if not NULL, else "default". Useful for: providing fallback values in queries, replacing NULLs for display. COALESCE(a, b, c) returns first non-NULL of a, b, c. Related: NULLIF(a, b) returns NULL if a equals b. Both are null-handling utilities used in SELECT columns and WHERE conditions.

Junior Full Answer →

25. What are indexes in PostgreSQL and why are they important?

Indexes speed up data retrieval by creating a sorted data structure (B-tree by default) on one or more columns. Without an index, PostgreSQL scans every row (seq scan). With an index, it jumps directly to matching rows (index scan). Tradeoff: faster reads but slower writes (index updated on every INSERT/UPDATE/DELETE). Index columns frequently used in WHERE, JOIN, and ORDER BY.

Junior Full Answer →

26. What is a partial index?

Partial index indexes only rows matching a WHERE condition. Example: CREATE INDEX ON orders (user_id) WHERE status = "active". Smaller than a full index (only active orders indexed), faster to maintain, and PostgreSQL can only use it when the query matches the WHERE condition. Use when most queries target a subset of rows (active records, recent data, non-null values).

Junior Full Answer →

27. What is a composite index?

Composite index covers multiple columns: CREATE INDEX ON orders (user_id, created_at). Useful when queries filter on multiple columns together. Index prefix rule: a composite index on (a, b, c) can be used for queries on (a), (a, b), or (a, b, c) but NOT (b) alone. Put the most selective and most frequently filtered column first.

Junior Full Answer →

28. What is a materialized view?

Materialized view stores the result of a query on disk - like a cached table. Faster to query than a regular view (no re-execution of the underlying query). Requires manual refresh: REFRESH MATERIALIZED VIEW viewname. REFRESH CONCURRENTLY allows reads during refresh (requires unique index). Use for expensive aggregate queries or reports that are expensive to recompute on every access.

Junior Full Answer →

29. What is the difference between DELETE and TRUNCATE in PostgreSQL?

DELETE: removes matching rows, logged in WAL (slower for large deletions), fires triggers, can use WHERE, transactional. TRUNCATE: removes all rows very fast (doesn't scan rows), minimal WAL logging, fires truncate triggers, can't use WHERE. Both are transactional in PostgreSQL (unlike MySQL MyISAM). TRUNCATE is the right choice when clearing entire tables for ETL or test data cleanup.

Junior Full Answer →

30. What is the VACUUM operation in PostgreSQL?

VACUUM reclaims storage from dead tuples created by PostgreSQL's MVCC - rows that were deleted or updated but not yet physically removed. Dead tuples occupy space and slow down queries. VACUUM marks space as reusable (doesn't shrink file). VACUUM ANALYZE also updates query planner statistics. Autovacuum runs automatically. Run manually if autovacuum is falling behind on high-traffic tables.

Junior Full Answer →

31. What is the ANALYZE command used for?

ANALYZE collects statistics about table column value distribution and stores them in pg_statistic. The query planner uses these statistics to estimate row counts and choose the best execution plan. Stale statistics lead to bad plan choices (wrong join order, wrong index). Run ANALYZE after bulk data loads or large changes. Autovacuum runs ANALYZE automatically when tables change significantly.

Junior Full Answer →

32. What is the difference between a function and a stored procedure in PostgreSQL?

Functions (CREATE FUNCTION) return a value and can be used in SELECT, WHERE, etc. They can't manage transactions. Stored procedures (CREATE PROCEDURE, PostgreSQL 11+) don't need to return a value and can COMMIT/ROLLBACK within their body - enabling transaction control inside procedures. Call procedures with CALL procedure_name(). Functions are more versatile for query use; procedures for transaction control.

Junior Full Answer →

33. What is PostgreSQL’s JSONB data type?

JSONB stores JSON as decomposed binary - faster to query than JSON (which stores as text). JSONB supports: indexing with GIN indexes, operators for key lookup (->), path extraction (#>), containment (@>), and key existence (?). Use JSONB when you need to query JSON fields. Use JSON only when you need to preserve key order or exact formatting. Most use cases should use JSONB.

Junior Full Answer →

34. What are foreign data wrappers (FDW)?

Foreign Data Wrappers (FDW) allow PostgreSQL to query external data sources as if they were local tables. Connect to: other PostgreSQL databases (postgres_fdw), MySQL, CSV files, Redis, S3, REST APIs. Install extension, create a server, define user mapping, then create foreign tables. Useful for data federation and migrating data across systems without ETL pipelines.

Junior Full Answer →

35. What is the purpose of the EXPLAIN command?

EXPLAIN shows the query execution plan PostgreSQL will use without actually running the query. EXPLAIN ANALYZE runs the query and shows actual vs estimated row counts and execution times. Look for: sequential scans on large tables (may need an index), hash joins vs nested loops, rows estimates vs actual rows (big gap = stale statistics). Essential for performance debugging.

Junior Full Answer →

36. What is PostgreSQL’s MVCC and how does it enable concurrent reads and writes?

MVCC (Multi-Version Concurrency Control) lets readers and writers proceed concurrently without blocking each other. When a row is updated, PostgreSQL creates a new version (tuple) with a new transaction ID, keeping the old version for concurrent readers. Each transaction sees a consistent snapshot of the data as of when it started. No read locks needed - readers never block writers and vice versa.

Mid Full Answer →

37. What is a tuple in PostgreSQL?

In PostgreSQL, a tuple is a row version in a table. Every UPDATE creates a new tuple (new row version) rather than modifying in place. Old tuples stay visible to concurrent transactions and are later cleaned up by VACUUM. Each tuple has system columns: xmin (transaction that created it), xmax (transaction that deleted it), ctid (physical location). Multiple tuple versions enable MVCC.

Mid Full Answer →

38. What is a transaction ID and why is wraparound dangerous?

Transaction ID (XID) is a 32-bit integer assigned to every transaction. PostgreSQL uses XIDs to determine tuple visibility in MVCC. Wraparound danger: after ~2 billion transactions, XID wraps around to 0. Old rows could suddenly appear to be "in the future" and become invisible - catastrophic data loss. Autovacuum prevents this by freezing old tuples (replacing real XIDs with FrozenXID that's always visible).

Mid Full Answer →

39. What is VACUUM FULL and how does it differ from regular VACUUM?

Regular VACUUM marks dead tuples as reusable space without compacting the file or returning space to the OS. VACUUM FULL rewrites the entire table to a new file, reclaiming disk space (table shrinks). VACUUM FULL requires an exclusive lock (all reads and writes block during operation) - not safe on high-traffic tables. Use VACUUM FULL rarely, or use pg_repack as a non-blocking alternative.

Mid Full Answer →

40. What is the visibility map in PostgreSQL?

The visibility map is a per-table file with one bit per page tracking if all tuples on that page are visible to all transactions. Index-only scans use the visibility map to avoid fetching the heap page to verify tuple visibility. VACUUM updates the visibility map. Pages with all-visible tuples allow index-only scans, dramatically improving read performance for covering indexes.

Mid Full Answer →

41. What is an index-only scan?

Index-only scan reads all needed columns directly from the index without accessing the heap (table). Possible when: all SELECT columns are in the index, and the visibility map shows the page as all-visible (so PostgreSQL doesn't need to check heap for tuple visibility). Much faster than regular index scan (no heap fetch). Create covering indexes: include all frequently queried columns.

Mid Full Answer →

42. What is the difference between B-Tree, GIN, and GiST indexes?

B-Tree: default, balanced tree, handles equality and range queries, sorted order - use for most queries. GIN (Generalized Inverted Index): multi-value items - use for JSONB, arrays, full-text search (indexes all keys/elements). GiST (Generalized Search Tree): geometric data, range types, full-text search - use for PostGIS, EXCLUDE constraints. Each handles different query patterns.

Mid Full Answer →

43. When would you use a BRIN index?

BRIN (Block Range INdex) stores min/max values per range of disk pages - very small index, very fast to create, low maintenance overhead. Only useful when column values correlate with physical storage order (sequential inserts of timestamps, monotonic IDs). Not selective for random data. Ideal for time-series tables where you INSERT rows in timestamp order - BRIN can eliminate large chunks of the table.

Mid Full Answer →

44. What is the purpose of pg_stat_activity?

pg_stat_activity is a system view showing one row per database connection with: pid (process ID), query (SQL being run), state (active, idle, idle in transaction), wait_event (what it's waiting on), application_name, client_addr. Essential for diagnosing: blocking queries (find locks), long-running transactions, idle-in-transaction connections holding locks, and overall connection usage.

Mid Full Answer →

45. What is a CTE and why use it?

CTE (Common Table Expression) is a named temporary result set defined with WITH, usable in the main query. WITH active_users AS (SELECT * FROM users WHERE active = true) SELECT * FROM active_users WHERE age > 18. Improves readability for complex queries. Recursive CTEs traverse hierarchical data. In PostgreSQL 12+, CTEs are inlined by default (the optimizer can push conditions inside).

Mid Full Answer →

46. What is the difference between a CTE and a subquery?

CTE: named, reusable within a query, can be recursive, executes once and result is referenced multiple times (in some cases). Subquery: embedded in the parent query, potentially re-executed for each row (correlated subquery). Pre-PostgreSQL 12, CTEs were optimization fences (result materialized). Now both can be optimized similarly. Use CTEs for readability; subqueries when optimizer needs more flexibility.

Mid Full Answer →

47. How do recursive CTEs work?

Recursive CTEs traverse hierarchical or graph-like data. Structure: WITH RECURSIVE cte AS (base_case UNION ALL recursive_case WHERE termination_condition). Base case: initial rows (root nodes). Recursive case: references the CTE to get next level. Use for: org charts (employee hierarchy), category trees, graph traversal, bill of materials. Add a depth counter or LIMIT to prevent infinite recursion.

Mid Full Answer →

48. What are dead tuples and why do they accumulate?

Dead tuples are old row versions left behind by UPDATE and DELETE operations in MVCC. When you update a row, PostgreSQL creates a new tuple version but keeps the old one visible for concurrent transactions. Once no active transaction needs the old version, it becomes a "dead tuple". Dead tuples accumulate, bloating tables and slowing queries. VACUUM removes dead tuples and marks space reusable.

Mid Full Answer →

49. How does PostgreSQL handle deadlocks?

PostgreSQL detects deadlocks by running a deadlock detection process after lock_timeout expires. It builds a wait-for graph and detects cycles. When a deadlock is found, PostgreSQL aborts one of the transactions (lowest cost or most recent) with "deadlock detected" error. Prevent deadlocks: always acquire locks in consistent order, keep transactions short, use NOWAIT or FOR UPDATE SKIP LOCKED.

Mid Full Answer →

50. What is a tablespace?

Tablespace defines the physical disk location where database objects are stored. CREATE TABLESPACE fast_ssd LOCATION "/mnt/ssd". Create tables in it: CREATE TABLE big_table (...) TABLESPACE fast_ssd. Use to: place hot tables on fast SSDs, put large archive tables on slower cheap storage, separate indexes from their tables for parallel I/O, and spread load across multiple disks.

Mid Full Answer →