Senior-Level MongoDB Interview Questions

Q1:

How does MongoDB handle concurrency using document-level locking?

Senior

Answer

MongoDB uses WiredTiger’s document-level locking where each document has an independent lock, enabling simultaneous writes across different documents and avoiding collection-level contention.

Quick Summary: WiredTiger uses document-level optimistic concurrency. Multiple readers and writers can proceed in parallel without blocking each other. Conflicts are detected at commit time (if two transactions write to the same document, one gets a WriteConflict error and retries). This is far better than the old MMAPv1 collection-level locking that serialized all writes to a collection.

Permalink

Q2:

What is snapshot isolation and how does MongoDB achieve it?

Senior

Answer

MongoDB provides snapshot isolation for transactions using timestamps, oplog ordering, and WiredTiger MVCC to maintain a consistent point-in-time view throughout the transaction.

Quick Summary: Snapshot isolation gives each transaction a consistent point-in-time view of the data. In MongoDB, WiredTiger creates a version snapshot at transaction start. Reads within the transaction see that snapshot - unaffected by concurrent writes from other transactions. Writes are only visible after commit. This prevents dirty reads and non-repeatable reads without locking readers.

Permalink

Q3:

What role does WiredTiger’s write-ahead logging play in durability?

Senior

Answer

WiredTiger writes operations to WAL before flushing data pages. After crashes, MongoDB replays the WAL to restore data, ensuring strong durability guarantees.

Quick Summary: WiredTiger writes every change to the journal (write-ahead log) before applying it to data files. Each journal entry records the operation fully. On crash, MongoDB opens from the last checkpoint (data snapshot) and replays all journal entries that came after. This guarantees durability - a committed, journaled write is never lost even in sudden power failure.

Permalink

Q4:

How do you diagnose performance issues using MongoDB’s profiler?

Senior

Answer

The profiler captures slow queries, execution times, scan metrics, and index usage, helping identify unindexed operations, inefficient sorts, and pipeline bottlenecks.

Quick Summary: MongoDB profiler records query execution details to system.profile. Enable with db.setProfilingLevel(1, {slowms: 100}) to capture queries over 100ms. Analyze: look at millis (execution time), nscanned vs nreturned (high ratio = bad index), keysExamined, and the queryPlanner stage. Atlas Profiler provides this in a UI. Regularly review slow query logs in production.

Permalink

Q5:

Why do $lookup operations cause performance concerns in large systems?

Senior

Answer

$lookup performs cross-collection joins. Without proper indexing, it triggers large scans and increases CPU and memory usage.

Quick Summary: $lookup (join) in a sharded cluster can't be pushed down to shards if the "from" collection is on different shards - all data must come to the mongos for merging. This causes massive data movement and memory pressure. Fix: embed frequently joined data, pre-aggregate with scheduled pipelines, use Atlas Search for full document lookup, or ensure joined collections share the same shard key.

Permalink

Q6:

How does MongoDB’s balancer decide when to migrate chunks?

Senior

Answer

The balancer monitors shard chunk distribution via config servers and migrates chunks when imbalance thresholds are exceeded.

Quick Summary: The balancer triggers migration when the chunk count difference between the most and least loaded shards exceeds a threshold (varies by total chunk count). It migrates chunks from the most loaded to least loaded shards until balanced. Balancing uses collection-level locks during certain migration phases. Schedule balancing windows (balancerStart/Stop) to avoid peak traffic hours.

Permalink

Q7:

How do you avoid shard hotspots?

Senior

Answer

Avoid monotonically increasing keys and choose high-cardinality shard keys or hashed keys for even write distribution.

Quick Summary: Avoid shard hotspots by choosing a shard key with even write distribution. Don't use monotonically increasing keys (timestamps, ObjectId) as shard keys - all new writes go to the last chunk on one shard. Solutions: use hashed shard key (distributes writes randomly), use compound shard key combining a high-write field with a hash component, or pre-split chunks before loading data.

Permalink

Q8:

What are jumbo chunks and why are they problematic?

Senior

Answer

Jumbo chunks grow too large to split or migrate, blocking balancing and degrading performance in sharded clusters.

Quick Summary: Jumbo chunks exceed the maximum chunk size (default 128MB) and can't be automatically split or migrated. Caused by: all documents in a chunk share the same shard key value (low cardinality key). The balancer can't rebalance jumbo chunks, causing permanent hotspot on one shard. Fix: choose a higher-cardinality shard key. For existing jumbo chunks, use refineCollectionShardKey or manual splitting.

Permalink

Q9:

How does MongoDB internally manage oplog entries during replication?

Senior

Answer

The primary writes operations to the oplog; secondaries tail the oplog and apply changes in timestamp order for consistent replication.

Quick Summary: The oplog is a capped collection. Each entry records a write operation (with document data for idempotent replay). Secondaries tail the primary oplog and apply entries. If a secondary falls too far behind (oplog is overwritten before secondary catches up), the secondary goes into RECOVERING state and needs full resync. Size oplog large enough to cover expected lag and maintenance windows.

Permalink

Q10:

What is replication rollback and when does it occur?

Senior

Answer

Rollback happens when a primary steps down before its oplog entries replicate. The node removes unreplicated writes on restart to match the majority view.

Quick Summary: Replication rollback occurs when a primary fails, a new primary is elected, and the old primary rejoins with writes that were never replicated to the majority. Those writes are rolled back (written to a rollback folder for manual recovery). To prevent rollback: use write concern majority. With w:1 writes, any unacknowledged write not yet replicated to the new primary is lost.

Permalink

Q11:

How does MongoDB ensure consistency in a sharded cluster?

Senior

Answer

Config servers maintain chunk metadata; mongos routes queries based on metadata, and majority write concern ensures cluster-wide consistent writes.

Quick Summary: In a sharded cluster, consistency per shard is handled by each shard's replica set. Cross-shard consistency: single-document operations on one shard are always atomic. Multi-document cross-shard transactions use a two-phase commit protocol via the transaction coordinator. For non-transactional operations across shards, you get per-shard consistency but no global atomicity.

Permalink

Q12:

How does MongoDB handle distributed transactions across shards?

Senior

Answer

MongoDB uses two-phase commit across shards to ensure atomic multi-shard updates, preventing partial writes.

Quick Summary: Distributed transactions across shards use a two-phase commit coordinated by a transaction coordinator (runs on the mongos or shard). Phase 1: all shards prepare and lock resources. Phase 2: all shards commit or abort. This adds significant latency and lock contention. Keep cross-shard transactions short, or redesign schema to avoid them - keep related data on the same shard via shard key design.

Permalink

Q13:

What is the impact of large indexes on performance?

Senior

Answer

Large indexes consume memory, slow writes, and increase disk I/O, requiring careful index design for efficiency.

Quick Summary: Large indexes consume RAM in the WiredTiger cache. All indexes must fit in memory for optimal performance. Large indexes also slow down writes (each write must update all indexes). Monitor index size with db.collection.stats(). Remove unused indexes regularly. Covered queries are only possible if the index fits in memory. Use db.collection.aggregate([{$indexStats:{}}]) to find unused indexes.

Permalink

Q14:

What are hidden indexes and when should you use them?

Senior

Answer

Hidden indexes allow evaluating the impact of index removal without affecting the planner, useful for safe index tuning.

Quick Summary: Hidden indexes are maintained but not used by the query planner. You can hide an index to test the performance impact of removing it without actually dropping it. If queries stay fast, the index was unused and you can safely drop it. If performance degrades, unhide it. Introduced in MongoDB 4.4 to safely evaluate index removal in production.

Permalink

Q15:

How does MongoDB choose an execution plan when multiple indexes exist?

Senior

Answer

MongoDB tests candidate plans during a trial phase and caches the best plan to avoid repeated plan selection.

Quick Summary: MongoDB uses the query planner to select the best index. When multiple indexes could satisfy a query, the planner runs a "tournament" - it executes candidate plans in parallel for a trial period and picks the winner (fewest works to return the first batch of results). The winner is cached in the plan cache. Provide hints (db.collection.find().hint()) to force a specific index.

Permalink

Q16:

What is a plan cache eviction and when does it happen?

Senior

Answer

Plan cache evicts entries after metadata changes or when query patterns deviate significantly, forcing re-evaluation of plans.

Quick Summary: Plan cache eviction happens when: the collection's data distribution changes significantly (index stats become stale), the index is dropped, the collection is rebuilt, or MongoDB restarts. After eviction, the planner re-evaluates all candidate plans next time the query runs. Unexpected plan cache evictions can cause sudden performance changes as a previously good plan is reevaluated.

Permalink

Q17:

How do frequent updates cause document movement and why is it bad?

Senior

Answer

Document growth beyond allocated space triggers relocation, causing fragmentation and increased index maintenance.

Quick Summary: When a document grows beyond its allocated space (due to updates adding new fields), WiredTiger must move it to a new location. This movement updates the index entries pointing to the document. Frequent moves cause index fragmentation and more I/O. Mitigation: pre-allocate space by using $set to set fields to null initially, or use padding. Schema design to avoid document growth prevents this.

Permalink

Q18:

How does collation impact index usage?

Senior

Answer

Queries must match index collation; otherwise MongoDB cannot use the index and falls back to collection scans.

Quick Summary: Collation defines language-specific string comparison rules (case sensitivity, accent sensitivity, sort order). Indexes are collation-aware - an index with a specific collation can only be used by queries with the same collation. A query with collation won't use a standard index. Create the index with the same collation your queries use, or the query planner falls back to a collection scan.

Permalink

Q19:

What is the effect of large aggregation pipelines on memory?

Senior

Answer

Large pipelines may spill to disk when memory is insufficient, drastically slowing performance.

Quick Summary: Aggregation pipelines that produce large intermediate results use memory. MongoDB limits aggregation memory to 100MB by default. If exceeded, the pipeline fails unless you add allowDiskUse: true (allows spilling to disk - slower). Optimize: add $match and $project early to reduce document size before expensive stages, use indexes in $match, avoid $unwind on large arrays early in the pipeline.

Permalink

Q20:

Why do unbounded array growth patterns degrade performance?

Senior

Answer

Growing arrays increase document size, cause relocations, and produce heavy index rewrites, slowing reads and writes.

Quick Summary: Unbounded arrays that grow indefinitely cause documents to grow past the 16MB limit, hurt working set efficiency (you load the whole document to read one array element), make updates expensive (updating an element in a 10,000-item array requires rewriting the array), and cause multi-key index bloat. Design pattern: use a separate collection for array items with a reference back to the parent.

Permalink

Q21:

How does replication guarantee ordering?

Senior

Answer

Oplog timestamps and majority write acknowledgment ensure secondaries apply operations in the same sequence as the primary.

Quick Summary: MongoDB replication maintains ordering via the oplog - a capped collection where entries are ordered by a timestamp+counter (oplog timestamp). Secondaries apply oplog entries in strict order. This guarantees writes are replayed in the same order they happened on the primary. The oplog is the single source of truth for replication ordering across all secondary members.

Permalink

Q22:

What is the difference between majority and linearizable reads?

Senior

Answer

Majority reads reflect replicated data, while linearizable reads guarantee strict ordering by requiring primary confirmation.

Quick Summary: Majority read concern returns data that has been committed to a majority of replica set members - guarantees the data won't roll back. Linearizable read concern goes further - it guarantees you read the absolute latest committed write, waiting for all in-flight writes to complete first. Linearizable is single-document only, much slower, but provides the strongest consistency for critical reads.

Permalink

Q23:

How does MongoDB handle versioned schema migrations?

Senior

Answer

Migrations are performed safely using batch updates or tools like Mongock, with applications supporting both old and new versions temporarily.

Quick Summary: MongoDB supports schema evolution via its flexible document model - add new fields without migrating existing documents. Strategies: use schema versioning field (add "schemaVersion: 2" to documents and handle both versions in code), lazy migration (upgrade documents on first read/write), bulk migration scripts for breaking changes. Schema validation can be updated live with collMod.

Permalink

Q24:

What is the role of the config server replica set?

Senior

Answer

Config servers store cluster metadata. If the config server cluster fails, routing and chunk management halt.

Quick Summary: Config servers store all metadata for a sharded cluster: which collections are sharded, the shard key, chunk ranges, and which chunk lives on which shard. Run as a replica set (CSRS). The mongos routers cache this metadata. If config servers are unavailable, mongos can still serve reads from cache but no new chunk migrations or shard key changes can occur. Config servers must be highly available.

Permalink

Q25:

What are yield points during query execution?

Senior

Answer

Yield points allow MongoDB to pause long operations to let other operations acquire locks, preventing system stalls.

Quick Summary: Yield points are moments during a long-running query when MongoDB pauses to check for interrupts and allow other operations to run. Without yields, a long collection scan could hold resources indefinitely. MongoDB yields automatically at regular intervals (every 128 documents by default). This prevents any single query from starving other operations, but it means cursors can see changes made during their execution.

Permalink

Q26:

What is the role of index filters?

Senior

Answer

Index filters restrict query planner index usage, helping force specific indexes for performance tuning.

Quick Summary: Index filters let you specify which indexes the planner can use for a given query shape, overriding the planner's automatic choice. Set with planCacheSetFilter. Useful when the planner consistently picks a suboptimal plan and query hints aren't practical. More persistent than hints (survive restarts in MongoDB 6+). Use sparingly - usually fixing the query or index design is better.

Permalink

Q27:

What is a rolling index build?

Senior

Answer

Rolling index builds rebuild indexes on secondaries first, then switch primaries safely, ensuring zero downtime.

Quick Summary: Rolling index build builds an index on each replica set member one at a time (not simultaneously), so the replica set stays operational during the build. Build index on each secondary while it's stepped down, then step down the primary and build on it. Avoids the performance impact of building on all nodes at once. Required approach for building indexes on large collections in production without downtime.

Permalink

Q28:

What are the trade-offs of using $near for geospatial queries?

Senior

Answer

$near provides distance-ordered results but can be CPU-heavy and require specialized indexes for performance.

Quick Summary: $near and $geoNear for geospatial queries require a 2dsphere or 2d index. Performance trade-offs: they sort results by distance (expensive for large result sets), can't be combined efficiently with other indexes (geospatial index used, other filters applied after), and $near doesn't work in aggregation pipelines ($geoNear must be first stage). Limit results to a reasonable maxDistance and document count.

Permalink

Q29:

Why do sharded clusters struggle with scatter-gather queries?

Senior

Answer

Scatter-gather queries hit all shards, increasing latency and limiting scalability. Good shard keys minimize this pattern.

Quick Summary: Scatter-gather queries hit every shard because the query doesn't include the shard key. mongos fans out to all shards, each returns results, mongos merges them. For sorting, all shards sort and return top N results, mongos re-sorts and picks top N. This is expensive and gets worse as shard count grows. Fix: include the shard key in queries to target a single shard (targeted query).

Permalink

Q30:

How do you design MongoDB schema for high write throughput systems?

Senior

Answer

Use small documents, minimize indexes, design evenly distributed shard keys, and use bucketing patterns to reduce write amplification.

Quick Summary: High write throughput schema design: keep documents small (faster to write, more fit in cache), minimize index count (each write updates all indexes), use bulk writes, avoid transactions, use a good shard key to distribute writes. Consider the bucket pattern for time-series (batch multiple measurements into one document to reduce write count). Pre-aggregate counters with $inc instead of inserting individual events.

Permalink

Senior MongoDB Interview Questions

MongoDB Interview Questions & Answers

Questions

How does MongoDB handle concurrency using document-level locking?

Answer

What is snapshot isolation and how does MongoDB achieve it?

Answer

What role does WiredTiger’s write-ahead logging play in durability?

Answer

How do you diagnose performance issues using MongoDB’s profiler?

Answer

Why do $lookup operations cause performance concerns in large systems?

Answer

How does MongoDB’s balancer decide when to migrate chunks?

Answer

How do you avoid shard hotspots?

Answer

What are jumbo chunks and why are they problematic?

Answer

How does MongoDB internally manage oplog entries during replication?

Answer

What is replication rollback and when does it occur?

Answer

How does MongoDB ensure consistency in a sharded cluster?

Answer

How does MongoDB handle distributed transactions across shards?

Answer

What is the impact of large indexes on performance?

Answer

What are hidden indexes and when should you use them?

Answer

How does MongoDB choose an execution plan when multiple indexes exist?

Answer

What is a plan cache eviction and when does it happen?

Answer

How do frequent updates cause document movement and why is it bad?

Answer

How does collation impact index usage?

Answer

What is the effect of large aggregation pipelines on memory?

Answer

Why do unbounded array growth patterns degrade performance?

Answer

How does replication guarantee ordering?

Answer

What is the difference between majority and linearizable reads?

Answer

How does MongoDB handle versioned schema migrations?

Answer

What is the role of the config server replica set?

Answer

What are yield points during query execution?

Answer

What is the role of index filters?

Answer

What is a rolling index build?

Answer

What are the trade-offs of using $near for geospatial queries?

Answer

Why do sharded clusters struggle with scatter-gather queries?

Answer

How do you design MongoDB schema for high write throughput systems?

Answer

Curated Sets for MongoDB

People Also Ask - Related MongoDB Questions