MongoDB Interview Questions

Q1:

What is MongoDB?

Entry

Answer

MongoDB is a NoSQL, document-oriented database that stores data as JSON-like documents. It is schema-flexible and designed for scalable modern applications.

Permalink

Q2:

What is a document in MongoDB?

Entry

Answer

A document is a JSON-like object containing key-value pairs. It is the basic unit of data stored inside collections in MongoDB.

Permalink

Q3:

What is a collection?

Entry

Answer

A collection is a group of documents similar to a table in relational databases but without a fixed schema.

Permalink

Q4:

What is a database in MongoDB?

Entry

Answer

A database is a container for collections. Each application typically uses one or more databases within the MongoDB server.

Permalink

Q5:

What is BSON?

Entry

Answer

BSON is a binary format used by MongoDB to store documents. It supports more data types than JSON, such as Date and ObjectId.

Permalink

Q6:

What is an ObjectId?

Entry

Answer

ObjectId is the default unique identifier for documents. It includes timestamp and machine-specific information to ensure global uniqueness.

Permalink

Q7:

What is a schema in MongoDB?

Entry

Answer

MongoDB is schema-flexible, allowing documents with different structures. Schema rules can be enforced using validators when needed.

Permalink

Q8:

What is the purpose of the find() method?

Entry

Answer

The find() method retrieves documents based on filters and supports projection, sorting, and pagination.

Permalink

Q9:

What is the difference between find() and findOne()?

Entry

Answer

find() returns multiple documents as a cursor, while findOne() returns only the first matching document.

Permalink

Q10:

What does the updateOne() function do?

Entry

Answer

updateOne() updates the first matching document using operators like $set, $inc, or $push.

Permalink

Q11:

What is a deleteOne() operation?

Entry

Answer

deleteOne() removes the first document matching the filter condition.

Permalink

Q12:

What is field projection in MongoDB?

Entry

Answer

Projection specifies which fields to include or exclude when fetching documents, improving efficiency.

Permalink

Q13:

What is an index in MongoDB?

Entry

Answer

An index improves search performance on fields. Without indexes, MongoDB performs collection scans.

Permalink

Q14:

What is a primary key in MongoDB?

Entry

Answer

Every document has a unique _id field, which acts as the primary key. MongoDB generates an ObjectId if not provided.

Permalink

Q15:

What is a replica set?

Entry

Answer

A replica set is a group of MongoDB servers with redundancy and automatic failover, consisting of one primary and multiple secondaries.

Permalink

Q16:

What is sharding in MongoDB?

Entry

Answer

Sharding distributes large datasets across multiple servers for horizontal scaling.

Permalink

Q17:

What is MongoDB Atlas?

Entry

Answer

MongoDB Atlas is the fully managed cloud service for MongoDB, providing automated scaling, backups, and monitoring.

Permalink

Q18:

What is the difference between MongoDB and a relational database?

Entry

Answer

MongoDB stores flexible JSON-like documents, while relational databases use structured tables and predefined schemas.

Permalink

Q19:

What is the purpose of the $set operator?

Entry

Answer

$set updates or adds fields without replacing the entire document.

Permalink

Q20:

What does the $inc operator do?

Entry

Answer

$inc increases or decreases numeric values atomically. Useful for counters or scores.

Permalink

Q21:

What is a capped collection and when should it be used?

Junior

Answer

A capped collection is a fixed-size collection where MongoDB overwrites old documents when full. It maintains insertion order and supports high-speed writes, useful for logs and metrics.

Permalink

Q22:

What is the difference between $push and $addToSet?

Junior

Answer

$push adds an element to an array even if it already exists. $addToSet adds it only if it is not present, preventing duplicates.

Permalink

Q23:

What is an embedded document and when is embedding recommended?

Junior

Answer

An embedded document stores related data inside a parent document. Embedding improves read performance and is recommended for one-to-few relationships.

Permalink

Q24:

What is data referencing in MongoDB?

Junior

Answer

Referencing links documents across collections using IDs. It is used when datasets are large, loosely connected, or when avoiding duplication.

Permalink

Q25:

What is the purpose of the aggregation pipeline?

Junior

Answer

The aggregation pipeline processes documents through stages such as $match, $group, $project, and $lookup for analytics and transformations.

Permalink

Q26:

What is $lookup used for?

Junior

Answer

$lookup performs a left outer join between collections, enriching documents with related data.

Permalink

Q27:

What is the difference between insertOne and insertMany?

Junior

Answer

insertOne inserts a single document. insertMany inserts multiple documents in one operation and improves performance.

Permalink

Q28:

What is the purpose of TTL indexes?

Junior

Answer

TTL indexes automatically delete documents after a specified time, useful for sessions, logs, and temporary data.

Permalink

Q29:

What is the explain function and why is it useful?

Junior

Answer

explain() shows how a query is executed, including index usage and performance details. It helps diagnose slow queries.

Permalink

Q30:

What is a write concern?

Junior

Answer

Write concern defines how strictly MongoDB should confirm a write, ranging from w:1 (primary only) to w:majority for higher durability.

Permalink

Q31:

What is a read preference?

Junior

Answer

Read preference decides which nodes serve read requests, such as primary, secondary, or nearest, enabling load balancing.

Permalink

Q32:

What is journaling in MongoDB?

Junior

Answer

Journaling writes operations to a journal file before applying them to data files, preventing data loss in crashes.

Permalink

Q33:

What is $regex used for?

Junior

Answer

$regex performs pattern matching on string fields and is useful for partial text searches.

Permalink

Q34:

What is the difference between save and update?

Junior

Answer

save replaces an entire document if it exists or inserts it if not. update modifies only specified fields using update operators.

Permalink

Q35:

What is sharding key selection and why is it important?

Junior

Answer

A good sharding key ensures balanced data distribution, high cardinality, and avoids write hotspots, which affects scaling performance.

Permalink

Q36:

How does MongoDB handle schema flexibility while still allowing schema validation?

Mid

Answer

MongoDB is schema-flexible but supports validation using $jsonSchema. This allows flexible documents while enforcing structure for critical fields.

Permalink

Q37:

What are the main differences between embedding and referencing in MongoDB?

Mid

Answer

Embedding stores related data in one document for fast reads, while referencing links documents across collections to reduce duplication and document size.

Permalink

Q38:

How do compound indexes improve query performance?

Mid

Answer

Compound indexes index multiple fields together, allowing MongoDB to speed up queries and sorting based on index prefix rules.

Permalink

Q39:

What is an index prefix and why does it matter in compound indexing?

Mid

Answer

MongoDB can only use the initial fields of a compound index. If a query does not include the prefix field, the index cannot be used.

Permalink

Q40:

What is the purpose of an aggregation pipeline’s $facet stage?

Mid

Answer

$facet allows running multiple aggregations in parallel on the same input, useful for dashboards requiring different metrics from one dataset.

Permalink

Q41:

What is $unwind and why is it used?

Mid

Answer

$unwind expands array fields into multiple documents so pipeline stages can analyze individual elements.

Permalink

Q42:

What is a covered query in MongoDB?

Mid

Answer

A covered query is satisfied entirely from an index without touching the collection, improving performance by reducing disk access.

Permalink

Q43:

What is index cardinality and how does it affect performance?

Mid

Answer

Higher cardinality means more unique values, making indexes more selective and improving query performance.

Permalink

Q44:

What are multi-key indexes?

Mid

Answer

Multi-key indexes allow indexing array fields by indexing each element, enabling fast queries over arrays.

Permalink

Q45:

What is the difference between $in and $nin in performance?

Mid

Answer

$in can use indexes efficiently while $nin generally causes collection scans because it excludes values.

Permalink

Q46:

What is write concern and why is it important?

Mid

Answer

Write concern specifies how many nodes must acknowledge a write. Higher levels improve durability but increase latency.

Permalink

Q47:

What is read concern in MongoDB?

Mid

Answer

Read concern determines the consistency level of reads, such as local, majority, or snapshot for transactions.

Permalink

Q48:

How does MongoDB ensure durability during crashes?

Mid

Answer

MongoDB uses journaling to write operations to journal files before applying them, ensuring recovery after crashes.

Permalink

Q49:

What are write-ahead logs (journal files) and how do they work?

Mid

Answer

Journal files store operations sequentially for atomicity and crash recovery. MongoDB replays journals after restarts.

Permalink

Q50:

What is a MongoDB transaction and when is it needed?

Mid

Answer

MongoDB transactions allow multi-document ACID operations, needed when updating related data across collections.

Permalink

Q51:

What is $merge used for?

Mid

Answer

$merge writes aggregation results into a target collection, supporting upserts and replacements useful for ETL workflows.

Permalink

Q52:

What challenges arise when using transactions?

Mid

Answer

Transactions add latency, reduce concurrency, and require replica set or sharded clusters. They must be used sparingly.

Permalink

Q53:

How does sharding work in MongoDB?

Mid

Answer

Sharding distributes data across shards based on a shard key. mongos routes queries and config servers store metadata.

Permalink

Q54:

What is the role of the mongos router?

Mid

Answer

mongos routes application queries to the correct shards and abstracts the distributed cluster from clients.

Permalink

Q55:

What makes a good shard key?

Mid

Answer

A good shard key must offer high cardinality, distribute writes evenly, and match query patterns to avoid hotspots.

Permalink

Q56:

What are chunk migrations in MongoDB?

Mid

Answer

Chunks are ranges of shard key values that move between shards to balance data. The balancer manages migrations.

Permalink

Q57:

What is the purpose of the balancer?

Mid

Answer

The balancer ensures even data distribution across shards by moving chunks when imbalance occurs.

Permalink

Q58:

What causes chunk migration performance issues?

Mid

Answer

Large documents, poor shard keys, heavy writes, and slow inter-shard networks can slow migrations.

Permalink

Q59:

What is a change stream in MongoDB?

Mid

Answer

Change streams provide real-time events for inserts, updates, and deletes. Useful for microservices and cache invalidation.

Permalink

Q60:

What is $graphLookup and when is it useful?

Mid

Answer

$graphLookup performs recursive lookups, useful for hierarchical structures like org charts or categories.

Permalink

Q61:

How do you detect slow queries in MongoDB?

Mid

Answer

Use slow query logs, profiler, and explain() to identify high scan ratios and missing indexes.

Permalink

Q62:

What is the role of the WiredTiger storage engine?

Mid

Answer

WiredTiger provides document-level locking, compression, checkpoints, and high concurrency performance.

Permalink

Q63:

How does WiredTiger compression improve storage?

Mid

Answer

Compression reduces disk usage and improves I/O performance by reading and writing fewer bytes.

Permalink

Q64:

What are checkpoints in WiredTiger?

Mid

Answer

Checkpoints flush in-memory data to disk periodically, ensuring durable restart points.

Permalink

Q65:

What causes collection-level locking and how to avoid it?

Mid

Answer

Multi-document operations and unindexed writes can cause lock contention. Use indexes and smaller writes to avoid locking.

Permalink

Q66:

What is a working set and why is it important?

Mid

Answer

The working set is frequently accessed data and indexes. Performance drops if it exceeds available RAM.

Permalink

Q67:

What is index intersection?

Mid

Answer

MongoDB can combine multiple indexes to satisfy a query, useful when no single index covers all fields.

Permalink

Q68:

Why do large documents degrade performance?

Mid

Answer

Large documents slow reads and writes, increase RAM usage, and reduce replication and migration performance.

Permalink

Q69:

What is the difference between primary and secondary reads?

Mid

Answer

Primary reads are strongly consistent, while secondary reads are eventually consistent and used for load balancing.

Permalink

Q70:

What is replication lag and why does it occur?

Mid

Answer

Lag occurs when secondaries apply changes slower than primary. Causes include heavy writes and slow hardware.

Permalink

Q71:

What is the oplog and how does it support replication?

Mid

Answer

The oplog is a capped collection storing recent operations. Secondaries replay oplog entries to stay in sync.

Permalink

Q72:

What is majority write concern and why use it?

Mid

Answer

Majority write concern ensures writes are replicated to most nodes, preventing data loss after failovers.

Permalink

Q73:

How do you optimize MongoDB for high write throughput?

Mid

Answer

Use good shard keys, bulk writes, avoid unnecessary indexes, keep documents small, and tune WiredTiger cache.

Permalink

Q74:

What is $project in aggregation?

Mid

Answer

$project selects, removes, or transforms fields, helping control output structure and performance.

Permalink

Q75:

How does MongoDB handle multi-document ACID transactions internally?

Mid

Answer

MongoDB uses two-phase commit, snapshot isolation, and transaction logs to ensure atomic multi-document operations.

Permalink

Q76:

How does MongoDB handle concurrency using document-level locking?

Senior

Answer

MongoDB uses WiredTiger’s document-level locking where each document has an independent lock, enabling simultaneous writes across different documents and avoiding collection-level contention.

Permalink

Q77:

What is snapshot isolation and how does MongoDB achieve it?

Senior

Answer

MongoDB provides snapshot isolation for transactions using timestamps, oplog ordering, and WiredTiger MVCC to maintain a consistent point-in-time view throughout the transaction.

Permalink

Q78:

What role does WiredTiger’s write-ahead logging play in durability?

Senior

Answer

WiredTiger writes operations to WAL before flushing data pages. After crashes, MongoDB replays the WAL to restore data, ensuring strong durability guarantees.

Permalink

Q79:

How do you diagnose performance issues using MongoDB’s profiler?

Senior

Answer

The profiler captures slow queries, execution times, scan metrics, and index usage, helping identify unindexed operations, inefficient sorts, and pipeline bottlenecks.

Permalink

Q80:

Why do $lookup operations cause performance concerns in large systems?

Senior

Answer

$lookup performs cross-collection joins. Without proper indexing, it triggers large scans and increases CPU and memory usage.

Permalink

Q81:

How does MongoDB’s balancer decide when to migrate chunks?

Senior

Answer

The balancer monitors shard chunk distribution via config servers and migrates chunks when imbalance thresholds are exceeded.

Permalink

Q82:

How do you avoid shard hotspots?

Senior

Answer

Avoid monotonically increasing keys and choose high-cardinality shard keys or hashed keys for even write distribution.

Permalink

Q83:

What are jumbo chunks and why are they problematic?

Senior

Answer

Jumbo chunks grow too large to split or migrate, blocking balancing and degrading performance in sharded clusters.

Permalink

Q84:

How does MongoDB internally manage oplog entries during replication?

Senior

Answer

The primary writes operations to the oplog; secondaries tail the oplog and apply changes in timestamp order for consistent replication.

Permalink

Q85:

What is replication rollback and when does it occur?

Senior

Answer

Rollback happens when a primary steps down before its oplog entries replicate. The node removes unreplicated writes on restart to match the majority view.

Permalink

Q86:

How does MongoDB ensure consistency in a sharded cluster?

Senior

Answer

Config servers maintain chunk metadata; mongos routes queries based on metadata, and majority write concern ensures cluster-wide consistent writes.

Permalink

Q87:

How does MongoDB handle distributed transactions across shards?

Senior

Answer

MongoDB uses two-phase commit across shards to ensure atomic multi-shard updates, preventing partial writes.

Permalink

Q88:

What is the impact of large indexes on performance?

Senior

Answer

Large indexes consume memory, slow writes, and increase disk I/O, requiring careful index design for efficiency.

Permalink

Q89:

What are hidden indexes and when should you use them?

Senior

Answer

Hidden indexes allow evaluating the impact of index removal without affecting the planner, useful for safe index tuning.

Permalink

Q90:

How does MongoDB choose an execution plan when multiple indexes exist?

Senior

Answer

MongoDB tests candidate plans during a trial phase and caches the best plan to avoid repeated plan selection.

Permalink

Q91:

What is a plan cache eviction and when does it happen?

Senior

Answer

Plan cache evicts entries after metadata changes or when query patterns deviate significantly, forcing re-evaluation of plans.

Permalink

Q92:

How do frequent updates cause document movement and why is it bad?

Senior

Answer

Document growth beyond allocated space triggers relocation, causing fragmentation and increased index maintenance.

Permalink

Q93:

How does collation impact index usage?

Senior

Answer

Queries must match index collation; otherwise MongoDB cannot use the index and falls back to collection scans.

Permalink

Q94:

What is the effect of large aggregation pipelines on memory?

Senior

Answer

Large pipelines may spill to disk when memory is insufficient, drastically slowing performance.

Permalink

Q95:

Why do unbounded array growth patterns degrade performance?

Senior

Answer

Growing arrays increase document size, cause relocations, and produce heavy index rewrites, slowing reads and writes.

Permalink

Q96:

How does replication guarantee ordering?

Senior

Answer

Oplog timestamps and majority write acknowledgment ensure secondaries apply operations in the same sequence as the primary.

Permalink

Q97:

What is the difference between majority and linearizable reads?

Senior

Answer

Majority reads reflect replicated data, while linearizable reads guarantee strict ordering by requiring primary confirmation.

Permalink

Q98:

How does MongoDB handle versioned schema migrations?

Senior

Answer

Migrations are performed safely using batch updates or tools like Mongock, with applications supporting both old and new versions temporarily.

Permalink

Q99:

What is the role of the config server replica set?

Senior

Answer

Config servers store cluster metadata. If the config server cluster fails, routing and chunk management halt.

Permalink

Q100:

What are yield points during query execution?

Senior

Answer

Yield points allow MongoDB to pause long operations to let other operations acquire locks, preventing system stalls.

Permalink

Q101:

What is the role of index filters?

Senior

Answer

Index filters restrict query planner index usage, helping force specific indexes for performance tuning.

Permalink

Q102:

What is a rolling index build?

Senior

Answer

Rolling index builds rebuild indexes on secondaries first, then switch primaries safely, ensuring zero downtime.

Permalink

Q103:

What are the trade-offs of using $near for geospatial queries?

Senior

Answer

$near provides distance-ordered results but can be CPU-heavy and require specialized indexes for performance.

Permalink

Q104:

Why do sharded clusters struggle with scatter-gather queries?

Senior

Answer

Scatter-gather queries hit all shards, increasing latency and limiting scalability. Good shard keys minimize this pattern.

Permalink

Q105:

How do you design MongoDB schema for high write throughput systems?

Senior

Answer

Use small documents, minimize indexes, design evenly distributed shard keys, and use bucketing patterns to reduce write amplification.

Permalink

Q106:

How does MongoDB guarantee global consistency in multi-shard, multi-region deployments?

Expert

Answer

MongoDB uses majority write concern, oplog ordering, causal consistency, and region-aware replica set tags to ensure global consistency across multi-region, multi-shard deployments.

Permalink

Q107:

How does MongoDB internally manage oplog truncation and what risks exist if oplog is too small?

Expert

Answer

MongoDB truncates old oplog entries automatically. If the oplog is too small, secondaries cannot catch up, causing rollback or forcing an initial sync that increases downtime risk.

Permalink

Q108:

What architectural patterns ensure minimal replication lag in high-write clusters?

Expert

Answer

Low-latency storage, optimized shard keys, small document writes, and region-aware replica placement minimize lag. Flow control tuning prevents primaries from overwhelming secondaries.

Permalink

Q109:

How do you design a multi-shard transaction strategy to avoid large distributed rollbacks?

Expert

Answer

Keep transactions small, align operations with shard keys, avoid multi-collection writes, and reduce batch size to prevent distributed rollback overhead.

Permalink

Q110:

How does WiredTiger’s checkpointing mechanism influence crash recovery?

Expert

Answer

Checkpoints flush memory pages to disk. On crash, MongoDB replays WAL only after the last checkpoint, reducing recovery time and ensuring durability.

Permalink

Q111:

What leads to cache pressure in WiredTiger and how do you alleviate it?

Expert

Answer

Cache pressure arises from oversized working sets. Solutions include increasing WT cache, reducing document size, removing heavy indexes, and archiving cold data.

Permalink

Q112:

How do you detect and fix logical inconsistencies across replicas?

Expert

Answer

Use db.hashes(), validate(), and CDC systems to detect mismatches. Fix via initial sync, logical rebuild, or selective re-sync.

Permalink

Q113:

Why is two-phase commit expensive in MongoDB, and when should you avoid it?

Expert

Answer

Two-phase commit requires cross-shard coordination and oplog tracking, increasing latency and resource usage. Avoid unless strict multi-document atomicity is required.

Permalink

Q114:

What are resumable change streams and why are they critical for event-driven architectures?

Expert

Answer

Change streams resume from a resumeToken or clusterTime, ensuring fault-tolerant event processing with no data loss or duplication.

Permalink

Q115:

How do you scale analytic workloads without impacting OLTP performance?

Expert

Answer

Use dedicated analytics secondaries, hidden nodes, or offload data to OLAP systems via CDC. Use $merge pipelines for incremental materialization.

Permalink

Q116:

How does MongoDB internally manage write conflicts under snapshot isolation?

Expert

Answer

MongoDB uses timestamp-based MVCC. Conflicting writes trigger transaction abort to maintain isolation guarantees.

Permalink

Q117:

What are the internals of the balancer decision algorithm in sharded clusters?

Expert

Answer

The balancer evaluates chunk counts, data size, and cluster history. It throttles migrations and uses metadata from config servers for safe relocation.

Permalink

Q118:

How do you optimize heavy $lookup workloads in distributed clusters?

Expert

Answer

Use embedding, pre-joins, shard-aligned lookups, reduced cardinality, and foreign-key indexes. Denormalization often replaces expensive $lookup patterns.

Permalink

Q119:

How does MongoDB prevent data loss during node failover?

Expert

Answer

Primary acknowledges writes only after majority replication. Elections ensure nodes with consistent oplogs become primary, preventing divergence.

Permalink

Q120:

What are the biggest risks of sharding too early or too late?

Expert

Answer

Sharding early adds unnecessary complexity; sharding late causes heavy balancing, hotspots, and downtime. Ideal timing depends on write throughput and dataset size.

Permalink

Q121:

How do bucket patterns optimize time-series data in MongoDB?

Expert

Answer

Buckets group time-series events into ranges, reducing document count and index overhead. Native time-series collections use similar internal bucketing.

Permalink

Q122:

How does MongoDB optimize read patterns using index intersection?

Expert

Answer

MongoDB combines multiple indexes to satisfy a query when no compound index exists. This improves performance compared to full scans but is slower than a single optimal compound index.

Permalink

Q123:

What strategies help prevent write amplification in high-throughput clusters?

Expert

Answer

Reduce document size, avoid unnecessary indexes, use targeted updates, limit large arrays, and distribute writes evenly with good shard keys.

Permalink

Q124:

How do multi-threaded aggregation queries maintain correctness in MongoDB?

Expert

Answer

Parallel aggregation uses partitioned memory and deterministic stage ordering, merging intermediate outputs without violating semantics.

Permalink

Q125:

How do you handle schema evolution in long-lived MongoDB clusters?

Expert

Answer

Use additive schema changes, background migrations, versioned schemas, and applications that accept both old and new fields until migration completes.

Permalink

Top MongoDB Interview Questions