Mid-Level MongoDB Interview Questions

Q1:

How does MongoDB handle schema flexibility while still allowing schema validation?

Mid

Answer

MongoDB is schema-flexible but supports validation using $jsonSchema. This allows flexible documents while enforcing structure for critical fields.

Quick Summary: MongoDB supports schema flexibility by default but lets you add validation via JSON Schema rules on a collection. You specify required fields, field types, value ranges, and patterns. Validation happens on insert and update. Use validationAction: "warn" during migration (logs violations without rejecting) or "error" to enforce strictly. This balances flexibility with data integrity.

Permalink

Q2:

What are the main differences between embedding and referencing in MongoDB?

Mid

Answer

Embedding stores related data in one document for fast reads, while referencing links documents across collections to reduce duplication and document size.

Quick Summary: Embedding: store related data in one document. Pro: one read, atomic updates, no joins. Con: document size limit, data duplication. Referencing: store _id, fetch separately. Pro: no duplication, smaller documents, shared data. Con: requires extra query or $lookup. Rule: embed when data is accessed together and is one-to-few. Reference when data is shared, large, or frequently updated independently.

Permalink

Q3:

How do compound indexes improve query performance?

Mid

Answer

Compound indexes index multiple fields together, allowing MongoDB to speed up queries and sorting based on index prefix rules.

Quick Summary: Compound indexes cover multiple fields in a specific order. db.orders.createIndex({userId: 1, createdAt: -1}) supports queries filtering by userId and sorting by createdAt descending. This is much faster than two separate indexes because MongoDB traverses one B-tree. The order of fields in the index matters - place equality fields first, then sort fields, then range fields.

Permalink

Q4:

What is an index prefix and why does it matter in compound indexing?

Mid

Answer

MongoDB can only use the initial fields of a compound index. If a query does not include the prefix field, the index cannot be used.

Quick Summary: Index prefix means a compound index {a, b, c} supports queries on {a}, {a, b}, or {a, b, c} but NOT on {b} or {c} alone. MongoDB can only use a compound index from the leftmost field forward. If you frequently query by {b} alone, you need a separate index. Designing indexes with the right field order avoids creating redundant indexes.

Permalink

Q5:

What is the purpose of an aggregation pipeline’s $facet stage?

Mid

Answer

$facet allows running multiple aggregations in parallel on the same input, useful for dashboards requiring different metrics from one dataset.

Quick Summary: $facet runs multiple sub-pipelines on the same input in parallel, each producing a different result in the output. Useful for building faceted search results - one sub-pipeline counts by category, another by price range, another for the actual results. All computed in one aggregation pass instead of multiple queries.

Permalink

Q6:

What is $unwind and why is it used?

Mid

Answer

$unwind expands array fields into multiple documents so pipeline stages can analyze individual elements.

Quick Summary: $unwind deconstructs an array field - for each element in the array, it outputs a separate document. Example: a product document with a sizes array [S, M, L] becomes three documents, one per size. Necessary when you want to group, filter, or sort by individual array elements in an aggregation pipeline.

Permalink

Q7:

What is a covered query in MongoDB?

Mid

Answer

A covered query is satisfied entirely from an index without touching the collection, improving performance by reducing disk access.

Quick Summary: A covered query is satisfied entirely by the index - MongoDB never reads the actual documents. This is the fastest possible query execution. For a query to be covered: all queried fields must be in the index, all projected fields must be in the index, and _id must be excluded from the projection (unless also in the index). Verify with explain() - look for "totalDocsExamined: 0".

Permalink

Q8:

What is index cardinality and how does it affect performance?

Mid

Answer

Higher cardinality means more unique values, making indexes more selective and improving query performance.

Quick Summary: Index cardinality is the number of distinct values an indexed field has. High cardinality (user email, userId) = index is very selective = fast queries. Low cardinality (boolean, status with 3 values) = index is not selective = MongoDB may skip it and prefer a collection scan. Always index high-cardinality fields. Low-cardinality fields work better as second fields in compound indexes.

Permalink

Q9:

What are multi-key indexes?

Mid

Answer

Multi-key indexes allow indexing array fields by indexing each element, enabling fast queries over arrays.

Quick Summary: Multi-key indexes are created on array fields. MongoDB creates an index entry for every element in the array. This allows efficient queries on array contents: find all users where tags contains "mongodb". MongoDB automatically detects and creates a multi-key index when you index an array field. Limitation: a compound index can have at most one multi-key field.

Permalink

Q10:

What is the difference between $in and $nin in performance?

Mid

Answer

$in can use indexes efficiently while $nin generally causes collection scans because it excludes values.

Quick Summary: $in queries documents where a field value is in a provided array. MongoDB uses the index efficiently if the array is small. $nin is "not in" - much slower because it can't use indexes effectively for negative conditions (has to scan all non-matching values). Avoid $nin on large collections. Use a whitelist ($in) approach instead of blacklist ($nin) when possible.

Permalink

Q11:

What is write concern and why is it important?

Mid

Answer

Write concern specifies how many nodes must acknowledge a write. Higher levels improve durability but increase latency.

Quick Summary: Write concern defines durability guarantees. w:1: primary wrote to memory. w:majority: majority of replica set persisted the write - survives primary failure. j:true: write is persisted to journal before acknowledging (survives crashes). For financial data or anything you can't lose: use {w: "majority", j: true}. Higher concern = higher latency.

Permalink

Q12:

What is read concern in MongoDB?

Mid

Answer

Read concern determines the consistency level of reads, such as local, majority, or snapshot for transactions.

Quick Summary: Read concern controls data freshness and isolation for read operations. local: returns data that may not be majority-committed (default). majority: returns data acknowledged by majority of replicas - won't roll back. linearizable: guarantees reading the most recent majority-committed data (slowest). snapshot: consistent point-in-time view for transactions. Choose based on consistency requirements.

Permalink

Q13:

How does MongoDB ensure durability during crashes?

Mid

Answer

MongoDB uses journaling to write operations to journal files before applying them, ensuring recovery after crashes.

Quick Summary: MongoDB ensures durability through: journaling (write-ahead log persists writes before applying), WiredTiger checkpoints (periodic full data snapshots), and replica set replication (data copied to multiple nodes). On crash, MongoDB replays the journal from the last checkpoint to restore to a consistent state. With write concern majority + journaling enabled, committed writes survive node failures.

Permalink

Q14:

What are write-ahead logs (journal files) and how do they work?

Mid

Answer

Journal files store operations sequentially for atomicity and crash recovery. MongoDB replays journals after restarts.

Quick Summary: Write-ahead logging (WAL / journal): before MongoDB applies any data change, it first writes the operation to the journal file on disk. If the server crashes mid-write, MongoDB replays the journal on startup to complete or roll back the incomplete operation. This ensures the data files are never left in a partially-written, inconsistent state after a crash.

Permalink

Q15:

What is a MongoDB transaction and when is it needed?

Mid

Answer

MongoDB transactions allow multi-document ACID operations, needed when updating related data across collections.

Quick Summary: MongoDB transactions provide ACID guarantees across multiple documents and collections (since MongoDB 4.0 for replica sets, 4.2 for sharded clusters). Use when you need to update multiple documents atomically - e.g., transfer money between two accounts. Transactions have performance overhead - they hold locks and use snapshot isolation. Design schemas to minimize transaction needs.

Permalink

Q16:

What is $merge used for?

Mid

Answer

$merge writes aggregation results into a target collection, supporting upserts and replacements useful for ETL workflows.

Quick Summary: $merge writes aggregation pipeline results to a collection (either inserting or merging into existing documents). More flexible than $out (which replaces the whole collection). You can specify what to do when a matching document exists: replace, merge, keep existing, fail, or run a custom pipeline. Use for building materialized views or pre-aggregated reports.

Permalink

Q17:

What challenges arise when using transactions?

Mid

Answer

Transactions add latency, reduce concurrency, and require replica set or sharded clusters. They must be used sparingly.

Quick Summary: Challenges with MongoDB transactions: performance overhead (locks held, snapshot maintained for duration), limited to 60 seconds by default, increased conflict and abort rate under high concurrency, cross-shard transactions add latency. Best practice: keep transactions short, minimize documents touched, prefer schema design that avoids transactions (embedding, atomic update operators).

Permalink

Q18:

How does sharding work in MongoDB?

Mid

Answer

Sharding distributes data across shards based on a shard key. mongos routes queries and config servers store metadata.

Quick Summary: Sharding distributes collection data across shards based on the shard key. MongoDB splits data into chunks (default 128MB ranges). The config server replica set stores the chunk-to-shard mapping. mongos routers use this map to direct queries to the right shard(s). A query on the shard key hits one shard; a query without it hits all shards (scatter-gather).

Permalink

Q19:

What is the role of the mongos router?

Mid

Answer

mongos routes application queries to the correct shards and abstracts the distributed cluster from clients.

Quick Summary: mongos is the routing layer for a sharded cluster. Client applications connect to mongos (not directly to shards). mongos queries the config servers for the chunk map, determines which shard(s) hold the relevant data, fans out queries to those shards, merges results, and returns to the client. It's stateless and you can run multiple mongos instances for high availability.

Permalink

Q20:

What makes a good shard key?

Mid

Answer

A good shard key must offer high cardinality, distribute writes evenly, and match query patterns to avoid hotspots.

Quick Summary: A good shard key: high cardinality (many distinct values), writes distributed across all shards (no hotspot), frequently appears in queries (query isolation to one shard), not monotonically increasing. Hash shard keys distribute writes evenly but lose range query efficiency. Compound shard keys can balance writes and query isolation. Bad key = uneven distribution = one shard gets all the load.

Permalink

Q21:

What are chunk migrations in MongoDB?

Mid

Answer

Chunks are ranges of shard key values that move between shards to balance data. The balancer manages migrations.

Quick Summary: When data becomes unevenly distributed across shards, the balancer moves chunks between shards to rebalance. A chunk migration copies the chunk data from source to destination shard, then updates the config server routing table, then removes the data from the source. Migrations happen in the background but consume I/O and network bandwidth - can impact performance.

Permalink

Q22:

What is the purpose of the balancer?

Mid

Answer

The balancer ensures even data distribution across shards by moving chunks when imbalance occurs.

Quick Summary: The balancer is a background process that ensures chunks are distributed evenly across shards. When shard chunk counts are imbalanced (difference exceeds a threshold), the balancer migrates chunks from the most-loaded shard to the least-loaded. You can schedule balancing windows to avoid running during peak hours and minimize performance impact.

Permalink

Q23:

What causes chunk migration performance issues?

Mid

Answer

Large documents, poor shard keys, heavy writes, and slow inter-shard networks can slow migrations.

Quick Summary: Chunk migration performance issues: migrations copy data over the network, consuming bandwidth. During migration, writes to migrating chunks are paused briefly for the final sync. If you have a poor shard key causing constant imbalance, the balancer migrates continuously. Jumbo chunks (too large to split) can't be migrated, causing permanent imbalance on one shard.

Permalink

Q24:

What is a change stream in MongoDB?

Mid

Answer

Change streams provide real-time events for inserts, updates, and deletes. Useful for microservices and cache invalidation.

Quick Summary: Change streams provide real-time notifications of data changes in MongoDB (inserts, updates, deletes, DDL). They use the oplog under the hood. Consume with a watch() call. Resumable - you save a resume token and restart from a specific point after failure. Use for: triggering downstream actions (invalidate cache, send notification), event sourcing, real-time dashboards.

Permalink

Q25:

What is $graphLookup and when is it useful?

Mid

Answer

$graphLookup performs recursive lookups, useful for hierarchical structures like org charts or categories.

Quick Summary: $graphLookup performs recursive lookups to traverse graph or tree-like data. Given a starting document, it recursively fetches documents connected via a specified field. Use for: org charts, friend-of-friend networks, category hierarchies, file system trees. More efficient than multiple application-side queries for graph traversal. Set maxDepth to limit recursion.

Permalink

Q26:

How do you detect slow queries in MongoDB?

Mid

Answer

Use slow query logs, profiler, and explain() to identify high scan ratios and missing indexes.

Quick Summary: Detect slow queries with: MongoDB profiler (set db.setProfilingLevel(1, {slowms: 100}) to log queries slower than 100ms to system.profile collection), mongotop (shows per-collection read/write time), mongostat (server-wide stats), Atlas Performance Advisor, and the currentOp command to see queries running right now. Follow with explain() on slow queries to find missing indexes.

Permalink

Q27:

What is the role of the WiredTiger storage engine?

Mid

Answer

WiredTiger provides document-level locking, compression, checkpoints, and high concurrency performance.

Quick Summary: WiredTiger is the default MongoDB storage engine since MongoDB 3.2. It provides: document-level concurrency control (multiple writers don't block each other), compression (snappy by default - saves 50-80% disk space), checkpointing (consistent snapshots every 60 seconds), and write-ahead logging for crash recovery. Replaced the old MMAPv1 engine which used collection-level locking.

Permalink

Q28:

How does WiredTiger compression improve storage?

Mid

Answer

Compression reduces disk usage and improves I/O performance by reading and writing fewer bytes.

Quick Summary: WiredTiger compresses data using snappy (default - fast, moderate compression), zlib (slower, better compression), or zstd (MongoDB 4.2+ - best balance). Compression is applied to both data and indexes on disk. This reduces storage costs significantly and can improve I/O performance since less data is read from/written to disk. CPU cost of compression is usually worth the I/O savings.

Permalink

Q29:

What are checkpoints in WiredTiger?

Mid

Answer

Checkpoints flush in-memory data to disk periodically, ensuring durable restart points.

Quick Summary: WiredTiger checkpoints write a consistent snapshot of all in-memory data to disk every 60 seconds (or when the journal reaches 2GB). Checkpoints create a new consistent data file state. On crash recovery, MongoDB restores from the last checkpoint and then replays the journal to apply changes made after that checkpoint. This limits recovery time to the last 60 seconds of journal data.

Permalink

Q30:

What causes collection-level locking and how to avoid it?

Mid

Answer

Multi-document operations and unindexed writes can cause lock contention. Use indexes and smaller writes to avoid locking.

Quick Summary: WiredTiger uses document-level locking so multiple operations can write to the same collection concurrently without blocking each other. Collection-level locking only happens for certain operations like createIndex or collMod. Avoid these in production on large collections. In older MMAPv1, collection-level locking caused severe write contention under concurrent load.

Permalink

Q31:

What is a working set and why is it important?

Mid

Answer

The working set is frequently accessed data and indexes. Performance drops if it exceeds available RAM.

Quick Summary: Working set is the data and indexes that MongoDB actively uses - what fits in RAM. When working set fits in WiredTiger's cache (60% of RAM by default), reads are served from memory. When working set exceeds RAM, MongoDB pages data to/from disk - causing I/O spikes and slow reads. Size your RAM so the frequently accessed working set fits. Monitor using serverStatus.wiredTiger.cache metrics.

Permalink

Q32:

What is index intersection?

Mid

Answer

MongoDB can combine multiple indexes to satisfy a query, useful when no single index covers all fields.

Quick Summary: Index intersection allows MongoDB to use two separate indexes to satisfy a single query (instead of requiring a compound index). MongoDB ANDs the results from both indexes. In practice, a well-designed compound index almost always outperforms index intersection. Check explain() output - if you see "AND_HASH" or "AND_SORTED" stages, MongoDB is using index intersection.

Permalink

Q33:

Why do large documents degrade performance?

Mid

Answer

Large documents slow reads and writes, increase RAM usage, and reduce replication and migration performance.

Quick Summary: Large documents hurt performance: they consume more cache space (fewer docs fit in RAM), take longer to transfer over network, and slow down reads even when you only need a few fields (unless you use projection). If you regularly read only part of a document, consider splitting into multiple documents or using projection to avoid fetching unused fields.

Permalink

Q34:

What is the difference between primary and secondary reads?

Mid

Answer

Primary reads are strongly consistent, while secondary reads are eventually consistent and used for load balancing.

Quick Summary: Primary reads: always fresh, consistent, but all reads go to one node (can be bottleneck). Secondary reads: distributed across replica set members, reduces primary load, but data may be slightly behind (replication lag). Use secondaries for read-heavy analytics workloads or reporting where slight staleness is acceptable. Never read from secondaries for data that needs to be immediately consistent.

Permalink

Q35:

What is replication lag and why does it occur?

Mid

Answer

Lag occurs when secondaries apply changes slower than primary. Causes include heavy writes and slow hardware.

Quick Summary: Replication lag is the delay between a write on the primary and its application on a secondary. Caused by: secondary hardware being slower, heavy write load, network latency, large write operations. Monitor with rs.printSecondaryReplicationInfo(). High lag means secondaries are stale - reads from them return old data and they're slower to take over if primary fails.

Permalink

Q36:

What is the oplog and how does it support replication?

Mid

Answer

The oplog is a capped collection storing recent operations. Secondaries replay oplog entries to stay in sync.

Quick Summary: The oplog (operations log) is a capped collection on each replica set member that records all write operations. Secondaries continuously tail the primary's oplog and apply operations in order. Replication lag grows when secondaries can't keep up. Change streams use the oplog. Oplog size matters: if a secondary falls too far behind, the oplog might not contain the missing entries.

Permalink

Q37:

What is majority write concern and why use it?

Mid

Answer

Majority write concern ensures writes are replicated to most nodes, preventing data loss after failovers.

Quick Summary: Majority write concern (w: "majority") ensures the write is acknowledged by the majority of replica set members before returning success. If the primary fails and a new primary is elected, a majority-acknowledged write is guaranteed to be present on the new primary. Without majority concern, writes acknowledged only by the primary can be rolled back during failover.

Permalink

Q38:

How do you optimize MongoDB for high write throughput?

Mid

Answer

Use good shard keys, bulk writes, avoid unnecessary indexes, keep documents small, and tune WiredTiger cache.

Quick Summary: High write throughput optimization: use bulk writes (bulkWrite() - fewer round trips), avoid per-document indexes (indexes slow writes), use unordered bulk inserts (continue on error, parallel), distribute writes across shards with a good shard key, avoid transactions where possible (they add overhead), use write concern w:1 if you can tolerate some risk, and benchmark WiredTiger cache size.

Permalink

Q39:

What is $project in aggregation?

Mid

Answer

$project selects, removes, or transforms fields, helping control output structure and performance.

Quick Summary: $project in aggregation reshapes documents - include specific fields (field: 1), exclude fields (field: 0), rename fields (newName: "$oldName"), and add computed fields using expressions. Reduces document size early in the pipeline to minimize memory used by subsequent stages. Similar to SQL SELECT - define exactly what fields you want in the output.

Permalink

Q40:

How does MongoDB handle multi-document ACID transactions internally?

Mid

Answer

MongoDB uses two-phase commit, snapshot isolation, and transaction logs to ensure atomic multi-document operations.

Quick Summary: MongoDB multi-document transactions use snapshot isolation (read your own writes, consistent view of data as of transaction start). Internally: WiredTiger takes a snapshot at transaction start, all reads see the snapshot, writes are buffered and committed atomically. On commit, WiredTiger checks for write-write conflicts - if another transaction modified the same document, one is aborted and must retry.

Permalink

Mid MongoDB Interview Questions

MongoDB Interview Questions & Answers

Questions

How does MongoDB handle schema flexibility while still allowing schema validation?

Answer

What are the main differences between embedding and referencing in MongoDB?

Answer

How do compound indexes improve query performance?

Answer

What is an index prefix and why does it matter in compound indexing?

Answer

What is the purpose of an aggregation pipeline’s $facet stage?

Answer

What is $unwind and why is it used?

Answer

What is a covered query in MongoDB?

Answer

What is index cardinality and how does it affect performance?

Answer

What are multi-key indexes?

Answer

What is the difference between $in and $nin in performance?

Answer

What is write concern and why is it important?

Answer

What is read concern in MongoDB?

Answer

How does MongoDB ensure durability during crashes?

Answer

What are write-ahead logs (journal files) and how do they work?

Answer

What is a MongoDB transaction and when is it needed?

Answer

What is $merge used for?

Answer

What challenges arise when using transactions?

Answer

How does sharding work in MongoDB?

Answer

What is the role of the mongos router?

Answer

What makes a good shard key?

Answer

What are chunk migrations in MongoDB?

Answer

What is the purpose of the balancer?

Answer

What causes chunk migration performance issues?

Answer

What is a change stream in MongoDB?

Answer

What is $graphLookup and when is it useful?

Answer

How do you detect slow queries in MongoDB?

Answer

What is the role of the WiredTiger storage engine?

Answer

How does WiredTiger compression improve storage?

Answer

What are checkpoints in WiredTiger?

Answer

What causes collection-level locking and how to avoid it?

Answer

What is a working set and why is it important?

Answer

What is index intersection?

Answer

Why do large documents degrade performance?

Answer

What is the difference between primary and secondary reads?

Answer

What is replication lag and why does it occur?

Answer

What is the oplog and how does it support replication?

Answer

What is majority write concern and why use it?

Answer

How do you optimize MongoDB for high write throughput?

Answer

What is $project in aggregation?