MongoDB Interview Cheat Sheet

1. What is MongoDB?

MongoDB is a NoSQL document database that stores data as JSON-like documents (BSON). Unlike relational databases, there are no tables with fixed schemas - each document can have different fields. It's designed for flexibility, horizontal scaling, and developer productivity. Used widely for catalogs, user profiles, content, and real-time analytics.

Entry Full Answer →

2. What is a document in MongoDB?

A document is a single record in MongoDB stored as BSON (Binary JSON). It contains key-value pairs like a JSON object. Documents can have nested objects and arrays. Example: a user document might have name, email, address (nested object), and orders (array of objects) - all in one document. Maximum document size is 16MB.

Entry Full Answer →

3. What is a collection?

A collection is a group of documents in MongoDB - roughly equivalent to a table in SQL. But unlike SQL tables, collections have no enforced schema by default - documents in the same collection can have different fields. Collections are created automatically when you insert the first document. You query and index at the collection level.

Entry Full Answer →

4. What is a database in MongoDB?

A database in MongoDB is a container for collections. One MongoDB server can host multiple databases. Each database has its own set of files on disk. You switch databases with "use dbname". Common practice: one database per application. Unlike SQL, creating a database just requires inserting data - no explicit CREATE DATABASE needed.

Entry Full Answer →

5. What is BSON?

BSON (Binary JSON) is the binary format MongoDB uses to store documents. It extends JSON with additional types: Date, ObjectId, Binary data, 32/64-bit integers, Decimal128. BSON is faster to encode/decode than JSON and supports more data types. When you work with MongoDB drivers, you use JSON-like syntax but the data is stored as BSON internally.

Entry Full Answer →

6. What is an ObjectId?

ObjectId is MongoDB's default primary key type - a 12-byte unique identifier automatically generated for the _id field. It encodes: 4-byte timestamp, 5-byte random value (unique per machine/process), 3-byte incrementing counter. This makes ObjectIds roughly sortable by creation time, unique across distributed systems, and generated client-side without DB round-trips.

Entry Full Answer →

7. What is a schema in MongoDB?

MongoDB is schemaless by default - no schema definition required. But you can enforce structure using Schema Validation (JSON Schema rules defined on the collection). This lets you have flexible schemas during development but add validation rules as the app matures. Most applications use Mongoose (Node.js) or similar ODM to define schemas at the application layer.

Entry Full Answer →

8. What is the purpose of the find() method?

find() queries a collection and returns a cursor of matching documents. Usage: db.users.find({age: {$gt: 18}}). The cursor is lazy - documents are fetched in batches as you iterate. You can chain .sort(), .limit(), .skip(), .project() to shape the results. Without arguments, find() returns all documents in the collection.

Entry Full Answer →

9. What is the difference between find() and findOne()?

find() returns a cursor with all matching documents - you iterate through them. findOne() returns the first matching document directly (not a cursor), or null if none found. Use findOne() when you only need one result and don't want to deal with cursor iteration. It's slightly more efficient when you genuinely only need one document.

Entry Full Answer →

10. What does the updateOne() function do?

updateOne() updates the first document matching a filter. Takes two args: filter (which docs to match) and update (what to change). Use $set to change specific fields without replacing the whole document. Returns an object with matchedCount and modifiedCount. If you want to update all matching documents, use updateMany() instead.

Entry Full Answer →

11. What is a deleteOne() operation?

deleteOne() removes the first document matching a filter. db.users.deleteOne({_id: id}) deletes exactly one user. Returns deletedCount. If multiple documents match the filter, only the first found is deleted. For deleting all matching documents, use deleteMany(). Always double-check your filter before running delete operations in production.

Entry Full Answer →

12. What is field projection in MongoDB?

Field projection controls which fields are returned in query results - reduces data transfer and memory usage. In find(), the second argument is the projection: {name: 1, email: 1} returns only name and email. {password: 0} excludes the password field. You can't mix include and exclude in the same projection (except for _id which can always be excluded).

Entry Full Answer →

13. What is an index in MongoDB?

An index in MongoDB is a data structure (B-tree) that speeds up queries by allowing MongoDB to find documents without scanning the entire collection. Without an index, every query does a full collection scan (COLLSCAN). Create an index on frequently queried fields: db.users.createIndex({email: 1}). Too many indexes slow down writes.

Entry Full Answer →

14. What is a primary key in MongoDB?

Every MongoDB document has an _id field that serves as the primary key - it must be unique within the collection. By default, MongoDB auto-generates an ObjectId for _id. You can provide your own _id value (string, int, etc.) but it must be unique. The _id field is always indexed automatically.

Entry Full Answer →

15. What is a replica set?

A replica set is a group of MongoDB servers that maintain the same dataset. One is the primary (handles writes), the rest are secondaries (replicate from primary, can serve reads). If the primary fails, secondaries elect a new primary automatically (failover). Provides high availability and data redundancy. Minimum 3 nodes recommended for proper elections.

Entry Full Answer →

16. What is sharding in MongoDB?

Sharding distributes data across multiple servers (shards) to handle datasets too large for one machine or write throughput too high for one server. Each shard holds a subset of the data determined by the shard key. A mongos router directs queries to the right shard(s). Config servers store the metadata about which chunks live on which shard.

Entry Full Answer →

17. What is MongoDB Atlas?

MongoDB Atlas is MongoDB's fully managed cloud database service. It runs on AWS, Azure, or GCP. Atlas handles provisioning, backups, monitoring, scaling, security patches, and upgrades automatically. Provides Atlas Search (full-text), Atlas Data Lake, Atlas Charts, and online archive. It's the recommended way to run MongoDB in production - no ops overhead.

Entry Full Answer →

18. What is the difference between MongoDB and a relational database?

MongoDB vs relational: MongoDB stores data as flexible documents (no fixed schema), relational uses tables with fixed columns. MongoDB doesn't support joins natively (use $lookup or embed data). MongoDB scales horizontally via sharding; relational typically scales vertically. Relational is better for complex transactions and structured data. MongoDB wins for flexible, hierarchical, and rapidly evolving schemas.

Entry Full Answer →

19. What is the purpose of the $set operator?

$set updates specific fields of a document without replacing the whole thing. db.users.updateOne({_id: id}, {$set: {name: "Alice", age: 30}}). Only the specified fields change; other fields stay intact. Without $set, if you pass a plain object MongoDB replaces the entire document (losing all other fields). Always use $set for partial updates.

Entry Full Answer →

20. What does the $inc operator do?

$inc atomically increments (or decrements) a numeric field by the given amount. db.products.updateOne({_id: id}, {$inc: {stock: -1, views: 1}}). Decrements if the value is negative. Atomic - safe for concurrent updates (no read-modify-write race condition). Commonly used for counters, inventory tracking, and vote counts.

Entry Full Answer →

21. What is a capped collection and when should it be used?

Capped collections have a fixed maximum size (in bytes) and optionally a max document count. When full, oldest documents are automatically overwritten by new ones (circular buffer). No deletes needed. Use for: logs, event streams, caches where only recent data matters. Insert order is maintained. Downside: can't delete individual documents, limited update operations.

Junior Full Answer →

22. What is the difference between $push and $addToSet?

$push appends a value to an array even if it already exists - can create duplicates. $addToSet adds a value only if it doesn't already exist in the array - like a set in math. Use $addToSet when maintaining unique values (tags, categories, user IDs). Use $push when order matters or duplicates are allowed (event log entries).

Junior Full Answer →

23. What is an embedded document and when is embedding recommended?

Embedded documents store related data together in one document (address inside a user doc). Recommended when data is accessed together, relationship is one-to-one or one-to-few, and child data doesn't grow unboundedly. Referencing stores the related document's _id and uses $lookup for joins. Use referencing for many-to-many, frequently changing data, or data shared across documents.

Junior Full Answer →

24. What is data referencing in MongoDB?

Data referencing stores the _id of a related document instead of embedding the data. Like a foreign key in SQL. Used when: data is large, shared across many documents, or independently accessed. Requires a separate query or $lookup to fetch the referenced data. Trade-off: two queries or slower $lookup vs embedded doc simplicity.

Junior Full Answer →

25. What is the purpose of the aggregation pipeline?

The aggregation pipeline processes documents through a series of stages to transform and analyze data. Common stages: $match (filter), $group (aggregate by field), $sort, $project (reshape), $lookup (join), $unwind (flatten arrays), $limit, $skip. Each stage passes its output to the next. More powerful than find() for analytics and data transformation.

Junior Full Answer →

26. What is $lookup used for?

$lookup performs a left outer join between collections in an aggregation pipeline. It matches documents from the "from" collection based on a localField/foreignField pair and adds matched docs as an array in the output. Similar to SQL JOIN. Performance tip: $lookup is expensive - consider embedding if you always access data together.

Junior Full Answer →

27. What is the difference between insertOne and insertMany?

insertOne() inserts a single document and returns the inserted document's _id. insertMany() inserts an array of documents in one operation - faster than calling insertOne() in a loop (one network round-trip). insertMany() by default stops on first error (ordered mode). Set {ordered: false} to continue inserting remaining documents even if some fail.

Junior Full Answer →

28. What is the purpose of TTL indexes?

TTL (Time To Live) indexes automatically delete documents after a specified number of seconds. Created with expireAfterSeconds: db.sessions.createIndex({createdAt: 1}, {expireAfterSeconds: 3600}) deletes documents after 1 hour. MongoDB runs a background cleanup process every 60 seconds. Use for: sessions, cache entries, temporary data, audit logs with retention policies.

Junior Full Answer →

29. What is the explain function and why is it useful?

explain() shows how MongoDB executes a query - which index was used (IXSCAN vs COLLSCAN), how many documents were examined, execution time, and query plan. Use explain("executionStats") for detailed stats. Essential for performance debugging - if you see COLLSCAN on a frequently run query, you need an index. Always run explain() on new queries in development.

Junior Full Answer →

30. What is a write concern?

Write concern controls how many replica set members must acknowledge a write before MongoDB considers it successful. w:1 (default): primary acknowledges. w:majority: majority of members must acknowledge - safer, slower. w:0: fire and forget. Higher write concern = stronger durability guarantee but higher latency. Choose based on your data loss tolerance.

Junior Full Answer →

31. What is a read preference?

Read preference controls which replica set member handles read operations. primary: all reads from primary (consistent, default). primaryPreferred: primary if available, else secondary. secondary: always read from secondaries (may be slightly stale). secondaryPreferred: secondaries when available. nearest: lowest network latency. Use secondaries to distribute read load but accept eventual consistency.

Junior Full Answer →

32. What is journaling in MongoDB?

Journaling writes every write operation to an on-disk journal (write-ahead log) before applying it to data files. If MongoDB crashes mid-write, it replays the journal on restart to recover to a consistent state. Enabled by default since MongoDB 3.2. Without journaling, a crash between the write and fsync can corrupt data files.

Junior Full Answer →

33. What is $regex used for?

$regex filters documents where a string field matches a regular expression. db.users.find({name: {$regex: "^alice", $options: "i"}}) finds users whose name starts with "alice" (case-insensitive). Performance warning: regex queries without a text index or leading wildcard can't use indexes and cause full collection scans. Anchor patterns to the start (^) when possible.

Junior Full Answer →

34. What is the difference between save and update?

save() was removed in MongoDB 5.x. Previously: if the document had an _id that matched an existing document, it replaced the whole document; otherwise it inserted. update() (now updateOne/updateMany) modifies specific fields. Always use insertOne/updateOne/replaceOne explicitly - they're clearer about intent and safer than the old save() which could silently replace entire documents.

Junior Full Answer →

35. What is sharding key selection and why is it important?

The shard key determines how data is distributed across shards. A good shard key has high cardinality (many distinct values), even write distribution (avoid hotspots), and is included in most queries. Bad choices: monotonically increasing keys (like timestamps or ObjectId) cause all writes to go to one shard. Hash sharding distributes ObjectIds evenly across shards.

Junior Full Answer →

36. How does MongoDB handle schema flexibility while still allowing schema validation?

MongoDB supports schema flexibility by default but lets you add validation via JSON Schema rules on a collection. You specify required fields, field types, value ranges, and patterns. Validation happens on insert and update. Use validationAction: "warn" during migration (logs violations without rejecting) or "error" to enforce strictly. This balances flexibility with data integrity.

Mid Full Answer →

37. What are the main differences between embedding and referencing in MongoDB?

Embedding: store related data in one document. Pro: one read, atomic updates, no joins. Con: document size limit, data duplication. Referencing: store _id, fetch separately. Pro: no duplication, smaller documents, shared data. Con: requires extra query or $lookup. Rule: embed when data is accessed together and is one-to-few. Reference when data is shared, large, or frequently updated independently.

Mid Full Answer →

38. How do compound indexes improve query performance?

Compound indexes cover multiple fields in a specific order. db.orders.createIndex({userId: 1, createdAt: -1}) supports queries filtering by userId and sorting by createdAt descending. This is much faster than two separate indexes because MongoDB traverses one B-tree. The order of fields in the index matters - place equality fields first, then sort fields, then range fields.

Mid Full Answer →

39. What is an index prefix and why does it matter in compound indexing?

Index prefix means a compound index {a, b, c} supports queries on {a}, {a, b}, or {a, b, c} but NOT on {b} or {c} alone. MongoDB can only use a compound index from the leftmost field forward. If you frequently query by {b} alone, you need a separate index. Designing indexes with the right field order avoids creating redundant indexes.

Mid Full Answer →

40. What is the purpose of an aggregation pipeline’s $facet stage?

$facet runs multiple sub-pipelines on the same input in parallel, each producing a different result in the output. Useful for building faceted search results - one sub-pipeline counts by category, another by price range, another for the actual results. All computed in one aggregation pass instead of multiple queries.

Mid Full Answer →

41. What is $unwind and why is it used?

$unwind deconstructs an array field - for each element in the array, it outputs a separate document. Example: a product document with a sizes array [S, M, L] becomes three documents, one per size. Necessary when you want to group, filter, or sort by individual array elements in an aggregation pipeline.

Mid Full Answer →

42. What is a covered query in MongoDB?

A covered query is satisfied entirely by the index - MongoDB never reads the actual documents. This is the fastest possible query execution. For a query to be covered: all queried fields must be in the index, all projected fields must be in the index, and _id must be excluded from the projection (unless also in the index). Verify with explain() - look for "totalDocsExamined: 0".

Mid Full Answer →

43. What is index cardinality and how does it affect performance?

Index cardinality is the number of distinct values an indexed field has. High cardinality (user email, userId) = index is very selective = fast queries. Low cardinality (boolean, status with 3 values) = index is not selective = MongoDB may skip it and prefer a collection scan. Always index high-cardinality fields. Low-cardinality fields work better as second fields in compound indexes.

Mid Full Answer →

44. What are multi-key indexes?

Multi-key indexes are created on array fields. MongoDB creates an index entry for every element in the array. This allows efficient queries on array contents: find all users where tags contains "mongodb". MongoDB automatically detects and creates a multi-key index when you index an array field. Limitation: a compound index can have at most one multi-key field.

Mid Full Answer →

45. What is the difference between $in and $nin in performance?

$in queries documents where a field value is in a provided array. MongoDB uses the index efficiently if the array is small. $nin is "not in" - much slower because it can't use indexes effectively for negative conditions (has to scan all non-matching values). Avoid $nin on large collections. Use a whitelist ($in) approach instead of blacklist ($nin) when possible.

Mid Full Answer →

46. What is write concern and why is it important?

Write concern defines durability guarantees. w:1: primary wrote to memory. w:majority: majority of replica set persisted the write - survives primary failure. j:true: write is persisted to journal before acknowledging (survives crashes). For financial data or anything you can't lose: use {w: "majority", j: true}. Higher concern = higher latency.

Mid Full Answer →

47. What is read concern in MongoDB?

Read concern controls data freshness and isolation for read operations. local: returns data that may not be majority-committed (default). majority: returns data acknowledged by majority of replicas - won't roll back. linearizable: guarantees reading the most recent majority-committed data (slowest). snapshot: consistent point-in-time view for transactions. Choose based on consistency requirements.

Mid Full Answer →

48. How does MongoDB ensure durability during crashes?

MongoDB ensures durability through: journaling (write-ahead log persists writes before applying), WiredTiger checkpoints (periodic full data snapshots), and replica set replication (data copied to multiple nodes). On crash, MongoDB replays the journal from the last checkpoint to restore to a consistent state. With write concern majority + journaling enabled, committed writes survive node failures.

Mid Full Answer →

49. What are write-ahead logs (journal files) and how do they work?

Write-ahead logging (WAL / journal): before MongoDB applies any data change, it first writes the operation to the journal file on disk. If the server crashes mid-write, MongoDB replays the journal on startup to complete or roll back the incomplete operation. This ensures the data files are never left in a partially-written, inconsistent state after a crash.

Mid Full Answer →

50. What is a MongoDB transaction and when is it needed?

MongoDB transactions provide ACID guarantees across multiple documents and collections (since MongoDB 4.0 for replica sets, 4.2 for sharded clusters). Use when you need to update multiple documents atomically - e.g., transfer money between two accounts. Transactions have performance overhead - they hold locks and use snapshot isolation. Design schemas to minimize transaction needs.

Mid Full Answer →