Optimizing MongoDB Queries with Indexing in NestJS & TypeScript

How much data is a lot of data?

In databases, the answer often depends on the application and its use cases. For a startup, “a lot of data” might mean few gigabytes, while for an enterprise collections, it could be terabytes or petabytes. Regardless of scale, as data grows, efficient queries become crucial, and that’s where indexing comes into play.

Indexes enable database servers to execute queries far more efficiently by avoiding the need to scan the entire collection of documents. Sounds impressive, doesn’t it? In this article, we’ll explore the advantages and potential trade-offs of indexing in MongoDB.

What are indexes?

Indexes in databases are analogous to indexes in a book. Imagine you’re looking for a specific chapter in a 500-page book. Without an index, you’d have to flip through every page to find what you’re looking for. With an index, however, you can jump directly to the right page. Similarly, indexes in databases provide a shortcut to data, allowing queries to find relevant information quickly.

In technical terms, an index is a data structure that improves the speed of data retrieval operations on a database table or collection. It works like a table of contents in a book, allowing the database to locate data without scanning the entire dataset.

How do they work?

Indexes are data structures that store a subset of data (like specific fields) in a way that makes it easier for the database engine to search through them. Instead of scanning the entire dataset, the database can look at the index to pinpoint the exact location of the required data.

Here’s a simple example of how indexes work:

User ID	Location	Pointer to Data
1	Gdansk	Document #1
2	Warsaw	Document #2
3	Krakow	Document #3
4	Gdansk	Document #4
5	Poznan	Document #5

In this table, the index is built on the “Location” field. It contains a mapping of locations to pointers that indicate where the actual data resides. For instance, if you query for users in “Gdansk,” the database uses this index to directly locate the relevant documents (#1 and #4) instead of scanning the entire dataset.

Setup guide

To follow along with this article, you’ll need the following:

Backend: NestJS with TypeScript and Mongoose.
Database: A dockerized MongoDB instance.

The repository with the code used in this article can be found here.

Indexing with Mongoose

Let’s say we want to retrieve data from a users collection collection, specifically for users located in Gdańsk. Here’s a simple controller and service for that:

users.controller.ts

@Get(‘location/search’)

findByLocation(@Query(‘location’) location: string) {

return this.usersService.findByLocation(location);

}

users.service.ts

async findByLocation(location: string) {

return this.userModel

.find({ location })

.explain(‘executionStats’);

}

The .explain(‘executionStats’) method provides execution statistics for the query.

Here’s the result for our users collection, which contains 11 documents:

{

“explainVersion”: “1”,

“queryPlanner”: {

“namespace”: “nestjs.users”,

“parsedQuery”: {

“location”: {

“$eq”: “Gdansk”

}

...

“executionStats”: {

“executionSuccess”: true,

“nReturned”: 2,

“executionTimeMillis”: 4,

“totalKeysExamined”: 0,

“totalDocsExamined”: 11,

...

}

As you can see, finding a user with a specific location required scanning the entire collection. This isn’t efficient, especially with larger datasets.

Adding an index

Using Mongoose, we can easily create indexes in our collection by adding the index: true flag to the schema:

user.schema.ts

@Schema({ timestamps: true })

export class User {

@Prop({ required: true })

name: string;

@Prop({ required: true, unique: true })

email: string;

@Prop({ required: true, index: true })

location: string;

}

This creates a secondary index on the location property. The index applies to both new and existing documents in the collection when the application starts.

Now, running the same findByLocation query produces these results:

{

“explainVersion”: “1”,

“queryPlanner”: {

“namespace”: “nestjs.users”,

“parsedQuery”: {

“location”: {

“$eq”: “Gdansk”

}

...

“executionStats”: {

“executionSuccess”: true,

“nReturned”: 2,

“executionTimeMillis”: 1,

“totalKeysExamined”: 2,

“totalDocsExamined”: 2,

...

}

This time, only 2 documents were scanned instead of 11. Although the dataset is small, the performance improvement is evident.

Unique indexes

In the schema, you’ll notice we used both index: true and unique: true. It’s important to understand that the unique flag also creates an index, but with the added constraint of ensuring that all values in the field are unique.

This happens because MongoDB uses indexes to efficiently enforce uniqueness and quickly look up existing values without redundant checks. This is particularly useful for fields like emails, where duplicates are not allowed.

Downsides

While indexing improves query performance, they are not always a perfect solution.

Increased Disk Space Usage: Indexes consume additional disk space, which can grow significantly as you add more indexed fields and documents.

Overhead for CRUD Operations: Every time you perform a CRUD operation, the database must update the relevant indexes. This additional work can slow down write-heavy applications.

Summary

Indexes are a powerful tool for improving query performance, but they’re not a silver bullet. A poorly designed query can still perform badly, even with indexes in place. Proper query design and an understanding of MongoDB’s query planner are crucial to getting the best performance.

When used strategically, indexes can significantly enhance your MongoDB application’s performance, but always consider the trade-offs before deciding to implement them. Let your application’s specific use case and workload guide you.