A guide to MongoDB monitoring

What is MongoDB?

MongoDB is a NoSQL database that stores data as collections of JSON documents. A JSON document contains a set of key-value pairs to represent information. Data inside NoSQL databases doesn’t have to conform to a rigid schema; documents can differ in size and content.

Flexible schemas allow you to quickly adapt your database according to changing business requirements. MongoDB has aligned its data structures to those of popular programming languages, enabling your developers to do more with less code.

JSON documents allow you to store related data together, which leads to faster queries, as you don’t have to join different collections to retrieve data. For example, consider the following JSON where you embed all the relevant details of an order:

{ 
"order_id": "1234", 
"transaction_id": "1234", 
"customer_id": "1234" 
}

Storing the same data in a normalized SQL database may require three different tables: orders, transactions, and customers.

A quick look at MongoDB architecture

In the MongoDB world, a record is known as a document. A collection is a group of documents; it’s the NoSQL equivalent of a table. A database contains different collections. A collection can only be present in one database. A MongoDB instance can have multiple databases, and for each database, a new set of files is created on the file system.

Organizing data as documents allows you to store different kinds of data in the same collection. This gives your developers the luxury of implementing new features without making structural changes in the database. They can add, remove, or update fields in a JSON document without changing anything in the corresponding collection or affecting the existing documents.

If you want the documents of a collection to have the same structure, you can use MongoDB’s JSON schema validation. This allows you to enforce validation rules for your documents, such as which fields are allowed, value ranges, and more.

JSON documents enable you to structure your data as per your needs. You can structure data as key-value pairs, rich objects, time series data, or as the edges and nodes of a graph. MongoDB stores JSON documents as binary JSON (BSON), a binary representation of JSON, which makes it easier to traverse, compare, and sort data.

You can build a fault-tolerant, distributed MongoDB architecture using replica sets and native sharding. Replica sets allow you to create as many as 50 copies of your data. Native sharding allows you to distribute your data across different machines. It can be achieved in three ways:

Ranged sharding, which distributes documents based on the shared key value
Hashed sharding, in which documents are partitioned based on a hash of the shared key value
Zoned sharding, which allows developers to specify rules for data partitioning

MongoDB vs. MySQL

MongoDB and MySQL represent two radically different ways of storing, processing, and reading data. The following table compares the two database technologies in detail.

	MongoDB	MySQL
Type	NoSQL	Relational database
How is data stored	JSON documents	As rows in a table
Support for storing JSON documents	Available and optimized	Available but with overhead
Schemas	Flexible	Rigid
How are records grouped	Collections	Tables
Query language	MongoDB query language (MQL)	Standard Query Language (SQL)
Dynamic schema with optional validation	Supported	Not supported
Typed data	Supported	Supported
Programmer-friendly	Dealing with JSONs makes feature building, debugging, and change management easier.	Relatively less friendly, as programmers have to be more conscious of table schema while building features and making changes
Fit for complicated data structures and transactions	Not a good fit, e.g. multi-object transactions are not supported	A good fit, e.g. multi-object transactions are supported
Joins	Typically not required	Typically required
Auto sharding	Supported	Not supported
Data representation	A document looks like this: { "name": "Adam", "age": 32, "profession": "doctor" }	A record looks like this: Name age profession Adam 32 doctor
Options for scalability	Plentiful, available out-of-the-box	Limited
Security	Various security controls available	Various security controls available

Why is it important to monitor MongoDB?

There are a few reasons why monitoring your MongoDB servers is crucial to the success of your organization:

Ensure high availability

Many applications rely on the database server for loading and storing mission-critical data. In many ways, the database acts as the backbone of your entire infrastructure. Monitoring different performance metrics, such as database health, CPU and memory usage, and available connections allows you to ensure that MongoDB is available and in a healthy state.

Detect and resolve bottlenecks

Long-running queries, idle connections, and misconfigurations can all cause a MongoDB instance to slow down, or even crash. Continuous monitoring will allow you to detect bottlenecks like unoptimized queries, unindexed collections, and more.

Predict malfunctions

Monitoring performance and health metrics, like disk space usage, query execution time, and pending queries equips you to predict and fix potential malfunctions. For example, if a sudden spike in disk utilization is coinciding with an unexpected rise in the query execution time, you can surmise that something is wrong with the database.

Perform capacity planning

By studying real-time and historical performance metrics, and analyzing how MongoDB’s workload changes over time, you can understand current capacity and scale as necessary. For example, if you observe that MongoDB performance regularly drops during peak hours, you can add more nodes to your cluster.

Assess optimization activities

Monitoring also enables you to assess the impact of optimization activities, such as increasing CPU cores, tweaking configuration parameters, adding indexes to collections, or optimizing queries. For example, you can see if the average query execution time improves after you increase the max connections limit.

Key metrics for monitoring MongoDB

Focus on the following metrics to get a clear idea of how a MongoDB instance is performing:

Database status and health metrics

Keeping an eye on the database status and health is a primary requirement of performance monitoring. If an instance is unresponsive or not accepting new requests, it warrants immediate investigation. MongoDB offers a few built-in functions and commands to check status in real time.

The db.serverStatus() function returns a JSON document containing detailed information on the current status of the database engine. It includes statistics related to uptime, connections, queues, active clients, and more. For example, the following section of a sample output shows metrics related to transactions:

… 
"transactions" : { 
   "retriedCommandsCount" : 0,
   "retriedStatementsCount" : 1, 
   "transactionsCollectionWriteCount" : 12,
   "currentActive" : 2, 
   "currentInactive" : 0, 
   "currentOpen" : 2, 
   "totalAborted" : 12, 
   "totalCommitted" : 12, 
   "totalStarted" : 15, 
   "totalPrepared" : 13, 
  "totalPreparedThenCommitted" : 12, 
   "totalPreparedThenAborted" :  1, 
   "currentPrepared" :  0 
}, 
…

The db.stats() function returns several useful statistics regarding server state, including objects, indexes, total size, and cluster time. Consider the following sample output:

{ 
  db: 'sample', 
  collections: 22, 
  views: 1, 
  objects: 1649, 
  avgObjSize: 51.5632741267, 
  dataSize: 811.70, 
  storageSize: 80, 
  freeStorageSize: 12, 
  indexes: 1, 
  indexSize: 100, 
 indexFreeStorageSize: 26, 
  totalSize: 226, 
  totalFreeStorageSize: 60, 
  scaleFactor: 1024, 
  fsUsedSize: 6015, 
  fsTotalSize: 612554, 
  ok: 1, 
  '$clusterTime': { 
    clusterTime: Timestamp({ t: 1664442195, i: 1 }), 
    signature: { 
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0), 
      keyId: Long("0") 
	} 
  },
 operationTime: Timestamp({ t: 1664442195, i: 1 }) 
}

Client and connection metrics

Monitoring connection statistics is another great way of gauging performance in real time. The output of the db.serverStatus() function also returns a section on connections:

… 
"connections" : { 
            	"current" : 12, 
            	"available" : 500, 
            	"totalCreated" : 24, 
            	"active" : 1, 
            	"exhaustIsMaster" : 0, 
            	"exhaustHello" : 0, 
                	"awaitingTopologyChanges" : 0 
        }, 
…

The current field represents the total number of connected clients, whereas available indicates the number of unused connections that the database can use. totalCreated is a count of the total number of connections that the database has created since start-up. The active field shows the connected clients currently executing operations on the server.

The globalLock section of the db.serverStatus() output also has information related to active clients.

… 
"globalLock" : { 
           	... 
   "activeClients" : { 
      "total" : 3, 
      "readers" : 2, 
      "writers" : 1 
   } 
},
…

The readers field represents the number of active client connections performing read operations, whereas the writers field indicates active client connections performing write operations.

Database operation metrics

It’s necessary to monitor the rate at which different database operations are being performed. The db.serverStatus() function will help us here as well. It returns a section on opcounters.

… 
"opcounters" : { 
            	"insert" : 40, 
            	"query" : 20, 
            	"update" : 20, 
            	"delete" : 11, 
            	"getmore" : 9, 
            	"command" : 100 
        },
…

Measuring and analyzing the counts of different database operations, like insert, query, update, and delete, allows you to spot bottlenecks and identify improvement avenues. For example, if database slowdown coincides with an exponential increase in inserts, you can deduce that inserts are creating a performance bottleneck.

System metrics

MongoDB, like any other database service, is resource-intensive. To optimize MongoDB for maximum performance, it’s vital to run it in an environment with substantial resources, regardless of whether it’s hosted on the cloud, running inside a Kubernetes cluster, or present on-site.

Monitoring system metrics, like disk utilization, disk latency, memory, and CPU usage will help you determine the optimal resource configuration. For example, if you are noticing high disk latency (> 500ms), it means that the storage layer (file system) is impacting performance.

Replication metrics

Replication is one of the main characteristics of a MongoDB architecture that sets it apart from other database engines. To ensure that your cluster is maintaining high levels of data integrity, it’s important to monitor replication metrics, like replication lag and replication headroom.

The rs.status() function returns the information on the node’s state, such as the time stamp of the last committed operation and the heartbeat interval.

… 
"myState" : 1, 
 "term" : NumberLong(1), 
 "term" : NumberLong(1), 
 "heartbeatIntervalMillis" : NumberLong(2000), 
 "optimes" : 
{ "lastCommittedOpTime" : 
 { "ts" : Timestamp(0, 0), 
 "t" : NumberLong(-1) 
 } 
…

The rs.printReplicationInfo() and rs.printSecondaryReplicationInfo() also return useful replication metrics, like total oplog size, used size, and the time difference between the first and last operation in the oplog.

Metrics related to queues

All the read and write operations waiting to acquire a lock are placed in queues. Longer queues may indicate competing writing paths or unoptimized queries. Monitoring queue-related metrics can enable you to identify performance inhibitors and resolve contention.

The first part of the globalLock section of the output of db.serverStatus() has the currentQueue document. This document includes the readers and writers fields, which represent the currently queued read and write operations, respectively.

… 
        "globalLock" : { 
            	"totalTime" : 945311000, 
            	"currentQueue" : { 
                    	   "total" : 3, 
                    	   "readers" : 3, 
                    	   "writers" : 0 
            	}, 
        	… 
        }, 
…

Total indexes

Indexes are data structures that can speed up data retrieval from a database. Index-related metrics let you track performance and identify potential areas for improvement. For example, the metrics.queryExecutor.scanned attribute from the db.serverStatus() output shows a count of the index items that were scanned during query evaluation.

In a well-indexed database, this number should be high. If it isn’t, or it’s zero, you are not leveraging the power of indexes to improve performance. The dbStats() function also returns useful information related to indexes, including the number of total indexes and their collective size.

How to view currently running queries in MongoDB

If your database is slowing down, the first thing you should do is check the queries that are currently being executed. MongoDB offers a convenient way of viewing all in-progress operations: using the db.currentOp() function.

The function also takes an optional argument that either takes a Boolean or a document. If you specify true, the output will include system operations and operations being performed on idle connections. You can also pass to the function a document with query conditions to get a list of filtered operations that match the specified conditions.

For example, calling db.currentOp( { "$all": true } ) or db.currentOp(true) will have the same effect, which is to return all operations. A typical output of the function includes the operation, client name, connection id, and duration of execution.

What is profiling on MongoDB?

Profiling is an approach to analyze the performance of a database instance. The MongoDB profiler collects and shows detailed information about database commands, including Create, Read, Update, and Delete (CRUD) operations, cursor operations, configuration commands, and administration commands, and writes all aggregated data to a collection, known as system.profile.

By default, profiling is disabled. To enable it, call the db.setProfilingLevel( <level> ) function with the appropriate level as an argument.

db.setProfilingLevel(1) will enable profiling for operations that take longer than slowms or that match a filter. The slowms setting allows you to set a threshold for slow operations.
db.setProfilingLevel(2) will enable profiling for all operations.

It’s worth mentioning that enabling the profiler will have a performance overhead, especially if you set the level to 2. It’s recommended to set the level to 1 and specify a suitable slowms value.

Query profiling data is stored in the system.profile collection to identify long-running queries or commands. For example, to get the last 100 entries to the collection, use the following command:

db.system.profile.find().limit(100).sort( { ts : -1 } ).pretty()

To only get profiling data related to a specific collection, use this query:

db.system.profile.find( { ns : 'mydb.collection_name' } ).pretty()

Use the following query to show operations that took more than 10 milliseconds to complete:

db.system.profile.find( { millis : { $gt : 10 } } ).pretty()

Useful commands and queries for MongoDB performance tuning

Optimizing MongoDB using the correct configuration parameters enables you to achieve maximum throughput. The following cheat sheet contains several commands and queries that can help you tune your MongoDB instance.

Command/query	Description
show profile	Show the last 5 operations that took more than 1 millisecond to complete
db.collection.createIndex()	Create an index on a collection
<command>.limit(10)	Create an index on a collection
db.createCollection( "sample", {storageEngine:{wiredTiger:{configString:'block_compressor=zlib'}}} )	Create a new collection with zlib compression enabled.
mongod --wiredTigerCacheSizeGB <desired_size>	Start the MongoDB instance with the WiredTiger cache enabled, and the desired size set. Ideally, the size shouldn’t exceed 50% of the RAM.
numactl --interleave	Configures a memory interleave policy. To be used when running on systems with non-uniform memory access (NUMA) hardware.
db.setProfilingLevel(1, { slowms: 10 })	Enable profiling for all operations that take more than 10ms.
blockdev --setra <value>	Set the read-ahead block size between 8 and 32. Read-ahead improves file reading performance.
db.serverStatus().locks	Get detailed information regarding locks
db.serverStatus().mem	Get information regarding memory usage

Introducing native MongoDB monitoring tools

MongoDB comes with two built-in utilities for performance monitoring: mongostat and mongotop. As of version 4.0, it also offers free cloud monitoring.

Mongostat periodically displays the count of database operations by type, including insert, update, query, and delete. It can be run directly from the system command line, and has the following syntax:

mongostat <options> <connection string> <polling interval, specified in seconds, defaults to 1>

Mongotop provides a means to monitor the read and write activity of a MongoDB instance, on a per-collection basis, in real time. It can also be run directly from the system command line and has a similar syntax to mongostat:

mongotop <options> <connection string> <polling interval, specified in seconds, defaults to 1>

The free monitoring module provides basic performance metrics for your architecture, including memory and CPU usage, operation counts, and execution times. Use the db.enableFreeMonitoring() and db.disableFreeMonitoring() functions to enable or disable it, respectively.

Monitoring MongoDB performance using native tools

In the following sections, we will learn how to track performance using the native monitoring tools we covered in the last section.

Using mongostat

Mongostat returns real-time statistics regarding core database functions, which are periodically refreshed. Some of the fields included in the output are: inserts, queries, updates, deletes, getmore, command, flushes, percentage of used cache (used), and resident memory (res).

To return data every 3 seconds, for 30 seconds, use the following command:

mongostat -n=30 3

Consider the following sample output:

insert query update delete getmore ….  time 
    *3	*2 	*0     *1  	1   …. Sep 28 12:20:11.121

You can also add any fields from the output of the ServerStatus() function to mongostat. For example, the following command adds two fields to the output of mongstat:

mongostat -O=' connections.totalCreated,network.bytes=network bytes'

Using mongotop

Mongotop is an excellent production monitoring tool that lets you visualize the read and write activities of all your collections. To refresh data every 15 seconds, use the following command:

mongotop 15 --uri='mongodb://mongodb1.sample.com:27000'

Expect an output like the following:

            	ns		total	read    write	
      	test.myDb 		100ms   0ms    100ms 
    admin.system.roles  	0ms 	0ms      0ms 
    admin.system.users  	0ms 	0ms      0ms 
  admin.system.version  	0ms 	0ms      0ms 
 config.system.sessions  	0ms     0ms  	 0ms 
  local.startup_log  		0ms 	0ms      0ms 
  local.system.replset  	0ms 	0ms      0ms

Using free monitoring

When you enable free monitoring, MongoDB provides you with a unique URL where monitoring data can be viewed. If you forget the URL, you can retrieve it by calling the db.getFreeMonitoringStatus() function, which returns it in the url parameter.

The db.serverStatus() function also returns some free monitoring fields inside the freeMonitoring document.

Conclusion

MongoDB is a production-ready, cloud-first database technology trusted by thousands of organizations. Monitoring MongoDB instances is vital to keep CPU and memory usage in check, achieve high availability, and improve performance. This article aimed to equip you with all the knowledge needed to monitor a MongoDB instance; we hope it has served its purpose.

Guide to MongoDB Monitoring