Horizontal Scaling In MongoDB With Replication
- Date
- Authors
- Name
- Mehdi Hadeli
- @mehdihadeli
Introduction
When our application grows, we have more features, and over time we have more users and more load, and as a result, our application stores a huge amount of data for these numbers of users, and we need more processing power for processing the queries to this database server. With growing our database, we can't keep all data on one database server because, after some time, it will be a bottleneck of our system due to the huge amount of queries and heavy load on it beside our application server, and at the end, it will slow down our application or cause it to not respond. Also, it is possible our database server is not available because of some networking problem or because our server is down. So it is not trustworthy to have just one database server for an application with a high load request. For solving this problem, we can scale our database
vertically or horizontally or both.
Scaling Database Vertically
In scaling the database vertically, we increase the capacity of a single server, like resource power for our database server, like ram, cpu, disk,....
Scaling database vertically is a quick
and easy solution
when you have a smaller database, but this approach has some limitations:
- We have some
limitation
inincreasing resource power
like cpu and ram on a single machine, also, most cloud-based providers have hardware ceilings for a single server. - It is more expensive compare to scaling system vertically in a large scale.
- Vertical scaling introduces a
single point of failure
, because we have justone server
ormachine
, so if the server was not available for any reason like networking or crashing we get error for all of our requests.
At the end it is a tradeoff to use vertical scaling because it is simple
and doesn't have infrastructures complexity
, and can be use for applications with small database and less traffic.
Horizontal Scaling
In horizontal scaling
we add some node or server to our cluster and distribute the work load to all nodes to decrease pressure and removing single point of failure on one server. Each node or server has less resource power compare to a single high-speed high-capacity server but system work load will spread between all nodes and each node is reposing for handling subset of overall work load and providing better efficiency than a single high-speed high-capacity server. Adding more node will increase our power capacity through less powerful servers or nodes and it has lower cost compare to have 1 high-speed high-capacity server. But be careful because it increase the complexity of our database architecture, so just use it whenever you really need it and you have a high load database.
There are two commonly used horizontal database scaling techniques: replication
and horizontal partitioning
or sharding
. MongoDB is a document-based database that supports both of these.
In this blog post
I will talk about horizontal scaling with Replication and in the next blog post
, I will talk about horizontal scaling with Sharding.
Horizontal Scaling with Replication
In replication, we replicate
or copy
our database data to other database servers or nodes. Replication provides redundancy and increases data availability, adding fault tolerance to our system. If a server or node fails, the cluster will not be affected that much because other nodes can serve client requests.
Replication increases read speed for the clients because we have different nodes and each node has a copy of the database and can handle the request. With having multiple replica of our data on different node we have more locality for different clients and and with setting nearest replication strategy for mongo configuration and sending a read request to the cluster, the request will route to the nearest and more local node database. With increasing nodes we can serve more read request and scale or database.
A set of replicated nodes or mongod processes that have the same copy of data set is called a replica set
. In MongoDB, replication
performs through a replica set
, also replica set is required if we need transactions
, change streams
, or accessing the oplog. We can convert a standalone Self-Managed mongod to a replica set which is useful for testing and development to achieve replica set feature like transaction. A standalone instance isn't a good choice for a production deployment because it can be a single point of failure.
A replica set contains several data bearing nodes
and optionally one arbiter node. Among data bearing nodes in the replica set we should have just one primary node and other nodes are secondary nodes.
For a replica set recommendation is using a replica set with three members. A three member replica set can have either three data-bearing members containing one primary node
and two secondary nodes
which is recommended and if we have some limitation like cost for adding third data bearing member we can use two data-bearing members and add a mongod instance to a replica set as an arbiter, actually one primary node
, one secondary node
and one Arbiter
. An arbiter
participates in elections but does not hold data.
In a replica set the primary node
receives all write operations
and then primary node
will confirm writes operations using write concern like w: "majority"
with acknowledge write on primary node and secondary nodes according type of write concern (in w: "majority"
write concern we need to get acknowledgment from the majority of nodes which means half of nodes include primary node). MongoDB applies write operations on the primary node
and then records the operations on the primary's oplog. The oplog (operations log) is a special collection that store all operations that modify the data in the database in the primary's oplog
. For example if our write for changing data is successful, MongoDB store some operation logs entries for this change in the primary's oplog collection but if write operation don't modify any data or fail, MongoDB don't create oplog entries.
After storing oplogs in the primary node's oplog, Secondary nodes copy, replicate and apply primary's operations (oplog) in an asynchronous process to their data sets independently
and async from primary node. All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database. Replication for secondary nodes are asynchronous as default in MongoDB and secondary nodes are eventual consistency and will sync with primary node after some times. Asynchronous replication to secondaries means that reads from secondaries in some read preferences like secondary
may return data that does not reflect the state of the data on the primary.
MongoDb uses write concern in write operations in primary nodes for write acknowledgment by nodes and default write concern after MongoDB v5 is w: "majority"
which means write operations should ack by most of nodes and acknowledge the write operation by primary node. In a replica set with n nodes the majority means more than half of cluster nodes plus 1 : [n/2] + 1
.
In a primary node
with w: "majority"
or w: "majority", j: false
write concern (which is default), for acknowledging write operations for other majority of secondary nodes, write concern uses the journal status
or j
. If we set j: false
property on write concern or just use w: "majority"
write concern without setting journal
option (which is default behavior), the primary node doesn't wait for the majority of secondary nodes to completely persist writes
in their own data set
and 'disk', and the primary node gets acknowledgement from secondary nodes when write is present in the 'oplog' and does not fully persist on the disk.
If we want to ensure data and writes fully persist on disk in majority of secondary nodes for achieving strong and immediate consistency
between writes in primary node
and majority of secondary nodes, we should set the journal
option to true
in the write concern, like j: true
. With this approach, after writing data on the primary node, it waits for the majority of secondary nodes' acknowledgement, and this makes our primary node's write operation completely synchronous and very slow, but we can achieve consistency in time. Also, if we want to explicitly specify how many number of secondary nodes should persist data, we can use w: <n>
option of write concern
and explicitly passing the number of the nodes, like 'w:3'.
Note: The process of copying and replicating oplogs from primary nodes by secondary nodes is completely asynchronous, and they are autonomous and independent of the write operation, but write concern for acknowledge write to the secondary nodes will check the state write and oplogs on secondary nodes periodically.
In this three members replica set primary node accepts all write operations, afterwards the secondaries will replicate and apply these oplog to their local oplog
and update their current state of the database.
Although clients cannot write data to secondaries but All members in the replica set can accept client read request for example, secondary nodes can accept the read request, but in MongoDB the default Read Preference is primary
which means all read requests should be served by primary node
. We can change this read preference based on our needs; for example, change the default to the 'nearest', and in this configuration, client read will route to the nearest node, irrespective of whether that node is primary or secondary, and based on calculating a specified latency threshold
for that client.
All read preference modes except primary
may return none fresh data
because secondaries replicate operations from the primary in an asynchronous
process as default.
Here is an example for reading from secondary nodes instead of primary node:
If the primary server or node is down in term of a crash o system failure, one of the secondary nodes becomes new primary node through an election process in the replica set. If failed primary node comes back online again it becomes a secondary node in th e replica set and helping to existing primary node.
For example in this three members replica set, when our primary node becomes unavailable by a failure or crashing, MongoDB holds an election in the replica set and one of these secondary nodes becomes primary.
Setup Replica Set
For most of use cases a replica set with there member is enough for surviving our system during system failures. Replica set always should have odd members, this ensures that elections process works better.
Beside of achieving Replication and create redundancy of nodes with replica set, for some MongoDB features like transactions
, change streams
, or accessing the oplog we need a Replica.
For creating replica set for a cluster of nodes we have three solution:
- Self-Managed replica set in the Terminal with hosting mongod instances separately on each machine, which are a node in the replica set and connecting multiple mongod instances on different machines together on the replica set.
- Using Docker for creating replica set and hosting different mongod instances as nodes for the replicate and use docker network for connecting nodes together
- Using Docker-Compose for creating replica set and easier setup and hosting different mongod instances as nodes for the replicate and use docker-compose network for connecting nodes together
- Using kubernetes for creating replica set cluster and connecting different mongod pods to this replica set
Self-Managed Replica Set in the Terminal
Initialize Replica Set Nodes Dedicated Mongod Instances In Self-Managed Replica Set
Before we can deploy a Self-Managed
replica set in the terminal and hosting mongod instances separately on each machine, we need to install MongoDB on each machine that will be part of your replica set and after installation MongoDB we have assess to mongod execution file and we can run and host mongod instance
on each machine. In production, we should host each mongod instance
on a separated node in the replica set and in an isolated environment and machine for increasing the reliability of systems in case of failure or crashing in each node because nodes are on different machines. In our local system we can host all of these mongod instances
in the same machine just for development and test purpose.
After installing MongoDB on the production nodes machine or installing in our local machine we should verify for existing mongod
command. In the windows we should add mongod
path to windows environment:
# add this path to with Path environment variables
C:\Program Files\MongoDB\Server\7.0\bin
After that we have access to mongod
command, we cab verify that with mongod -v
. For running mongod
instance without specifying --dbpath
argument the default --dbpath is \data\db\
on linux and C:\data\db\
on windows but we should have C:\data\db\
folder before running mongod instance, now with running mongod it will host on the default MongoDB port which is 27017
and we can connect to this mongod host via our client like RoboMongo or JetBrains DataGrip.
For each node and its mongod
instance in the replica set we need a data folder to store data sets. I want to crease a replica set with three members so for each node I create a data folder like this:
DB1_DATA_DIR="./data/mongodb1"
DB2_DATA_DIR="./data/mongodb2"
DB3_DATA_DIR="./data/mongodb3"
# Log directory
DB1_LOG_DIR="./log/mongodb1"
DB2_LOG_DIR="./log/mongodb2"
DB3_LOG_DIR="./log/mongodb3"
# Create data and log directories if they do not exist
mkdir -p "$DB1_DATA_DIR" "$DB2_DATA_DIR" "$DB3_DATA_DIR"
mkdir -p "$DB1_LOG_DIR" "$DB2_LOG_DIR" "$DB3_LOG_DIR"
Then we need to start our three mongod instances on three nodes on different machine in the production for handling recovery, reliability of system in case of failure or crashing, in the local machine I start three mongod instance on different port. The --fork option is not for MongoDB for Windows users and you must execute every mongod command in different window terminals.
First we create the primary node
with a dedicated mongod
instance and if we want to use mongod with passing config
we can use mongod1.conf
:
mongod --port 27018 --noauth --dbpath ./data/mongodb1 --replSet mongo-replicaset --bind_ip localhost,mongod1
# Or running mongod with passing config file instead of command arguments
mongod --config ./configs/mongod1.conf
# https://www.mongodb.com/docs/manual/reference/configuration-options/#use-the-configuration-file
storage:
dbPath: ./data/mongodb1
systemLog:
# destination: file # Log to the file
# path: ./logs/mongodb1/mongod.log
logAppend: true
net:
port: 27018
bindIp: localhost,mongod1
replication:
replSetName: mongo-replicaset
security:
authorization: disabled
After creating primary node for our replica set we should create our secondary nodes, here because we have a three members replica set which is recommended we should create two secondary nodes
with dedicated mongod instances.
Starting first secondary node and second mongod
instance:
mongod --port 27019 --noauth --dbpath ./data/mongodb2 --replSet mongo-replicaset --bind_ip localhost,mongod2
# Or running mongod with passing config file instead of command arguments
mongod --config ./configs/mongod2.conf
And if we want to use mongod with passing config we can use mongod2.conf
config:
# https://www.mongodb.com/docs/manual/reference/configuration-options/#use-the-configuration-file
storage:
dbPath: ./data/mongodb2
systemLog:
# destination: file
# path: ./logs/mongodb2/mongod.log # Log to file
logAppend: true
net:
port: 27019
bindIp: localhost,mongod2
replication:
replSetName: mongo-replicaset
security:
authorization: disabled
Starting Second secondary node and third mongod
instance:
mongod --port 27020 --noauth --dbpath ./data/mongodb3 --replSet mongo-replicaset --bind_ip localhost,mongod3
# Or running mongod with passing config file instead of command arguments
mongod --config ./configs/mongod3.conf
And if we want to use mongod with passing config we can use mongod3.conf
config:
storage:
dbPath: ./data/mongodb3
systemLog:
# destination: file # Log to file
# path: ./logs/mongodb3/mongod.log
logAppend: true
net:
port: 27020
bindIp: localhost,mongo3
replication:
replSetName: mongo-replicaset
security:
authorization: disabled
Some of mongod arguments that we used as configuration for starting mongod instance:
- --replSet: Specify a replica set name and here
mongo-replicaset
as an argument to this set. All hosts in the replica set must have thesame set name
. - --bind_ip: The hostnames and/or IP addresses and/or full Unix domain socket paths on which
mongod
should listen forclient connections
. In this case, the mongod instance binds to both thelocalhost
and the hostnamemongod1
which is associated with the IP address for first mongod instance. - --port: The TCP port on which the MongoDB instance (mongod) listens for client connections.
27017
is the default port for MongoDB. - --noauth: Disables authentication and it Currently the default. for simplicity we used
noauth
but for production use you should use auth based mechanism. - --dbpath: The directory where the mongod instance stores its data. Default folder path on Linux and macOS is
/data/db
and for windows isC:\data\db\
. - --logpath: Sends all diagnostic logging information to a log file instead of to standard output or to the host's syslog system.
More information about mongod argument can be found here and more information about configuration options can be found here
If we want to start all mongod instances in the same windows terminal
, I created this bash script for running the three members replica set easier:
#!/bin/bash
DB1_PORT=27018
DB2_PORT=27019
DB3_PORT=27020
# Data directory
DB1_DATA_DIR="./data/mongodb1"
DB2_DATA_DIR="./data/mongodb2"
DB3_DATA_DIR="./data/mongodb3"
# Log directory
DB1_LOG_DIR="./log/mongodb1"
DB2_LOG_DIR="./log/mongodb2"
DB3_LOG_DIR="./log/mongodb3"
REPLICA_SET="${REPLICA_SET_NAME:-mongo-replicaset}"
# Create data and log directories if they do not exist
mkdir -p "$DB1_DATA_DIR" "$DB2_DATA_DIR" "$DB3_DATA_DIR"
mkdir -p "$DB1_LOG_DIR" "$DB2_LOG_DIR" "$DB3_LOG_DIR"
mongod --noauth --dbpath ${DB1_DATA_DIR} --port ${DB1_PORT} --bind_ip "localhost,mongod1" --replSet "$REPLICA_SET" & MONGOD1_PID=$! # --logpath ${DB1_LOG_DIR}/mongod.log
mongod --noauth --dbpath ${DB2_DATA_DIR} --port ${DB2_PORT} --bind_ip "localhost,mongod2" --replSet "$REPLICA_SET" & MONGOD2_PID=$! # --logpath ${DB2_LOG_DIR}/mongod.log
mongod --noauth --dbpath ${DB3_DATA_DIR} --port ${DB3_PORT} --bind_ip "localhost,mongod3" --replSet "$REPLICA_SET" & MONGOD3_PID=$! # --logpath ${DB3_LOG_DIR}/mongod.log
# Wait for MongoDB processes to finish
wait $MONGOD1_PID
wait $MONGOD2_PID
wait $MONGOD3_PID
Initialize the Replica Set In Self-Managed Replica Set
After initializing replica set nodes with their dedicated Mongod instances and staring them. In production from the same machine where one of the mongod instances is running and we want to make that node a primary node
and in this casemongod1
, we should run mongosh; if it is not installed on the machine, install the mongosh tool first. With running mongosh
without any argument in the machine that has a running mongod instance, it will connect to this running mongod that is listening to localhost
on the default port of 27017
. In the local machine, all of our mongod instances
are hosted on the same machine, and all mongod instances are listening to localhost
but on a different ports (27018
, 27019
, 27020
).
Run mongosh
without any options will connect mongod default port which is 27017
:
mongosh
This is equivalent to the following command:
mongosh "mongodb://localhost:27017"
But our replica set nodes are on none default port, and our primary node
port is 27018
so for connecting to this none default port we can use this command in mongosh
:
mongosh "mongodb://localhost:27018"
# OR
mongosh --port 27018
For connecting to a remote host in production on the other machine and none default port, you can use something like:
mongosh "mongodb://mongod2.example.com:27018"
# OR
mongosh --host mongod2.example.com --port 27018
After connecting to our primary node
with mongod1
host address and 27018
port using mongosh, and inside mongosh
on primary node
and mongod1
for initializing replica set we should run rs.initiate(). This method Initiates a replica set with passed configurations. We should run rs.initiate()
just on the primary node in the replica set.
Now we should pass correct configurations to the initiate
method to initiating the replica set. Two important configurations are:
- _id: The name of the replica set. Here our replica set name is
mongo-replicaset
. - members: An array of member configurations in the replica set. Here we have a replica set with three members.
- members[n]._id: An integer identifier for the member in the replica set, unique among all members. values may be any integer value greater than or equal to 0.
- members[n].host: The hostname and, if specified, the port number, of the set member. The hostname name must be resolvable for every host in the replica set.
# for local machine with same machine and multiple mongod instances
rs.initiate( {
_id : "mongo-replicaset",
members: [
{ _id: 1, host: "localhost:27018" },
{ _id: 2, host: "localhost:27019" },
{ _id: 3, host: "localhost:27020" }
]
})
# for production and separated machine for each node and mongod instance
rs.initiate( {
_id : "mongo-replicaset",
members: [
{ _id: 1, host: "mongod1.example.com:27018" },
{ _id: 2, host: "mongod2.example.com:27019" },
{ _id: 3, host: "mongod3.example.com:27020" }
]
})
After running this method MongoDB initiates a replica set for us and If the replica set initiate correctly, we get the following output:
{ ok: 1 }
After creating replica set we can see detail and status of created replica set inside of primary node mongod that we are connecting with mongosh
and to identify the primary
in the replica set. we can check a member is primary or secondary in the member stateStr:<>
property and it can be stateStr: 'PRIMARY'
or stateStr: 'SECONDARY'
.
rs.status()
rs.status() output:
{
set: 'mongo-replicaset',
date: ISODate('2024-09-04T14:39:55.321Z'),
myState: 1,
term: Long('1'),
syncSourceHost: '',
syncSourceId: -1,
heartbeatIntervalMillis: Long('2000'),
majorityVoteCount: 2,
writeMajorityCount: 2,
votingMembersCount: 3,
writableVotingMembersCount: 3,
optimes: {
lastCommittedOpTime: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
lastCommittedWallTime: ISODate('2024-09-04T14:39:52.042Z'),
readConcernMajorityOpTime: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
appliedOpTime: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
durableOpTime: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
lastAppliedWallTime: ISODate('2024-09-04T14:39:52.042Z'),
lastDurableWallTime: ISODate('2024-09-04T14:39:52.042Z')
},
lastStableRecoveryTimestamp: Timestamp({ t: 1725460772, i: 1 }),
electionCandidateMetrics: {
lastElectionReason: 'electionTimeout',
lastElectionDate: ISODate('2024-09-04T14:29:51.526Z'),
electionTerm: Long('1'),
lastCommittedOpTimeAtElection: { ts: Timestamp({ t: 1725460180, i: 1 }), t: Long('-1') },
lastSeenOpTimeAtElection: { ts: Timestamp({ t: 1725460180, i: 1 }), t: Long('-1') },
numVotesNeeded: 2,
priorityAtElection: 1,
electionTimeoutMillis: Long('10000'),
numCatchUpOps: Long('0'),
newTermStartDate: ISODate('2024-09-04T14:29:51.575Z'),
wMajorityWriteAvailabilityDate: ISODate('2024-09-04T14:29:52.098Z')
},
members: [
{
_id: 1,
name: 'localhost:27018',
health: 1,
state: 1,
stateStr: 'PRIMARY',
uptime: 858,
optime: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
optimeDate: ISODate('2024-09-04T14:39:52.000Z'),
lastAppliedWallTime: ISODate('2024-09-04T14:39:52.042Z'),
lastDurableWallTime: ISODate('2024-09-04T14:39:52.042Z'),
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
electionTime: Timestamp({ t: 1725460191, i: 1 }),
electionDate: ISODate('2024-09-04T14:29:51.000Z'),
configVersion: 1,
configTerm: 1,
self: true,
lastHeartbeatMessage: ''
},
{
_id: 2,
name: 'localhost:27019',
health: 1,
state: 2,
stateStr: 'SECONDARY',
uptime: 614,
optime: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
optimeDurable: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
optimeDate: ISODate('2024-09-04T14:39:52.000Z'),
optimeDurableDate: ISODate('2024-09-04T14:39:52.000Z'),
lastAppliedWallTime: ISODate('2024-09-04T14:39:52.042Z'),
lastDurableWallTime: ISODate('2024-09-04T14:39:52.042Z'),
lastHeartbeat: ISODate('2024-09-04T14:39:54.110Z'),
lastHeartbeatRecv: ISODate('2024-09-04T14:39:55.118Z'),
pingMs: Long('0'),
lastHeartbeatMessage: '',
syncSourceHost: 'localhost:27018',
syncSourceId: 1,
infoMessage: '',
configVersion: 1,
configTerm: 1
},
{
_id: 3,
name: 'localhost:27020',
health: 1,
state: 2,
stateStr: 'SECONDARY',
uptime: 614,
optime: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
optimeDurable: { ts: Timestamp({ t: 1725460792, i: 1 }), t: Long('1') },
optimeDate: ISODate('2024-09-04T14:39:52.000Z'),
optimeDurableDate: ISODate('2024-09-04T14:39:52.000Z'),
lastAppliedWallTime: ISODate('2024-09-04T14:39:52.042Z'),
lastDurableWallTime: ISODate('2024-09-04T14:39:52.042Z'),
lastHeartbeat: ISODate('2024-09-04T14:39:54.109Z'),
lastHeartbeatRecv: ISODate('2024-09-04T14:39:55.149Z'),
pingMs: Long('0'),
lastHeartbeatMessage: '',
syncSourceHost: 'localhost:27018',
syncSourceId: 1,
infoMessage: '',
configVersion: 1,
configTerm: 1
}
],
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1725460792, i: 1 }),
signature: {
hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
keyId: Long('0')
}
},
operationTime: Timestamp({ t: 1725460792, i: 1 })
}
Use rs.conf() to display the replica set configuration object:
rs.conf()
rs.conf() output:
{
_id: 'mongo-replicaset',
version: 1,
term: 1,
members: [
{
_id: 1,
host: 'localhost:27018',
arbiterOnly: false,
buildIndexes: true,
hidden: false,
priority: 1,
tags: {},
secondaryDelaySecs: Long('0'),
votes: 1
},
{
_id: 2,
host: 'localhost:27019',
arbiterOnly: false,
buildIndexes: true,
hidden: false,
priority: 1,
tags: {},
secondaryDelaySecs: Long('0'),
votes: 1
},
{
_id: 3,
host: 'localhost:27020',
arbiterOnly: false,
buildIndexes: true,
hidden: false,
priority: 1,
tags: {},
secondaryDelaySecs: Long('0'),
votes: 1
}
],
protocolVersion: Long('1'),
writeConcernMajorityJournalDefault: true,
settings: {
chainingAllowed: true,
heartbeatIntervalMillis: 2000,
heartbeatTimeoutSecs: 10,
electionTimeoutMillis: 10000,
catchUpTimeoutMillis: -1,
catchUpTakeoverDelayMillis: 30000,
getLastErrorModes: {},
getLastErrorDefaults: { w: 1, wtimeout: 0 },
replicaSetId: ObjectId('66d86ed4adf6232f3e0eea38')
}
}
The connection string to access this three member replica set
is:
mongodb://127.0.0.1:27018,127.0.0.1:27019,127.0.0.1:27020/?replicaSet=mongo-replicaset
In our replica set, we can stop the primary node and we can see how replica set elect a new primary node from existing secondary nodes.
Stopping primary node:
mongosh "mongodb://localhost:27018"
use admin
db.shutdownServer()
Connecting to one of remaining nodes and here it can be mongod2
on 27019
port for getting current status of replica set:
#
mongosh "mongodb://localhost:27019"
## Get current status replica set
rs.status()
rs.status()
output:
{
set: 'mongo-replicaset',
date: ISODate('2024-09-04T15:18:50.344Z'),
myState: 1,
term: Long('2'),
syncSourceHost: '',
syncSourceId: -1,
heartbeatIntervalMillis: Long('2000'),
majorityVoteCount: 2,
writeMajorityCount: 2,
votingMembersCount: 3,
writableVotingMembersCount: 3,
optimes: {
lastCommittedOpTime: { ts: Timestamp({ t: 1725463129, i: 1 }), t: Long('2') },
lastCommittedWallTime: ISODate('2024-09-04T15:18:49.708Z'),
readConcernMajorityOpTime: { ts: Timestamp({ t: 1725463129, i: 1 }), t: Long('2') },
appliedOpTime: { ts: Timestamp({ t: 1725463129, i: 1 }), t: Long('2') },
durableOpTime: { ts: Timestamp({ t: 1725463129, i: 1 }), t: Long('2') },
lastAppliedWallTime: ISODate('2024-09-04T15:18:49.708Z'),
lastDurableWallTime: ISODate('2024-09-04T15:18:49.708Z')
},
lastStableRecoveryTimestamp: Timestamp({ t: 1725463069, i: 1 }),
electionCandidateMetrics: {
lastElectionReason: 'stepUpRequestSkipDryRun',
lastElectionDate: ISODate('2024-09-04T15:17:19.591Z'),
electionTerm: Long('2'),
lastCommittedOpTimeAtElection: { ts: Timestamp({ t: 1725463033, i: 1 }), t: Long('1') },
lastSeenOpTimeAtElection: { ts: Timestamp({ t: 1725463033, i: 1 }), t: Long('1') },
numVotesNeeded: 2,
priorityAtElection: 1,
electionTimeoutMillis: Long('10000'),
priorPrimaryMemberId: 1,
numCatchUpOps: Long('0'),
newTermStartDate: ISODate('2024-09-04T15:17:19.616Z'),
wMajorityWriteAvailabilityDate: ISODate('2024-09-04T15:17:19.637Z')
},
electionParticipantMetrics: {
votedForCandidate: true,
electionTerm: Long('1'),
lastVoteDate: ISODate('2024-09-04T14:29:51.531Z'),
electionCandidateMemberId: 1,
voteReason: '',
lastAppliedOpTimeAtElection: { ts: Timestamp({ t: 1725460180, i: 1 }), t: Long('-1') },
maxAppliedOpTimeInSet: { ts: Timestamp({ t: 1725460180, i: 1 }), t: Long('-1') },
priorityAtElection: 1
},
members: [
{
_id: 1,
name: 'localhost:27018',
health: 0,
state: 8,
stateStr: '(not reachable/healthy)',
uptime: 0,
optime: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
optimeDurable: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
optimeDate: ISODate('1970-01-01T00:00:00.000Z'),
optimeDurableDate: ISODate('1970-01-01T00:00:00.000Z'),
lastAppliedWallTime: ISODate('2024-09-04T15:17:29.633Z'),
lastDurableWallTime: ISODate('2024-09-04T15:17:29.633Z'),
lastHeartbeat: ISODate('2024-09-04T15:18:49.492Z'),
lastHeartbeatRecv: ISODate('2024-09-04T15:17:34.203Z'),
pingMs: Long('0'),
lastHeartbeatMessage: 'Error connecting to localhost:27018 (127.0.0.1:27018) :: caused by :: onInvoke :: caused by :: No connection could be made because the target machine actively refused it.',
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
configVersion: 1,
configTerm: 1
},
{
_id: 2,
name: 'localhost:27019',
health: 1,
state: 1,
stateStr: 'PRIMARY',
uptime: 3077,
optime: { ts: Timestamp({ t: 1725463129, i: 1 }), t: Long('2') },
optimeDate: ISODate('2024-09-04T15:18:49.000Z'),
lastAppliedWallTime: ISODate('2024-09-04T15:18:49.708Z'),
lastDurableWallTime: ISODate('2024-09-04T15:18:49.708Z'),
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
electionTime: Timestamp({ t: 1725463039, i: 1 }),
electionDate: ISODate('2024-09-04T15:17:19.000Z'),
configVersion: 1,
configTerm: 2,
self: true,
lastHeartbeatMessage: ''
},
{
_id: 3,
name: 'localhost:27020',
health: 1,
state: 2,
stateStr: 'SECONDARY',
uptime: 2949,
optime: { ts: Timestamp({ t: 1725463129, i: 1 }), t: Long('2') },
optimeDurable: { ts: Timestamp({ t: 1725463129, i: 1 }), t: Long('2') },
optimeDate: ISODate('2024-09-04T15:18:49.000Z'),
optimeDurableDate: ISODate('2024-09-04T15:18:49.000Z'),
lastAppliedWallTime: ISODate('2024-09-04T15:18:49.708Z'),
lastDurableWallTime: ISODate('2024-09-04T15:18:49.708Z'),
lastHeartbeat: ISODate('2024-09-04T15:18:50.033Z'),
lastHeartbeatRecv: ISODate('2024-09-04T15:18:48.572Z'),
pingMs: Long('0'),
lastHeartbeatMessage: '',
syncSourceHost: 'localhost:27019',
syncSourceId: 2,
infoMessage: '',
configVersion: 1,
configTerm: 2
}
],
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1725463129, i: 1 }),
signature: {
hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
keyId: Long('0')
}
},
operationTime: Timestamp({ t: 1725463129, i: 1 })
}
We can also try to stop any of the secondary nodes and see how the replica set continues to work. With this way we can test availability and fault tolerance in the replica set.
We can use MongoDB single-node replica set
for development and test purpose to reduce resource consumption and also access to transactions
and change streams
due to having a replica set but in production we want to have multiple replica and nodes to achieve high availability and fault tolerance.
For a Single node replica set connection string is like this:
mongodb://127.0.0.1:27018/?replicaSet=mongo-replicaset
For easy-to-use initializing replica set, I've created a bash script, and you can use it.
#!/bin/bash
DB1_PORT=27018
DB2_PORT=27019
DB3_PORT=27020
LOCAL_HOST="${HOST:-localhost}"
REPLICA_SET="${REPLICA_SET_NAME:-mongo-replicaset}"
MEMBER_1="{\"_id\": 1, \"host\": \"${LOCAL_HOST}:${DB1_PORT}\", \"priority\": 2 }"
MEMBER_2="{\"_id\": 2, \"host\": \"${LOCAL_HOST}:${DB2_PORT}\", \"priority\": 0 }"
MEMBER_3="{\"_id\": 3, \"host\": \"${LOCAL_HOST}:${DB3_PORT}\", \"priority\": 0 }"
# mongosh using eval
mongosh "mongodb://${LOCAL_HOST}:${DB1_PORT}" --eval "
rs.initiate({
_id: '${REPLICA_SET}',
members: [
${MEMBER_1},
${MEMBER_2},
${MEMBER_3}
]
});
"
Using Docker for Creating Replica Set
Initialize Replica Set Nodes In Docker
For creating replicaset we can use docker, and at first we should create a docker-network because each node in replica set should have access to other nodes in the replica set by their ip or host name and in docker we can resolve this host name by container name when containers are in the same network. Actually during rs.initiate()
process for creating replica set inside of primary node we should have access to all nodes host name and their ip during resolve host name and docker network will resolve this addresses.
docker network create mongo-cluster-network
After creating a network and before running replicate set mongod instances, we don't need to create data and log folders that are required for running mongod instances, because they are created automatically through mapped volumes in the docker containers.
Now we should run each mongod instances for replicaset nodes separately:
docker run -i --rm -d \
--name "mongod1" \
--network mongo-cluster-network \
-p 27018:27017 \
-v "${PWD}/data/mongodb1:/data/db" \
-v "${PWD}/log/mongodb1":"/var/log/mongodb" \
-v "${PWD}/scripts:/scripts" \
mongo:latest \
--noauth --replSet mongo-replicaset --bind_ip localhost,mongod1 --port 27017
Here because I use WSL on windows
beside of docker-desktop
. For solving permission problem regarding current user in WSL, I added options = "metadata"
line to wsl config to configure local wsl settings per-distribution, we can also edit our global wsl config. You can edit the wsl distribution level config with using sudo nano /etc/wsl.conf
. Adding options = "metadata"
will fix folder and file permissions on WSL mounts so everything isn’t 777 all the time within the WSL mounts, For more Information read this article. Also we can pass --user $(id -u):$(id -g)
argument for running our docker container and this option run docker container with the same user ID (UID) and group ID (GID) as the current user in WSL and this user will use insider of container and this user has required read and write permissions, by default, Docker containers run as the root
user (UID 0
) inside the container, which can lead to permission conflicts
when accessing host-mounted files
or directories
.
For running all mongod instances
in our there nodes replica set and in the same windows terminal, I've created a bash script for running the three-member replica set easier and you can use it:
#!/bin/bash
DB1_PORT=27018
DB2_PORT=27019
DB3_PORT=27020
# Data directory
DB1_DATA_DIR="${PWD}/data/mongodb1"
DB2_DATA_DIR="${PWD}/data/mongodb2"
DB3_DATA_DIR="${PWD}/data/mongodb3"
# Log directory
DB1_LOG_DIR="${PWD}/log/mongodb1"
DB2_LOG_DIR="${PWD}/log/mongodb2"
DB3_LOG_DIR="${PWD}/log/mongodb3"
REPLICA_SET="${REPLICA_SET_NAME:-mongo-replicaset}"
# Create Docker network
NETWORK_NAME="mongo-cluster-network"
docker network create $NETWORK_NAME || echo "Network $NETWORK_NAME already exists."
# https://stackoverflow.com/questions/50608301/docker-mounted-volume-adds-c-to-end-of-windows-path-when-translating-from-linux
# https://nickjanetakis.com/blog/setting-up-docker-for-windows-and-wsl-to-work-flawlessly#ensure-volume-mounts-work
docker run -i --rm -d \
--name "mongod1" \
--network $NETWORK_NAME \
-p ${DB1_PORT}:27017 \
-v "${DB1_DATA_DIR}:/data/db" \
-v "${DB1_LOG_DIR}:/var/log/mongodb" \
-v "${PWD}/scripts:/scripts" \
mongo:latest \
--noauth --replSet "$REPLICA_SET" --bind_ip localhost,mongod1 --port 27017
docker run -i --rm -d \
--name "mongod2" \
--network $NETWORK_NAME \
-p ${DB2_PORT}:27017 \
-v "${DB2_DATA_DIR}:/data/db" \
-v "${DB2_LOG_DIR}:/var/log/mongodb" \
-v "${PWD}/scripts:/scripts" \
mongo:latest \
--noauth --replSet "$REPLICA_SET" --bind_ip localhost,mongod2 --port 27017
docker run -i --rm -d \
--name "mongod3" \
--network $NETWORK_NAME \
-p ${DB3_PORT}:27017 \
-v "${DB3_DATA_DIR}:/data/db" \
-v "${DB3_LOG_DIR}:/var/log/mongodb" \
-v "${PWD}/scripts:/scripts" \
mongo:latest \
--noauth --replSet "$REPLICA_SET" --bind_ip localhost,mongod3 --port 27017
# Stream logs from all containers
docker logs -f mongod1 &
docker logs -f mongod2 &
docker logs -f mongod3 &
# Wait for background log processes
wait
Initialize the Replica Set For Docker
After initializing replica set nodes with their dedicated Mongod instances and staring them. From the same container where one of the mongod instances is running and we want to make that node or container a primary node
and in this case mongod1
container, we should run mongosh;
First we should go inside of our primary node container that mongod instance
is running, because we want to make mongod1
container as primary we will go inside of container docker exec -it mongod1
, but we need to execute a command inside of this container so we run mongosh without any argument in this container that has a running mongod instance, it will connect to this running mongod that is listening to localhost
on the default port of 27017
.
docker exec -it mongod1 mongosh
# Or specify mongod host and port explicitly
docker exec -it mongod1 mongosh "mongodb://localhost:27017"
After connecting to the primary node and mongod instance with using mongosh we can initialize our replica set with using rs.initiate(), This method Initiates a replica set with passed configurations. We should run rs.initiate()
just on the primary node in the replica set. Inside of our primary node during rs.Initiate
we should access to all nodes host address and their corresponding ip address for creating replica set and because of using mongo-cluster-network
network in all of our container nodes we can resolve all container node host address with the container name.
Here is full scripts for initializing replica set on primary node container:
REPLICA_SET="${REPLICA_SET_NAME:-mongo-replicaset}"
MEMBER_1="{\"_id\": 1, \"host\": \"mongod1\", \"priority\": 2 }"
MEMBER_2="{\"_id\": 2, \"host\": \"mongod2\", \"priority\": 0 }"
MEMBER_3="{\"_id\": 3, \"host\": \"mongod3\", \"priority\": 0 }"
# mongosh using eval
docker exec -it mongod1 mongosh --eval "
rs.initiate({
_id: \"${REPLICA_SET}\",
members: [
${MEMBER_1},
${MEMBER_2},
${MEMBER_3}
]
});
"
# mongosh using js
# docker exec -it mongod1 mongosh /scripts/init-replica-set.js
If the replica initiated successfully, we get the following output:
{ ok: 1 }
For checking status and detail of our replica set we can use bellow command inside of each of nodes in the replica set like mongod1
:
docker exec -it mongod1 mongosh --eval "rs.status()"
The connection string to access this three member replica set
is:
mongodb://127.0.0.1:27018,127.0.0.1:27019,127.0.0.1:27020/?replicaSet=mongo-replicaset
Using Docker Compose for Creating Replica Set
For easier initialize replica sets, we can use a Docker Compose
file. With running docker compose file, all replica set nodes with their corresponding mongod instances
will run also docker compose initialize replica sets
automatically.
services:
mongod1:
image: mongo:latest
container_name: mongod1
ports:
- '${MONGOD1_PORT:-27018}:27017'
networks:
- mongo-cluster-network
# https://docs.docker.com/reference/compose-file/services/#healthcheck
# Getting health check logs --> docker inspect --format='{{ json .State.Health.Log}}' mongod1
healthcheck:
test: ['CMD', 'sh', '/scripts/init-replicaset.sh']
interval: 5s # Check every 5 seconds
retries: 20 # Retry up to 20 times
start_period: 0s # Wait 0 seconds before starting health checks
timeout: 30s # Timeout for each health check
start_interval: 1s
volumes:
- '${DB1_DATA_DIR:-${PWD}/data/mongodb1}:/data/db'
- '${DB1_LOG_DIR:-${PWD}/log/mongodb1}:/var/log/mongodb'
- '${PWD}/scripts:/scripts'
command: --noauth --replSet ${REPLICA_SET_NAME:-mongo-replicaset} --bind_ip localhost,mongod1 --port 27017
For our primary node container
which is mongod1
beaide of running mongod instances we used a healthcheck to initialize our replica set by using our init-replicaset.sh script. Inside of init-replicaset.sh
script because the healthcheck
and test
will run inside of mongo container we don't need to use docker exec -it mongod1
to go inside of the mongod1
container and we already are inside of mongod1
container because we are inside of MongoDB container we have access to mongosh. Now with using mongosh
inside of MongoDB container we can connect to existing mongod instance inside of this primary node container and then run rs.initiate() with passing required configurations for creating our replica set.
try {
rs.status();
print('Replica set already initialized');
} catch (err) {
print('Replica set not yet initialized, attempting to initiate...');
rs.initiate({
_id: \"${REPLICA_SET}\",
members: [
${MEMBER_1},
${MEMBER_2},
${MEMBER_3}
]
});
}
The healthcheck directive in mongod1
container is used to run a command that try to get current status
of replica set with using rs.status()
if replica set is not initialized yet, the rs.status()
will throw an exception and we will go to catch block. In catch block we try to initialize the replica set with rs.initiate()
but this operation can fail while the MongoDB instance is starting up or other mongod instances in the replica set nodes are not running yet and this failure causes our health check go to in the unhealthy
state, so we're using the healthcheck functionality to retry
the operation with some delays and intervals until it succeeds.
Here about retry mechanism in the healthcheck:
interval: 5s
: Runs healthcheck each 5 seconds.retries: 20
: Retries the process 20 times and after that consider that as a failure.start_period: 0s
: Wait 0 seconds before starting health checks.timeout: 30s
: 30 second Timeout for each health check- also which means we
With using this approach we encapsulated replica set initialization process inside of our mongod1
service and this container beside of running mongod instance is responsible for initialling of our replica set and we don't need a separated docker compose service for initialling replica set.
Using this approach we encapsulated the replica set initialization process inside of our mongod1
service, and this container besides running the mongod instance
is responsible for initializing the replica set
and we don't need a separate service for initializing the replica set and makes our setup simpler.
Here is docker compose configuration to run replica set mongod instances and initializing the replica set:
version: '3.9'
services:
mongod1:
image: mongo:latest
container_name: mongod1
ports:
- '${MONGOD1_PORT:-27018}:27017'
networks:
- mongo-cluster-network
# https://docs.docker.com/reference/compose-file/services/#healthcheck
# Getting health check logs --> docker inspect --format='{{ json .State.Health.Log}}' mongod1
healthcheck:
test: ['CMD', 'sh', '/scripts/init-replicaset.sh']
interval: 5s # Check every 5 seconds
retries: 20 # Retry up to 20 times
start_period: 0s # Wait 0 seconds before starting health checks
timeout: 30s # Timeout for each health check
start_interval: 1s
volumes:
- '${DB1_DATA_DIR:-${PWD}/data/mongodb1}:/data/db'
- '${DB1_LOG_DIR:-${PWD}/log/mongodb1}:/var/log/mongodb'
- '${PWD}/scripts:/scripts'
command: --noauth --replSet ${REPLICA_SET_NAME:-mongo-replicaset} --bind_ip localhost,mongod1 --port 27017
mongod2:
image: mongo:latest
container_name: mongod2
ports:
- '${MONGOD2_PORT:-27019}:27017'
networks:
- mongo-cluster-network
volumes:
- '${DB2_DATA_DIR:-${PWD}/data/mongodb2}:/data/db'
- '${DB2_LOG_DIR:-${PWD}/log/mongodb2}:/var/log/mongodb'
command: --noauth --replSet ${REPLICA_SET_NAME:-mongo-replicaset} --bind_ip localhost,mongod2 --port 27017
mongod3:
image: mongo:latest
container_name: mongod3
ports:
- '${MONGOD3_PORT:-27020}:27017'
networks:
- mongo-cluster-network
volumes:
- '${DB3_DATA_DIR:-${PWD}/data/mongodb3}:/data/db'
- '${DB3_LOG_DIR:-${PWD}/log/mongodb3}:/var/log/mongodb'
command: --noauth --replSet ${REPLICA_SET_NAME:-mongo-replicaset} --bind_ip localhost,mongod3 --port 27017
networks:
mongo-cluster-network:
driver: bridge
Here is our bash script for creating replica set that used by docker compose and healthcheck process:
#!/bin/bash
echo "Executing init replica set script."
REPLICA_SET="${REPLICA_SET_NAME:-mongo-replicaset}"
MEMBER_1="{\"_id\": 1, \"host\": \"mongod1:27017\", \"priority\": 2 }"
MEMBER_2="{\"_id\": 2, \"host\": \"mongod2:27017\", \"priority\": 0 }"
MEMBER_3="{\"_id\": 3, \"host\": \"mongod3:27017\", \"priority\": 0 }"
# Because we use healthcheck and test for the container we already are inside of 'mongod' container
mongosh --quiet --eval "
try {
rs.status();
print('Replica set already initialized');
} catch (err) {
print('Replica set not yet initialized, attempting to initiate...');
rs.initiate({
_id: \"${REPLICA_SET}\",
members: [
${MEMBER_1},
${MEMBER_2},
${MEMBER_3}
]
});
}
"
# # mongosh using js configuration
# mongosh /scripts/init-replica-set.js
The connection string for connecting to this replica set is:
mongodb://127.0.0.1:27018,127.0.0.1:27019,127.0.0.1:27020/?replicaSet=mongo-replicaset
Replication Advantages and Disadvantages
Here are some of advantages of replication:
- Increases data availability and reliability: Replication ensures data is always available. In case of crashing or a hardware failure in one node, because we have some complete copy of our data set on another node, the request will route to other active nodes, and our application isn't affected because of a failure in one node.
- Disaster Recovery: In some disaster scenarios where we have a hardware failure or a crash in the server node, replication provides a disaster recovery for us. If the primary node goes down by a failure because we have some secondary nodes in our replica set, this secondary node can be used as backup and allow to restore the database server quickly. it decreases the downtime and data loss of our database server, and data can be restored from one of the secondary nodes. In case of failure in primary node one of the secondary nodes can be elected as a new primary node. Scalability and Load Balancing: Replication allows us to scale our database horizontally, and read operations can be distributed between secondary nodes with a read preference mode like
nearest
orsecondary
, and this allows us to scale our read capacity, improving overall performance and making it suitable for read-heavy applications, and also reduces load on the primary node. But we have to send all writes to the primary node, and replication doesn't have any affect on the scalability of write operations. Data Redundancy: By replicating data across different nodes and servers, replication provides a safeguard against data loss and system failures on the primary node. This redundancy at the end will increase data availability. No Down Time for Backup and Maintenance: Replication can be used for maintenance tasks like backups, and for this purpose we can use secondary nodes for performing backups without any downtime and affecting the performance of the primary node. Geographical Distribution: With replication and distributing data across different geographical nodes beside of increasing availability because we have different copy and replicate of the data on different region, users on different location can access and read data from nearby node with lower latency and enhancing data locality and for this purpose MongoDB uses read preference andnearby
read algorithm.
Beside replication advantages, it has some disadvantages that you should consider before using:
- Increase Resource Usage: In replication, multiple copies of the same data are spread across different nodes, so we need multiple nodes or servers that consume more disk space, CPU, and RAM for our replica set, and this can lead to more costs for infrastructure, nodes resources and also we have some maintenance costs.
- Complexity in Maintenance: Managing a replica set can introduce more complexity in terms of configuration, monitoring, data synchronization, failure management and handling the election process, etc., compared to a single node setup.
- Eventual Consistency and Immediate Consistency Issue: MongoDB uses eventual consistency in the replication process and replica set, and this means we have some delay in synchronization of data between primary nodes and secondary nodes but all secondary nodes will eventually reflect the same data. This may lead to scenarios where secondary nodes serve 'stale data` if they are used for reading data until they catch up and update with the primary node, and there is no guarantee that reads from all secondary nodes immediately return most recent writes, and this can create inconsistency issues in read operations from secondary nodes, and it is not ideal for applications that need strong consistency.
- Increased Latency for Writes: Although replication increase read performance but it can introduce some latency for write operation in some scenarios based on write concern. Write operations should be processed by the primary node and then replicated to secondary nodes based on write concern settings and primary node should wait for acknowledgment from secondary nodes for ack write in their data set for example for
{ w: "majority"}
write concern primary node should wait for write acknowledgment for half of nodes, so because of this waiting in primary node for acknowledgments our write performance can be slow down. Although replication increases read performance, it can introduce some latency for write operations in some scenarios based on write concern. Write operations should be processed by the primary node and then replicated to secondary nodes based on write concern settings. The primary node should wait for acknowledgement from secondary nodes. For example, for{ w: "majority"}
write concern, the primary node should wait for acknowledgement for half of the nodes, so because of this waiting in the primary node for acknowledgements, our write performance can be slowed down.
Source Code
All source code with different approaches are available in This Repository on the GitHub.
References
- MongoDB Replication
- Replication in MongoDB
- Replica Set Members
- Read Preference
- Replica Set Elections
- Write Concern for Replica Sets
- Write Concern
- Read Preference Use Cases
- Deploy a Self-Managed Replica Set
- Convert a Standalone Self-Managed mongod to a Replica Set
- Connect to a Deployment
- The only local MongoDB replica set with Docker Compose guide you’ll ever need!