MongoDB Backup Data Directory
Dedicated to mongodump haters
MongoDB 3.6 brought us a number of new improvements we can take now into consideration if it's more convenient, reliable and fast platform for NoSQL. A quick look into the website give us the most important add-ons to facilitate our work:
Change streams enable you to create powerful data pipelines, moving data to wherever it’s needed using a time-ordered sequence of changes as they occur in the database.
New causal consistency enforces strict, sequential ordering of operations within a session, regardless of which node in the cluster is serving the request. Shard-aware secondary reads ensure data consistency from any secondary, even as data is balanced across the cluster.
Fully expressive array updates allow to perform complex array manipulations against all matching elements in any array in a single atomic update.
Retryable writes reduces the error handling you have to implement in your code. The MongoDB drivers will now automatically retry write operations in the event of transient network errors or primary replica elections, while the server enforces exactly-once semantics.
MongoDB 3.6 now extends binding to localhost by default to all packages and platforms, denying all external connections to the database until permitted by you. Combined with new IP whitelisting support, you can now configure MongoDB to only accept external connections from approved IP addresses.
MongoDB Atlas clusters can now span multiple cloud provider regions, enabling you to build apps that maintain continuous availability in the event of geographic outages, and improve customer experience by collocating data closer to users.
Backup methods have not changed with the new version and MongoDB still suggests us a few options how to do backup:
Back up with Atlas
Back up with MongoDB Cloud Manager or Ops Manager
Back up by copying underlying data files
Back up with filesystem snapshots
Back up with cp or rsync
Back up with mongodump
A short description of each method you can find in MongoDB documentation:
Unfortunately, if we want to use MongoDB Community in production environment we are forced not to use Atlas and Ops Manager options. But if you have more than 200 GB of data in replica set (I suppose you don't take any risk and really have it in production) and don't have enough free disk space? And presumable hate any kind of dump?
When we have a replica set the best way is to run backups on secondary servers and that's the case I want to describe. I've chosen a simple way to backup data directory: copying data files (without rsync or cp) and compress them into a *.gz file. This goal we can achieve with bash script or python script.
In order to do it we need to stop all writes on MongoDB instance and allow data consistency in our backup. When a backup will stop the script will run "unlock" command and a secondary server will continue to work.
I've used in the script below bash command to compress a tar file but you can write function with zipfile or tarfile libraries too. Also it requires Python version 3 so please do a few steps before running a script:
yum -y update
yum -y install yum-utils
yum -y groupinstall development
yum -y install https://centos7.iuscommunity.org/ius-release.rpm
yum -y install python36u
yum -y install python36u-pip
yum -y install python36u-devel
pip3.6 install pymongo
Also let's create backup directory and backup script itself:
cd /root
mkdir mongodb_backup
mkdir mongodb_jobs
cd /mongodb_jobs
vim mongodb_directory_backup.py
And here is the simple code:
import os
import smtplib
import platform
import subprocess
from datetime import datetime
from pymongo import MongoClient
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
# Declaring global variables
timestamp = datetime.now().strftime('%d%m%Y')
host = platform.uname()[1]
output_filename = host + '_mongodb_' + timestamp
backup_dir = '/root/mongodb_backup/' + timestamp + '/'
work_dir = '/root/mongodb_backup/'
data_dir = '/opt/mongodb/data'
mongo_shell = '/usr/bin/mongo'
mongo_host = '127.0.0.1'
mongo_port = '27017'
conn_string = 'mongodb://' + mongo_host + ':' + mongo_port + '/'
zip_command = 'tar -czvf ' + backup_dir + output_filename + '.tar.gz ' + data_dir
remove_command = 'find ' + work_dir + '*' + ' -type d -ctime +2 -exec rm -rf {} \;'
from_address = '[email address]'
to_address = '[email address]'
# Connecting to the local MongoDB instance
client = MongoClient(conn_string)
# Connecting to a SMTP server
mail_server = smtplib.SMTP('[SMTPServerAddress]', [Port])
mail_server.starttls()
mail_server.login('[Username]', '[Password]')
mail_server.connect()
# Get the master status
MongoDB = client.admin
serverStatus = MongoDB.command('isMaster')
isMasterStatus = serverStatus['ismaster']
# Check if the instance has Primary replica role
if isMasterStatus != 'True':
pass
else:
message = MIMEMultipart()
message['From'] = from_address
message['To'] = to_address
message['Subject'] = 'MongoDB backup failure on ' + host
html = """\
<html>
<head></head>
<body>
The host """ + host + """ has a Primary role!<br>
Cannot backup the data directory.<br>
Please check replica set state.
</body>
</html>
"""
body = MIMEText(html, 'html')
message.attach(body)
mail_server.sendmail(from_address, to_address, message.as_string())
quit()
# Creating a backup directory locally
try:
os.makedirs(backup_dir)
except OSError:
if not os.path.isdir(backup_dir):
raise
# Run command to block all writes
MongoDB.command("fsync", lock = True)
# Copying data directory with tar/zip
try:
subprocess.call([zip_command], cwd = backup_dir, stdout = subprocess.DEVNULL, stderr = subprocess.STDOUT, shell = True)
except OSError:
MongoDB.command("fsyncUnlock")
message = MIMEMultipart()
message['From'] = from_address
message['To'] = to_address
message['Subject'] = 'MongoDB backup failure on ' + host
html = """\
<html>
<head></head>
<body>
The script cannot create a compressed tar file.<br>
Cannot backup the data directory.<br>
Please check the reason!
</body>
</html>
"""
body = MIMEText(html, 'html')
message.attach(body)
mail_server.sendmail(from_address, to_address, message.as_string())
quit()
# Unlock objects to start writes
MongoDB.command("fsyncUnlock")
# Deleting backup files older than 2 days
subprocess.call([remove_command], cwd = work_dir, stdout = subprocess.DEVNULL, stderr = subprocess.STDOUT, shell = True)
# Quit the job
mail_server.quit()
quit()
The last step is to create a cron job:
crontab -e
# MongoDB data directory backup
0 6 * * 0 /usr/bin/python3.6 /root/mongodb_jobs/mongodb_directory_backup.py
Good luck !