When you use MongoDB replication sets it can be difficult to determine if your secondary servers are up to date and functioning properly. An improperly formed Map/Reduce instruction can cause an error which stops replication. At this time there is no application error or any other notification for this scenario. Even under the best of circumstances it's nice to know how far behind your secondary servers are. The graph above shows a production replication set which hovers around 2-3 seconds lag. That doesn't mean that the Mongo servers take that long to replicate a change. Instead it means "how many seconds has it been since a replicable event occured?" If the time lag grows dramatically you know there hasn't been an insert into the primary DB or that the replication has failed.
#!/bin/bash -ex
#
# Description: Graph how old the last sync was between primary and secondary mongo instances.
#
# Base code and idea thanks to: Edward M. Goldberg
#
# any mistakes courtesy of: Dennis Faust
#
if [ $RS_DISTRO = ubuntu ]; then
plugin_dir="/etc/collectd/conf"
elif [ $RS_DISTRO = centos ]; then
plugin_dir="/etc/collectd.d"
fi
echo "*** This sets up a collectd plugin container"
cat <<"EOF" >$plugin_dir/MongoDB-lag.conf
LoadPlugin exec
<Plugin exec>
Exec "mongodb" "/usr/local/bin/plugin-mongolag.sh"
</Plugin>
EOF
echo "*** This is called by the container"
cat <<"EOF" >/usr/local/bin/plugin-mongolag.sh
#!/bin/bash
#
while sleep 5; do
VALUE=`echo "db.printSlaveReplicationInfo()" | mongo | grep secs | head -1 | cut -f2 -d'=' | cut -f1 -d's' | cut -f2 -d' ' `
echo "PUTVAL \"REPLACE/mongo/gauge-secondary_lag_secs\" interval=5 N:$VALUE"
done
EOF
sed -i "s/REPLACE/$SKETCHY_UUID/g" /usr/local/bin/plugin-mongolag.sh
chmod 777 /usr/local/bin/plugin-mongolag.sh
service collectd restart
exit 0 # leave with a smile...