MONGO集群修改linux主机时间后的影响有哪些-创新互联

小编给大家分享一下MONGO集群修改linux主机时间后的影响有哪些，希望大家阅读完这篇文章之后都有所收获，下面让我们一起去探讨吧！

创新互联建站从2013年创立，是专业互联网技术服务公司，拥有项目成都网站设计、成都网站制作网站策划，项目实施与项目整合能力。我们以让每一个梦想脱颖而出为使命，1280元兴文做网站,已为上家服务,为兴文各地企业和个人服务,联系电话:18980820575

生产环境是一主一从一仲裁3分片的集群，现在发现其中一个节点比当前时间大了好几天，后使用NTP将时间往回调整副本集上。

原来时间是5月3日，当前是 4月26日，对此进行了调整。

[root@cwdtest1 bin]# date

Fri May 3 13:20:31 CST 2019

[root@cwdtest1 bin]# ntpdate -u 10.205.34.171

26 Apr 12:39:23 ntpdate[14568]: step time server 10.205.34.171 offset -607507.747595 sec

[root@cwdtest1 bin]# hwclock --systohc

调整后当前的时间：

[root@cwdtest1 bin]# date

Fri Apr 26 12:39:31 CST 2019

当完成调整时间之后，发现两个问题：

1.副本集无法同步新的oplog，由此出现了延迟

shard2:PRIMARY> db.printSlaveReplicationInfo();

source: 10.3.252.231:27002

syncedTo: Fri May 03 2019 13:24:23 GMT+0800 (CST)

8 secs (0 hrs) behind the primary

2. 查看oplog的 tLast 时间比当前的大

shard2:PRIMARY> db.getReplicationInfo()

{

"logSizeMB" : 1383.3396482467651,

"usedMB" : 154.49,

"timeDiff" : 17015711,

"timeDiffHours" : 4726.59,

"tFirst" : "Thu Oct 18 2018 14:49:20 GMT+0800 (CST)",

"tLast" : "Fri May 03 2019 13:24:31 GMT+0800 (CST)",

"now" : "Fri Apr 26 2019 13:57:01 GMT+0800 (CST)"

}

shard2:PRIMARY> db.printReplicationInfo()

configured oplog size: 1383.3396482467651MB

log length start to end: 17015711secs (4726.59hrs)

oplog first event time: Thu Oct 18 2018 14:49:20 GMT+0800 (CST)

oplog last event time: Fri May 03 2019 13:24:31 GMT+0800 (CST)

now: Fri Apr 26 2019 15:46:27 GMT+0800 (CST)

查看db.getReplicationInfo中，我们找出tLast 和now两个时间是从哪里得到的？

shard2:PRIMARY> db.getReplicationInfo
function () {
        var localdb = this.getSiblingDB("local");
 
        var result = {};
        var oplog;
        var localCollections = localdb.getCollectionNames();
        if (localCollections.indexOf('oplog.rs') >= 0) {
            oplog = 'oplog.rs';
        } else if (localCollections.indexOf('oplog.$main') >= 0) {
            oplog = 'oplog.$main';
        } else {
            result.errmsg = "neither master/slave nor replica set replication detected";
            return result;
        }
 
        var ol = localdb.getCollection(oplog);
        var ol_stats = ol.stats();
        if (ol_stats && ol_stats.maxSize) {
            result.logSizeMB = ol_stats.maxSize / (1024 * 1024);
        } else {
            result.errmsg = "Could not get stats for local." + oplog + " collection. " +
                "collstats returned: " + tojson(ol_stats);
            return result;
        }
 
        result.usedMB = ol_stats.size / (1024 * 1024);
        result.usedMB = Math.ceil(result.usedMB * 100) / 100;
 
        var firstc = ol.find().sort({$natural: 1}).limit(1);
        var lastc = ol.find().sort({$natural: -1}).limit(1);
        if (!firstc.hasNext() || !lastc.hasNext()) {
            result.errmsg =
                "objects not found in local.oplog.$main -- is this a new and empty db instance?";
            result.oplogMainRowCount = ol.count();
            return result;
        }
 
        var first = firstc.next();
        var last = lastc.next();
        var tfirst = first.ts;
        var tlast = last.ts;
 
        if (tfirst && tlast) {
            tfirst = DB.tsToSeconds(tfirst);
            tlast = DB.tsToSeconds(tlast);
            result.timeDiff = tlast - tfirst;
            result.timeDiffHours = Math.round(result.timeDiff / 36) / 100;
            result.tFirst = (new Date(tfirst * 1000)).toString();
            result.tLast = (new Date(tlast * 1000)).toString();
            result.now = Date();
        } else {
            result.errmsg = "ts element not found in oplog objects";
        }
 
        return result;
    }

从以上可以看出:

var ol = localdb.getCollection(oplog);

var lastc = ol.find().sort({$natural: -1}).limit(1);

var last = lastc.next();

var tlast = last.ts;

result.tLast = (new Date(tlast * 1000)).toString();

result.now = Date();

tLast 的时间是获取oplog.rs 集合中最后一条数据的ts时间。

Now 的时间是调用 Date()函数获取当前时间。

于是，此时我怀疑副本集无法同步，是因为oplog中存放比当前时间大的日志，而当调整时间后新生成的oplog日志记录并不是最新的，因此副本集在对比时发现最新的日志一直不变，便无法同步。

大概说下mongodb同步的机制（借鉴网络）：

1.执行写语句时，在primary上完成写操作

2.在primary上记录一条oplog日志，日志中包含一个ts字段，值为写操作执行的时间，比如本例中记为t

3.secondary从primary拉取oplog，获取到刚才那一次写操作的日志

4.secondary按获取到的日志执行相应的写操作

5.执行完成后，secondary再获取新的日志，其向primary上拉取oplog的条件为{ts:{$gt:t}}

6.primary此时收到secondary的请求，了解到secondary在请求时间大于t的写操作日志，所以他知道操作在t之前的日志都已经成功执行了

于是，我在primary 执行一次插入测试，来验证怀疑。

shard2:PRIMARY> use shtest

switched to db shtest

shard2:PRIMARY> db.coll.insert( {x:3339876})

WriteResult({ "nInserted" : 1 })

查询主节点最后一条操作记录：

rs.debug.getLastOpWritten()