源码-redis-1

走读 Redis 源码时，发现一处尝试多次打开文件的操作，不知什么原因，就找了找历史记录，摘录如下。

while(maxtries--) {
    snprintf(tmpfile,256,
        "temp-%d.%ld.rdb",(int)server.unixtime,(long int)getpid());
    dfd = open(tmpfile,O_CREAT|O_WRONLY|O_EXCL,0644);
    if (dfd != -1) break;
    sleep(1);
}

摘录

github 提交记录<8c5abee892f28d9c19921971b86991b5091e8530>的信息为： Applied the replication bug patch provided by Jeremy Zawodny, removing temp file collision after the slave got the dump.rdb file in the SYNC stage.

antirez 的 blog 里详细讲了（Friday, 12 March 10）：

The replication bug

Well another very important issue of this week was a not easy to spot bug in the Redis replication code :)

The bug was found to Jeremy Zawodny from Craigslist (they are using a fairly large Redis farm!). Not only Jeremy found the bug and was extremely supportive while I was trying to fix the bug, he also finally fixed it :)

The story and the cause of this bug are interesting IMHO, so I'm going to briefly write what happened. Jeremy filed a bug report about replication failing: slaves were unable to parse the .rdb file received from the master.

My best guess was that this was related to a broken .rdb file generation, both I and Petern spent many hours trying to find a sorted set that would corrupt the generated .rdb file. What was really strange was that the file was corrupted in a bad way but still no crashes or alike in the masters. If it was a memory corruption bug, there were good chances of the master failing as well in random ways. Instead everything was fine.

Jeremy investigated more and finally found and fixed the bug: the .rdb was corrupting while it was transfered from master to slave. In fact at craigslist they run a number of slaves for every host sharing the same work dir, and this is the code Redis was using in order to generate the temp file used to save the dump:
c snprintf(tmpfile,256,"temp-%d.%ld.rdb",(int)time(NULL),(long int)random());

This looks random enough, instead it's lame. For a few reasons:

In redis there is no seeding of the PRNG via srand() because we want deterministic behavior, otherwise bug fixing is hard.

it's not so unlike that two slaves will sync with masters at the same time, especially in big environments where there are scripts handing replication.

why I did't used the pid that is ways more unique? two processes can't have the same pid at the same time, so the right thing was to use pid+time, and as Jeremy suggested, to use O_EXCL so even if there is a collision we are safe. This is how the new code looks like:
c while(maxtries--) { snprintf(tmpfile,256, "temp-%d.%ld.rdb",(int)time(NULL),(long int)getpid()); dfd = open(tmpfile,O_CREAT|O_WRONLY|O_EXCL,0644); if (dfd != -1) break; sleep(1); }

This should be much better ;) I'm going to release 1.2.6 today or in the week end at max, with this fix.

源码-redis-1

摘录

参考

Related Posts

Published

Category

Tags

Contact