监控目标
- 监测导致 Redis 变慢的事件
- 能监控并记录变慢的命令
本文介绍下 Redis 的延迟监控框架和慢命令日志的设计与实现
延迟监控框架的实现
Redis 在运行过程中,以下表格中给出的几类事件都会导致 Redis 变慢,我们通常也把这些事件称为延迟事件
说明如下:
- command: 常规命令
- fast-command: O(1) 和 O(log N) 命令
- fork: 系统调用 fork(2) .
- rdb-unlink-temp-file: 系统调用 unlink(2) .
- aof-write: 写入AOF - 捕获系统调用事件fsync(2).
- aof-fsync-always: 由appendfsync allways策略被调用时,系统调用 fsync(2) .
- aof-write-pending-fsync: 当挂起的写入时,系统调用执行fsync(2) .
- aof-write-active-child: 当子进程执行时,系统调用fsync(2) .
- aof-write-alone: 当主进程执行时,系统调用fsync(2) .
- aof-fstat: 系统调用 fstat(2).
- aof-rename: 完成BGREWRITEAOF之后,系统调用rename(2)以重命名临时文件.
- aof-rewrite-diff-write: 编写执行BGREWRITEAOF时累积的差异.
- active-defrag-cycle: 主动的碎片整理周期.
- expire-cycle: 到期周期.
- eviction-cycle: 驱逐周期.
- eviction-del: 在逐出周期中删除.
针对这些事件,Redis 实现了延迟监控框架,通过采样的方式来记录它们的执行情况
延迟监控框架是在latency.h和latency.c文件中实现的
记录事件执行情况的数据结构
struct latencySample {
int32_t time; //事件的采样时间
uint32_t latency; //事件的执行时长(以毫秒为单位)
};
Redis 又设计了 latencyTimeSeries 结构体,这个结构体使用了一个 latencySample 类型的数组,记录了针对某一类事件的一系列采样结果
struct latencyTimeSeries {
int idx; //采样事件数组的写入位置
uint32_t max; //当前事件的最大延迟
struct latencySample samples[LATENCY_TS_LEN]; //采样事件数组,记录LATENCY_TS_LEN个采样结果,LATENCY_TS_LEN默认为160
};
因为延迟监控框架要记录的延迟事件有很多种,所以 Redis 还进一步设计了一个哈希表 latency_events,作为全局变量 server 的一个成员变量,用来记录不同事件的采样结果数组
struct redisServer {
…
dict *latency_events;
…
}
void initServer(void) {
…
latencyMonitorInit();
}
void latencyMonitorInit(void) {
server.latency_events = dictCreate(&latencyTimeSeriesDictType,NULL);
}
如何实现延迟事件的采样?
记录延迟数据是调用latencyAddSample函数:
/* Add the specified sample to the specified time series "event".
* This function is usually called via latencyAddSampleIfNeeded(), that
* is a macro that only adds the sample if the latency is higher than
* server.latency_monitor_threshold. */
void latencyAddSample(const char *event, mstime_t latency) {
// 查找全局延迟事件哈希表
struct latencyTimeSeries *ts = dictFetchValue(server.latency_events,event);
time_t now = time(NULL);
int prev;
/* Create the time series if it does not exist. */
if (ts == NULL) {
ts = zmalloc(sizeof(*ts));
ts->idx = 0;
ts->max = 0;
memset(ts->samples,0,sizeof(ts->samples));
dictAdd(server.latency_events,zstrdup(event),ts);
}
if (latency > ts->max) ts->max = latency;
/* If the previous sample is in the same second, we update our old sample
* if this latency is > of the old one, or just return. */
// 如果上次更新跟当前更新是同一秒且当前延迟大于上次延迟则更新下,直接返回
prev = (ts->idx + LATENCY_TS_LEN - 1) % LATENCY_TS_LEN;
if (ts->samples[prev].time == now) {
if (latency > ts->samples[prev].latency)
ts->samples[prev].latency = latency;
return;
}
// 插入一个新的采样数据
ts->samples[ts->idx].time = time(NULL);
ts->samples[ts->idx].latency = latency;
ts->idx++;
// 重复使用采样结果数组 latencyTimeSeries
if (ts->idx == LATENCY_TS_LEN) ts->idx = 0;
}
latencyAddSample函数会被下面宏调用,即只有延迟数据大于等于阈值时才会调用进行记录
/* Add the sample only if the elapsed time is >= to the configured threshold. */
#define latencyAddSampleIfNeeded(event,var) \
if (server.latency_monitor_threshold && \
(var) >= server.latency_monitor_threshold) \
latencyAddSample((event),(var));
latencyAddSampleIfNeeded 函数,实际上会在刚才介绍的延迟事件(上面表中的事件,如fork事件,命令事件,AOF事件等)发生时被调用
比如,当 Redis 命令通过 call 函数(在 server.c 文件中)执行时,call 函数就会调用 latencyAddSampleIfNeeded 函数进行采样,如下所示:
if (flags & CMD_CALL_SLOWLOG) {
char *latency_event = (real_cmd->flags & CMD_FAST) ?
"fast-command" : "command";
// 记录常规命令和fast-command命令的延迟
latencyAddSampleIfNeeded(latency_event,duration/1000);
}
如果 AOF 文件刷盘的配置项是 AOF_FSYNC_ALWAYS,会记录aof-fsync-always事件的延迟:
/* Perform the fsync if needed. */
if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
/* redis_fsync is defined as fdatasync() for Linux in order to avoid
* flushing metadata. */
latencyStartMonitor(latency);
/* Let's try to get this data on the disk. To guarantee data safe when
* the AOF fsync policy is 'always', we should exit if failed to fsync
* AOF (see comment next to the exit(1) after write error above). */
if (redis_fsync(server.aof_fd) == -1) {
serverLog(LL_WARNING,"Can't persist AOF for fsync error when the "
"AOF fsync policy is 'always': %s. Exiting...", strerror(errno));
exit(1);
}
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("aof-fsync-always",latency);
server.aof_fsync_offset = server.aof_current_size;
server.aof_last_fsync = server.unixtime;
}
延迟分析和提供应对措施建议
Redis 是提供了 latency doctor 命令,来给出延迟分析结果和应对方法建议的。当我们执行这条命令的时候,Redis 就会使用 latencyCommand 函数来处理。而在处理这个命令时,latencyCommand 函数会调用 createLatencyReport 函数,来生成延迟分析报告和应对方法建议。
createLatencyReport 函数会针对 latency_events 哈希表中记录的每一类事件,先调用 analyzeLatencyForEvent 函数,计算获得采样的延迟事件执行时长的均值、最大 / 最小值等统计结果。在 createLatencyReport 函数中,它定义了多个 int 变量,当这些变量的值为 1 时,就表示建议 Redis 使用者采用一种应对高延迟的措施,代码如下:
/* Create a human readable report of latency events for this Redis instance. */
sds createLatencyReport(void) {
sds report = sdsempty();
int advise_better_vm = 0; /* Better virtual machines. */
int advise_slowlog_enabled = 0; /* Enable slowlog. */
int advise_slowlog_tuning = 0; /* Reconfigure slowlog. */
int advise_slowlog_inspect = 0; /* Check your slowlog. */
int advise_disk_contention = 0; /* Try to lower disk contention. */
int advise_scheduler = 0; /* Intrinsic latency. */
int advise_data_writeback = 0; /* data=writeback. */
int advise_no_appendfsync = 0; /* don't fsync during rewrites. */
int advise_local_disk = 0; /* Avoid remote disks. */
int advise_ssd = 0; /* Use an SSD drive. */
int advise_write_load_info = 0; /* Print info about AOF and write load. */
int advise_hz = 0; /* Use higher HZ. */
int advise_large_objects = 0; /* Deletion of large objects. */
int advise_mass_eviction = 0; /* Avoid mass eviction of keys. */
int advise_relax_fsync_policy = 0; /* appendfsync always is slow. */
int advise_disable_thp = 0; /* AnonHugePages detected. */
int advices = 0;
...
}
慢命令日志的实现
Redis 是使用了一个较为简单的方法来记录慢命令日志,也就是用一个列表,把执行时间超出慢命令日志执行时间阈值的命令记录下来,记录在redisServer中的slowlog变量中:
struct redisServer {
…
list *slowlog;
…}
// slowlog中的元素数据结构如下:
/* This structure defines an entry inside the slow log list */
typedef struct slowlogEntry {
robj **argv; //日志项对应的命令及参数
int argc; //日志项对应的命令及参数个数
long long id; //日志项的唯一ID
long long duration; //日志项对应命令的执行时长(以微秒为单位)
time_t time; //日志项对应命令的执行时间戳
sds cname; //日志项对应命令的发送客户端名称
sds peerid; //日志项对应命令的发送客户端网络地址
} slowlogEntry;
实现慢命令日志记录功能的代码是在slowlog.c文件中。这里的主要函数是 slowlogPushEntryIfNeeded,函数的参数包含了当前执行命令及其参数 argv,以及当前命令的执行时长 duration
/* Push a new entry into the slow log.
* This function will make sure to trim the slow log accordingly to the
* configured max length. */
void slowlogPushEntryIfNeeded(client *c, robj **argv, int argc, long long duration) {
// 慢日志没有开启直接返回
if (server.slowlog_log_slower_than < 0) return; /* Slowlog disabled */
// 如果超过阈值,则插入一条慢日志
if (duration >= server.slowlog_log_slower_than)
listAddNodeHead(server.slowlog,
slowlogCreateEntry(c,argv,argc,duration));
// 超过长度则删除老的
/* Remove old entries if needed. */
while (listLength(server.slowlog) > server.slowlog_max_len)
listDelNode(server.slowlog,listLast(server.slowlog));
}
如果我们发现日志中记录的命令参数非常多,那么它就可能是一条操作 bigkey 的命令.
slowlogCreateEntry 函数在创建慢命令日志项时,也会判断命令参数个数。如果命令参数个数,超出了阈值 SLOWLOG_ENTRY_MAX_ARGC 这个宏定义的大小(默认 32)时,它就不会记录超出阈值的参数了,而是记录下剩余的参数个数。这样一来,慢命令日志项中就既记录了部分命令参数,有助于排查问题,也避免了记录过多参数,占用过多内存。
/* Create a new slowlog entry.
* Incrementing the ref count of all the objects retained is up to
* this function. */
slowlogEntry *slowlogCreateEntry(client *c, robj **argv, int argc, long long duration) {
slowlogEntry *se = zmalloc(sizeof(*se));
int j, slargc = argc;
if (slargc > SLOWLOG_ENTRY_MAX_ARGC) slargc = SLOWLOG_ENTRY_MAX_ARGC;
se->argc = slargc;
se->argv = zmalloc(sizeof(robj*)*slargc);
for (j = 0; j < slargc; j++) {
/* Logging too many arguments is a useless memory waste, so we stop
* at SLOWLOG_ENTRY_MAX_ARGC, but use the last argument to specify
* how many remaining arguments there were in the original command. */
if (slargc != argc && j == slargc-1) {
se->argv[j] = createObject(OBJ_STRING,
sdscatprintf(sdsempty(),"... (%d more arguments)",
argc-slargc+1));
} else {
/* Trim too long strings as well... */
if (argv[j]->type == OBJ_STRING &&
sdsEncodedObject(argv[j]) &&
sdslen(argv[j]->ptr) > SLOWLOG_ENTRY_MAX_STRING)
{
sds s = sdsnewlen(argv[j]->ptr, SLOWLOG_ENTRY_MAX_STRING);
s = sdscatprintf(s,"... (%lu more bytes)",
(unsigned long)
sdslen(argv[j]->ptr) - SLOWLOG_ENTRY_MAX_STRING);
se->argv[j] = createObject(OBJ_STRING,s);
} else if (argv[j]->refcount == OBJ_SHARED_REFCOUNT) {
se->argv[j] = argv[j];
} else {
/* Here we need to duplicate the string objects composing the
* argument vector of the command, because those may otherwise
* end shared with string objects stored into keys. Having
* shared objects between any part of Redis, and the data
* structure holding the data, is a problem: FLUSHALL ASYNC
* may release the shared string object and create a race. */
se->argv[j] = dupStringObject(argv[j]);
}
}
}
se->time = time(NULL);
se->duration = duration;
se->id = server.slowlog_entry_id++;
se->peerid = sdsnew(getClientPeerId(c));
se->cname = c->name ? sdsnew(c->name->ptr) : sdsempty();
return se;
}
记录函数slowlogPushEntryIfNeeded会被slowlogPushCurrentCommand调用
/* Log the last command a client executed into the slowlog. */
void slowlogPushCurrentCommand(client *c, struct redisCommand *cmd, ustime_t duration) {
/* Some commands may contain sensitive data that should not be available in the slowlog. */
if (cmd->flags & CMD_SKIP_SLOWLOG)
return;
/* If command argument vector was rewritten, use the original
* arguments. */
robj **argv = c->original_argv ? c->original_argv : c->argv;
int argc = c->original_argv ? c->original_argc : c->argc;
slowlogPushEntryIfNeeded(c,argv,argc,duration);
}
void call(client *c, int flags) {
...
elapsedStart(&call_timer);
c->cmd->proc(c); // 执行命令
const long duration = elapsedUs(call_timer); // 得到命令的执行时间
c->duration = duration;
...
/* Log the command into the Slow log if needed.
* If the client is blocked we will handle slowlog when it is unblocked. */
if ((flags & CMD_CALL_SLOWLOG) && !(c->flags & CLIENT_BLOCKED))
slowlogPushCurrentCommand(c, real_cmd, duration);
...
}
即call函数 -> slowlogPushCurrentCommand -> slowlogPushEntryIfNeeded进行记录慢命令日志
总结
本文介绍了事件延迟框架的实现和慢命令日志的实现,针对延迟框架,会记录像fork、AOF重写、过期key事件等事件的延迟事件,当然要延迟超过阈值才会记录,每类事件会记录到一个数组中。针对慢命令日志,针对超过阈值的慢命令,使用一个列表把它们记录下来,这里面包括了命令及参数,以及发送命令的客户端信息,这样可以方便运维人员查看分析。Redis 的 INFO 命令也提供了 Redis 运行时的监控信息,不过你要知道,INFO 命令的实现,主要是在全局变量 server 的成员变量中,用来记录 Redis 实例的实时运行状态或是资源使用情况的。