checkpoint_timeout
这个参数的含义非常好理解。官方文档是这么描述这个参数的:
Maximum time between automatic WAL checkpoints. If this value is specified without units, it is taken as seconds. The valid range is between 30 seconds and one day. The default is five minutes (5min). Increasing this parameter can increase the amount of time needed for crash recovery. This parameter can only be set in the postgresql.conf file or on the server command line.
检查点操作有两种类型,第一种是每隔一段时间就自动执行一次,第二种是各种条件满足后会设置共享内存中的某个ckpt_flags为非零值,也会触发检查点。checkpoint_timeout是针对第一种情况的,它是描述时间间隔的,缺省是5分钟,即每隔五分钟就会执行一次检查点。
如果我们阅读checkpointer的源码,我们会看到如下代码:在checkpointer进程启动阶段,会记录当前的时间到last_checkpoint_time这个变量中,它是一个8字节的有符号整数。
typedef int64 pg_time_t;
static pg_time_t last_checkpoint_time;
static pg_time_t last_xlog_switch_time;
last_checkpoint_time = last_xlog_switch_time = (pg_time_t) time(NULL);
checkpointer进程完成各种初始化工作后,就会进入一个无限循环for(;;),在这个循环中,有如下逻辑:
bool do_checkpoint = false;
pg_time_t now;
int elapsed_secs;
now = (pg_time_t) time(NULL); /// 获取当前的时间
elapsed_secs = now - last_checkpoint_time; /// elapsed_secs是自上一次检查点以后流逝的秒数。
if (elapsed_secs >= CheckPointTimeout) /// 因为超时,会触发检查点操作。
{
if (!do_checkpoint)
chkpt_or_rstpt_timed = true;
do_checkpoint = true;
flags |= CHECKPOINT_CAUSE_TIME; /// 设置标志位,表示是因为超时导致的检查点。
}
上面的逻辑不难理解,每次循环都会获取当前时间和last_checkpoint_time相减,结果是秒数elapsed_secs,如果这个值大于等于CheckPointTimeOut,则设置do_checkpoint为true,表示要执行一次检查点。后面的逻辑如下:
/*
* Do a checkpoint if requested.
*/
if (do_checkpoint) /// 如果需要做一个检查点。
{
bool ckpt_performed = false;
bool do_restartpoint;
/* Check if we should perform a checkpoint or a restartpoint. */
do_restartpoint = RecoveryInProgress(); /// 如果处于备库模式,就为true,否则为false
/*
* Do the checkpoint.
*/
if (!do_restartpoint)
{
CreateCheckPoint(flags);
ckpt_performed = true;
}
else
ckpt_performed = CreateRestartPoint(flags);
if (ckpt_performed) /// 如果执行了Checkpoint,true/false
{
/*
* Note we record the checkpoint start time not end time as
* last_checkpoint_time. This is so that time-driven
* checkpoints happen at a predictable spacing.
*/
last_checkpoint_time = now;
if (do_restartpoint)
PendingCheckpointerStats.restartpoints_performed++;
}
当检查点执行完毕后,last_checkpoint_time的值被设置为now的值,表示本次检查点的发生时间。在下一次循环过程中,继续检查它和now之间的流逝的时间是否大于CheckPointTimeOut,决定是否执行一次检查点。
由于超时而引发的检查点被称为timed checkpoint,这种检查点越多越好。根据不同的PG版本,你可以在pg_stat_checkpointer或者pg_stat_bgwriter系统视图中看到这种检查点发生的次数。