把备库提升为主库，背后发生了什么？-PostgreSQL数据库-樱桃溪学院

把备库提升为主库，背后发生了什么？

xiaobu 5月前 237

主库(primary database)是在PG物理复制中的一个概念，相比较备库，主库的最大特征是可读可写(readable and writable)。备库(standby database)是物理复制中的一个概念，它的最大特征是只读(read-only)，它和主库的内容一模一样。

把一个PG的备库(standby database)提升为主库(primary database)是非常简单的，可以在即将要被变成主库的那个备库上执行“SELECT pg_promote()”命令，就可以把该只读的备库变成可读可写的主库了。但是这背后发生了什么呢？本文带领大家探索这背后到底发生了哪些事情。

首先看一下pg_promote()函数的源代码：

/// #define PROMOTE_SIGNAL_FILE		"promote"
/*
 * Promotes a standby server.
 *
 * A result of "true" means that promotion has been completed if "wait" is
 * "true", or initiated if "wait" is false.
 */
Datum
pg_promote(PG_FUNCTION_ARGS)
{
	bool		wait = PG_GETARG_BOOL(0);
	int			wait_seconds = PG_GETARG_INT32(1);
	FILE	   *promote_file;
	int			i;

	if (!RecoveryInProgress()) /// prompt只能在备库上做。
		ereport(ERROR,
				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
				 errmsg("recovery is not in progress"),
				 errhint("Recovery control functions can only be executed during recovery.")));

	if (wait_seconds <= 0)
		ereport(ERROR,
				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
				 errmsg("\"wait_seconds\" must not be negative or zero")));

	/* create the promote signal file */
	promote_file = AllocateFile(PROMOTE_SIGNAL_FILE, "w"); 
	if (!promote_file)
		ereport(ERROR,
				(errcode_for_file_access(),
				 errmsg("could not create file \"%s\": %m",
						PROMOTE_SIGNAL_FILE)));

	if (FreeFile(promote_file))
		ereport(ERROR,
				(errcode_for_file_access(),
				 errmsg("could not write file \"%s\": %m",
						PROMOTE_SIGNAL_FILE)));

	/* signal the postmaster */
	if (kill(PostmasterPid, SIGUSR1) != 0) /// 先写promote文件，再向postmaster主进程发送SIGUSR1信号。
	{
		(void) unlink(PROMOTE_SIGNAL_FILE);
		ereport(ERROR,
				(errcode(ERRCODE_SYSTEM_ERROR),
				 errmsg("failed to send signal to postmaster: %m")));
	}

	/* return immediately if waiting was not requested */
	if (!wait)
		PG_RETURN_BOOL(true);

	/* wait for the amount of time wanted until promotion */
#define WAITS_PER_SECOND 10
	for (i = 0; i < WAITS_PER_SECOND * wait_seconds; i++)
	{
		int			rc;

		ResetLatch(MyLatch);

		if (!RecoveryInProgress()) /// 如果备库变成了主库，就跳出循环。
			PG_RETURN_BOOL(true);

		CHECK_FOR_INTERRUPTS();

		rc = WaitLatch(MyLatch,
					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
					   1000L / WAITS_PER_SECOND,
					   WAIT_EVENT_PROMOTE);

		/*
		 * Emergency bailout if postmaster has died.  This is to avoid the
		 * necessity for manual cleanup of all postmaster children.
		 */
		if (rc & WL_POSTMASTER_DEATH)
			ereport(FATAL,
					(errcode(ERRCODE_ADMIN_SHUTDOWN),
					 errmsg("terminating connection due to unexpected postmaster exit"),
					 errcontext("while waiting on promotion")));
	}

	ereport(WARNING,
			(errmsg_plural("server did not promote within %d second",
						   "server did not promote within %d seconds",
						   wait_seconds,
						   wait_seconds)));
	PG_RETURN_BOOL(false);
}

这段代码的逻辑不难理解，它通过RecoveryInProgress()函数来判断你这条命令是运行在备库上，还是主库上。主库当然不需要被promoted，所以这条命令只能在备库上执行。然后它在数据库集群目录下创建一个promote的信号文件，所谓信号文件，就是这个文件的存在就意味着一个明确的信号，而这个文件的内容是不需要操心的。该信号文件被创建成功后，就通过kill(PostmasterPid, SIGUSR1)给主进程发送SIGUSR1的信号，然后就反复查询RecoveryInProgress()函数啥时候返回false，一旦该函数的返回值为false，则表明promote成功，pg_promote()函数就返回给用户一个成功信息。

很显然，我们接着要看主进程收到SIGUSR1信号后，主进程做了什么事情。我们可以看如下代码：

	if (StartupPID != 0 && /// 这表明Startup进程正在运行中。
		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
		 pmState == PM_HOT_STANDBY) &&
		CheckPromoteSignal()) /// CheckPromoteSignal()检查promote文件是否存在，如果存在就返回true。
	{
		/*
		 * Tell startup process to finish recovery.
		 *
		 * Leave the promote signal file in place and let the Startup process
		 * do the unlink.
		 */
		signal_child(StartupPID, SIGUSR2);
	}

主进程收到SIGUSR1信号后，会执行上述的逻辑。上述逻辑中的CheckPromoteSignal()函数就是判断在数据库集群目录下是否有promote文件，如果该文件存在，且StartupPID != 0，且主进程处于PM_STARTUP/PM_RECOVERY/PM_HOT_STANDBY三种状态中的一种，就给startup进程发送SIGUSR2信号。

接下来我们就要看startup进程收到SIGUSR2信号后，做了哪些动作。

最新回复 (1)

xiaobu 5月前

2楼

startup进程收到SIGUSR2的后，会执行StartupProcTriggerHandler()函数。该函数的源代码如下：

/* SIGUSR2: set flag to finish recovery */
static void
StartupProcTriggerHandler(SIGNAL_ARGS)
{
	promote_signaled = true;
	WakeupRecovery();
}

这个函数继续调用WakeupRecovery()函数，其源代码如下：

/*
 * Wake up startup process to replay newly arrived WAL, or to notice that
 * failover has been requested.
 */
void
WakeupRecovery(void)
{
	SetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
}

发新帖

xiaobu

主题数
49

帖子数
165

注册排名
19