把备库提升为主库,背后发生了什么?

xiaobu 1月前 58

主库(primary database)是在PG物理复制中的一个概念,相比较备库,主库的最大特征是可读可写(readable and writable)。备库(standby database)是物理复制中的一个概念,它的最大特征是只读(read-only),它和主库的内容一模一样。

 

把一个PG的备库(standby database)提升为主库(primary database)是非常简单的,可以在即将要被变成主库的那个备库上执行“SELECT pg_promote()”命令,就可以把该只读的备库变成可读可写的主库了。但是这背后发生了什么呢?本文带领大家探索这背后到底发生了哪些事情。

首先看一下pg_promote()函数的源代码:

/// #define PROMOTE_SIGNAL_FILE		"promote"
/*
 * Promotes a standby server.
 *
 * A result of "true" means that promotion has been completed if "wait" is
 * "true", or initiated if "wait" is false.
 */
Datum
pg_promote(PG_FUNCTION_ARGS)
{
	bool		wait = PG_GETARG_BOOL(0);
	int			wait_seconds = PG_GETARG_INT32(1);
	FILE	   *promote_file;
	int			i;

	if (!RecoveryInProgress()) /// prompt只能在备库上做。
		ereport(ERROR,
				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
				 errmsg("recovery is not in progress"),
				 errhint("Recovery control functions can only be executed during recovery.")));

	if (wait_seconds <= 0)
		ereport(ERROR,
				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
				 errmsg("\"wait_seconds\" must not be negative or zero")));

	/* create the promote signal file */
	promote_file = AllocateFile(PROMOTE_SIGNAL_FILE, "w"); 
	if (!promote_file)
		ereport(ERROR,
				(errcode_for_file_access(),
				 errmsg("could not create file \"%s\": %m",
						PROMOTE_SIGNAL_FILE)));

	if (FreeFile(promote_file))
		ereport(ERROR,
				(errcode_for_file_access(),
				 errmsg("could not write file \"%s\": %m",
						PROMOTE_SIGNAL_FILE)));

	/* signal the postmaster */
	if (kill(PostmasterPid, SIGUSR1) != 0) /// 先写promote文件,再向postmaster主进程发送SIGUSR1信号。
	{
		(void) unlink(PROMOTE_SIGNAL_FILE);
		ereport(ERROR,
				(errcode(ERRCODE_SYSTEM_ERROR),
				 errmsg("failed to send signal to postmaster: %m")));
	}

	/* return immediately if waiting was not requested */
	if (!wait)
		PG_RETURN_BOOL(true);

	/* wait for the amount of time wanted until promotion */
#define WAITS_PER_SECOND 10
	for (i = 0; i < WAITS_PER_SECOND * wait_seconds; i++)
	{
		int			rc;

		ResetLatch(MyLatch);

		if (!RecoveryInProgress()) /// 如果备库变成了主库,就跳出循环。
			PG_RETURN_BOOL(true);

		CHECK_FOR_INTERRUPTS();

		rc = WaitLatch(MyLatch,
					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
					   1000L / WAITS_PER_SECOND,
					   WAIT_EVENT_PROMOTE);

		/*
		 * Emergency bailout if postmaster has died.  This is to avoid the
		 * necessity for manual cleanup of all postmaster children.
		 */
		if (rc & WL_POSTMASTER_DEATH)
			ereport(FATAL,
					(errcode(ERRCODE_ADMIN_SHUTDOWN),
					 errmsg("terminating connection due to unexpected postmaster exit"),
					 errcontext("while waiting on promotion")));
	}

	ereport(WARNING,
			(errmsg_plural("server did not promote within %d second",
						   "server did not promote within %d seconds",
						   wait_seconds,
						   wait_seconds)));
	PG_RETURN_BOOL(false);
}

这段代码的逻辑不难理解,它通过RecoveryInProgress()函数来判断你这条命令是运行在备库上,还是主库上。主库当然不需要被promoted,所以这条命令只能在备库上执行。然后它在数据库集群目录下创建一个promote的信号文件,所谓信号文件,就是这个文件的存在就意味着一个明确的信号,而这个文件的内容是不需要操心的。该信号文件被创建成功后,就通过kill(PostmasterPid, SIGUSR1)给主进程发送SIGUSR1的信号,然后就反复查询RecoveryInProgress()函数啥时候返回false,一旦该函数的返回值为false,则表明promote成功,pg_promote()函数就返回给用户一个成功信息。

 

很显然,我们接着要看主进程收到SIGUSR1信号后,主进程做了什么事情。我们可以看如下代码:

	if (StartupPID != 0 && /// 这表明Startup进程正在运行中。
		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
		 pmState == PM_HOT_STANDBY) &&
		CheckPromoteSignal()) /// CheckPromoteSignal()检查promote文件是否存在,如果存在就返回true。
	{
		/*
		 * Tell startup process to finish recovery.
		 *
		 * Leave the promote signal file in place and let the Startup process
		 * do the unlink.
		 */
		signal_child(StartupPID, SIGUSR2);
	}

主进程收到SIGUSR1信号后,会执行上述的逻辑。上述逻辑中的CheckPromoteSignal()函数就是判断在数据库集群目录下是否有promote文件,如果该文件存在,且StartupPID != 0,且主进程处于PM_STARTUP/PM_RECOVERY/PM_HOT_STANDBY三种状态中的一种,就给startup进程发送SIGUSR2信号。

接下来我们就要看startup进程收到SIGUSR2信号后,做了哪些动作。

 

 

最新回复 (1)
  • xiaobu 1月前
    引用 2

    startup进程收到SIGUSR2的后,会执行StartupProcTriggerHandler()函数。该函数的源代码如下:

    /* SIGUSR2: set flag to finish recovery */
    static void
    StartupProcTriggerHandler(SIGNAL_ARGS)
    {
    	promote_signaled = true;
    	WakeupRecovery();
    }
    

    这个函数继续调用WakeupRecovery()函数,其源代码如下:

    /*
     * Wake up startup process to replay newly arrived WAL, or to notice that
     * failover has been requested.
     */
    void
    WakeupRecovery(void)
    {
    	SetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
    }
    

    d

     

     

返回
发新帖