使用pg_resetwal来改变WAL文件的大小

xiaobu 1月前 73

在使用initdb创建一个新的数据库集群时,我们可以通过指定参数“--wal-segsize=32”来设置WAL文件的大小为32MB。wal-segsize的合法值为1、2、4、8、16、32、64、128、256、512、1024共11种合法的值,其余的值都是非法的。如果我们不指定这个参数,WAL文件的大小是16MB。

数据库集群一旦创建完毕后,WAL文件的大小就不能轻易改变了。但是我们可以使用pg_resetwal这个工具来更改WAL文件的大小,不过这个重大操作需要关闭数据库集群才能进行,也就是说需要down time。下面的内容记录了如何使用pg_resetwal来改变WAL文件的大小的过程。

 

我们从零开始,首先使用initdb来创建一个新的数据库集群:

postgres@ip-172-31-29-179:~$ initdb -D walsize
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory walsize ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D walsize -l logfile start

让我们看一下新创建的数据库集群walsize的WAL文件的体积是多少:

postgres@ip-172-31-29-179:~$ pg_controldata -D walsize
pg_control version number:            1800
Catalog version number:               202506291
Database system identifier:           7540849576612943281
Database cluster state:               shut down
pg_control last modified:             Thu Aug 21 01:47:42 2025
Latest checkpoint location:           0/175CEF8
Latest checkpoint's REDO location:    0/175CEF8
Latest checkpoint's REDO WAL file:    000000010000000000000001
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:752
Latest checkpoint's NextOID:          13638
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        744
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Thu Aug 21 01:47:42 2025
Fake LSN counter for unlogged rels:   0/3E8
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                off
max_connections setting:              100
max_worker_processes setting:         8
max_wal_senders setting:              10
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float8 argument passing:              by value
Data page checksum version:           1
Default char data signedness:         signed
Mock authentication nonce:            b11d25d7c816c4ebf89d01df9077f2f17c400af3b335a4d10a820b8ae56b6d5e
postgres@ip-172-31-29-179:~$ ls -l walsize/pg_wal/
total 16392
-rw------- 1 postgres postgres 16777216 Aug 21 01:47 000000010000000000000001
drwx------ 2 postgres postgres     4096 Aug 21 01:47 archive_status
drwx------ 2 postgres postgres     4096 Aug 21 01:47 summaries
postgres@ip-172-31-29-179:~$ 

从上面的结果可以看到,WAL文件的体积是16777216字节,即16MB。下面我们启动这个数据库集群,启动后再创建一个测试数据库oracle,并在oracle数据库中创建一张测试表state:

postgres@ip-172-31-29-179:~$ pg_ctl start -D walsize -l W1.txt
waiting for server to start.... done
server started
postgres@ip-172-31-29-179:~$ psql
psql (18beta3)
Type "help" for help.

postgres=# CREATE DATABASE oracle;
CREATE DATABASE
postgres=# \c oracle
You are now connected to database "oracle" as user "postgres".
oracle=# CREATE TABLE state(id CHAR(2), name VARCHAR(32));
CREATE TABLE
oracle=# INSERT INTO state VALUES('CA', 'Los Angeles');
INSERT 0 1
oracle=# CHECKPOINT;
CHECKPOINT
oracle=# 

最后我们手工执行一个checkpoint,确保所有的数据都被可靠地写入到磁盘中了。在执行完CheckPoint以后,然后我们立刻关闭这个数据库集群:

postgres@ip-172-31-29-179:~$ pg_ctl stop -D walsize
waiting for server to shut down.... done
server stopped

我们使用pg_controldata命令查看一下控制文件中的内容:

postgres@ip-172-31-29-179:~$ pg_controldata -D walsize
pg_control version number:            1800
Catalog version number:               202506291
Database system identifier:           7540849576612943281
Database cluster state:               shut down
pg_control last modified:             Thu Aug 21 01:54:16 2025
Latest checkpoint location:           0/1BE85C8
Latest checkpoint's REDO location:    0/1BE85C8
Latest checkpoint's REDO WAL file:    000000010000000000000001
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:766
Latest checkpoint's NextOID:          16392
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        744
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Thu Aug 21 01:54:16 2025
Fake LSN counter for unlogged rels:   0/3E8
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                off
max_connections setting:              100
max_worker_processes setting:         8
max_wal_senders setting:              10
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float8 argument passing:              by value
Data page checksum version:           1
Default char data signedness:         signed
Mock authentication nonce:            b11d25d7c816c4ebf89d01df9077f2f17c400af3b335a4d10a820b8ae56b6d5e
postgres@ip-172-31-29-179:~$ 

从上面的结果看,数据库已经被干净地关闭了,因为数据库的状态是“shut down”,而且Latest checkpoint location和Latest checkpoint's REDO location的值是相等的。这些检查要点都是衡量一个数据库集群是否干净地关闭的要点。

postgres@ip-172-31-29-179:~$ pg_resetwal --wal-segsize=128 walsize
Write-ahead log reset

在上述命令中,我们使用了--wal-segsize=128这个参数设置WAL文件的尺寸是128M。现在我们查看一下WAL文件的体积:

postgres@ip-172-31-29-179:~$ pg_controldata -D walsize
pg_control version number:            1800
Catalog version number:               202506291
Database system identifier:           7540849576612943281
Database cluster state:               shut down
pg_control last modified:             Thu Aug 21 02:00:13 2025
Latest checkpoint location:           0/8000028
Latest checkpoint's REDO location:    0/8000028
Latest checkpoint's REDO WAL file:    000000010000000000000001
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:766
Latest checkpoint's NextOID:          16392
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        744
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Thu Aug 21 02:00:13 2025
Fake LSN counter for unlogged rels:   0/3E8
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    minimal
wal_log_hints setting:                off
max_connections setting:              100
max_worker_processes setting:         8
max_wal_senders setting:              10
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                134217728
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float8 argument passing:              by value
Data page checksum version:           1
Default char data signedness:         signed
Mock authentication nonce:            b11d25d7c816c4ebf89d01df9077f2f17c400af3b335a4d10a820b8ae56b6d5e
postgres@ip-172-31-29-179:~$ ls -l walsize/pg_wal/
total 131084
-rw------- 1 postgres postgres 134217728 Aug 21 02:00 000000010000000000000001
drwx------ 2 postgres postgres      4096 Aug 21 01:47 archive_status
drwx------ 2 postgres postgres      4096 Aug 21 01:47 summaries
postgres@ip-172-31-29-179:~$ 

我们把WAL文件的体积变成了128MB,PG有一个规定:min_wal_size的体积应该是WAL文件体积的2倍,但是这个参数的缺省值是80MB,我们把它设置为256MB,然后再启动数据库集群:

 

可以看到,WAL文件的体积已经变成了128MB。下面我们启动这个数据库集群:

postgres@ip-172-31-29-179:~$ pg_ctl start -D walsize -l W2.txt
waiting for server to start.... done
server started
postgres@ip-172-31-29-179:~$ psql
psql (18beta3)
Type "help" for help.

postgres=# \c oracle
You are now connected to database "oracle" as user "postgres".
oracle=# SELECT * FROM state;
 id |    name     
----+-------------
 CA | Los Angeles
(1 row)

我们可以看到,数据库集群可以正常启动。

这个过程的要点,我会后续进行分析。

 

 

最新回复 (1)
  • xiaobu 1月前
    引用 2

    pg_resetwal干了哪些工作?


    pg_resetwal的源代码非常简单,下面是它做的工作:

    	/*
    	 * Else, do the dirty deed.
    	 */
    	RewriteControlFile(); /// 重写控制文件
    	KillExistingXLOG();   /// 删除pg_wal目录下所有的WAL文件
    	KillExistingArchiveStatus(); /// 删除pg_wal/archive_status目录下的文件
    	KillExistingWALSummaries();  /// 删除pg_wal/summaries目录下的文件
    	WriteEmptyXLOG(); /// 创建一个新的WAL文件,它的编号比pg_wal目录下任何一个WAL文件的编号多一。
    	/// 在这个文件中写入一条CheckPoint的WAL记录。
    
    	printf(_("Write-ahead log reset\n"));
    	return 0;
    

    它第一步是重写控制文件,在这一步,它会做如下事情:

    	ControlFile.state = DB_SHUTDOWNED; /// 设置数据库的状态为干净地关闭。
    	ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
    

    上述代码会把控制文件中的数据库状态设置为DB_SHUTDOWNED,而且让checkPoint和redo两个LSN是一致的,并且在最后一步的WriteEmptyXLOG()函数中执行如下设置:

    	record->xl_info = XLOG_CHECKPOINT_SHUTDOWN; 

    这个设置就是为了表示数据库是干净地被关闭的。函数KillExistingXLOG(),KillExistingArchiveStatus()和函数KillExistingWALSummaries()的工作非常简单,就是删除pg_wal目录下已经存在的老的WAL文件。

    经过上面几步操作以后,等下次启动该数据库集群时,PG会发现这个数据库集群是被干净地关闭的,所以它不会做数据库恢复,即startup进程基本上啥也不做就退出了。

    pg_resetwal这种操作,实际上就是保证数据库集群肯定能够被启动起来。如果数据库集群是被干净地关闭的,pg_resetwal实际上啥也没做,但是可以通过它来改变一些重要的参数,如WAL文件的体积等。

     

     

     

返回
发新帖