本方案说明
- PostgreSQL + repmgr:实现主从自动故障检测与切换(Failover)。
- PgBouncer:作为连接池,屏蔽后端数据库变动,提供透明连接。
- 动态配置更新:通过repmgr组件的
promote_command
阶段触发脚本自动更新 PgBouncer 的[databases]
配置,指向新主库。 - 需要开发语言支持连接串写多个IP,来连接多个pgbouncer
节点规划
主机 | hostname | 角色 | 组件 |
---|---|---|---|
10.0.0.41 | repmgr01 | Leader | PostgreSQL 15.5、repmgr 5.5.0、pgbouncer-1.24.0 |
10.0.0.42 | repmgr02 | standby1 | PostgreSQL 15.5、repmgr 5.5.0、pgbouncer-1.24.0 |
10.0.0.43 | repmgr03 | standby2 | PostgreSQL 15.5、repmgr 5.5.0、pgbouncer-1.24.0 |
目录文件说明
/data/pgsql/data | 后端postgresql的数据目录 |
---|---|
/data/pgsql/log | 后端postgresql的日志目录 |
/data/pgsql/data/postgres.conf | 后端postgresql的配置文件 |
/data/pgsql/data/pg_hba.conf | 后端postgresql的访问控制文件 |
/data/repmgr | 高可用组件repmgr的家目录 |
/data/repmgr/repmgr.conf | 高可用组件repmgr的配置文件 |
/data/repmgr/promte_standby_pgbouncer.sh | repmgr监测到后端主节点故障后触发的脚本 |
/data/pgbouncer/pgbouncer.template | pgbouncer配置模板,被触发脚本引用 |
/data/pgbouncer/pgbouncer.ini | pgbouncer配置文件 |
1.集群准备
准备一套1主2从的repmgr集群,部署过程参考我的另一篇文章PostgreSQL高可用架构Repmgr部署流程
1.1 安装依赖
# 集群中所有节点root用户执行安装# 触发脚本之后会将新配置文件同步到所有节点
yum install -y install rsync
# PgBouncer是基于Libevent开发的,所以需要先安装Libevent的开发包
yum install -y install libevent-devel
1.2 安装PgBouncer
# 集群中所有节点root用户执行,安装PgBouncer[root@repmgr01 ~]# cd /opt
[root@repmgr01 opt]# wget http://www.pgbouncer.org/downloads/files/1.24.0/pgbouncer-1.24.0.tar.gz
[root@repmgr01 opt]# tar xvf pgbouncer-1.24.0.tar.gz
[root@repmgr01 opt]# cd pgbouncer-1.24.0
[root@repmgr01 pgbouncer-1.24.0]# ./configure
[root@repmgr01 pgbouncer-1.24.0]# make && make install
# 默认PgBouncer是安装到 /usr/local/bin 目录下的
2.配置pgbouncer
2.1 后端数据库创建业务用户
# 后端主库操作
[postgres@postgres-01 data]$ psql
psql (15.5)
Type "help" for help.postgres=#
create database erpdb;
CREATE USER erpuser WITH PASSWORD 'Erp@123';
ALTER USER erpuser WITH LOGIN;
GRANT CONNECT ON DATABASE erpdb TO erpuser;
\c erpdb
GRANT USAGE ON SCHEMA public TO erpuser;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO erpuser;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO erpuser;# 所有数据库配置pg_hba.conf,追加以下内容
[postgres@repmgr01 ~]$ vim /data/pgsql/data/pg_hba.conf
host erpdb erpuser 127.0.0.1/32 scram-sha-256
host erpdb erpuser 10.0.0.0/24 scram-sha-256# 所有数据库,重新加载配置
[postgres@postgres-01 data]$ psql
psql (15.5)
Type "help" for help.postgres=# select pg_reload_conf();
2.2 编辑pgbouncer配置文件
用于初次部署集群时,启动pgbouncer服务
# 所有节点postgres用户执行
vim /data/pgbouncer/pgbouncer.ini
[databases]
# "postgres=host=localhost..."中的"postgres"表示外部用户连接PgBouncer时的数据库名称
# 这个数据库名称与后端的实际数据库名称可以不同
# PgBouncer → PostgreSQL:使用此处配置的 user/password
erpdb = host=10.0.0.41 port=5432 dbname=erpdb user=erpuser password=Erp@123
[pgbouncer]
admin_users = admin
logfile = /data/pgbouncer/pgbouncer.log
pidfile = /data/pgbouncer/pgbouncer.pid
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /data/pgbouncer/userlist.txt
pool_mode = session
# 每个(数据库+用户)组合的后端连接数
# 例如:用户A访问DB1和用户B访问DB1会有不同的连接池
default_pool_size = 20
# 最多允许用户建多少个连接到PgBouncer,示例计算:
# 有3个应用用户访问2个数据库 → 6个用户数据库组合,设 default_pool_size=20
# max_client_conn ≈ 1.2 × (20 × 6) = 144 (可设为150)
max_client_conn = 150
server_idle_timeout = 600
2.3 编辑pgbouncer配置模板
此模板是为了在故障转移时生成pgbouncer配置文件的[pgbouncer]部分
# 所有节点postgres用户执行
[postgres@repmgr01 ~]$ mkdir /data/pgbouncer
[postgres@repmgr01 ~]$ vim /data/pgbouncer/pgbouncer.template
[pgbouncer]
admin_users = admin
logfile = /data/pgbouncer/pgbouncer.log
pidfile = /data/pgbouncer/pgbouncer.pid
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /data/pgbouncer/userlist.txt
pool_mode = session
# 最多允许用户建多少个连接到PgBouncer,示例计算:
# 有3个应用用户访问2个数据库 → 6个用户数据库组合,设 default_pool_size=20
# max_client_conn ≈ 1.2 × (20 × 6) = 144 (可设为150)
max_client_conn = 100
# 每个(数据库+用户)组合的后端连接数
# 例如:用户A访问DB1和用户B访问DB1会有不同的连接池
default_pool_size = 20
server_idle_timeout = 600
2.4 配置pgbouncer认证文件
# 所有节点编辑认证文件,admin用户是pgbouncer的管理用户
vim /data/pgbouncer/userlist.txt
"erpuser" "Erp@123"
"admin" "Admin@123"
2.5 启动PgBouncer
#所有节点
# 启动PgBouncer
# "-d" 表示 "daemon" ,也就是让PgBouncer以后台的方式运行
pgbouncer -d /data/pgbouncer/pgbouncer.ini# 验证是否可以通过pgbouncer登录后端pg数据库
[postgres@repmgr01 ~]$ PGPASSWORD="Erp@123" psql -h 10.0.0.41 -p 6432 -U erpuser -d erpdb
psql (15.5)
Type "help" for help.erpdb=> SELECT inet_server_addr() AS backend_host,inet_server_port() AS backend_port,current_database(),current_user;
3.编辑repmgr触发脚本
#所有节点
vim /data/repmgr/promte_standby_pgbouncer.sh
#!/usr/bin/env bash
set -u
set -o xtrace
#pgbouncer服务的主机列表
PGBOUNCER_HOSTS="10.0.0.41 10.0.0.42 10.0.0.43"
#pgbouncer服务的配置文件位置
PGBOUNCER_DATABASE_INI="/data/pgbouncer/pgbouncer.ini"
#pgbouncer连接的后端数据库的别名
PGBOUNCER_DATABASE="erpdb"
#pgbouncer服务的管理库
PGBOUNCER_DATABASE_ADMIN_DB="pgbouncer"
#pgbouncer服务的管理用户
PGBOUNCER_DATABASE_USER="admin"
PGBOUNCER_DATABASE_PASSWORD="Admin@123"
#pgbouncer服务端口
PGBOUNCER_PORT=6432
#后端postgresql的端口
PORT=5432
#pgbouncer连接的后端数据库名
DBNAME="erpdb"
PG_HOME=/usr/local/pgsql
HOSTNAME=`hostname -i`
REPMGR_DB="repmgr"
REPMGR_USER="repmgr"
REPMGR_PASSWD="repmgr"
REPMGR_CONF="/data/repmgr/repmgr.conf"
STEP1="Promote ${HOSTNAME} from standby to primary"
STEP2="Recreate the pgbouncer config file on node ${HOSTNAME}"
STEP3="Resync the pgbouncer config file"
STEP4="Reload the pgbouncer config file"
PGBOUNCER_DATABASE_INI_NEW="/tmp/pgbouncer.ini"
PGBOUNCER_DATABASE_INI_TEMPLATE='/data/pgbouncer/pgbouncer.template'# STEP1. Promote this node from standby to primary
${PG_HOME}/bin/repmgr standby promote -f ${REPMGR_CONF} --log-to-file
if [ $? -ne 0 ]; thenecho promte_standby_pgbounce.sh: ${STEP1} on ${HOSTNAME} failed !!! exit 1
fi
#流复制的标志,f表示主库,t表示从库
standby_flg=`PGPASSWORD=${REPMGR_PASSWD} ${PG_HOME}/bin/psql -p ${PORT} -U ${REPMGR_USER} -h localhost -At -c "SELECT pg_is_in_recovery();"`
if [ ${standby_flg} == 'f' ]; thenecho promte_standby_pgbounce.sh: ${STEP1} on ${HOSTNAME} successfully !!!
elif [ ${standby_flg} == 't' ]; thenecho promte_standby_pgbounce.sh: ${STEP1} on ${HOSTNAME} failed !!!exit 1
fi
# STEP2. Reconfigure pgbouncer instances
for HOST in $PGBOUNCER_HOSTS
do# Recreate the pgbouncer config file# 生成pgbouncer配置文件中关于[databases]标签下的内容echo -e "[databases]\n" > $PGBOUNCER_DATABASE_INI_NEW#生成pgbouncer配置文件中关于后端数据库的conninfoPGPASSWORD=${REPMGR_PASSWD} ${PG_HOME}/bin/psql -p ${PORT} -U ${REPMGR_USER} -h localhost -At -c "SELECT '${PGBOUNCER_DATABASE} = '|| split_part(conninfo,' ',1) ||' port=${PORT}'||' dbname=${DBNAME} ' ||' application_name=pgbouncer_${HOST}' FROM repmgr.nodes WHERE active = TRUE AND type='primary'" >> $PGBOUNCER_DATABASE_INI_NEW# 生成pgbouncer配置文件中[pgbouncer]标签下的内容cat $PGBOUNCER_DATABASE_INI_TEMPLATE >> $PGBOUNCER_DATABASE_INI_NEW echo promte_standby_pgbounce.sh: ${STEP2} on ${HOSTNAME} successfully !!!# STEP3. Resync the pgbouncer config filersync $PGBOUNCER_DATABASE_INI_NEW $HOST:$PGBOUNCER_DATABASE_INIif [ $? -ne 0 ]; thenecho promte_standby_pgbounce.sh: ${STEP3} on ${HOSTNAME} failed !!!elseecho promte_standby_pgbounce.sh: ${STEP3} on ${HOSTNAME} successfully !!! fi# STEP4. Reload the pgbouncer config file PGPASSWORD=${PGBOUNCER_DATABASE_PASSWORD} ${PG_HOME}/bin/psql -tc "reload" -h $HOST -p $PGBOUNCER_PORT -d ${PGBOUNCER_DATABASE_ADMIN_DB} -U ${PGBOUNCER_DATABASE_USER}if [ $? -ne 0 ]; thenecho promte_standby_pgbounce.sh: ${STEP4} on ${HOSTNAME} failed !!! elseecho promte_standby_pgbounce.sh: ${STEP4} on ${HOSTNAME} successfully !!! fi
done# Clean up generated file
rm -rf $PGBOUNCER_DATABASE_INI_NEW
echo "Reconfiguration of pgbouncer complete"
# 授予脚本可执行权限
chmod +x /data/repmgr/promte_standby_pgbouncer.sh
4.修改repmgr配置文件
# 修改promote_command参数,执行我们新建的脚本
vim /data/repmgr/repmgr.conf
promote_command='/data/repmgr/promte_standby_pgbouncer.sh >> /data/repmgr/repmgrd.log'
5.重新启动repmgrd守护进程
kill $(pgrep -f repmgrd)
repmgrd -f /data/repmgr/repmgr.conf --daemonize
6.验证failover
查看当前集群状态
[postgres@repmgr01 ~]$ repmgr -f /data/repmgr/repmgr.conf cluster show
查看pgbouncer指向的后端数据库
[postgres@repmgr01 ~]$ PGPASSWORD="Erp@123" psql -p 6432 -U erpuser -h 127.0.0.1 -d erpdb
psql (15.5)
Type "help" for help.erpdb=> SELECT inet_server_addr() AS backend_host,inet_server_port() AS backend_port,current_database(),current_user;
手动关闭主节点
[postgres@postgres-01 data]$ pg_ctl stop -D $PGDATA
waiting for server to shut down.... done
server stopped
查看关闭主节点后各节点日志
[postgres@repmgr01 ~]$ tail -f /data/repmgr/repmgrd.log
repmgr01节点失去连接
repmgr02节点,在该节点上执行了触发脚本,成功被提升为主库,并且node-3节点作为从节点
repmgr03节点,作为repmgr02节点的STANDBY成功连接
再次查看pgbouncer指向的后端数据库
[postgres@postgres-01 data]$ PGPASSWORD="Erp@123" psql -p 6432 -U erpuser -h 127.0.0.1 -d erpdb
psql (15.5)
Type "help" for help.erpdb=> SELECT inet_server_addr() AS backend_host,inet_server_port() AS backend_port,current_database(),current_user;
所有节点的pgbouncer配置文件已被修改为指向新的主节点
[postgres@repmgr01 data]$ cat /data/pgbouncer/pgbouncer.ini | grep "erpdb"
erpdb = host=10.0.0.42 port=5432 dbname=erpdb application_name=pgbouncer_10.0.0.41[postgres@repmgr02 data]$ cat /data/pgbouncer/pgbouncer.ini | grep "erpdb"
erpdb = host=10.0.0.42 port=5432 dbname=erpdb application_name=pgbouncer_10.0.0.42[postgres@repmgr03 data]$ cat /data/pgbouncer/pgbouncer.ini | grep "erpdb"
erpdb = host=10.0.0.42 port=5432 dbname=erpdb application_name=pgbouncer_10.0.0.43