概述 使用 postgresql + etcd + patroni + haproxy + keepalived 可以实现 PG 的高可用集群,其中,以 postgresql 做数据库,Patroni 监控本地的 PostgreSQL 状态,并将本地 PostgreSQL 信息 / 状态写入 etcd 来存储集群状态,所以,patroni 与 etcd 结合可以实现数据库集群故障切换(自动或手动切换),而 haproxy 可以实现数据库读写分离 + 读负载均衡(通过不同端口实现),keepalived 实现 VIP 跳转,对 haproxy 提供了高可用,防止 haproxy 宕机。
Patroni 介绍Patroni 是一个基于 Python 的用于实现 PostgreSQL HA 解决方案的框架。为了最大程度的兼容性,它支持多种分布式配置存储,包括 ZooKeeper、etcd、Consul 或 Kubernetes。旨在帮助数据库工程师、DBA、DevOps 工程师和 SRE 快速部署数据中心(或任何地方)的 HA PostgreSQL 环境。
当前支持的 PostgreSQL 版本从 9.3 到 16。支持自动化故障转移、物理复制和逻辑复制、提供 RESTful API 接口,允许外部应用或运维工具直接操作 PostgreSQL 集群,进行如启停、迁移等操作,与 Linux watchdog 集成,以避免脑裂现象。
项目地址: GitHub - patroni/patroni: A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
ETCD 介绍etcd 是一个分布式键值存储数据库,支持跨平台,拥有强大的社区。etcd 的 Raft 算法,提供了可靠的方式存储分布式集群涉及的数据。etcd 广泛应用在微服务架构和 Kubernates 集群中,不仅可以作为服务注册与发现,还可以作为键值对存储的中间件。从业务系统 Web 到 Kubernetes 集群,都可以很方便地从 etcd 中读取、写入数据。 etcd 完整的 cluster(集群)至少有三台,这样才能选举出一个 master 节点,两个 slave 节点。如果小于 3 台则无法进行选举,造成集群不可用。Etcd 使用 2379 和 2380 端口。 2379 端口:提供 HTTP API 服务,和 etcdctl 交互 2380 端口:集群中节点间通讯 项目地址:[[GitHub - etcd-io/etcd: Distributed reliable key-value store for the most critical data of a distributed system ]]
环境说明 服务器信息 服务器名 ip 地址 os 数据库读写端口 只读端口 组件 组件端口 k8s-mater01 192.168.28.11 ubuntu 24.10 15433 25433 PostgreSQL,Patroni、Etcd,haproxy、keepalived 8008 k8s-mater02 192.168.28.12 ubuntu 24.10 15433 25433 PostgreSQL,Patroni、Etcd,haproxy、keepalived 8008 k8s-mater01 192.168.28.13 ubuntu 24.10 15433 25433 PostgreSQL,Patroni、Etcd,haproxy、keepalived 8008 vip 192.168.28.10 PostgreSQL,Patroni、Etcd,haproxy、keepalived 8008
软件信息 软件名 版本 Patroni 4.0.6 Etcd 3.5.16 Keepalived 2.3.1 Haproxy 2.9.10-1ubuntu1.2 PostgreSQL 16.9 watchdog 5.16 python 3.12.7
架构图这个架构中,PostgreSQL 提供数据服务,Patroni 负责主从切换,etcd 提供一致性存储,HAProxy 提供访问路由,Keepalived 提供网络 VIP 高可用,Watchdog 提供节点存活及脑裂防护机制。 六者协同组成一个企业级高可用数据库集群
预先准备 网络设置 防火墙设置关闭防火墙 (相对简单,但是不安全) 放行对应的端口(安全)
安装 PostgreSQL 通过 apt 安装 PostgreSQL (所有节点)sudo apt install curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
. /etc/os-release
sudo sh -c "echo 'deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] https://apt.postgresql.org/pub/repos/apt $VERSION_CODENAME -pgdg main' > /etc/apt/sources.list.d/pgdg.list" sudo apt update
sudo apt -y install postgresql
数据库目录设置 (所有节点)mkdir -p /data/postgresql/pgdata/
mkdir -p /data/postgresql/pg_archive/
chown -R postgres:postgres /data/postgresql/
postgres 用户设置 设置家目录mkdir -p /home/postgres/
chown -R postgres: postgres /home/postgres/
vim /etc/passwd
postgres: x: 114 : 113 :PostgreSQL administrator,,,:/home/postgres :/bin/bash 设置环境变量vim /home/postgres/.bashrc
[ -f /etc/profile ] && source /etc/profile
export PATH=/usr/lib/postgresql/16/bin:$PATH
[ -f /var/lib/pgsql/.pgsql_profile ] && source /var/lib/pgsql/.pgsql_profile
设置 sudo 免密vim / etc/ sudoers
#行末新增
postgres ALL = (ALL ) NOPASSWD: ALL 安装 ETCD 集群 (所有节点) 安装 ectd 下载 etcdwget https://github.com/etcd-io/etcd/releases/download/v3.6.4/etcd-v3.6.4-linux-amd64.tar.gz
tar -xf etcd-v3.6.4-linux-amd64.tar.gz --strip-components=1 -C /usr/local/bin etcd-v3.6.4-linux-amd64/etcd etcd-v3.6.4-linux-amd64/etcdctl
mkdir -p /etc/etcd/
mkdir -p /data/etcd/
编辑 etcd 配置文件touch /etc/etcd/etcd-pg.config.yml
#节点1配置文件
vim /etc/etcd/etcd-pg.config.yml
#节点名
name: pg-etcd01
#数据目录
data-dir: /data/etcd
snapshot-count: 5000
#选举和心跳参数
heartbeat-interval: 100
election-timeout: 1000
#存储新能优化
quota-backend-bytes: 8589934592
max-request-bytes: 10485760
max-concurrent-requests: 5000
#自动压缩与碎片整理
auto-compaction-mode: periodic
auto-compaction-retention: "2h"
#集群通信配置
listen-peer-urls: "http://192.168.28.11:12380"
listen-client-urls: "http://192.168.28.11:12379,http://127.0.0.1:12379"
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: "http://192.168.28.11:12380"
advertise-client-urls: "http://192.168.28.11:12379"
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: "pg-etcd01=http://k8s-etcd01:12380,pg-etcd02=http://k8s-etcd02:12380,pg-etcd03=http://k8s-etcd03:12380"
initial-cluster-token: 'etcd-cluster-pg'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
#节点2配置文件
name: pg-etcd02
data-dir: /data/etcd
snapshot-count: 5000
#选举和心跳参数
heartbeat-interval: 100
election-timeout: 1000
#存储新能优化
quota-backend-bytes: 8589934592
max-request-bytes: 10485760
max-concurrent-requests: 5000
#自动压缩与碎片整理
auto-compaction-mode: periodic
auto-compaction-retention: "2h"
#集群通信配置
listen-peer-urls: "http://192.168.28.12:12380"
listen-client-urls: "http://192.168.28.12:12379,http://127.0.0.1:12379"
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: "http://192.168.28.12:12380"
advertise-client-urls: "http://192.168.28.12:12379"
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: "pg-etcd01=http://k8s-etcd01:12380,pg-etcd02=http://k8s-etcd02:12380,pg-etcd03=http://k8s-etcd03:12380"
initial-cluster-token: 'etcd-cluster-pg'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
#节点3配置文件
name: pg-etcd03
data-dir: /data/etcd
snapshot-count: 5000
#选举和心跳参数
heartbeat-interval: 100
election-timeout: 1000
#存储新能优化
quota-backend-bytes: 8589934592
max-request-bytes: 10485760
max-concurrent-requests: 5000
#自动压缩与碎片整理
auto-compaction-mode: periodic
auto-compaction-retention: "2h"
#集群通信配置
listen-peer-urls: "http://192.168.28.13:12380"
listen-client-urls: "http://192.168.28.13:12379,http://127.0.0.1:12379"
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: "http://192.168.28.13:12380"
advertise-client-urls: "http://192.168.28.13:12379"
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: "pg-etcd01=http://k8s-etcd01:12380,pg-etcd02=http://k8s-etcd02:12380,pg-etcd03=http://k8s-etcd03:12380"
initial-cluster-token: 'etcd-cluster-pg'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
创建 etcd 服务vim /etc/systemd/system/etcd-pg.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd-pg.config.yml
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
启动 etcd-pg 服务
systemctl daemon-reload
systemctl start etcd-pg.service
systemctl enable etcd-pg.service
检查 etcd 集群健康状态root@k8s -master01:~
+------------------+---------+-----------+-------------------------+----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-----------+-------------------------+----------------------------+------------+
| 2bb79737c88dd84d | started | pg-etcd03 | http:// k8s-etcd03:12380 | http:// 192.168 .28 .13 :12379 | false |
| 354b7a6aa8551f4a | started | pg-etcd02 | http:// k8s-etcd02:12380 | http:// 192.168 .28 .12 :12379 | false |
| 84101e54de967367 | started | pg-etcd01 | http:// k8s-etcd01:12380 | http:// 192.168 .28 .11 :12379 为了简化命令,可以通过 alisa 配置
cd ~
vim .profile
alias etcdctlpg="etcdctl --endpoints=" k8s-etcd01:12379,k8s-etcd02:12379,k8s-etcd03:12379" " source .profile
root@k8s-master01:~# etcdctlpg member list -w=table
+------------------+---------+-----------+-------------------------+----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-----------+-------------------------+----------------------------+------------+
| 2bb79737c88dd84d | started | pg-etcd03 | http://k8s-etcd03:12380 | http://192.168.28.13:12379 | false |
| 354b7a6aa8551f4a | started | pg-etcd02 | http://k8s-etcd02:12380 | http://192.168.28.12:12379 | false |
| 84101e54de967367 | started | pg-etcd01 | http://k8s-etcd01:12380 | http://192.168.28.11:12379 | false |
Etcd 可视化工具 A powerful ui client for ETCD v3. Provides desktop application and web packages.
安装 watchdog (所有节点)watchdog 防止脑裂。Patroni 支持通过 Linux 的 watchdog 监视 patroni 进程的运行,当 patroni 进程无法正常往 watchdog 设备写入心跳时,由 watchdog 触发 Linux 重启。
sudo apt install -y watchdog
sudo modprobe softdog
sudo chmod 666 /dev/watchdog
sudo chown postgres:postgres /dev/watchdog
sudo systemctl start watchdog
sudo systemctl enable watchdog
安装 Patroni (所有节点) 安装1 . pip3 install --break -system -packages psycopg2-binary
2 . pip3 install --break -system -packages patroni[etcd]
3 . pip3 install --break -system -packages python-json-logger
4 . mkdir -p /etc/patroni
创建配置文件 创建 Patroni 服务cat /etc/systemd/system/patroni-5433.service
[Unit]
Description=Patroni high-availability PostgreSQL
After=syslog.target network.target etcd.service
Requires=etcd-pg.service
[Service]
Type=simple
User=postgres
Group=postgres
PermissionsStartOnly=true
WorkingDirectory=/home/postgres/
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni-5433.yaml
ExecReload=/bin/kill -HUP
KillMode=process
Restart=always
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
配置免密登录su postgres
cd ~
touch .pgpass
vim .pgpass
192.168 .28.11 : 5433 :* :replica :replica 192.168 .28.12 : 5433 :* :replica :replica 192.168 .28.13 : 5433 :* :replica :replica 创建 日志目录sudo mkdir -p /var/log/patroni/
sudo chown -R postgres:postgres /var/log/patroni/
启动 Patroni根据节点依次启动 Patroni
sudo systemctl daemon-reload
sudo systemctl restart patroni-5433.service
sudo systemctl enable patroni-5433.service
sudo systemctl status patroni-5433.service
查看服务状态,默认情况下,根据配置文件中,initdb 内容,Patroni 会自动对数据库进行初始化操作,并创建用户,配置文件,拉起数据库并建立主从关系及流复制
查看状态root@k8s - master01:~ # patronictl - c / etc/ patroni/ patroni-5433. yaml list
+ Cluster: pg_patroni_etcd () | Member | Host | Role | State | TL | Lag in MB | + | pg_patroni_5433_01 | 192.168 .28 .11 :5433 | Leader | running | 17 | | | pg_patroni_5433_02 | 192.168 .28 .12 :5433 | Replica | streaming | 17 | 0 | | pg_patroni_5433_03 | 192.168 .28 .13 :5433 | Replica | streaming | 17 | 0 | + 通过 alisa 设置简易命令
alias patr5433="patronictl -c /etc/patroni/patroni-5433.yaml" patroni 维护命令 (所有节点) 列出节点信息patronictl -c /etc/patroni/patroni-5433.yaml list
重做备库reinit 先是移除了整个 data 目录。然后选择正确的节点进行备份恢复。
patronictl -c /etc/patroni/patroni-5433.yaml reinit [nodename]
查看配置patronictl -c /etc/patroni/patroni-5433.yaml show-config
更改参数patronictl -c /etc/patroni/patroni-5433.yaml edit-config
patronictl -c /etc/patroni/patroni-5433.yaml reload [nodename]
重启节点 / 关闭节点仅重启当前节点 patronictl -c /etc/patroni/patroni-5433 .yaml restart [clustername] [nodename] 如果节点是 pending 状态的,才会执行重启操作 patronictl -c /etc/patroni/patroni-5433 .yaml restart [clustername] --pending 重启所有成员 patronictl -c /etc/patroni/patroni-5433.yaml restart [clustername]
维护模式 脱离 patroni 的集群管理 patronictl pausepatronictl pause 暂时将 Patroni 集群置于维护模式并禁用自动 在某些情况下,Patroni 需要暂时退出集群管理,同时仍然在 DCS 中保留集群状态。可能的用例是集群上不常见的活动,例如主要版本升级或损坏恢复。在这些活动期间,节点经常因为 Patroni 不知道的原因而启动和停止,有些节点甚至可以暂时提升,这违反了只运行一个主节点的假设。因此,Patroni 需要能够与正在运行的集群 “分离”,在 Pacemaker 中实现与维护模式相当的功能。
patronictl resumepatronictl resume 将使 Patroni 集群退出维护模式,并重新启用自动故障转移。 自动拉起所有数据库
switchover 主备切换 patronictl switchover# Switchover
root@k8s-master01:~# patr5433 switchover
Current cluster topology
+ Cluster: pg_patroni_etcd (7540584003720074567) ---+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------------------+--------------------+---------+-----------+----+-----------+
| pg_patroni_5433_01 | 192.168.28.11:5433 | Leader | running | 22 | |
| pg_patroni_5433_02 | 192.168.28.12:5433 | Replica | streaming | 22 | 0 |
| pg_patroni_5433_03 | 192.168.28.13:5433 | Replica | streaming | 22 | 0 |
+--------------------+--------------------+---------+-----------+----+-----------+
Primary [pg_patroni_5433_01]:
Candidate ['pg_patroni_5433_02', 'pg_patroni_5433_03'] []: pg_patroni_5433_02
When should the switchover take place (e.g. 2025-08-26T12:26 ) [now]: now
Are you sure you want to switchover cluster pg_patroni_etcd, demoting current leader pg_patroni_5433_01? [y/N]: y
2025-08-26 11:26:32.31189 Successfully switched over to "pg_patroni_5433_02"
+ Cluster: pg_patroni_etcd (7540584003720074567) ---+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------------------+--------------------+---------+---------+----+-----------+
| pg_patroni_5433_01 | 192.168.28.11:5433 | Replica | stopped | | unknown |
| pg_patroni_5433_02 | 192.168.28.12:5433 | Leader | running | 22 | |
| pg_patroni_5433_03 | 192.168.28.13:5433 | Replica | running | 22 | 0 |
+--------------------+--------------------+---------+---------+----+-----------+
root@k8s-master01:~# patr5433 list
+ Cluster: pg_patroni_etcd (7540584003720074567) ---+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------------------+--------------------+---------+-----------+----+-----------+
| pg_patroni_5433_01 | 192.168.28.11:5433 | Replica | streaming | 23 | 0 |
| pg_patroni_5433_02 | 192.168.28.12:5433 | Leader | running | 23 | |
| pg_patroni_5433_03 | 192.168.28.13:5433 | Replica | streaming | 23 | 0 |
+--------------------+--------------------+---------+-----------+----+-----------+
接口切换 数据库从 pg_patroni_5433_01 switchover 到 pg_patroni_5433_02
[root@pgtest1 ~]# curl -s http://192.168 .28.11 :8008 /switchover -XPOST -d '{"leader" :"pg_patroni_5433_01" ,"candidate" :"pg_patroni_5433_02" }'
Successfully switched over to "pg_patroni_5433_02" failover 切换patronictl failover
# Failover
[postgres@pgtest1 ~ ]$ patronictl - c / etc/ patroni/ patroni-5433. yaml failover
Candidate ['pg_patroni_5433_01' , 'pg_patroni_5433_02' ,'pg_patroni_5433_03' ] []: pg_patroni_5433_01
Current cluster topology
... ...
Are you sure you want to failover cluster pg_cluster, demoting current master pg_patroni_5433_02? [y/ N]: y
2021 -10 -28 03 :47 :56.13486 Successfully failed over to "pg_patroni_5433_01"
... ...
获取主节点 dsn 信息root@k8s -master01 : ~
host= 192.168 .28.12 port= 5433 安装 Haproxy (所有节点) 安装 Haproxysudo apt install haproxy
编辑 Haproxy 配置文件sudo vim /etc/haproxy/haproxy.cfg
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s
defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s
listen status_page
bind *:8888
stats enable
stats uri /haproxy-status
stats auth admin:admin
stats realm "Welcome to the haproxy load balancer status page of k8s-master"
frontend monitor-in
bind *:33305
mode http
option httplog
monitor-uri /monitor
# 主库读写端口
listen master
bind *:15433
mode tcp
option tcplog
balance roundrobin
option httpchk OPTIONS /master
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pgtest1 192.168.28.11:5433 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pgtest2 192.168.28.12:5433 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pgtest3 192.168.28.13:5433 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
#从库读端口
listen replicas
bind *:25433
mode tcp
option tcplog
balance roundrobin
option httpchk OPTIONS /replica
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pgtest1 192.168.28.11:5433 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pgtest2 192.168.28.12:5433 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pgtest3 192.168.28.13:5433 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
启动 haproxysudo systemctl enable haproxy
sudo systemctl start haproxy
HAProxy 监控页面登录地址:[[http://192.168.28.11:8888/haproxy-status ]] (也可以通过各个节点 IP + 端口登录)
默认用户密码:admin/admin
安装 Keepalived (所有节点) 安装sudo apt install -y keepalived
配置文件 主服务配置文件vim /etc/keepalived/keepalived.conf
global_defs {
router_ id LVS_DEVEL00
script_ user root
enable_script_ security
}
vrrp_script check_ haproxy {
script "/etc/keepalived/check_haproxy.sh" interval 2 weight 5 fall 3 rise 5 timeout 2
}
vrrp_instance VI_ 1 {
state Master interface ens18 virtual_router_id 80 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 12345 } virtual_ipaddress { 192.168.28.10/24 } track_script { check_haproxy }
}
备库节点 1 配置文件vim /etc/keepalived/keepalived.conf
global_defs {
router_ id LVS_DEVEL01
script_ user root
enable_script_ security
}
vrrp_script check_ haproxy {
script "/etc/keepalived/check_haproxy.sh" interval 2 weight 5 fall 3 rise 5 timeout 2
}
vrrp_instance VI_ 1 {
state BACKUP interface ens18 virtual_router_id 80 priority 90 advert_int 1 authentication { auth_type PASS auth_pass 12345 } virtual_ipaddress { 192.168.28.10/24
}
track_script {
check_ haproxy
}
}
备库节点 2 配置文件vim /etc/keepalived/keepalived.conf
global_defs {
router_ id LVS_DEVEL02
script_ user root
enable_script_ security
}
vrrp_script check_ haproxy {
script "/etc/keepalived/check_haproxy.sh" interval 2 weight 5 fall 3 rise 5 timeout 2
}
vrrp_instance VI_ 1 {
state BACKUP interface ens18 virtual_router_id 80 priority 80 advert_int 1 authentication { auth_type PASS auth_pass 12345 } virtual_ipaddress { 192.168.24.15/24
}
track_script {
check_ haproxy
}
}
检查脚本vim /etc/keepalived/check_haproxy.sh
count=`ps aux | grep -v grep | grep haproxy | wc -l`
if [ $count -eq 0 ]; then exit 1
else exit 0
fi 赋予执行权限
chmod +x /etc/keepalived/check_haproxy.sh
依次启动sudo systemctl start keepalived
sudo systemctl enable keepalived📌 转载信息
原作者:
xxdtb
转载时间:
2026/1/19 17:52:08