keepalived + nginx 初步实现高可用

0x01 关于 keepalived

1
2
早期是专为 `LVS` 设计的,主要用来监控LVS集群中各个节点状态
内部基于 `VRRP协议` 实现,即`虚拟路由冗余协议`,从名字不难看出,协议本身是用于保证实现路由节点高可用的

0x02 所谓的 VRRP 协议

1
2
3
4
5
6
7
简单来讲,即将N台提供相同功能的路由器组成一个路由器组,在这个组里有一个master和多个backup
一般情况下,master是由选举算法产生的,另外需要注意的是,只有在 master 上才有一个用于对外提供服务的虚拟ip
其它的backup都是没有的,当master在对外提供服务时,其它的backup又在干什么呢
很简单,当master在对外提供服务时,它同时也在不停的向所有的backup发送VRRP状态信息 `说白点儿就是心跳包`
告诉所有backup们,说,'我还没累死,你们先歇着,等我挂了,你们再上',然后,所有的backup就会一直在那儿闲着不停地接收这样的状态信息
当某一时刻,backup突然没再接到这样的状态回应时,就说明master已经光荣牺牲了
所有的backup会再重新用选举算法,把优先级最高的backup升级为master继续对外提供服务,以此保证了服务的持续可用性,即所谓的高可用

0x03 借助 keepalived 在web上的高可用实现

1
2
3
首先,在所有需要进行高可用的web节点机器上部署好keepalived,并在节点中设置一个master,其它的则全部设为backup
一旦backup接收不到来自master的心跳数据,即认为master已挂掉,backup随即就会接管master的所有资源数据
master状态恢复时,backup会把所有的资源数据再移交给master处理,此,即为最简单的web高可用实现

演示环境

1
2
3
4
Lvs ip: 192.168.3.75 虚拟ip: 192.168.3.2 对应域名: reverse.orgkeepalivedMASTER节点
NginxHttp ip: 192.168.3.49 对应域名: reverse.orgkeepalivedBACKUP节点
OldLamp ip: 192.168.3.45 对应域名: www.bwapp.cc
OldLnmp ip: 192.168.3.42 对应域名: test.bwapp.org

0x04 务必保证 Lvs 和 NginxHttp 两台机器上的nginx配置是完全一致的,且nginx服务已处于成功启动状态,具体如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# cat /usr/local/nginx/conf/nginx.conf
worker_processes 1;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
upstream default_pools{
server test.bwapp.org:80 weight=2;
server www.bwapp.cc:80;
}
upstream static_pools {
server test.bwapp.org:80;
}
upstream dynamic_pools {
server www.bwapp.cc:80;
}
server {
listen 80;
server_name reverse.org;
location / {
proxy_pass http://default_pools;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
}
location /static/ {
proxy_pass http://static_pools;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
}
location /dynamic/ {
proxy_pass http://dynamic_pools;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
}
}
}

0x05 首先,同时在 Lvs 和 NginxHttp 这两台机器上安装keepalived并进行启动测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# yum install openssl openssl-devel -y 在装nginx其实已经装过
# wget http://www.keepalived.org/software/keepalived-1.3.2.tar.gz
# tar xf keepalived-1.3.2.tar.gz
# cd keepalived-1.3.2
# ln -s /usr/src/kernels/2.6.32-642.el6.x86_64/ /usr/src/linux
# ll /usr/src/linux
# ./configure && make && make install
# cp /root/keepalived-1.3.2/keepalived/etc/init.d/keepalived /etc/init.d/
# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
# mkdir /etc/keepalived
# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/
# cp /usr/local/sbin/keepalived /usr/sbin/
# /etc/init.d/keepalived start
# ps -ef | grep keepalived 看到同时有三个keepalived进程起来,则说明安装成功
# /etc/init.d/keepalived stop 之后再停掉keepalived,继续后面的配置

0x06 接着,再来配置master节点的 keepalived.conf,注意,里面只需保留如下配置,其余默认的配置可暂时先全部删掉

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# man keepalived.conf 查看keepalived.conf的详细配置帮助
# cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
# vi /etc/keepalived/keepalived.conf
global_defs { # 全局配置
notification_email { # 报警通知联系人,该段可直接注释不用
klionsec@rootkit.org
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_01 # 类似mysql的server id
vrrp_skip_check_adv_addr
! vrrp_strict # 开启表示严格执行VRRP协议规范,此模式不支持节点单播,默认是开启的,建议关闭,否则你会发现绑不上虚拟ip
vrrp_garp_interval 0
vrrp_gna_interval 0
}
# keepalived的一个实例配置
vrrp_instance VI_1 { # 设置实例名称
state MASTER # 当前实例角色,如,MASTER,BACKUP
interface eth0 # 发送心跳的网卡接口
virtual_router_id 51 # 该实例id,MASTER和BACKUP端要保持一致,否则会出现裂脑
priority 150 # 优先级设置,高的为MASTER,建议节点之间优先级步长为50
advert_int 1 # 心跳包发送时间间隔,默认为1秒
authentication { # MASTER和BACKUP通信验证
auth_type PASS # 使用PASS方式
auth_pass 1111 # 默认密码为1111
}
virtual_ipaddress { # 绑定虚拟ip,相当于 `ip addr add 192.168.3.2/24 dev eth0`的效果
192.168.3.2/24
}
}
# scp /etc/keepalived/keepalived.conf root@192.168.3.49:/etc/keepalived/keepalived.conf
# /etc/init.d/keepalived start
# ip addr | grep "192.168.3.2/24" 先等一会儿,看下虚拟ip到底有没有加上

0x07 最后,配置backup节点的keepalived.conf,除了server id,实例角色和优先级选项,其余配置和master中的保持一致即可,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# vi /etc/keepalived/keepalived.conf
global_defs {
notification_email {
klionsec@rootkit.org
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_02 # 务必保证id的唯一性,类似mysql的server id
vrrp_skip_check_adv_addr
! vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 { # 两边使用同一个实例
state BACKUP # 角色要改
interface eth0
virtual_router_id 51
priority 100 # 优先级要小50
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.3.2/24
}
}
# /etc/init.d/keepalived start
# ip addr | grep "192.168.3.2/24" 这个ip只有在master挂掉的时候才会有

下面是自动飘ip的实际效果,此时再配合着回想VRRP协议是不是更好理解些呢

下面则是模拟keepalived配合nginx 实现web高可用的效果,当MASTER机器不幸down掉时,BACKUP几乎会瞬间接管,以此来保证web服务的持续可用,具体如下

注意,keepalived默认只对系统级别的宕机才会主动接管,对于各类常规服务它是不会自动接管的,解决办法很简单,写个脚本,手动把它俩关联上就好了,此处的所有脚本仅做demo参考,有兴趣请自行加强,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# vi keepalived_nginx_check.sh
# chmod +x keepalived_nginx_check.sh
#!/bin/bash
while true
do
NginxPid=`ps -C nginx --no-header |wc -l`
if [ $NginxPid -eq 2 ]
then # 检查worker数
/etc/init.d/keepalived stop &>/dev/null # 如果发现worker对不上,就自动让另一台backup去接管
fi
sleep 1 # 可以时间隔短点,不然,down半天了,那边还没反应
done
# ./keepalived_nginx_check.sh &
# ps -ef | grep "keepalived_nginx_check"

0x08 高可用的裂脑问题,所谓的裂脑,即只要backup收不到master发过来的状态信息就以为master挂掉了,但实际上master并没有挂,只是由于一些别的原因导致master和backup之间没法正常的通信,而backup却直接粗暴的理解成master挂掉了,随后就直接就接管资源,这也就导致了裂脑问题的发生,下面是一个检测裂脑的小脚本,原理比较简单,如果我发现master能ping通,但backup本地又有虚拟ip存在,则说明已经裂脑了,用法很简单,直接在backup节点上后台运行该脚本即可

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash
# check keepalived status
while true
do
addr=`ip addr | grep "192.168.3.2/24" | wc -l`
ping -c 5 -W 3 192.168.3.75 &>/dev/null
if [ $? -eq 0 -a $addr -eq 1 ] ;then
echo "ha is brain."
else
echo "ha is ok"
fi
sleep 2
done

0x09 keepalived默认的日志在/var/log/messages文件中,不利于分析,所以,实际中最好改下keepalived的默认日志路径,方便后续排查问题,如下

1
2
3
4
5
6
7
# vi /etc/sysconfig/keepalived
KEEPALIVED_OPTIONS="-D -S 0 -d"
# vi /etc/rsyslog.conf
local0.* /var/log/keepalived.log
# /etc/init.d/rsyslog restart
# /etc/init.d/keepalived restart
# tail -f /var/log/keepalived.log



后话:
    篇幅限制,这里仅仅只做了nginx的高可用,关于和LVS的配合应用我们会再单做说明,至于其它的一些基础服务高可用部署方式基本都大同小异,工具比较简单,此处不再赘述,有空的话,大家倒是可以去深入了解下 VRRP ^_^