首页手记 MySQL高可用架构之MHA

MySQL高可用架构之MHA

标签：

MySQL

可以看这篇文章的理论部分，写的不错：http://www.educity.cn/wenda/399310.html

操作可参考：http://www.178linux.com/15698

1、关于MHA

MHA(Master HA)是一款开源的MySQL的高可用程序，它为MySQL主从复制架构提供了automating master failover功能。MHA在监控到master节点故障时，会提升其中拥有的最新数据的slave节点成为新的master节点，在此期间，MHA会通过其它从节点获取额外信息来避免一致性方面的问题。MHA还提供了master节点的在线切换功能，即按需切换master/slave节点。

MHA服务有两种角色，MHA Manager(管理节点)和MHA Node(数据节点)：

MHA Manager：通常单独部署在一台独立机器上管理多个master/slave集群，每个master/slave集群称为一个application；

MHA node：运行在每台MySQL服务器上，它通过监控具备解析和清理log功能的脚本来加快故障转移

2、MHA组件说明

Manager节点：

-masterha_check_ssh：MHA依赖的SSH环境检测工具；

-masterha_check_repl:MySQL复制环境检测工具；

-masterha_manager:MHA服务主程序；

-masterha_check_status:MHA运行状态探测工具；

-masterha_master_monitor：MySQL master节点可用性检测工具；

-masterha_master_switch:master节点切换工具；

-masterha_conf_host:添加或删除配置的节点；

-masterha_stop：关闭MHA服务的工具；

Node节点：

-save_binary_logs：保存和复制master的二进制日志；

-apply_diff_relay_logs:识别差异的中继日志事件并用于其他slave；

-fiter_mysqlbinlog：去除不必要的ROLLBACK事件(MHA已不再使用这个工具)；

-purge_relay_logs：清除中继日志(不会阻塞SQL线程)；

自定义扩展：

-secondary_check_script：通过多条网络路由检测master的可用性；

-master_ip_failover_script:更新appliction使用的masterip；

-shutdown_script:强制关闭master节点；

-report_script:发送报告；

-init_conf_load_script：加载初始配置参数；

-master_ip_online_change_script:更新master节点ip地址

3、部署及测试

实验拓扑：

node1:192.168.150.137 MHA manager

node2:192.168.150.138 MHA node mariadb master

node3:192.168.150.139 MHA node mariadb slave candidate

node4:192.168.150.140 MHA node mariadb slave

配置过程：

1、修改每台服务器的hosts文件

/etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.150.137 node1.com node1

192.168.150.138 node2.com node2

192.168.150.139 node3.com node3

192.168.150.140 node4.com node4

2、node2-node4进行mariadb的yum安装

yum -y install mariadb-server

3、配置ssh互信通信环境

[root@node1 ~]# ssh-keygen -t rsa -P ''

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Created directory '/root/.ssh'.

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

a2:f2:10:28:cd:ea:7b:d8:f4:95:15:6e:73:a6:9d:4e root@node1.com

The key's randomart image is:

+--[ RSA 2048]----+

| |

| . |

| . . |

| + = o |

|o + . S * . |

|.. o . + . E |

|. * o . o |

|.. * . . |

| oo . |

+-----------------+

[root@node1 ~]# ls .ssh/id_rsa

id_rsa id_rsa.pub

[root@node1 ~]# cat .ssh/id_rsa.pub > .ssh/authorized_keys

[root@node1 ~]# ssh node1

The authenticity of host 'node1 (192.168.150.137)' can't be established.

ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'node1,192.168.150.137' (ECDSA) to the list of known hosts.

Last login: Wed Mar 29 15:06:52 2017 from 192.168.150.1

[root@node1 ~]# ll .ssh/authorized_keys

-rw-r--r-- 1 root root 396 3月 29 15:35 .ssh/authorized_keys

[root@node1 ~]# chmod go= .ssh/authorized_keys

[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node2:/root/.ssh

root@node2's password:

id_rsa 100% 1675 1.6KB/s 00:00

authorized_keys 100% 396 0.4KB/s 00:00

[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node3:/root/.ssh

The authenticity of host 'node3 (192.168.150.139)' can't be established.

ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'node3,192.168.150.139' (ECDSA) to the list of known hosts.

root@node3's password:

id_rsa 100% 1675 1.6KB/s 00:00

authorized_keys 100% 396 0.4KB/s 00:00

[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node4:/root/.ssh

root@node4's password:

Permission denied, please try again.

root@node4's password:

id_rsa 100% 1675 1.6KB/s 00:00

authorized_keys 100% 396 0.4KB/s 00:00

[root@node1 ~]# ssh node2

Last login: Wed Mar 29 15:07:05 2017 from 192.168.150.1

[root@node2 ~]# exit

登出

Connection to node2 closed.

[root@node1 ~]# ssh node3

Last login: Wed Mar 29 15:07:18 2017 from 192.168.150.1

[root@node3 ~]# exit

登出

Connection to node3 closed.

[root@node1 ~]# ssh node3

Last login: Wed Mar 29 15:40:05 2017 from node1.com

[root@node3 ~]# exit

登出

Connection to node3 closed.

[root@node1 ~]# ssh node4

Last failed login: Wed Mar 29 15:39:53 CST 2017 from node1.com on ssh:notty

There was 1 failed login attempt since the last successful login.

Last login: Wed Mar 29 15:39:30 2017 from node1.com

[root@node4 ~]# exit

登出

Connection to node4 closed.

[root@node1 ~]# ssh node4

Last login: Wed Mar 29 15:40:13 2017 from node1.com

[root@node4 ~]# exit

登出

Connection to node4 closed.

4、修改mysql参数

master：

[mysqld]

innodb_file_per_table = 1

skip_name_resolve = 1

log-bin = master-bin

relay-log = relay-bin

server_id = 1

slave：

[mysqld]

innode_file_per_table = 1

skip_name_resolve = 1

log-bin = master-bin

relay-log = relay-bin

server_id = 2

read_only = 1

relay_log_purge = 0

5、主库开启并创建授权账号

master:

MariaDB [(none)]> SHOW MASTER STATUS;

+-------------------+----------+--------------+------------------+

+-------------------+----------+--------------+------------------+

| master-bin.000003 | 245 | | |

+-------------------+----------+--------------+------------------+

1 row in set (0.00 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED BY 'replpass';

MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass'; #此为mha的管理账号

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此时主库是可写可读

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | OFF |

+---------------+-------+

1 row in set (0.00 sec)

6、从库进行主从功能开启 node3、node4操作相同

[root@node3 ~]# mysql

Welcome to the MariaDB monitor. Commands end with ; or \g.

Your MariaDB connection id is 2

Server version: 5.5.52-MariaDB MariaDB Server

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.138',MASTER_USER='repluser',MSTER_PASSWORD='replpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245;

Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

Slave_IO_State:

Master_Host: 192.168.150.138

Master_User: repluser

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: master-bin.000003

Read_Master_Log_Pos: 245

Relay_Log_File: relay-bin.000001

Relay_Log_Pos: 4

Relay_Master_Log_File: master-bin.000003

Slave_IO_Running: No

Slave_SQL_Running: No

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 245

Relay_Log_Space: 245

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 0

1 row in set (0.00 sec)

MariaDB [(none)]> START SLAVE;

Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.150.138

Master_User: repluser

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: master-bin.000003

Read_Master_Log_Pos: 497

Relay_Log_File: relay-bin.000002

Relay_Log_Pos: 782

Relay_Master_Log_File: master-bin.000003

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 497

Relay_Log_Space: 1070

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 1

1 row in set (0.00 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此时从库是只读模式

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | ON |

+---------------+-------+

1 row in set (0.00 sec)

7、此时一主两从架构已经配置完成，安装MHA包

manager节点(node1)：

安装mha4mysql-manager-0.56-0.el6.noarch.rpm和mha4mysql-node-0.56-0.el6.noarch.rpm

yum install mha4mysql* -y

node节点(node2-node4):

安装mha4mysql-node-0.56-0.el6.noarch.rpm

yum -y iinstall mha4mysql-node-0.56-0.el6.noarch.rpm

8、初始化MHA

创建配置目录及配置文件(在node1上执行)

[root@node1 ~]# mkdir /etc/masterha

[root@node1 ~]# vim /etc/masterha/app1.cnf

[server default]

user=mhauser

password=mhapass

manager_workdir=/data/masterha/app1

master_log=/data/masterha/app1/manager.log

remote_workdir=/data/masterha/app1

ssh_user=root

repl_user=repluser

repl_password=replpass

ping_interval=1

[server1]

hostname=192.168.150.138

candidate_master=1

[server2]

hostname=192.168.150.139

candidate_master=1

[server3]

hostname=192.168.150.140

9、启动前检测

ssh互信配置是否OK

[root@node1 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf

......

Warning: Permanently added '192.168.150.139' (ECDSA) to the list of known hosts.

Wed Mar 29 17:03:03 2017 - [debug] ok.

Wed Mar 29 17:03:03 2017 - [info] All SSH connection tests passed successfully.

mysql复制集群的连接配置是否OK

[root@node1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf

......

Wed Mar 29 17:04:40 2017 - [info]

192.168.150.138(192.168.150.138:3306) (current master)

+--192.168.150.139(192.168.150.139:3306)

+--192.168.150.140(192.168.150.140:3306)

Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.139..

Wed Mar 29 17:04:40 2017 - [info] ok.

Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.140..

Wed Mar 29 17:04:40 2017 - [info] ok.

Wed Mar 29 17:04:40 2017 - [warning] master_ip_failover_script is not defined.

Wed Mar 29 17:04:40 2017 - [warning] shutdown_script is not defined.

Wed Mar 29 17:04:40 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

10、启动MHA

[root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&

1 &[1] 16989

[root@node1 ~]# tail -f /data/masterha/app1/manager.log

192.168.150.138(192.168.150.138:3306) (current master)

+--192.168.150.139(192.168.150.139:3306)

+--192.168.150.140(192.168.150.140:3306)

Wed Mar 29 21:51:58 2017 - [warning] master_ip_failover_script is not defined.

Wed Mar 29 21:51:58 2017 - [warning] shutdown_script is not defined.

Wed Mar 29 21:51:58 2017 - [info] Set master ping interval 1 seconds.

Wed Mar 29 21:51:58 2017 - [warning] secondary_check_script is not defined. It is highly recommended setti

ng it to check master reachability from two or more routes.Wed Mar 29 21:51:58 2017 - [info] Starting ping health check on 192.168.150.138(192.168.150.138:3306)..

Wed Mar 29 21:51:58 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

11、启动后查看master节点状态

[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf

app1 (pid:16623) is running(0:PING_OK), master:192.168.150.138

12、进行故障转移测试

(1)maser节点关闭mariadb

[root@node2 ~]# killall mysqld mysqld_safe

(2)此时在manager上可以看到转移的日志

----- Failover Report -----

app1: MySQL Master failover 192.168.150.138(192.168.150.138:3306) to 192.168.150.139(192.168.150.139:3306)

succeeded

Master 192.168.150.138(192.168.150.138:3306) is down!

Check MHA Manager logs at node1.com:/data/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.

The latest slave 192.168.150.139(192.168.150.139:3306) has all relay logs for recovery.

Selected 192.168.150.139(192.168.150.139:3306) as a new master.

192.168.150.139(192.168.150.139:3306): OK: Applying all logs succeeded.

192.168.150.140(192.168.150.140:3306): This host has the latest relay log events.

Generating relay diff files from the latest slave succeeded.

192.168.150.140(192.168.150.140:3306): OK: Applying all logs succeeded. Slave started, replicating from 19

2.168.150.139(192.168.150.139:3306)192.168.150.139(192.168.150.139:3306): Resetting slave info succeeded.

Master failover to 192.168.150.139(192.168.150.139:3306) completed successfully.

故障转移后，manager会自动停止，此时查看master状态

[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf

app1 is stopped(2:NOT_RUNNING).

(3)查看其它两个库状态

node3 已成功接管master，并可读写

MariaDB [(none)]> SHOW MASTER STATUS;

+-------------------+----------+--------------+------------------+

+-------------------+----------+--------------+------------------+

| master-bin.000003 | 245 | | |

+-------------------+----------+--------------+------------------+

1 row in set (0.00 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | OFF |

+---------------+-------+

1 row in set (0.00 sec)

node4

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.150.139

Master_User: repluser

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: master-bin.000003

Read_Master_Log_Pos: 245

Relay_Log_File: relay-bin.000002

Relay_Log_Pos: 530

Relay_Master_Log_File: master-bin.000003

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 245

Relay_Log_Space: 818

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 2

1 row in set (0.00 sec)

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | ON |

+---------------+-------+

1 row in set (0.00 sec)

(4)提供新的从节点已修复复制集群

master界定啊故障后，需要重新准备好一个新的MySQL节点。基于来自于master节点的备份恢复后，将其重新配置为mster的从节点即可。新加入节点IP为原master节点IP，否则还得修改appl.cnf中相应的设置，最后再次启动manager，并再次检查状态。

[root@node2 ~]# rm -rf /var/lib/mysql/*

[root@node2 ~]# vim /etc/my.cnf

添加从库两选项

read_only = 1

relay_log_purge = 0

[root@node2 ~]# systemctl start mariadb.service

[root@node2 ~]# mysql

Welcome to the MariaDB monitor. Commands end with ; or \g.

Your MariaDB connection id is 2

Server version: 5.5.52-MariaDB MariaDB Server

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED

BY 'replpass';Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass';

Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> FLUSH PRIVILEGES;

Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.139',MASTER_USER='repluser',MASTER_PASSWORD='r

eplpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245;

Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE\G

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaD

B server version for the right syntax to use near '' at line 1MariaDB [(none)]> START SLAVE;

Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.150.139

Master_User: repluser

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: master-bin.000003

Read_Master_Log_Pos: 245

Relay_Log_File: relay-bin.000002

Relay_Log_Pos: 530

Relay_Master_Log_File: master-bin.000003

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 245

Relay_Log_Space: 818

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 2

1 row in set (0.00 sec)

再次开启manager查看状态，状态全部OK，主库变更为node3

[root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 &

[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf

app1 (pid:17229) is running(0:PING_OK), master:192.168.150.139

mysqlMYSQL

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

holdtom

手记
篇

粉丝

240

获赞与收藏

992

关注作者，订阅最新文章

阅读免费教程

MySQL 进阶教程

32个小节 51568 963

MySQL 入门教程

32个小节 130169 1624

后端通用面试教程

41个小节 30940 346

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

MySQL高可用架构之MHA

阅读免费教程