使用自动化安装部署ZCache server集群过程中,会在各主机用户间建立密钥授权SSH,基于该授权开发了相关脚本来进行集群的基本管理操作。
redis-cli
这是ZCache的客户端工具,可以用来连接ZCache server,使用redis-cli -h可以查看详细使用参数说明
[email protected][/usr/local/redis/bin]#./redis-cli -h
redis-cli 3.2.10-2.1.0
Usage: redis-cli [OPTIONS] [cmd [arg [arg ...]]]
-h <hostname> Server hostname (default: 127.0.0.1).
-p <port> Server port (default: 6379).
-s <socket> Server socket (overrides hostname and port).
-a <password> Password to use when connecting to the server.
-r <repeat> Execute specified command N times.
-i <interval> When -r is used, waits <interval> seconds per command.
It is possible to specify sub-second times like -i 0.1.
-n <db> Database number.
-x Read last argument from STDIN.
-d <delimiter> Multi-bulk delimiter in for raw formatting (default: \n).
-c Enable cluster mode (follow -ASK and -MOVED redirections).
--raw Use raw formatting for replies (default when STDOUT is
not a tty).
--no-raw Force formatted output even when STDOUT is not a tty.
--csv Output in CSV format.
--stat Print rolling stats about server: mem, clients, ...
--latency Enter a special mode continuously sampling latency.
--latency-history Like --latency but tracking latency changes over time.
Default time interval is 15 sec. Change it using -i.
--latency-dist Shows latency as a spectrum, requires xterm 256 colors.
Default time interval is 1 sec. Change it using -i.
--lru-test <keys> Simulate a cache workload with an 80-20 distribution.
--slave Simulate a slave showing commands received from the master.
--rdb <filename> Transfer an RDB dump from remote server to local file.
--pipe Transfer raw Redis protocol from stdin to server.
--pipe-timeout <n> In --pipe mode, abort with error if after sending all data.
no reply is received within <n> seconds.
Default timeout: 30. Use 0 to wait forever.
--bigkeys Sample Redis keys looking for big keys.
--scan List all keys using the SCAN command.
--pattern <pat> Useful with --scan to specify a SCAN pattern.
--intrinsic-latency <sec> Run a test to measure intrinsic system latency.
The test will run for the specified amount of seconds.
--eval <file> Send an EVAL command using the Lua script at <file>.
--ldb Used with --eval enable the Redis Lua debugger.
--ldb-sync-mode Like --ldb but uses the synchronous Lua debugger, in
this mode the server is blocked and script changes are
are not rolled back from the server memory.
--help Output this help and exit.
--version Output version and exit.
Examples:
cat /etc/passwd | redis-cli -x set mypasswd
redis-cli get mypasswd
redis-cli -r 100 lpush mylist x
redis-cli -r 100 -i 1 info | grep used_memory_human:
redis-cli --eval myscript.lua key1 key2 , arg1 arg2 arg3
redis-cli --scan --pattern '*:12345*'
(Note: when using --eval the comma separates KEYS[] from ARGV[] items)
When no command is given, redis-cli starts in interactive mode.
Type "help" in interactive mode for information on available commands
and settings.
这个工具主要有两种使用方式,一是在执行参数中包含操作命令,另一种是不包含操作命令,而是以人机交互终端形式存在。
在执行参数中包含操作命令
redis-cli执行完操作命令后即退出
示例:执行cluster info命令查看集群状态
[email protected][/usr/local/redis/bin]#redis-cli -h 10.45.82.64 -p 7670 cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:3
cluster_my_epoch:1
cluster_stats_messages_sent:22970
cluster_stats_messages_received:22970
执行cluster nodes命令查看集群节点信息
[email protected][/usr/local/redis/bin]#redis-cli -h 10.45.82.64 -p 7670 cluster nodes
9fe944c229221823e4f50f3ce94f76c2c98d890b 10.45.82.64:7672 master - 0 1509429776801 3 connected 10923-16383
e169a287a75fee34b2739e11d64901d6b918b105 10.45.82.64:7671 master - 0 1509429777804 2 connected 5461-10922
259c81ba95cd00621855ffa6857152ef6d200426 10.45.82.64:7670 myself,master - 0 0 1 connected 0-5460
文档中的节点信息参考(包含从节点)
579deae82267f8e1ef22603ad946c1b91a812be0 10.45.61.25:7380 myself,master - 0 0 10 connected 10928-16383
cabdb7d74b50f5752e8cf2bf5dc33fc99794285e 10.45.61.28:7381 slave 579deae82267f8e1ef22603ad946c1b91a812be0 0 1421196621852 10 connected
5c1a57c791ebe6b86a3ee250ad0c344852590f92 10.45.61.29:7380 master - 0 1421196619848 3 connected 6-5460
447d270fa311afa7bdcb4059ed00abc66f608938 10.45.61.28:7380 master - 0 1421196621852 8 connected 0-5 5461-10927
0519d80798dfea983033cc345cd583252cc75622 10.45.61.25:7381 slave 5c1a57c791ebe6b86a3ee250ad0c344852590f92 0 1421196620348 4 connected
f6db896b323339d96094f433f1138882d4c121fb 10.45.61.29:7381 slave 447d270fa311afa7bdcb4059ed00abc66f608938 0 1421196620850 8 connected
cluster nodes输出的每一行包含以下信息:
节点 ID :例如 259c81ba95cd00621855ffa6857152ef6d200426。
ip:port :节点的 IP 地址和端口号, 例如 10.45.82.64:7670。
flags :节点的角色(例如 master 、 slave 、 myself )以及状态(例如 fail ,等等)。
如果节点是一个从节点的话, 那么跟在 flags 之后的将是主节点的节点 ID : 例如 10.45.61.29:7381 的主节点的节点 ID 就是 447d270fa311afa7bdcb4059ed00abc66f608938。
集群最近一次向节点发送PING命令之后, 过去了多长时间还没接到回复。
节点最近一次返回 PONG 回复的时间。
节点的配置纪元(configuration epoch) 。
本节点的网络连接情况:例如 connected。
节点目前包含的槽:例如 10.45.61.25:7380 目前包含号码为 10928至 16383的哈希槽。
执行参数中不包含操作命令
redis-cli执行后即进入人机交互模式,在该模式下,操作人员可以连续执行各种操作命令,直到调用‘exit’命令来退出
示例:在人机交互模式执行set和get命令
[email protected][/usr/local/redis/bin]#redis-cli -h 10.45.82.64 -p 7670 -c
10.45.82.64:7670> set keyxh valueXh
-> Redirected to slot [14963] located at 10.45.82.64:7672
OK
10.45.82.64:7672> get keyxh
"valueXh"
10.45.82.64:7672> exit
[email protected][/usr/local/redis/bin]#
集群运行信息的采集rc-getinfo.sh
用创建的缓存用户登录! 即创建ZCache server的时候安装参数中配置的用户:
#集群归属的系统用户;redis程序归属的系统用户,不存在时自动创建。
redis_cluster_user = cache
#集群归属用户的密码;redis程序归属用户的密码,自动创建用户时设置。
redis_cluster_user_password = cache
localhost:/usr/local/redis/bin $ ./rc-getinfo.sh
OK
====================================================================================================
total used free
------------------------------------------------
[10.45.82.64]
Mem: 7994292 567496 5898040
Swap: 2097148 0 2097148
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Server node[10.45.82.64:7670]
Pid : 43214, Role : master, Uptime : 4.43(h), Version : 3.2.10-2.1.0
ClientNum : 1, AllKeys : 0, ExpireKeys: 0, AvgTtl : 0
ServerRss : 9792kB, SysMem : 7994292(kB), SysMemUse : 567496(kB), SysMemFree : 5898040(kB)
ServerSwap: 0(kB), SysSwap : 2097148(kB), SysSwapUse: 0(kB), SysSwapFree: 2097148(kB)
BgsaveTime: -1, AofEnable: 0, AofRewTime: -1, WriteStatus: ok
AllCommand: 1505, EvictKeys: 0, HitKeys : 0, MissKeys : 0
##############################################SLOWLOGS##############################################
ID Time Duration(us) Command
-------------------------------------------------------------------------------------
#############################################CONFIG DIFF############################################
Parameter Default Value
---------------------------------------------------------------------------------------------------
auto-aof-rewrite-percentage "200" "0"
lua-time-limit "5000" "1000"
node-fail-delay "300" "1000"
repl-diskless-sync-delay "5" "3"
dir "./" "/home/redis/data/7670"
client-output-buffer-limit "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60" "normal 0 0 0 slave 0 0 0 pubsub 33554432 8388608 60"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Server node[10.45.82.64:7671]
Pid : 43279, Role : master, Uptime : 4.43(h), Version : 3.2.10-2.1.0
ClientNum : 1, AllKeys : 0, ExpireKeys: 0, AvgTtl : 0
ServerRss : 9796kB, SysMem : 7994292(kB), SysMemUse : 567496(kB), SysMemFree : 5898040(kB)
ServerSwap: 0(kB), SysSwap : 2097148(kB), SysSwapUse: 0(kB), SysSwapFree: 2097148(kB)
BgsaveTime: -1, AofEnable: 0, AofRewTime: -1, WriteStatus: ok
AllCommand: 2991, EvictKeys: 0, HitKeys : 0, MissKeys : 0
##############################################SLOWLOGS##############################################
ID Time Duration(us) Command
-------------------------------------------------------------------------------------
#############################################CONFIG DIFF############################################
Parameter Default Value
---------------------------------------------------------------------------------------------------
auto-aof-rewrite-percentage "200" "0"
lua-time-limit "5000" "1000"
node-fail-delay "300" "1000"
repl-diskless-sync-delay "5" "3"
dir "./" "/home/redis/data/7671"
client-output-buffer-limit "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60" "normal 0 0 0 slave 0 0 0 pubsub 33554432 8388608 60"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Server node[10.45.82.64:7672]
Pid : 43344, Role : master, Uptime : 4.43(h), Version : 3.2.10-2.1.0
ClientNum : 1, AllKeys : 1, ExpireKeys: 0, AvgTtl : 0
ServerRss : 9804kB, SysMem : 7994292(kB), SysMemUse : 567496(kB), SysMemFree : 5898040(kB)
ServerSwap: 0(kB), SysSwap : 2097148(kB), SysSwapUse: 0(kB), SysSwapFree: 2097148(kB)
BgsaveTime: -1, AofEnable: 0, AofRewTime: -1, WriteStatus: ok
AllCommand: 2992, EvictKeys: 0, HitKeys : 1, MissKeys : 0
##############################################SLOWLOGS##############################################
ID Time Duration(us) Command
-------------------------------------------------------------------------------------
#############################################CONFIG DIFF############################################
Parameter Default Value
---------------------------------------------------------------------------------------------------
auto-aof-rewrite-percentage "200" "0"
lua-time-limit "5000" "1000"
node-fail-delay "300" "1000"
repl-diskless-sync-delay "5" "3"
dir "./" "/home/redis/data/7672"
client-output-buffer-limit "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60" "normal 0 0 0 slave 0 0 0 pubsub 33554432 8388608 60"
###########################################COMMANDS STATS###########################################
Command Calls Usec UsecPerCall
-----------------------------------------------------------------------------------
cluster 3028 645780 213.27 (!)
auth 4 14 3.50
ping 6 15 2.50
set 1 30 30.00
command 1 965 965.00 (!)
get 1 8 8.00
info 4489 97136 21.64
****************************************************************************************************
****************************************************************************************************
* SERVER ROLE STATUS KEYS CLIENTS USEMEM(kB) OPS NET(kBps) SLOTS *
* ------------------------------------------------------------------------------------------------ *
* 10.45.82.64:7672 master ok 1 1 9804 0 0 5460 *
* | *
* 10.45.82.64:7671 master ok 0 1 9796 0 0 5461 *
* | *
* 10.45.82.64:7670 master ok 0 1 9792 1 0.06 5460 *
* | *
****************************************************************************************************
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Please check the warning followed: !
! 0: There are too more masters on 10.45.82.64! !
! 1: Master[10.45.82.64:7672] has no slave ,or on the same host! !
! 2: Master[10.45.82.64:7671] has no slave ,or on the same host! !
! 3: Master[10.45.82.64:7670] has no slave ,or on the same host! !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
All statistic information of ZCache is in ZCache-cache.tar.gz !!!
启动集群
#从集群主机上启动: 可以在集群中任何一台主机上,以集群所属系统用户登录
localhost:/usr/local/redis/bin $ ./rc-start.sh
2017-11-07 15:12:13 * Begin to start redis cluster on user [ cache ]
2017-11-07 15:12:13 * Start node [ 10.45.82.64:7672 ] by [ redis-server redis-7672.conf ]...
2017-11-07 15:12:13 * Start node [ 10.45.82.64:7671 ] by [ redis-server redis-7671.conf ]...
2017-11-07 15:12:13 * Start node [ 10.45.82.64:7670 ] by [ redis-server redis-7670.conf ]...
2017-11-07 15:12:14 * [ 10.45.82.64:7672 ] start ok .
2017-11-07 15:12:14 * [ 10.45.82.64:7671 ] start ok .
2017-11-07 15:12:14 * [ 10.45.82.64:7670 ] start ok .
- 100% Time elapsed: 1 (s)
2017-11-07 15:12:14 * ...................................Start done
脚本会自动通过登录的系统用户,来查找到该主机上集群节点的配置文件,并进一步获取到集群包含的各节点及主机信息,最后通过ssh密钥授权来登录到各主机启动节点。
rc-start.sh启动时会检测各节点的启动情况,只有启动完成才返回;如果存在节点启动失败,将会输出该节点的启动日志信息,并请求人工介入处理。
#从非集群主机上启动:使用ZCache集群安装包中rcmanage.py脚本,可以实现从非集群主机上远程启动ZCache server集群
[email protected][/home/rcinstall]#./rcmanage.py -a start -n 10.45.82.64 -u cache -p cache
Redis Cluster Installation Tool
Begining start for redis cluster by [email protected]...
Finish start: 2017-11-07 15:24:01 * ...................................Start done
rcmanage.py使用ssh密码授权方式来登录到指定集群主机上,调用rc-start.sh脚本来启动ZCache server集群
停止集群
#从集群主机上停止:可以在集群中任何一台主机上,以集群所属系统用户登录
localhost:/usr/local/redis/bin $ ./rc-stop.sh
2017-11-07 15:32:20 * Begin to stop redis cluster on user [ cache ]
2017-11-07 15:32:20 * Stop automan
2017-11-07 15:32:20 * Shutdown redis node [ 10.45.82.64:7670 ]
2017-11-07 15:32:20 * Shutdown redis node [ 10.45.82.64:7671 ]
2017-11-07 15:32:20 * Shutdown redis node [ 10.45.82.64:7672 ]
2017-11-07 15:32:20 * ...................................Stop done
脚本会自动通过登录的系统用户,来查找到该主机上集群节点的配置文件,并进一步获取到集群包含的各节点信息,最后通过远程服务调用,来停止集群中的各个节点
#从非集群主机上停止:使用ZCache集群安装包中rcmanage.py脚本,可以实现从非集群主机上远程停止ZCache server集群。
[email protected][/home/rcinstall]#./rcmanage.py -a stop -n 10.45.82.64 -u cache -p cache
Redis Cluster Installation Tool
Begining stop for redis cluster by [email protected]...
Finish stop: 2017-11-07 15:33:51 * ...................................Stop done
rcmanage.py使用ssh密码授权方式来登录到指定集群主机上,调用rc-stop.sh脚本来停止ZCache server集群
卸载集群
#从集群主机上卸载:要卸载集群,可以在集群中任何一台主机上,以集群所属系统用户登录。
#rc-destroy.sh
脚本会自动通过登录的系统用户,来查找到该主机上集群节点的配置文件,并进一步获取到集群包含的各节点及主机信息,最后通过ssh密钥授权来登录到各主机卸载节点。
#从非集群主机上卸载:使用ZCache集群安装包中rcmanage.py脚本,可以实现从非集群主机上远程卸载ZCache server集群
./rcmanage.py -a destroy -n 10.45.82.64 -u cache -p cache
rcmanage.py使用ssh密码授权方式来登录到指定集群主机上,调用rc-destroy.sh脚本来卸载ZCache server集群
过期键值清理
过期键值清理使用rc-cleanexpire.sh脚本来处理,该脚本在集群安装过程中一同被安装到各主机的$REDIS_HOME/bin目录中(REDIS_HOME默认为/usr/local/redis)。
过期键值清理可以在集群中任何一台主机上,以集群所属系统用户登录。执行rc-cleanexpire.sh脚本(无参数):脚本会自动通过登录的系统用户,来查找到该主机上集群节点的配置文件,并进一步获取到集群包含的各节点及主机信息,最后通过ssh密钥授权来登录到各主机进行过期键值清理。
localhost:/usr/local/redis/bin $ ./rc-cleanexpire.sh
2017-11-08 10:57:19 * [10.45.6.24:7670]: Start to clean expired keys...
2017-11-08 10:57:19 * [10.45.6.24:7671]: Start to clean expired keys...
2017-11-08 10:57:19 * [10.45.6.24:7672]: Start to clean expired keys...
2017-11-08 10:57:19 * [10.45.82.64:7670]: Start to clean expired keys...
2017-11-08 10:57:19 * [10.45.82.64:7671]: Start to clean expired keys...
2017-11-08 10:57:19 * [10.45.82.64:7672]: Start to clean expired keys...
2017-11-08 10:57:19 * [10.45.82.64:7680]: Start to clean expired keys...
2017-11-08 10:57:20 * [10.45.82.64:7681]: Start to clean expired keys...
2017-11-08 10:57:20 * [10.45.82.64:7682]: Start to clean expired keys...
2017-11-08 10:57:20 * [10.45.82.64:7690]: Start to clean expired keys...
2017-11-08 10:57:20 * [10.45.82.64:7692]: Start to clean expired keys...
<10.45.6.24:7670 > | [##################################################]100%
<10.45.6.24:7671 > | [##################################################]100%
<10.45.6.24:7672 > | [##################################################]100%
<10.45.82.64:7670 > | [##################################################]100%
<10.45.82.64:7671 > | [##################################################]100%
<10.45.82.64:7672 > | [##################################################]100%
<10.45.82.64:7680 > | [##################################################]100%
<10.45.82.64:7681 > | [##################################################]100%
<10.45.82.64:7682 > | [##################################################]100%
<10.45.82.64:7690 > | [##################################################]100%
<10.45.82.64:7692 > | [##################################################]100%
2017-11-08 10:57:19 * [10.45.6.24:7670]: Clean expired keys: 0
2017-11-08 10:57:19 * [10.45.6.24:7672]: Clean expired keys: 0
2017-11-08 10:57:19 * [10.45.6.24:7671]: Clean expired keys: 0
2017-11-08 10:57:20 * [10.45.82.64:7670]: Clean expired keys: 0
2017-11-08 10:57:21 * [10.45.82.64:7671]: Clean expired keys: 0
2017-11-08 10:57:21 * [10.45.82.64:7672]: Clean expired keys: 0
2017-11-08 10:57:21 * [10.45.82.64:7680]: Clean expired keys: 0
2017-11-08 10:57:21 * [10.45.82.64:7681]: Clean expired keys: 0
2017-11-08 10:57:21 * [10.45.82.64:7682]: Clean expired keys: 0
2017-11-08 10:57:21 * [10.45.82.64:7690]: Clean expired keys: 0
2017-11-08 10:57:21 * [10.45.82.64:7692]: Clean expired keys: 0
在线修改节点参数
在线修改节点运行参数使用rc-setconfig.sh脚本来处理,该脚本在集群安装过程中一同被安装到各主机的$REDIS_HOME/bin目录中(REDIS_HOME默认为/usr/local/redis)。
在线修改节点运行参数可以在集群中任何一台主机上,以集群所属系统用户登录。执行rc-setconfig.sh脚本:脚本会自动通过登录的系统用户,来查找到该主机上集群节点的配置文件,并进一步获取到集群包含的各节点及主机信息,最后连接到服务节点通过config命令来修改节点运行参数。
localhost:/usr/local/redis/bin $ ./rc-setconfig.sh
Usage: rc-setconfig.sh [-m|-s|-n <IP:PORT>] [-wh] confname [confvalue]
# -m -s -n 三个选项互斥,如果三个选项都未设置则将修改所有节点的运行参数
-m
only modify masters config [if not set -m&-s&-n,will modify all nodes]
-s
only modify slaves config [if not set -m&-s&-n,will modify all nodes]
-n IP:PORT
only modify the node[IP:PORT] config
-w
write the config file after modify config online
-h
show usage
Examples:
1. set all server nodes online: loglevel=warning :
$ rc-setconfig.sh loglevel warning
2. set all server nodes online & write to config file: loglevel=notice :
$ rc-setconfig.sh -w loglevel notice
3. set master nodes online: loglevel=notice :
$ rc-setconfig.sh -m loglevel notice
4. set node<10.45.43.200:7370> online: loglevel=notice :
$ rc-setconfig.sh -n 10.45.43.200:7370 loglevel notice
5. get loglevel on all server nodes:
$ rc-setconfig.sh loglevel
When no confvalue is given,rc-setconfig.sh just show the config.
localhost:/usr/local/redis/bin $ ./rc-setconfig.sh -m -w loglevel debug
2017-11-08 11:03:04 * Process config on master nodes ...
2017-11-08 11:03:04 * [10.45.6.24:7670]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.6.24:7671]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.6.24:7672]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7670]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7671]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7672]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7680]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7681]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7682]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7690]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * [10.45.82.64:7692]: CONFIG SET loglevel debug : [write config file: OK]
2017-11-08 11:03:04 * Get the value of <loglevel> :
================================================================================
SERVER | loglevel
--------------------------------------------------------------------------------
10.45.6.24:7670 | "debug"
10.45.6.24:7671 | "debug"
10.45.6.24:7672 | "debug"
10.45.82.64:7670 | "debug"
10.45.82.64:7671 | "debug"
10.45.82.64:7672 | "debug"
10.45.82.64:7680 | "debug"
10.45.82.64:7681 | "debug"
10.45.82.64:7682 | "debug"
10.45.82.64:7690 | "debug"
10.45.82.64:7692 | "debug"
================================================================================
按时间点回档数据
当缓存集群开启了AOF持久化时,可以使用该工具进行数据恢复处理。例如因为误操作,缓存中有大量的数据被删除或修改,此时可以停掉业务程序处理,进行紧急数据恢复,使用rc-recover.sh工具来恢复ZCache至指定的时间点。
在集群中任何一台主机上,以集群所属系统用户登录。执行rc-recover.sh工具,它会自动通过登录的系统用户,来查找到该主机上集群节点的配置文件,并进一步获取到集群包含的各节点及主机信息,最后并发调用各节点的数据回档恢复处理。
localhost:/usr/local/redis/bin $ ./rc-recover.sh
Usage: rc-recover.sh [OPTIONS]
-g <groupid> Recover group id.
-t <timepoint> Recover to the timepoint.
-c Clear data before recover.
Examples:
$ rc-recover.sh -g 1 -t 20170101010101
$ rc-recover.sh -g 1 -t 20170101010101 -c
注:AOF持久化新增配置参数aof-expire-time:aof文件失效时长(单位:秒),默认该参数取值为0,即只保留当前有效的aof文件,历史无效aof文件将被自动删除;当该参数配置>0时,无效的aof文件将会保留指定时长后才会被删除。【保留历史无效的aof文件主要是用于数据的备份及回档恢复需要】
集群自管理
自动管理工具(rc-automan.sh)主要负责检测ZCache server集群的运行状态,当发现集群主节点挂掉,且无法自动切换时,会主动进行修复,完成主从故障切换。而当节点挂掉时,其也能够根据设置来自动拉起服务节点。
增加的设置参数为automan:
# automan: 集群自管理参数
# 0 - 关闭自动重启和故障切换,
# 1 - 开启自动重启,
# 2 - 开启自动故障切换,
# 3 - 开启自动重启和故障切换
automan 3
Usage: rc-automan.sh [-d|-i interval]
-d
show debug information
-i interval
check cluster status interval [default 10 (s)]
-h
show usage
Examples:
1. manage zcache in login user
$ rc-automan.sh
2. check zcache status every 60 seconds
$ rc-automan.sh -i 60
只有当ZCache server集群的主节点故障无法自动恢复时,自管理工具才会来帮助集群完成主从切换。【类似于有中心的分布式系统;这样可以解决无中心分布式系统(ZCache server)在特定场景下(主节点存活数不超过总数的一半,不能进行选举处理)无法自愈的问题】
自动管理工具同时会在ZCache server所在的所有主机的集群用户下运行,但其中只有一个实例(MASTER)进行自管理维护,其他实例处于STANDBY状态,只有当MASTER进程挂掉时,其他STANDBY实例才会产生新的MASTER进行接管。另外MASTER进程还会检查并自动拉起挂掉的STANDBY实例。
注:自管理工具无需人工调用,其会在rc-start.sh执行时被同时拉起。
集群节点信息查看
查看集群节点信息用rc-tool.sh脚本来处理,该脚本在集群安装过程中一同被安装到各主机的$REDIS_HOME/bin目录中(REDIS_HOME默认为/usr/local/redis)。
集群节点信息查看可以在集群中任何一台主机上,以集群所属系统用户登录。执行rc-tool.sh nodes脚本(参数为nodes):脚本会自动通过登录的系统用户,来查找到该主机上集群节点的配置文件,并进一步获取到集群包含的各节点及主机信息并进行格式化展示
localhost:/usr/local/redis/bin $ ./rc-tool.sh
Usage: ./rc-tool.sh <config|nodes|cmds> ip port
localhost:/usr/local/redis/bin $ ./rc-tool.sh nodes 10.45.6.24 7670
NODEID SERVER ROLE PING-TIME PONG-TIME EPOCH STATE SLOTS
-------------------------------------------------------------------------------------------------------------------
6568131fda... 10.45.6.24:7670 myself,master 0 0 11 connected 0-909 5461-6372
abdf90c344... 10.45.6.24:7671 master 0 1510203448712 13 connected 2731-3640 9102 13654-14563
ce7adb006f... 10.45.6.24:7672 master 0 1510203447711 12 connected 910 8193-9101 10923-11833
259c81ba95... 10.45.82.64:7670 master 0 1510203445706 1 connected 3641-5460
e169a287a7... 10.45.82.64:7671 master 0 1510203446709 2 connected 9103-10922
9fe944c229... 10.45.82.64:7672 master 0 1510203445706 3 connected 14564-16383
4aeb672cac... 10.45.82.64:7680 master 0 1510203448714 8 connected 11834-13653
6d604078c1... 10.45.82.64:7681 master 0 1510203447712 6 connected 6373-8192
1223f1c83c... 10.45.82.64:7682 master 0 1510203448714 7 connected 911-2730
718125f4d9... 10.45.82.64:7690 master 0 1510203444697 15 connected
2b7ea1255c... 10.45.82.64:7692 master 0 1510203448713 14 connected
工具会格式化展示集群中各节点信息,以树状结构来显示主从节点的关系,并且根据主节点的IP来进行排序,方便用户查找指定节点。