gh-ost使用

前言

生产上,有时需要对一些大表(几千万数据)的表中字段进行增、改操作.如果直接操作,很可能对线上有比较大的影响.很多时候,需要在非忙时(大多是凌晨)进行表修改.想想都折磨人.

这里记录使用工具gh-ost(go语言编写)进行表字段修改.

为什么用它

原理

  • 伪装成一个mysql的从节点,读取binary log.

    gh-ost-general-flow

场景

在阿里云上的一台mysql,一张2000w数据的表,需要增加一个int型字段.

使用gh-ost,耗时37分钟.

环境

  • mysql 5.7.15-log (阿里云)
  • gh-ost 1.0.46

使用

Requirements

  • If you mysql already using RBR, all is well for you
  • If not, convert one of your replicas to binlog_format='ROW', or let gh-ost do this for you.gh-ost will not convert back to STATEMENT (SBR)
  • Replica should support log-slave-updates.

所以,gh-ost建议连接到mysql的从节点上.原因如上:如果binary log为statement模式时,gh-ost会强制转换到row模式,且需要自己手动改回statement.(注: gh-ost参数中配置–switch-to-rbr)

limitations

  • 使用具有管理员账号的权限.
  • gh-ost对表名大小写不敏感,如待迁移的表名为:MyTable,且存在一张表为Mytable,则不能使用.
  • 表需要存在主键,且主键的值不是null,否则需要在gh-ost参数中添加--allow-nullable-unique-key.一般表的id都设置为自增主键,通用场景不用考虑这种问题.Read more

  • 如果gh-ost连接的是从节点,需要确保主从节点上两张表的表结构一直.

  • 使用google云数据库,参数添加 --gcp
  • 使用阿里云的RDS,参数添加 --aliyun-rds
  • 对表重命名不能使用

Install and Use

下载完成后解压后直接使用即可.

使用的关键是相关参数的设置及理解.参数的详细信息通过./gh-ost --help查看即可.这里我也罗列了详见附录gh-ost参数详解.

这里记录下我迁移中设置的相关参数.

阿里云RDS且无从节点

  • 阿里云的RDS,需要添加参数–aliyun-rds

  • 无从节点,直接在主节点上执行–allow-on-master

Check binlog_format

检查mysql的binary log格式(FULL也是支持的).如果是statement,需要添加–switch-to-rbr参数,且最后需要手动将binlog_format设置回statement格式.

1
2
3
4
5
$ show variables like 'binlog_format';
+-----------------------------------------+--------------------------------+
| Variable_name | Value |
+-----------------------------------------+--------------------------------+
| binlog_format | ROW |
Run migrate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
./gh-ost \
--aliyun-rds=true \
--allow-on-master \
--host= --port= --database= --table= --conf=/opt/account.cnf \
--alter="ADD COLUMN platform int(2) DEFAULT NULL COMMENT '所属平台'" \
--chunk-size=5000 \
--concurrent-rowcount \
--cut-over=atomic \
--cut-over-lock-timeout-seconds=3 \
--initially-drop-ghost-table=true \
--initially-drop-old-table=true \
--initially-drop-socket-file=true \
--timestamp-old-table=true \
--verbose \
--serve-socket-file=/tmp/gh-ost.test.sock \
--switch-to-rbr \
--execute

# --conf设置登录mysql的用户名和密码文件的绝对路径.内容格式如下.
[client]
user=gh-ost
password=123456
# --aliyun-rds=true针对使用阿里云的RDS.
# --allow-on-master没有从节点时,直接连接主mysql时指定.
# --alter表结构操作,如我这里添加字段"ADD COLOUM platform int(2) DEFAULT NULL COMMENT '所属平台'"
# --chunk-size每次复制数据的数量,默认是1000,允许100~100,000之间.
# --concurrent-rowcount一边复制数据一边计算数据量
# --cut-over数据复制的最后一步,安全的对原表和新表进行切换.默认是atomic,而two-step则不安全.
# --cut-over-lock-timeout-seconds执行cut-over时间.默认3秒,可以不用配置.
# --cut-over-exponential-backoff执行cut-over失败后执行指数级等待.
# --exponential-backoff-max-interval执行指数级等待的最大时间间隔.默认是64.如:
# --exact-rowcount真实计算数据总量,执行count()操作.个人觉得没必要,比较耗时.
# --initially-drop-ghost-table(建议使用)清除以前操作遗留下的ghost表
# --initially-drop-old-table(建议使用)清除以前操作遗留下的旧表
# --initially-drop-socket-file(建议使用)清除以前遗留的socket文件
# --timestamp-old-table旧表加上时间戳
# --verbose日志界别debug
# --serve-socket-file=/tmp/gh-ost.test.sock
# --panic-flag-file=/tmp/gh-ost.panic.flag(暂时不清楚,先不使用)
# --switch-to-rbr(建议添加)这样不用关注mysql的bin log的格式.让gh-ost去执行设置bin-log格式操作.

附录

gh-ost参数详解

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
-aliyun-rds
set to 'true' when you execute on Aliyun RDS.
-allow-master-master
explicitly allow running in a master-master setup
-allow-nullable-unique-key
allow gh-ost to migrate based on a unique key with nullable columns. As long as no NULL values exist, this should be OK. If NULL values exist in chosen key, data may be corrupted. Use at your own risk!
-allow-on-master
allow this migration to run directly on master. Preferably it would run on a replica
-alter string
alter statement (mandatory)
-approve-renamed-columns ALTER
in case your ALTER statement renames columns, gh-ost will note that and offer its interpretation of the rename. By default gh-ost does not proceed to execute. This flag approves that gh-ost's interpretation is correct
-ask-pass
prompt for MySQL password
-assume-master-host string
(optional) explicitly tell gh-ost the identity of the master. Format: some.host.com[:port] This is useful in master-master setups where you wish to pick an explicit master, or in a tungsten-replicator where gh-ost is unable to determine the master
-assume-rbr
set to 'true' when you know for certain your server uses 'ROW' binlog_format. gh-ost is unable to tell, event after reading binlog_format, whether the replication process does indeed use 'ROW', and restarts replication to be certain RBR setting is applied. Such operation requires SUPER privileges which you might not have. Setting this flag avoids restarting replication and you can proceed to use gh-ost without SUPER privileges
-check-flag
Check if another flag exists/supported. This allows for cross-version scripting. Exits with 0 when all additional provided flags exist, nonzero otherwise. You must provide (dummy) values for flags that require a value. Example: gh-ost --check-flag --cut-over-lock-timeout-seconds --nice-ratio 0
-chunk-size int
amount of rows to handle in each iteration (allowed range: 100-100,000) (default 1000)
-concurrent-rowcount
(with --exact-rowcount), when true (default): count rows after row-copy begins, concurrently, and adjust row estimate later on; when false: first count rows, then start row copy (default true)
-conf string
Config file
-critical-load string
Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits
-critical-load-hibernate-seconds int
When nonzero, critical-load does not panic and bail out; instead, gh-ost goes into hibernate for the specified duration. It will not read/write anything to from/to any server
-critical-load-interval-millis int
When 0, migration immediately bails out upon meeting critical-load. When non-zero, a second check is done after given interval, and migration only bails out if 2nd check still meets critical load
-cut-over string
choose cut-over type (default|atomic, two-step) (default "atomic")
-cut-over-exponential-backoff
Wait exponentially longer intervals between failed cut-over attempts. Wait intervals obey a maximum configurable with 'exponential-backoff-max-interval').
-cut-over-lock-timeout-seconds int
Max number of seconds to hold locks on tables while attempting to cut-over (retry attempted when lock exceeds timeout) (default 3)
-database string
database name (mandatory)
-debug
debug mode (very verbose)
-default-retries int
Default number of retries for various operations before panicking (default 60)
-discard-foreign-keys
DANGER! This flag will migrate a table that has foreign keys and will NOT create foreign keys on the ghost table, thus your altered table will have NO foreign keys. This is useful for intentional dropping of foreign keys
-dml-batch-size int
batch size for DML events to apply in a single transaction (range 1-100) (default 10)
-exact-rowcount
actually count table rows as opposed to estimate them (results in more accurate progress estimation)
-execute
actually execute the alter & migrate the table. Default is noop: do some tests and exit
-exponential-backoff-max-interval int
Maximum number of seconds to wait between attempts when performing various operations with exponential backoff. (default 64)
-force-named-cut-over
When true, the 'unpostpone|cut-over' interactive command must name the migrated table
-force-table-names string
table name prefix to be used on the temporary tables
-heartbeat-interval-millis int
how frequently would gh-ost inject a heartbeat value (default 100)
-help
Display usage
-hooks-hint string
arbitrary message to be injected to hooks via GH_OST_HOOKS_HINT, for your convenience
-hooks-path string
directory where hook files are found (default: empty, ie. hooks disabled). Hook files found on this path, and conforming to hook naming conventions will be executed
-host string
MySQL hostname (preferably a replica, not the master) (default "127.0.0.1")
-initially-drop-ghost-table
Drop a possibly existing Ghost table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists
-initially-drop-old-table
Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists
-initially-drop-socket-file
Should gh-ost forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!
-master-password string
MySQL password on master, if different from that on replica. Requires --assume-master-host
-master-user string
MySQL user on master, if different from that on replica. Requires --assume-master-host
-max-lag-millis int
replication lag at which to throttle operation (default 1500)
-max-load string
Comma delimited status-name=threshold. e.g: 'Threads_running=100,Threads_connected=500'. When status exceeds threshold, app throttles writes
-migrate-on-replica
Have the migration run on a replica, not on the master. This will do the full migration on the replica including cut-over (as opposed to --test-on-replica)
-nice-ratio float
force being 'nice', imply sleep time per chunk time; range: [0.0..100.0]. Example values: 0 is aggressive. 1: for every 1ms spent copying rows, sleep additional 1ms (effectively doubling runtime); 0.7: for every 10ms spend in a rowcopy chunk, spend 7ms sleeping immediately after
-ok-to-drop-table
Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?
-panic-flag-file string
when this file is created, gh-ost will immediately terminate, without cleanup
-password string
MySQL password
-port int
MySQL port (preferably a replica, not the master) (default 3306)
-postpone-cut-over-flag-file string
while this file exists, migration will postpone the final stage of swapping tables, and will keep on syncing the ghost table. Cut-over/swapping would be ready to perform the moment the file is deleted.
-quiet
quiet
-replica-server-id uint
server id used by gh-ost process. Default: 99999 (default 99999)
-replication-lag-query string
Deprecated. gh-ost uses an internal, subsecond resolution query
-serve-socket-file string
Unix socket file to serve on. Default: auto-determined and advertised upon startup
-serve-tcp-port int
TCP port to serve on. Default: disabled
-skip-foreign-key-checks
set to 'true' when you know for certain there are no foreign keys on your table, and wish to skip the time it takes for gh-ost to verify that
-skip-renamed-columns ALTER
in case your ALTER statement renames columns, gh-ost will note that and offer its interpretation of the rename. By default gh-ost does not proceed to execute. This flag tells gh-ost to skip the renamed columns, i.e. to treat what gh-ost thinks are renamed columns as unrelated columns. NOTE: you may lose column data
-stack
add stack trace upon error
-switch-to-rbr
let this tool automatically switch binary log format to 'ROW' on the replica, if needed. The format will NOT be switched back. I'm too scared to do that, and wish to protect you if you happen to execute another migration while this one is running
-table string
table name (mandatory)
-test-on-replica
Have the migration run on a replica, not on the master. At the end of migration replication is stopped, and tables are swapped and immediately swap-revert. Replication remains stopped and you can compare the two tables for building trust
-test-on-replica-skip-replica-stop
When --test-on-replica is enabled, do not issue commands stop replication (requires --test-on-replica)
-throttle-additional-flag-file string
operation pauses when this file exists; hint: keep default, use for throttling multiple gh-ost operations (default "/tmp/gh-ost.throttle")
-throttle-control-replicas string
List of replicas on which to check for lag; comma delimited. Example: myhost1.com:3306,myhost2.com,myhost3.com:3307
-throttle-flag-file string
operation pauses when this file exists; hint: use a file that is specific to the table being altered
-throttle-http string
when given, gh-ost checks given URL via HEAD request; any response code other than 200 (OK) causes throttling; make sure it has low latency response
-throttle-query string
when given, issued (every second) to check if operation should throttle. Expecting to return zero for no-throttle, >0 for throttle. Query is issued on the migrated server. Make sure this query is lightweight
-timestamp-old-table
Use a timestamp in old table name. This makes old table names unique and non conflicting cross migrations
-tungsten
explicitly let gh-ost know that you are running on a tungsten-replication based topology (you are likely to also provide --assume-master-host)
-user string
MySQL user
-verbose
verbose
-version
Print version & exit

参考

坚持原创技术分享,您的支持将鼓励我继续创作!
Fork me on GitHub