-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
Hi,
When running gh-ost, the cut-over part often fails due to timeout (unfortunately our slaves often have some lags / spikes of lag). This is not a problem in itself but it does not seem to wait and retry at a later time which I think it is supposed to do.
I am running version 1.1.1
(redacted command and log)
This is what I am running (alter just adds some columns) :
./gh-ost \
--user="ghost" \
--password="********" \
--host="slave" \
--allow-master-master \
--assume-master-host="master" \
--database="db" \
--table="table" \
--alter="alter" \
--verbose \
--skip-foreign-key-checks \
--skip-strict-mode \
--initially-drop-ghost-table \
--initially-drop-old-table \
--exact-rowcount \
--ok-to-drop-table \
--max-lag-millis=10000 \
--execute
Here is the output at the cut-over phase :
Copy: 14327000/14362256 99.8%; Applied: 12041; Backlog: 0/1000; Time: 1h16m10s(total), 1h15m58s(copy); streamer: bin-log.005490:906474118; Lag: 2.68s, State: migrating; ETA: 11s
Copy: 14334000/14362256 99.8%; Applied: 12047; Backlog: 2/1000; Time: 1h16m11s(total), 1h15m59s(copy); streamer: bin-log.005490:920606200; Lag: 2.38s, State: migrating; ETA: 8s
Copy: 14342000/14362256 99.9%; Applied: 12053; Backlog: 0/1000; Time: 1h16m12s(total), 1h16m0s(copy); streamer: bin-log.005490:934403718; Lag: 1.78s, State: migrating; ETA: 6s
Copy: 14350000/14362256 99.9%; Applied: 12055; Backlog: 0/1000; Time: 1h16m13s(total), 1h16m1s(copy); streamer: bin-log.005490:947459950; Lag: 1.48s, State: migrating; ETA: 3s
Copy: 14359000/14362256 100.0%; Applied: 12058; Backlog: 0/1000; Time: 1h16m14s(total), 1h16m2s(copy); streamer: bin-log.005490:961146997; Lag: 1.18s, State: migrating; ETA: 1s
2021-05-18 18:18:18 INFO Row copy complete
Copy: 14362256/14362256 100.0%; Applied: 12060; Backlog: 0/1000; Time: 1h16m14s(total), 1h16m2s(copy); streamer: bin-log.005490:966837215; Lag: 1.08s, State: migrating; ETA: due
2021-05-18 18:18:18 INFO Grabbing voluntary lock: gh-ost.1024388992.lock
2021-05-18 18:18:18 INFO Setting LOCK timeout as 6 seconds
2021-05-18 18:18:18 INFO Looking for magic cut-over table
2021-05-18 18:18:18 INFO Creating magic cut-over table `db`.`_table_del`
2021-05-18 18:18:18 INFO Magic cut-over table created
2021-05-18 18:18:18 INFO Locking `db`.`table`, `db`.`_table_del`
2021-05-18 18:18:18 INFO Tables locked
2021-05-18 18:18:18 INFO Session locking original & magic tables is 1024388992
2021-05-18 18:18:18 INFO Writing changelog state: AllEventsUpToLockProcessed:1621354698945390596
2021-05-18 18:18:18 INFO Waiting for events up to lock
Copy: 14362256/14362256 100.0%; Applied: 12060; Backlog: 0/1000; Time: 1h16m15s(total), 1h16m2s(copy); streamer: bin-log.005490:970667197; Lag: 1.28s, State: migrating; ETA: due
2021-05-18 18:18:21 ERROR Timeout while waiting for events up to lock
2021-05-18 18:18:21 ERROR 2021-05-18 18:18:21 ERROR Timeout while waiting for events up to lock
2021-05-18 18:18:21 INFO Looking for magic cut-over table
2021-05-18 18:18:21 INFO Will now proceed to drop magic table and unlock tables
2021-05-18 18:18:21 INFO Dropping magic cut-over table
2021-05-18 18:18:21 INFO Dropping magic cut-over table
2021-05-18 18:18:21 INFO Dropping table `db`.`_table_del`
2021-05-18 18:18:24 INFO Intercepted changelog state AllEventsUpToLockProcessed
2021-05-18 18:18:24 INFO Handled changelog state AllEventsUpToLockProcessed
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 1/1000; Time: 1h16m20s(total), 1h16m2s(copy); streamer: bin-log.005490:989474686; Lag: 3.18s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h16m25s(total), 1h16m2s(copy); streamer: bin-log.005490:1014763533; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h16m30s(total), 1h16m2s(copy); streamer: bin-log.005490:1026013234; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h16m35s(total), 1h16m2s(copy); streamer: bin-log.005490:1036196578; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h16m40s(total), 1h16m2s(copy); streamer: bin-log.005490:1047207998; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h16m45s(total), 1h16m2s(copy); streamer: bin-log.005490:1057140917; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h16m50s(total), 1h16m2s(copy); streamer: bin-log.005490:1062986858; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h16m55s(total), 1h16m2s(copy); streamer: bin-log.005490:1064318221; Lag: 4.48s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h17m0s(total), 1h16m2s(copy); streamer: bin-log.005490:1073646385; Lag: 4.78s, State: migrating; ETA: due
[2021/05/18 18:19:04] [info] binlogsyncer.go:723 rotate to (bin-log.005491, 4)
[2021/05/18 18:19:04] [info] binlogsyncer.go:723 rotate to (bin-log.005491, 4)
2021-05-18 18:19:04 INFO rotate to next log from bin-log.005491:1073742500 to bin-log.005491
2021-05-18 18:19:04 INFO rotate to next log from bin-log.005491:0 to bin-log.005491
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h17m5s(total), 1h16m2s(copy); streamer: bin-log.005491:14353974; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h17m10s(total), 1h16m2s(copy); streamer: bin-log.005491:26928263; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h17m30s(total), 1h16m2s(copy); streamer: bin-log.005491:64743000; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h18m0s(total), 1h16m2s(copy); streamer: bin-log.005491:118334528; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h18m30s(total), 1h16m2s(copy); streamer: bin-log.005491:176850550; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h19m0s(total), 1h16m2s(copy); streamer: bin-log.005491:236619046; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h19m30s(total), 1h16m2s(copy); streamer: bin-log.005491:292654898; Lag: 0.08s, State: migrating; ETA: due
# Migrating `db`.`table`; Ghost table is `db`.`_table_gho`
# Migrating master:3306; inspecting slave:3306; executing on slave
# Migration started at Tue May 18 17:02:04 +0200 2021
# chunk-size: 1000; max-lag-millis: 10000ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# Serving on unix socket: /tmp/gh-ost.db.table.sock
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h20m0s(total), 1h16m2s(copy); streamer: bin-log.005491:359697150; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h20m30s(total), 1h16m2s(copy); streamer: bin-log.005491:427123115; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h21m0s(total), 1h16m2s(copy); streamer: bin-log.005491:498372220; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h21m30s(total), 1h16m2s(copy); streamer: bin-log.005491:564757429; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h22m0s(total), 1h16m2s(copy); streamer: bin-log.005491:627971561; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h22m30s(total), 1h16m2s(copy); streamer: bin-log.005491:700060664; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h23m0s(total), 1h16m2s(copy); streamer: bin-log.005491:762556638; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h23m30s(total), 1h16m2s(copy); streamer: bin-log.005491:830434663; Lag: 1.18s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h24m0s(total), 1h16m2s(copy); streamer: bin-log.005491:898773371; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h24m30s(total), 1h16m2s(copy); streamer: bin-log.005491:972940697; Lag: 0.08s, State: migrating; ETA: due
Copy: 14362256/14362256 100.0%; Applied: 12061; Backlog: 0/1000; Time: 1h25m0s(total), 1h16m2s(copy); streamer: bin-log.005491:1033621038; Lag: 0.08s, State: migrating; ETA: due
And it goes on like that even when lag is low and that cut-over could happen. I ctrl-c-ed it after 2h50m.
Any clue on what may be happening and how to fix it ?
Thanks
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels