Skip to content

Caveats when upgrading from tarantool 1.6

Serge Petrenko edited this page Jul 14, 2022 · 1 revision

This page includes explanations and solutions to some common issues when upgrading the cluster from Tarantool 1.6 to 1.10. This will be of interest to anyone upgrading to the most recent Tarantool version, like 2.8.4, 2.10.0 and such, since taking an intermediate step with upgrading 1.6 -> 1.10 -> 2.x is the only way to perform the upgrade without downtime.

A direct upgrade from 1.6 to 2.x is also possible, but only with downtime.

Let's first reiterate the upgrade procedure for any Tarantool version:

  1. Pick any replica in cluster.
  2. Update this replica to the new Tarantool version.
  3. Make sure the replica connected to the rest of the cluster just fine: check box.info.replication[id].upstream and box.info.replication[id].downstream. The updated replica should follow everyone and should be followed by everyone.
  4. Update all the replicas by repeating steps 1-3 until only the master remains running old Tarantool version.
  5. Switch master to one of the updated replicas, and check that it continues following and being followed by everyone.
  6. Update the former master
  7. On the new master issue box.schema.upgrade().
  8. Issue box.snapshot() on every node in the cluster.

What's different when upgrading from Tarantool 1.6:

Step 2: Tarantool 1.10+ fails to recover from 1.6 xlogs unless box.cfg{force_recovery = true} is set. There is some small difference between 1.6 and 1.10 xlogs, which makes 1.6 xlogs appear erroneous to 1.10+ instances. In order to work this around one has to start the instance in force_recovery mode.

Step 3: new Tarantool nodes follow 1.6 nodes just fine, but some 1.6 nodes might disconnect from new nodes with an ER_LOADING error. This is not critical, the error goes away when replication on 1.6 is restarted:

old_repl = box.cfg.replication
box.cfg{replication = ""}
box.cfg{replication = old_repl}

Step 7: There was a breaking change between 1.6 and 1.10: in 1.6 field type "num" was an alias to "number", and in 1.10 "num" is converted to "unsigned". This means that after box.schema.upgrade() performed on master, the user might have some spaces with "unsigned" fields containing non-unsigned values: double, int and so on. This will make the snapshot inconsistent, unless an extra action is performed after box.schema.upgrade():

-- First find all spaces containing unsigned fields with non-unsigned values in them.
-- Say, we have one such space denoted problematic_space and the problem is in field problematic_field_no.
a = box.space.problematic_space:format()
a[problematic_field_no].type = 'number'
box.space.problematic_space:format(a)

Once this is performed on master, it's safe to proceed to step 8, making snapshot on every node.

Step 8: The user might be concerned with snapshot size in 1.10: it's drastically smaller than the one created by 1.6 (for example, ~300 Mb vs. 6 Gb in some corner cases). There's nothing to worry about. Tarantool 1.6 didn't compress the snapshots, while Tarantool 1.10 and above does that.

Developer Guidelines ↗

Architecture

How To ...?

Recipes

Upgrade instructions

Useful links

Old discussions

Personal pages

Clone this wiki locally