This section describes issues pertaining to Unicode support that you may face when upgrading to MySQL 5.5 from an older MySQL release. It also provides guidelines for downgrading from MySQL 5.5 to an older release.
In most respects, upgrading to MySQL 5.5 should present few problems with regard to Unicode usage, although there are some potential areas of incompatibility. These are the primary areas of concern:
For the variable-length character data types
(VARCHAR
and the
TEXT
types), the maximum
length in characters is less for utf8mb4
columns than for utf8
columns.
For all character data types
(CHAR
,
VARCHAR
, and the
TEXT
types), the maximum
number of characters that can be indexed is less for
utf8mb4
columns than for
utf8
columns.
Consequently, if you want to upgrade tables from
utf8
to utf8mb4
to take
advantage of supplementary-character support, it may be
necessary to change some column or index definitions.
Tables can be converted from utf8
to
utf8mb4
by using ALTER
TABLE
. Suppose that a table was originally defined as
follows:
CREATE TABLE t1 ( col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL, col2 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ) CHARACTER SET utf8;
The following statement converts t1
to use
utf8mb4
:
ALTER TABLE t1 DEFAULT CHARACTER SET utf8mb4, MODIFY col1 CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL, MODIFY col2 CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;
In terms of table content, conversion from
utf8
to utf8mb4
presents
no problems:
For a BMP character, utf8
and
utf8mb4
have identical storage
characteristics: same code values, same encoding, same
length.
For a supplementary character, utf8
cannot store the character at all, while
utf8mb4
requires four bytes to store it.
Since utf8
cannot store the character at
all, you do not have any supplementary characters in
utf8
columns and you need not worry about
converting characters or losing data when upgrading
utf8
data from older versions of MySQL.
In terms of table structure, the catch when converting from
utf8
to utf8mb4
is that
the maximum length of a column or index key is unchanged in
terms of bytes. Therefore, it is smaller in
terms of characters because the maximum
length of a character is four bytes instead of three. For the
CHAR
,
VARCHAR
, and
TEXT
data types, watch for these
things when converting your MySQL tables:
Check all definitions of utf8
columns and
make sure they will not exceed the maximum length for the
storage engine.
Check all indexes on utf8
columns and
make sure they will not exceed the maximum length for the
storage engine. Sometimes the maximum can change due to
storage engine enhancements.
If the preceding conditions apply, you must either reduce the
defined length of columns or indexes, or continue to use
utf8
rather than utf8mb4
.
Here are some examples where structural changes may be needed:
A TINYTEXT
column can hold up
to 255 bytes, so it can hold up to 85 three-byte or 63
four-byte characters. Suppose that you have a
TINYTEXT
column that uses
utf8
but must be able to contain more
than 63 characters. You cannot convert it to
utf8mb4
unless you also change the data
type to a longer type such as
TEXT
.
Similarly, a very long
VARCHAR
column may need to be
changed to one of the longer
TEXT
types if you want to
convert it from utf8
to
utf8mb4
.
InnoDB
has a maximum index length of 767
bytes, so for utf8
or
utf8mb4
columns, you can index a maximum
of 255 or 191 characters, respectively. If you currently
have utf8
columns with indexes longer
than 191 characters, you will need to index a smaller number
of characters. In an InnoDB
table, these
column and index definitions are legal:
col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))
To use utf8mb4
instead, the index must be
smaller:
col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))
The preceding types of changes are most likely to be required
only if you have very long columns or indexes. Otherwise, you
should be able to convert your tables from
utf8
to utf8mb4
without
problems. You can do this by using ALTER
TABLE
as described earlier in this section after
upgrading in place to 5.5.
The following items summarize other potential areas of incompatibility:
Performance of four-byte UTF-8 (utf8mb4
)
is slower than for three-byte UTF-8
(utf8
). If you do not want to incur this
penalty, continue to use utf8
.
SET NAMES 'utf8mb4'
causes use of the
four-byte character set for connection character sets. As
long as no four-byte characters are sent from the server,
there should be no problems. Otherwise, applications that
expect to receive a maximum of three bytes per character may
have problems. Conversely, applications that expect to send
four-byte characters must ensure that the server understands
them.
Applications cannot send utf16
or
utf32
character data to an older server
that does not understand them.
For replication, if the character sets that support
supplementary characters are going to be used on the master,
all slaves must understand them as well. If you attempt to
replicate from a MySQL 5.5 master to an older
slave, utf8
data will be seen as
utf8
by the slave and should replicate
correctly. But you cannot send utf8mb4
,
utf16
, or utf32
data.
Also, keep in mind the general principle that if a table has
different definitions on the master and slave, this can lead
to unexpected results. For example, the differences in
limitations on index key length makes it risky to use
utf8
on the master and
utf8mb4
on the slave.
If you have upgraded to MySQL 5.5, and then decide to downgrade back to an older release, these considerations apply:
ucs2
and utf8
data
should present no problems.
Any definitions that refer to the
utf8mb4
, utf16
, or
utf32
character sets will not be
recognized by the older server.
For object definitions that refer to the
utf8mb4
character set, you can dump them
with mysqldump in MySQL 5.5,
edit the dump file to change instances of
utf8mb4
to utf8
, and
reload the file in the older server, as long as there are no
four-byte characters in the data. The older server will see
utf8
in the dump file object definitions
and create new objects that use the (three-byte)
utf8
character set.
User Comments
Add your own comment.