A key determinant of the efficiency of compression in reducing
the size of data files is the nature of the data itself.
Recall that compression works by identifying repeated strings
of bytes in a block of data. Completely randomized data is the
worst case. Typical data often has repeated values, and so
compresses effectively. Character strings often compress well,
whether defined in CHAR
,
VARCHAR
, TEXT
or
BLOB
columns. On the other hand, tables
containing mostly binary data (integers or floating point
numbers) or data that is previously compressed (for example
JPEG
or
PNG
images) may not generally compress well, significantly or at
all.
Compression is chosen on a table by table basis with the
InnoDB storage engine, and a table and all of its indexes use the
same (compressed) page size. It might be that the primary key
(clustered) index, which contains the data for all columns of
a table, compresses more effectively than the secondary
indexes. For those cases where there are long rows, the use of
compression may result in long column values being stored
“off-page”, as discussed in
Section 5.3, “DYNAMIC
Row Format”. Those overflow
pages may compress well. Given these considerations, for many
applications, some tables compress more effectively than
others, and you may find that your workload performs best only
with a subset of tables compressed.
Experimenting is the only way to determine whether or not to
compress a particular table. InnoDB compresses data in 16K
chunks corresponding to the uncompressed page size, and in
addition to user data, the page format includes some internal
system data that is not compressed. Compression utilities
compress an entire stream of data, and so may find more
repeated strings across the entire input stream than InnoDB
would find in a table compressed in 16K chunks. But you can
get a sense of how compression efficiency by using a utility
that implements LZ77 compression (such as
gzip
or WinZip) on your data file.
Another way to test compression on a specific table is to copy
some data from your uncompressed table to a similar,
compressed table (having all the same indexes) and look at the
size of the resulting file. When you do so (if nothing else
using compression is running), you can examine the ratio of
successful compression operations to overall compression
operations. (In the INNODB_CMP
table,
compare COMPRESS_OPS
to
COMPRESS_OPS_OK
. See
INNODB_CMP
for more information.) If a high percentage of compression
operations complete successfully, the table might be a good
candidate for compression.
This is the User’s Guide for InnoDB storage engine 1.1 for MySQL 5.5, generated on 2010-04-13 (revision: 19994) .