Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.
MySQL uses the average value group size in the following ways:
To estimate how may rows must be read for each
ref access
To estimate how many row a partial join will produce; that is, the number of rows that an operation of this form will produce:
(...) JOINtbl_nameONtbl_name.key=expr
As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.
The average value group size is related to table cardinality,
which is the number of value groups. The
SHOW INDEX statement displays a
cardinality value based on
N/S, where
N is the number of rows in the table
and S is the average value group
size. That ratio yields an approximate number of value groups in
the table.
For a join based on the <=> comparison
operator, NULL is not treated differently
from any other value: NULL <=> NULL,
just as for any other
N <=>
NN.
However, for a join based on the = operator,
NULL is different from
non-NULL values:
is not true when
expr1 =
expr2expr1 or
expr2 (or both) are
NULL. This affects
ref accesses for comparisons
of the form : MySQL will not access
the table if the current value of
tbl_name.key =
exprexpr is NULL,
because the comparison cannot be true.
For = comparisons, it does not matter how
many NULL values are in the table. For
optimization purposes, the relevant value is the average size of
the non-NULL value groups. However, MySQL
does not currently allow that average size to be collected or
used.
For MyISAM tables, you have some control over
collection of table statistics by means of the
myisam_stats_method system
variable. This variable has three possible values, which differ
as follows:
When myisam_stats_method is
nulls_equal, all NULL
values are treated as identical (that is, they all form a
single value group).
If the NULL value group size is much
higher than the average non-NULL value
group size, this method skews the average value group size
upward. This makes index appear to the optimizer to be less
useful than it really is for joins that look for
non-NULL values. Consequently, the
nulls_equal method may cause the
optimizer not to use the index for
ref accesses when it
should.
When myisam_stats_method is
nulls_unequal, NULL
values are not considered the same. Instead, each
NULL value forms a separate value group
of size 1.
If you have many NULL values, this method
skews the average value group size downward. If the average
non-NULL value group size is large,
counting NULL values each as a group of
size 1 causes the optimizer to overestimate the value of the
index for joins that look for non-NULL
values. Consequently, the nulls_unequal
method may cause the optimizer to use this index for
ref lookups when other
methods may be better.
When myisam_stats_method is
nulls_ignored, NULL
values are ignored.
If you tend to use many joins that use
<=> rather than =,
NULL values are not special in comparisons
and one NULL is equal to another. In this
case, nulls_equal is the appropriate
statistics method.
The myisam_stats_method system
variable has global and session values. Setting the global value
affects MyISAM statistics collection for all
MyISAM tables. Setting the session value
affects statistics collection only for the current client
connection. This means that you can force a table's statistics
to be regenerated with a given method without affecting other
clients by setting the session value of
myisam_stats_method.
To regenerate table statistics, you can use any of the following methods:
Set myisam_stats_method,
and then issue a CHECK TABLE
statement
Change the table to cause its statistics to go out of date
(for example, insert a row and then delete it), and then set
myisam_stats_method and
issue an ANALYZE TABLE
statement
Some caveats regarding the use of
myisam_stats_method:
You can force table statistics to be collected explicitly,
as just described. However, MySQL may also collect
statistics automatically. For example, if during the course
of executing statements for a table, some of those
statements modify the table, MySQL may collect statistics.
(This may occur for bulk inserts or deletes, or some
ALTER TABLE statements, for
example.) If this happens, the statistics are collected
using whatever value
myisam_stats_method has at
the time. Thus, if you collect statistics using one method,
but myisam_stats_method is
set to the other method when a table's statistics are
collected automatically later, the other method will be
used.
There is no way to tell which method was used to generate
statistics for a given MyISAM table.
myisam_stats_method applies
only to MyISAM tables. Other storage
engines have only one method for collecting table
statistics. Usually it is closer to the
nulls_equal method.

User Comments
Avoid null values is the best, but, if you have a table with many rows with null values in the index column, regard this out the index. Null is most expensive for the database server in many times.
Add your own comment.