Certain optimizations are applicable to comparisons that use the
IN operator to test subquery results (or that
use =ANY, which is equivalent). This section
discusses these optimizations, particularly with regard to the
challenges that NULL values present.
Suggestions on what you can do to help the optimizer are given
at the end of the discussion.
Consider the following subquery comparison:
outer_exprIN (SELECTinner_exprFROM ... WHEREsubquery_where)
MySQL evaluates queries “from outside to inside.”
That is, it first obtains the value of the outer expression
outer_expr, and then runs the
subquery and captures the rows that it produces.
A very useful optimization is to “inform” the
subquery that the only rows of interest are those where the
inner expression inner_expr is equal
to outer_expr. This is done by
pushing down an appropriate equality into the subquery's
WHERE clause. That is, the comparison is
converted to this:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereANDouter_expr=inner_expr)
After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery.
More generally, a comparison of N
values to a subquery that returns
N-value rows is subject to the same
conversion. If oe_i and
ie_i represent corresponding outer
and inner expression values, this subquery comparison:
(oe_1, ...,oe_N) IN (SELECTie_1, ...,ie_NFROM ... WHEREsubquery_where)
Becomes:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereANDoe_1=ie_1AND ... ANDoe_N=ie_N)
The following discussion assumes a single pair of outer and inner expression values for simplicity.
The conversion just described has its limitations. It is valid
only if we ignore possible NULL values. That
is, the “pushdown” strategy works as long as both
of these two conditions are true:
When either or both of those conditions do not hold, optimization is more complex.
Suppose that outer_expr is known to
be a non-NULL value but the subquery does not
produce a row such that outer_expr =
inner_expr. Then
evaluates as follows:
outer_expr IN (SELECT
...)
In this situation, the approach of looking for rows with
is no longer
valid. It is necessary to look for such rows, but if none are
found, also look for rows where
outer_expr =
inner_exprinner_expr is
NULL. Roughly speaking, the subquery can be
converted to:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereAND (outer_expr=inner_exprORinner_exprIS NULL))
The need to evaluate the extra IS
NULL condition is why MySQL has the
ref_or_null access method:
mysql>EXPLAIN->SELECT->outer_exprIN (SELECT t2.maybe_null_keyFROM t2, t3 WHERE ...)-> FROM t1; *************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: ref_or_null possible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using where; Using index ...
The unique_subquery and
index_subquery
subquery-specific access methods also have or-null variants.
However, they are not visible in
EXPLAIN output, so you must use
EXPLAIN
EXTENDED followed by SHOW
WARNINGS (note the checking NULL in
the warning message):
mysql>EXPLAIN EXTENDED->SELECT*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquery possible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using index mysql>outer_exprIN (SELECT maybe_null_key FROM t2) FROM t1\GSHOW WARNINGS\G*************************** 1. row *************************** Level: Note Code: 1003 Message: select (`test`.`t1`.`outer_expr`, (((`test`.`t1`.`outer_expr`) in t2 on maybe_null_key checking NULL))) AS `outer_expr IN (SELECT maybe_null_key FROM t2)` from `test`.`t1`
The additional OR ... IS NULL condition makes
query execution slightly more complicated (and some
optimizations within the subquery become inapplicable), but
generally this is tolerable.
The situation is much worse when
outer_expr can be
NULL. According to the SQL interpretation of
NULL as “unknown value,”
NULL IN (SELECT should evaluate to:
inner_expr
...)
For proper evaluation, it is necessary to be able to check
whether the SELECT has produced
any rows at all, so
cannot be pushed
down into the subquery. This is a problem, because many real
world subqueries become very slow unless the equality can be
pushed down.
outer_expr =
inner_expr
Essentially, there must be different ways to execute the
subquery depending on the value of
outer_expr. In MySQL 5.1
before 5.1.16, the optimizer chose speed over distinguishing a
NULL from FALSE result, so
for some queries, you might get a FALSE
result rather than NULL.
As of MySQL 5.1.16, the optimizer chooses SQL compliance over
speed, so it accounts for the possibility that
outer_expr might be
NULL.
If outer_expr is
NULL, to evaluate the following expression,
it is necessary to run the SELECT
to determine whether it produces any rows:
NULL IN (SELECTinner_exprFROM ... WHEREsubquery_where)
It is necessary to run the original
SELECT here, without any
pushed-down equalities of the kind mentioned earlier.
On the other hand, when outer_expr is
not NULL, it is absolutely essential that
this comparison:
outer_exprIN (SELECTinner_exprFROM ... WHEREsubquery_where)
be converted to this expression that uses a pushed-down condition:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereANDouter_expr=inner_expr)
Without this conversion, subqueries will be slow. To solve the dilemma of whether to push down or not push down conditions into the subquery, the conditions are wrapped in “trigger” functions. Thus, an expression of the following form:
outer_exprIN (SELECTinner_exprFROM ... WHEREsubquery_where)
is converted into:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereAND trigcond(outer_expr=inner_expr))
More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:
(oe_1, ...,oe_N) IN (SELECTie_1, ...,ie_NFROM ... WHEREsubquery_where)
and converts it to this expression:
EXISTS (SELECT 1 FROM ... WHEREsubquery_whereAND trigcond(oe_1=ie_1) AND ... AND trigcond(oe_N=ie_N) )
Each trigcond(
is a special function that evaluates to the following values:
X)
X when the “linked”
outer expression oe_i is not
NULL
TRUE when the “linked” outer
expression oe_i is
NULL
Note that trigger functions are not
triggers of the kind that you create with
CREATE TRIGGER.
Equalities that are wrapped into trigcond()
functions are not first class predicates for the query
optimizer. Most optimizations cannot deal with predicates that
may be turned on and off at query execution time, so they assume
any trigcond( to
be an unknown function and ignore it. At the moment, triggered
equalities can be used by those optimizations:
X)
Reference optimizations:
trigcond( can be
used to construct X=Y
[OR Y IS NULL])ref,
eq_ref, or
ref_or_null table
accesses.
Index lookup-based subquery execution engines:
trigcond(
can be used to construct
X=Y)unique_subquery or
index_subquery accesses.
Table-condition generator: If the subquery is a join of several tables, the triggered condition will be checked as soon as possible.
When the optimizer uses a triggered condition to create some
kind of index lookup-based access (as for the first two items of
the preceding list), it must have a fallback strategy for the
case when the condition is turned off. This fallback strategy is
always the same: Do a full table scan. In
EXPLAIN output, the fallback
shows up as Full scan on NULL key in the
Extra column:
mysql>EXPLAIN SELECT t1.col1,->t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquery possible_keys: key1 key: key1 key_len: 5 ref: func rows: 2 Extra: Using where; Full scan on NULL key
If you run EXPLAIN
EXTENDED followed by SHOW
WARNINGS, you can see the triggered condition:
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `test`.`t1`.`col1` AS `col1`,
<in_optimizer>(`test`.`t1`.`col1`,
<exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2
on key1 checking NULL
where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having
trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS
`t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)`
from `test`.`t1`
The use of triggered conditions has some performance
implications. A NULL IN (SELECT ...)
expression now may cause a full table scan (which is slow) when
it previously did not. This is the price paid for correct
results (the goal of the trigger-condition strategy was to
improve compliance and not speed).
For multiple-table subqueries, execution of NULL IN
(SELECT ...) will be particularly slow because the
join optimizer doesn't optimize for the case where the outer
expression is NULL. It assumes that subquery
evaluations with NULL on the left side are
very rare, even if there are statistics that indicate otherwise.
On the other hand, if the outer expression might be
NULL but never actually is, there is no
performance penalty.
To help the query optimizer better execute your queries, use these tips:
A column must be declared as NOT NULL if
it really is. (This also helps other aspects of the
optimizer.)
If you don't need to distinguish a NULL
from FALSE subquery result, you can
easily avoid the slow execution path. Replace a comparison
that looks like this:
outer_exprIN (SELECTinner_exprFROM ...)
with this expression:
(outer_exprIS NOT NULL) AND (outer_exprIN (SELECTinner_exprFROM ...))
Then NULL IN (SELECT ...) will never be
evaluated because MySQL stops evaluating
AND parts as soon as the
expression result is clear.

User Comments
Yes, "IN" sub-query is really expensive. You can also use a dummy table like
select outer.f1,outer.f2 from outer_table outer, (select inner.f1 as inf1 from inner_table inner WHERE inner.f2="<<expression>>") as dummy where outer.f1= dummy.inf1;
It seems easy and works for me.
Everybody should look closely at Vimal's suggestion. It worked amazingly well (fast) for me.
Add your own comment.