Tips

NULL values

If there exists NULL values in some column, we should be careful when we specify conditions on this column. Always remember to deal with NULL besides other conditions.

+------+------+-----------+
| id   | name | referee_id|
+------+------+-----------+
|    1 | Will |      NULL |
|    2 | Jane |      NULL |
|    3 | Alex |         2 |
|    4 | Bill |      NULL |
|    5 | Zack |         1 |
|    6 | Mark |         2 |
+------+------+-----------+

# this code doesn't select the NULL values
select name
from customer
where referee_id != 2;

# this code does select the NULL values
select name
from customer
where referee_id is NULL or referee_id != 2;

JOIN vs Subquery

There are two general ways when we use subquery:

We only need one column of a table calculated using a subquery. In this case, this subquery can appear in any clause
We need more than one column from a table calculated using a subquery. In this case, we can JOIN the calculated table in FROM clause and refer to the columns in other clauses. (We may use aliases in this case for easy reading.)

JOIN 和 Subquery 基本上都可以完成某些操作，但是简易程度不同。一般搜索单一结果或有限个结果时用Subquery，而需要返回每行或每类对应的结果时用 JOIN (对应 correlated subquery).

Aggregate Function

当在SELECT中使用aggregate function时，除了aggregate function以外的其他所有变量都必须在GROUP BY中，否则会报错。唯一的例外是：如果对于某列，GROUP BY以后，每行对应的值唯一，那么这个值也可以放在SELECT中。例子 leet code 178 Rank Scores
COUNT 中如果需要唯一值，可以用DISINCT, 可以结合多个column做distinct, 比如count(distinct column1, column2)
When we want to count the number under some condition, use sum() function instead of count(). Inside sum() function, use case or if() to express the condition.

Nth largest/smallest

When we want the max or min value, we can use max() or min().
When we want the nth largest or smallest, sort the values first, then use limit 1 offset n-1.

IF() vs CASE

如果在不同条件下取不同值，那么可以用IF（）或者CASE
IF（）适用于只有有两种情况
CASE适用于两种或以上的情况

COALESCE() vs IFNULL()

这两个函数都用来处理NULL值
IFNULL() 适合于只有两种情况
COALESCE() 适合多种情况

Replace

当需要同时update和insert 表格的时候，就用replace.

REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEYor aUNIQUEindex, the old row is deleted before the new row is inserted.

MySQL uses the following algorithm forREPLACE(andLOAD DATA ... REPLACE):

Try to insert the new row into the table

While the insertion fails because a duplicate-key error occurs for a primary key or unique index:

Delete from the table the conflicting row that has the duplicate key value
Try again to insert the new row into the table

REPLACE [LOW_PRIORITY | DELAYED]
    [INTO] tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    [(col_name [, col_name] ...)]
    {VALUES | VALUE} (value_list) [, (value_list)] ...

REPLACE [LOW_PRIORITY | DELAYED]
    [INTO] tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    SET assignment_list

REPLACE [LOW_PRIORITY | DELAYED]
    [INTO] tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    [(col_name [, col_name] ...)]
    SELECT ...

Union vs OR

When there are several conditions to test, we can use OR, or we can use UNION. In theory, they should work at similar speed. But in practice, sometimes, UNION works much faster than OR.
If we want to merge data from 2 columns into 1 column, we can use UNION.

Pay special attention on corner cases:

If there exists NULL values in the table, usually we need to deal with NULL differently.
If the output is one row, but nothing is selected, then we usually should return NULL, which needs special care.

NextSQL

Last updated 5 years ago

Was this helpful?