Tuesday, May 26, 2015

DIFFERENCE in MS SQL

So, if I had two sets, A and B, and wanted to get all items that are in A but not B, as well as items in B but not A (i.e. the opposite of an intersection), I would have thought I could just do:

1
SELECT * FROM A
2
DIFFERENCE
3
SELECT * FROM B
Maybe my memory is failing me, but I thought there was a standard SQL difference operator, but after briefly looking around it seems there isn't.
It is possible though, using:
  1. EXCEPT
  2. FULL OUTER JOIN
I just tried it with table variables, set up as below (seems we get the same result with actual tables, but table variables are easier for demo purposes :B)
1
DECLARE @A TABLE (
2
    Number INT NOT NULL PRIMARY KEY
3
);
4
 
5
DECLARE @B TABLE (
6
    Number INT NOT NULL PRIMARY KEY
7
);
8
 
9
INSERT INTO @A (Number) VALUES (2), (4), (6), (8);
10
INSERT INTO @B (Number) VALUES (1), (2), (3), (5), (8);
EXCEPT way:
1
(
2
    (SELECT * FROM @A)
3
    EXCEPT
4
    (SELECT * FROM @B)
5
)
6
UNION
7
(
8
    (SELECT * FROM @B)
9
    EXCEPT
10
    (SELECT * FROM @A)
11
);
Quite verbose, but pretty easy to follow.
The FULL OUTER JOIN way is also pretty easy to follow:
1
SELECT
2
    CASE
3
        WHEN a.Number IS NULL THEN b.Number
4
        ELSE a.Number
5
    END
6
FROM
7
    @A a
8
FULL OUTER JOIN
9
    @B b
10
    ON a.Number = b.Number
11
WHERE
12
    (A.Number IS NULL AND B.Number IS NOT NULL)
13
    OR
14
    (A.Number IS NOT NULL AND B.Number IS NULL);
This is probably less easy to maintain, because you have to specifically reference the columns to match on, as well as choose which set to select values from, as is being done in the case statement. This could probably become a bit unwieldly for a table with lots of columns, and hopefully you don't mess up the NULL checks.
I did try this out on a query on a customers database though, and found that the full outer join method can be faster. Below are the execution plans for the two queries:
404 - Image Not Found
404 - Image Not Found
So, if I'm reading that right, it's having to go through each table more than once in the EXCEPT case, and only once in the FULL OUTER JOIN, which can make quite the... DIFFERENCE... (sorry couldn't help but make that pun :D).