Should I use OUTER JOIN instead of NOT EXISTS?

NOT EXISTS in a query can readily be replaced by an OUTER JOIN.
Here is how

CREATE TABLE Emp(EmpID INT, EmpName VARCHAR(20))
INSERT INTO Emp(EmpID, EmpName)
SELECT 1, 'Amit' UNION
SELECT 2, 'Sachin' UNION
SELECT 3, 'Sonam'

CREATE TABLE OnLeave(OnLeaveID INT, EmpID INT)
INSERT INTO OnLeave(OnLeaveID, EmpID)
SELECT 1, 2 UNION
SELECT 2, 3

--Using NOT EXISTS
SELECT Emp.EmpName FROM Emp
WHERE NOT EXISTS
	(SELECT 1 FROM OnLeave 
	WHERE OnLeave.EmpID = Emp.EmpID)

--Using LEFT OUTER JOIN
SELECT Emp.EmpName FROM Emp
LEFT OUTER JOIN OnLeave ON OnLeave.EmpID = Emp.EmpID
WHERE OnLeave.OnLeaveID IS NULL

This is because of the fact that in an LEFT OUTER JOIN every row in the left table is represented in the result set, whether or not there are any matching rows from the right table. If for a particular row of the left table there is no matching row from the right table, then the columns in that row which would have come from the right table will be NULL. The test for NULL should be made on a column that you are sure it wont contain a NULL value; for example primary key of the right table. Also it should be noted that a record will be repeated if there are more than one match in the right table. In that case we will have to use DISTINCT in the select line.

If proper indexes are in place, the OUTER JOIN method and WHERE EXISTS method are basically identical when it comes to performance. However there have been many case when I have replaced a NOT EXISTS with an OUTER JOIN and got reduced execution time. People keep on telling that both have almost identical query plan. And you should opt for NOT EXISTS as it is more readable. But I feel LEFT OUTER JOINT method is also equally readable. For example

UPDATE TblOne SET Errors = 1
FROM TblOne
WHERE NOT EXISTS
	(SELECT 1 FROM TblTwo
	WHERE TblTwo.TblOneId = TblOne.TblOneId
	AND TblTwo.Col1 = TblOne.Col1
	AND TblTwo.Chk = 'Y' )

UPDATE TblOne SET Errors = 1
FROM TblOne
LEFT OUTER JOIN TblTwo
	ON TblTwo.TblOneId = TblOne.TblOneId
	AND TblTwo.Col1 = TblOne.Col1
	AND TblTwo.Chk = 'Y'
WHERE TblTwo.TblTwoId IS NULL

What I do is, I keep both options handy. For queries with a very low execution time I go for NOT EXISTS just for the fact that it is easier for others to later understand what I am doing in the query. For heavier queries which have higher execution time I try both options and keep the one which gives better execution time. In general I have noticed that replacing NOT EXISTS with OUTER JOIN gives performance improvement when used in DELETE and UPDATE statements.

About these ads

About Amit Singh

Software engineer; crazy about music, astronomy and radio controlled aircrafts. Big science/technology buff!
This entry was posted in Programming, SQL and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s