I have the following query in MSSQL
SELECT TOP 50 CustomerID FROM Ratings
WHERE CustomerID != 915
AND MovieID IN (SELECT DISTINCT MovieID FROM Ratings WHERE CustomerID = 915)
GROUP BY CustomerID
ORDER BY count(*) DESC
It is super fast. When I try to use it in a subquery like this.
SELECT * FROM Ratings
WHERE MovieID = 1 AND
CustomerID IN (SELECT TOP 50 CustomerID FROM Ratings
WHERE CustomerID != 915
AND MovieID IN (SELECT DISTINCT MovieID FROM Ratings WHERE CustomerID = 915)
GROUP BY CustomerID
ORDER BY count(*) DESC)
Any ideas on why this is so slow and how I can speed it up? My primary key is (MovieID-CustomerID) and I added a index on CustomerID
From stackoverflow
-
You'll need to create an extra
UNIQUEindex on(CustomerID, MovieID)(in this order) to improve this query.See the article in my blog for performance details:
Since your subqueries return
UNIQUEsets of values, the query may be rewritten as aJOIN:SELECT r2.* FROM ( SELECT TOP 50 CustomerID FROM ( SELECT MovieID FROM Ratings WHERE CustomerID = 915 ) q JOIN Ratings r ON r.MovieID = q.MovieID AND CustomerID <> 915 GROUP BY CustomerID ORDER BY COUNT(*) DESC ) ro JOIN Ratings r2 ON r2.MovieID = 1 AND r2.CustomerID = ro.CustomerIDTo select
TOP 50customers of those who have ratedMovie 1, use:SELECT r2.* FROM ( SELECT TOP 50 CustomerID FROM ( SELECT MovieID FROM Ratings WHERE CustomerID = 915 ) q JOIN Ratings r ON r.MovieID = q.MovieID AND CustomerID <> 915 AND EXISTS ( SELECT 1 FROM Ratings re WHERE re.MovieID = 1 AND re.CustomerID = r.CustomerID ) GROUP BY CustomerID ORDER BY COUNT(*) DESC ) ro JOIN Ratings r2 ON r2.MovieID = 1 AND r2.CustomerID = ro.CustomerIDJames Van Boxtel : Awesome! Only thing is it sometimes returns slightly less than 50, probably cause some of the top people haven't rated MovieID 1. But I should be able to figure out the rest.James Van Boxtel : Sweet I got my thing fully working. BTW the query analyzer suggested the same index as you and it helped a lot.
0 comments:
Post a Comment