![]() This is the value of a code analysis tool like Prompt. Conversely, I’ve seen overly complex queries defended on the basis that doing it that way avoids some archaic performance issue that the programmer once encountered (like on SQL Server 7.0 or earlier). If it’s not the best choice, they find that out during performance testing, and tune it. Realistically, most programmers would stop when they found the answer that made sense to them, in the moment. Here we’ve set out our candidate solutions up front. I will only choose an alternative, less readable solution if it pays back significantly in terms of performance and scalability. The EXISTS operator is the most natural way to check for the existence of rows based on some criteria and, in our example, it answers the question in the most concise way and reads most like the requirements statement. ![]() Everything after that becomes performance tuning to deal with special cases. However, it holds true in enough cases that it is the best place to start. Sometimes one must contort what could have been a simple query to accommodate a wonky database design. Most of the time, this solution will perform the best too. ![]() Whatever the problem, write the query in the simplest, set-based way possible, so that someone else can read it like a normal, declarative sentence and understand it. My guiding principle is that SQL was always intended to be as close to real, written language as possible. So how do we choose which is the best, or most appropriate, solution? This boils down to readability and then performance, in that order. Which is better, EXISTS or COUNT (or something else)?Įach of these queries gave the same set of rows as output they all give the correct answer. I’d like to say that this was just as a contrived, “ what’s the wackiest idea I can think of” style of solution, but I have seen it in production code more than once (and it’s not even close to being the weirdest solution I’ve seen). However, it doesn’t answer the question in a straightforward way and using DISTINCT is often a code smell, indicating that more rows than necessary were processed, before removing duplicates at the end.Īnother way I solved this problem was to create a temp table of all customers, then delete rows that didn’t have a qualifying order. I’ve seen a lot of people tackle the problem like this, believing that it is the preferred way to do it. They all give you the right results, but which one is “best”, or most appropriate, solution? COUNT One developer suggests a solution that uses EXISTS with a subquery, another a solution that uses COUNT(*) with a subquery, yet another proposes one that uses just JOINs plus a DISTINCT clause in the SELECT. Our requirement is simply to devise a query for the WideWorldImporters sample database that returns a list of the names and email addresses of these customers. For our example, let’s say a client wants to run a special email promotion for anyone who has ever purchased an item from their shop that cost more than $500. The many ways to find correlating rowsĪs with most programming problems, there are several queries that will return the right answer, and finding correlated rows is no different. However, the superior readability bit, on its own, is worth fighting for. I’d rewrite that as “…for superior readability, and performance that will always be comparable and may be better in some complex cases“. Some programmers use COUNT(*) to check to see if there are any rows that match some criteria…it is recommended to use EXISTS() or NOT EXISTS() instead, for superior performance and readability. One of SQL Prompt’s built-in “performance” code analysis rules, PE013, states (paraphrased): This is a guest post from Louis Davidson.
0 Comments
Leave a Reply. |