SQL question (Basic)

Shambler · Jan 22, 2013

The following Where statement is basically listed twice with one variable changing. Specifically, DecisionStatusName is first listed as Approved and last listed as Rejected.

Can this Where statement be written without having to list almost everything twice? If so, does it matter?

WHERE
dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Approved'
or dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Rejected'

XOR != OR · Jan 22, 2013

WHERE dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname IN ( 'Approved', 'Rejected' )

I *think* that will work. I make no claims to performance impact of the modification, however.

Shambler · Jan 22, 2013

That worked! Thank you sir.

I assume that this should be done just to keep things clean and small. A best practice.
?

ShoeLace · Jan 22, 2013

The last time I checked IN was faster than OR but I'm not sure if that's the case every time.

Ex, IN version (supplied by XOR):

Code:

WHERE dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname IN ( 'Approved', 'Rejected' )

Ex, OR version:

Code:

WHERE dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and (dbo.decisionstatus.decisionstatusname = 'Approved' OR dbo.decisionstatus.decisionstatusname = 'Rejected')

Shambler · Jan 22, 2013

Nice, both work great.
It looks so much better using either of those.

Thanks peoples.

XOR != OR · Jan 22, 2013

Shambler said:
That worked! Thank you sir.

I assume that this should be done just to keep things clean and small. A best practice.
?

Best practice? *shrug* I do it to keep things readable. I am in the unenviable position of having to support my own code years after I last touch it, so I try to keep things as easy to read ( for myself ) as possible.

But again, my code doesn't need bleeding edge performance, so I haven't had to worry too much about that aspect of it.

mikeblas · Jan 23, 2013

Shambler said:
WHERE
dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Approved'
or dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Rejected'

This statement will never return any rows, since it contains a contradiction.

XOR != OR · Jan 23, 2013

mikeblas said:
This statement will never return any rows, since it contains a contradiction.

I'm missing it; where's the contradiction?

mikeblas · Jan 23, 2013

Code:

...
dbo.decisionstatus.decisionstatusname='Approved'
..
and ...
dbo.decisionstatus.decisionstatusname='Rejected'

DecisionStatusName can never be both 'Approved' and 'Rejected'. Depending on the implementation, OR may or may not bind the way the author (probably) intended it to. The query should be rewritten with parenthesis to indicate the intent.

Shambler · Jan 23, 2013

All 3 ways worked: My original ghetto statement, XOR and Shoelace's.

Going to stick with XOR's for now.

renixinq · Jan 23, 2013

Mike is correct that it may or may not work because the statement is not contained within parenthesis. Depending on multiple factors (statistics, indexing, etc.) the execution plan may interpret your statement differently than intended. I have no doubt that it works for you and while it's not technically wrong it doesn't lend itself to predictable results. Since you are not directly controlling how the data is being returned (remember SQL is declarative and the engine will handle the actual execution and return of data) you want to create a statement that will (hopefully) only function how you intend.

Here would be a better way to format your original statement

WHERE
(dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Approved')
or (dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Rejected')

You now have two clear criteria for the engine to work from now. Either one of shoelace's solutions would be better, in my opinion, when reading the SQL to troubleshoot. But, for something this simple it really doesn't matter and you should go with what you feel is best.

I will say my preference is to have all join criteria as explicit. The dbo.tasknote.WOID = maxresults.WOID clause looks like it should be a join criteria in your join instead of a filter in your where clause. I would suggest you move that statement to your join so your where clause just reads:

where (dbo.tasks.clsdby IS NULL) and (dbo.decisionstatus.decisionstatusname in ('Approved','Rejected'))

XOR != OR · Jan 23, 2013

renixinq said:
Mike is correct that it may or may not work because the statement is not contained within parenthesis. Depending on multiple factors (statistics, indexing, etc.) the execution plan may interpret your statement differently than intended. I have no doubt that it works for you and while it's not technically wrong it doesn't lend itself to predictable results. Since you are not directly controlling how the data is being returned (remember SQL is declarative and the engine will handle the actual execution and return of data) you want to create a statement that will (hopefully) only function how you intend.

Here would be a better way to format your original statement

WHERE
(dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Approved')
or (dbo.tasknote.WOID = maxresults.WOID and dbo.tasks.clsdby IS NULL and dbo.decisionstatus.decisionstatusname='Rejected')

You now have two clear criteria for the engine to work from now. Either one of shoelace's solutions would be better, in my opinion, when reading the SQL to troubleshoot. But, for something this simple it really doesn't matter and you should go with what you feel is best.

I will say my preference is to have all join criteria as explicit. The dbo.tasknote.WOID = maxresults.WOID clause looks like it should be a join criteria in your join instead of a filter in your where clause. I would suggest you move that statement to your join so your where clause just reads:

where (dbo.tasks.clsdby IS NULL) and (dbo.decisionstatus.decisionstatusname in ('Approved','Rejected'))

Ooooo, I get the distinction now.

Very nice. Thank you.

jimmyb · Jan 23, 2013

I don't know SQL - why doesn't the operator precedence avoid a contradiction?

renixinq · Jan 24, 2013

I doubt I'll explain this well but I'll give it a shot.

Remember that with SQL you are not determining how the data is accessed. This is a declarative language and you are basically writing a free form sentence that the engine interrupts to satisfy your request. When writing out filters it is "best" to enclose each filter in parenthesis so the engine will interrupt the filters separately and in the manner you intend. So depending on how the engine decides to access the data it could evaluate the and statements first and then evaluate the or statement last. So looking at his statement it could read something like this:

(dbo.tasknote.WOID = maxresults.WOID and dbo.decisionstatus.decisionstatusname='Rejected' and dbo.decisionstatus.decisionstatusname='Approved')
and (dbo.tasks.clsdby IS NULL and dbo.tasks.clsdby IS NULL)
or dbo.tasknote.WOID = maxresults.WOID

The reason being that the order in which you declare filters doesn't matter. A good example is you can place the dbo.tasks.clsdby IS NULL filters in the explicit join clause of the dbo.tasks table and it will be evaluated the same by the engine. One of the things that has been drilled into me as a BI developer is you model for predictable and consistent SQL. You don't want to be guessing HOW the SQL is being evaluated because my god that sucks when an issue comes up. You better know how to read an execution plan and understand WHY the engine is behaving in the way it is in those cases. That's why when you ask SSMS to produce a SQL script for you it encloses every filter in parenthesis.

Another mind blower is you aren't writing SQL against tables, your writing them against indexes. Tables are basically a logical definition while the indexes determine how the the data is stored and how the engine will access and return the data. So if you are writing SQL statement ALWAYS consider your indexes.

Hope that gives you an idea of why there COULD be a contradiction even though it is functioning as he hoped.

Shambler · Jan 24, 2013

Goodness I have a long road ahead of me!

jimmyb · Jan 24, 2013

renixinq said:
This is a declarative language and you are basically writing a free form sentence that the engine interrupts to satisfy your request.

Does the SQL specification not mandate an order of operations?

mikeblas · Jan 24, 2013

jimmyb said:
Does the SQL specification not mandate an order of operations?

I don't have time to research it, but I don't believe it does. It's possible that I'm right (really, it is!) and it's implementation-dependent. I feel like I remember that it didn't, then the standard said it should, then implementations finally caught up. I just don't remember at the moment. Vendors certainly do specify operator precedence.

But even if we stipulate that the standard does enforce order of operations, relying on that order without using parenthesis results in code that works but isn't very readable or maintainable. If you code while thinking about readability and maintainability, then you show your other team members that you give a damn about them and their time; and you don't have to spend time memorizing minutia from the standards or quirks of the implementation you happen to be using.

jimmyb · Jan 24, 2013

Yes I agree with that.

I think the relevant section of the spec is probably in part 2 of ISO/IEC 9075, which unfortunately I don't have access to.

renixinq · Jan 24, 2013

This goes back to it being a declarative language. So you need to separate your thought process from a procedural language. Filters in the where clause are not a calculation so there is no need for an order of operations.

If you are doing a calculation within your SQL statement there is the traditional order of operations (division multiplication prior to addition subtraction, etc.). The reason there is no order of operations for the application of filters is because you will screw up every single SQL statement you would ever write at that point. It would hamstring the SQL engine from being able to determine the best execution plan based on the structure of how the data is stored, distributed and accessed which goes back to you do not write SQL against tables but rather indexes since the data is either stored as a cluster index (sequential based on the index definition) or as a heap (see unordered). Again, you are not defining how the data is accessed you are defining the requested dataset and then the SQL engine will translate that request into an execution plan for gathering the data from disk or cache, creating temporary data structures in memory and then combining the data in the requested format to return back to the user. Think about the sequence that occurs for data to be returned and it should help clear up why you would not want a defined order of operations for applying filters. Same thing for joins.

SQL question (Basic)

Shambler

Supreme [H]ardness

XOR != OR

[H]F Junkie

Shambler

Supreme [H]ardness

ShoeLace

Gawd

Shambler

Supreme [H]ardness

XOR != OR

[H]F Junkie

mikeblas

[H]ard|DCer of the Month - May 2006

XOR != OR

[H]F Junkie

mikeblas

[H]ard|DCer of the Month - May 2006

Shambler

Supreme [H]ardness

renixinq

Limp Gawd

XOR != OR

[H]F Junkie

jimmyb

2[H]4U

renixinq

Limp Gawd

Shambler

Supreme [H]ardness

jimmyb

2[H]4U

mikeblas

[H]ard|DCer of the Month - May 2006

jimmyb

2[H]4U

renixinq

Limp Gawd