This first post in a series of posts on “Things That Make You Go ‘D’oh'” – those little “gotchas” in SQL Server’s inner workings that lead to epic facepalms when developers and DBAs finally discover the sources of the problems they cause – explains the issue that inspired me to write this series. I had written a fairly simple bit of T-SQL code that queried a table of real estate transactions and calculated the average price per month in given geographic areas, a basic SELECT AVG() . . . FROM . . . WHERE . . . GROUP BY construct. It ran without issue for many geographic areas, but for others, it returned a mysterious error message:
I couldn’t fathom how this code could generate a conversion error – there were no other numeric data types involved, so what was being converted to int (and failing the conversion)? To demonstrate the problem, I’ll first create a temp table with 100,000 rows of sample data (I use a table-valued function called dbo.randomNumberSeries, originally authored by Jeff Moden, to create the data – see this forum post for the code to create that function and the view it relies on):
(See the link at the end of this post to download the example code.)
Now, the query that causes the error (note that the error may not occur consistently – if you don’t get the error on the first try, just recreate and repopulate the temp table until you do (trust me, you will)):
As you can see, there’s no obvious source of an arithmetic overflow error in this code – all we’re doing is calculating an average that we know will be between 100,000 and 1,000,000, well within the range of the int datatype.
To get to the source of the error, we’ll have to look deep into the execution plan. Since the query won’t run due to the error, though, we’ll start with an estimated execution plan:
Looking a little more closely, we see that the AVG() function actually consists of two steps, a Stream Aggregate that produces a COUNT_BIG() value (Expr1011) and a SUM() value (Expr1012) and a Compute Scalar operator that divides Expr1012 by Expr1011:
This all looks innocuous, but the problem is lurking in the SUM() function. According to Books Online, the SUM() function returns a value of int datatype when the input value is of int datatype. In our sample data, the sum of the sale prices for any given month will quite likely be greater than 2,147,483,647, the upper limit of the int datatype. The arithmetic overflow error occurs when the SUM() function attempts to return that out-of-range value.
The solution to this issue is also the proof of its source. When the input of the SUM() function is a bigint, the return value is also a bigint. When we convert the sale price value to a bigint before applying the SUM() function, the return value will be within the range of the bigint data type and the arithmetic overflow error disappears: