“We are not using User Defined Table Types anymore! They are bad for performance.”
This is what one of my coworkers told me last week.
Whenever a person makes a statement like this, I first of all would like to know what made him/her say that. To me, a statement in the form of: “We do/don’t do X, because it is bad/good” makes me think about religion. Religion is all about rules, and in IT, there are no rules. In IT, the credo is, and should be: “It depends”.
However, an issue at work has made me decide to move forward with this one.
As most of you know, parameter sniffing can be one of the hardest problems to recognize. Most of the time, a parameter has a single value. I was not sure however how the optimizer would react to a User Defined Table Type when it was used to insert rows by using a stored procedure.
My hypothesis is that SQL Server will re-use the execution plan that was used the first time a stored procedure was called using a UDTT.
I tested it, and it became clear that the execution plan that was re-used was dependent on the first execution of the stored procedure. So: hypothesis confirmed.
Let me show you how I did this. First, I created a little Windows Forms app in C# that asks you how many rows you would like to insert:
Next I say I want it to insert 10 rows. This results in the following query plan:
As you can see, the query plan correctly reflects the right number of rows. In other words: SQL Server can work with a correct estimate, which benefits performance.
Let’s try it again, using a few more rows:
Now take a look at the query plan:
As you can see, the optimizer expected 10 rows because of the first query. Now we have 50 rows actual.
This does not have to be a bad thing if the number of rows you insert into your UDTT does not differ very much, but when it does, you are in for a lot of (performance) trouble. It means the SQL Server Query Optimizer cannot will not look at the number of rows you are handling, but will instead take the execution plan of the latest execution. So when you first insert was like.. 10 rows and now you are inserting 10k rows, SQL Server will use the same execution plan. You might be in trouble.
So what to do against this? Well.. You could tell SQL Server to treat each of the stored procedure executions as if it were the first one that executed. If you had an OLTP system with 10k transactions per second you would not want to have SQL Server to create a new execution plan each time, but for most any BI scenario, you would might. So, let’s add two words to this stored procedure:
So now we told SQL Server to evaluate each time again what would be the best plan to execute this query. Let’s see what happens.
Again, insert 10 rows:
Ok, so SQL Server estimated 10 correctly. No surprise here as this is the first execute. Based on what we saw previously, when inserting 50, we would expect SQL Server to estimate 10 again. Let’s take a look:
Wow! SQL Server used the RECOMPILE keyword, correctly estimating the number of rows that was inserted. This means SQL Server can optimize the execution of your workflow, ensuring your command takes as little time as possible to execute.
I have heard people say: “We are not using UDTT’s anymore because they are bad for performance.” I can understand these people, but I think you need to take time to investigate why things are performing badly. The accompanying Windows testing app I wrote took me less than 1 hour to write and test, and told me a lot about why sql server “messes up” sometimes.
Instead of deciding for or against a certain solution, please take a little time to do some tests. It will make your life a lot easier. You can use the project on my GitHub page to get you started.
If you have any comments or questions, please message me and I will get back to you. Promise.
Recommended reading on this topic:
Table variables do not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100).(https://docs.microsoft.com/en-us/sql/t-sql/data-types/table-transact-sql?view=sql-server-2017)