Analyze an Estimated Execution Plan at RunTime

There are many reasons why one might want to analyze an Estimated Execution Plan for a query. Of course you can analyze it in the graphical Execution Plan display tab in SSMS, but I wanted to analyze the more-complete XML plan.
And, (here’s the kicker) I wanted to do it at RUNTIME.
This post outlines my journey in finding a way to do this. The eventual solution was implemented as a CLR procedure with a wrapper SQL procedure to do the analysis.

The Problem

I was working on an application whose frontend allowed an advanced user to write queries against underlying views.  Even thought the user was restricted to just SELECT access against certain views to prevent him/her from wreaking havoc, there were still actions we wanted to take based on how they were writing the query.  For example, we might want to prevent the query from executing if (a) the query were written in a way that would return too much data, or (b) if the query would have run slowly, or (c) if the user were SQL-experienced and was trying to execute something like sp_help, which in some cases we weren’t able to deny execute to for the user (for example: in SQLAzure, you’re currently not able to deny execute to this).

The (Possible) Solution

I thought that getting the Execution Plan in XML and analyzing it would be a good way to get the information we needed.  Indeed, we could query the XML for all we wanted.

For pedagogical purposes, let’s just use a select from sys.objects for an example of a statement to analyze:

SELECT * FROM sys.objects

If we just wanted to use the IDE, we could get the estimated plan by just clicking the “Display Estimated Execution Plan” button in the toolbar.

From there, you can get view the Plan.

Now Graphical Plans are nice to look through, but they mask detailed data for the plan. The XML plan is far more complete. You can right click and select “Show Execution Plan XML”. And that can give you an XML file you can view and save:

All this is well and good, but what if you wanted to do it at RUNTIME, like I want to?  This brings up a series of issues.

Options and Respective Issues

Obviously, the IDE isn’t going to work for runtime.  So the question becomes: how can we store this XML in a variable at runtime on which we can do an XQuery?  Can we do this programmatically?

Option:  Tracing

You could use a SQL trace to capture the showplan XML, as outlined in this post:

http://blogs.msdn.com/b/sqlqueryprocessing/archive/2007/06/01/store-statistics-xml-in-database-tables-using-sql-traces-for-further-analysis.aspx

The issue: the query would still have to be executed.  I want to get the estimated execution plan, and then decide whether or not to run the query.

Option:  Query DMVs

You could query dynamic management views and dynamic management functions, as in the snippet below.  (The example XQuery to analyze the XML from the previous link)

DECLARE @PlanXML xml;
DECLARE @SQL nvarchar(max)
SET @SQL = 'SELECT * FROM sys.objects';

SELECT @PlanXML = [QP].[query_plan]
FROM sys.dm_exec_cached_plans CP
CROSS APPLY sys.dm_exec_sql_text(CP.plan_handle) ST
CROSS APPLY sys.dm_exec_query_plan(CP.plan_handle) QP
WHERE [ST].[text] = @SQL;

SELECT @PlanXML AS PlanXML;

WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS sql)
SELECT
	 ro.relop.value('@NodeId', 'int')					AS NodeId
	,ro.relop.value('@PhysicalOp', 'nvarchar(200)')		AS PhysicalOp
	,ro.relop.value('@LogicalOp', 'nvarchar(200)')		AS LogicalOp
	,(ro.relop.value('@EstimateRows', 'float')
		* (ro.relop.value('@EstimateRewinds', 'float')
		+  ro.relop.value('@EstimateRebinds', 'float')
		+ 1.0))											AS EstimateRows
	,CAST(ro.relop.exist('*/sql:RelOp') AS bit)			AS IsNotLeaf
FROM @PlanXML.nodes('//sql:RelOp') AS ro(relop)

The issue: again, the query would have had to have been kicked off and cached first. If the plan were cached, we’d get this nice output:

However, we can’t count on the text of that query having been cached.

Option:  SET SHOWPLAN_XML

You’ll find that behind the scenes when you click “Display Estimated Execution Plan”, what Management Studio is actually doing is this:

SET SHOWPLAN_XML ON
GO

SELECT * FROM sys.objects
GO

SET SHOWPLAN_XML OFF
GO

Provided the user has proper rights,:

GRANT SHOWPLAN TO [username]

you’ll get what you want.

The issue:  good luck trying to save this off as an XML variable.  Because after you have SHOWPLAN_XML on, anything executed after that will not actually be executed.  The XML for the Plan of anything you try to do (including saving to a variable) will just be returned in the results tab.  We’re still getting nowhere.

Option: Dynamic SQL

Not a bad idea.  We could execute everything in dynamic SQL and insert it into a table.   Let’s try it:

IF object_id('tempdb..#EstimatedPlans') IS NOT NULL
BEGIN
   DROP TABLE #EstimatedPlans
END

CREATE TABLE #EstimatedPlans
(
   PlanXML xml
)

DECLARE @SQL nvarchar(max)
SET @SQL =
'
	SET SHOWPLAN_XML ON
	GO

	SELECT * FROM sys.objects
	GO

	SET SHOWPLAN_XML OFF
	GO
'

INSERT INTO #EstimatedPlans
EXEC sp_executesql @SQL

You may have already noticed the problem with this. “GO” is for separating batches in Management Studio. You can’t use it in a batch of dynamic SQL. You’ll get:

Msg 102, Level 15, State 1, Line 3
Incorrect syntax near ‘GO’.

Ok, let’s take out the “GO”s then:

IF object_id('tempdb..#EstimatedPlans') IS NOT NULL
BEGIN
   DROP TABLE #EstimatedPlans
END

CREATE TABLE #EstimatedPlans
(
   PlanXML xml
)

DECLARE @SQL nvarchar(max)
SET @SQL =
'
	SET SHOWPLAN_XML ON

	SELECT * FROM sys.objects

	SET SHOWPLAN_XML OFF
'

INSERT INTO #EstimatedPlans
EXEC sp_executesql @SQL

The issue:

Msg 1067, Level 15, State 1, Line 0
The SET SHOWPLAN statements must be the only statements in the batch.

Arrrg! We have to execute multiple batches on the same connection to turn SHOWPLAN_XML on and pass in our query. But we can’t execute multiple batches on the same connection in Dynamic SQL!

But wait! What about this:

DECLARE @SQL nvarchar(max)

SET @SQL = 'SET SHOWPLAN_XML ON'
EXEC sp_executesql @SQL

SET @SQL = 'SELECT * FROM sys.objects'
EXEC sp_executesql @SQL

SET @SQL = 'SET SHOWPLAN_XML OFF'
EXEC sp_executesql @SQL

Sorry, all you get now is the output of SELECT * FROM sys.objects, since each dynamic SQL execution is in its own scope that returns back to the calling scope. Results:

So how do we execute multiple batches on the same connection?

Option:  CLR to the rescue!

Perhaps if we created a CLR procedure and pass in the SQL, it could execute “SET SHOWPLAN_XML ON” first, and then the input SQL on the same open connection, and return the XML results.  Huzzah!

I’ll spare you the pain of the code samples from my CLR failures, and just summarize the issues I had to address along the way:

  1. You can usually return data in a dataset from a CLR procedure using
    SqlContext.Pipe.Send(sqlDataRecord);
    

    The issue with this is that this requires a SQLConnection with the connection string of “context connection=true” to run on the same database.  Unfortunately, there are some limitations to this connection that will prevent us from doing what we want to do:

    http://msdn.microsoft.com/en-us/library/ms131101.aspx

    Ultimately, we need to pass in the server and database name and open up a new connection.

  2. I tried to build a connection string and open up that connection, get the XML, and then open up a context connection to pass the data out.  This will work in some cases, but will error sometimes if you try to do an EXEC INTO (output the data of the CLR proc into a temp table)

    Long story short: I needed use an output parameter.

  3. Tried the output type of SQLString from the CLR procedure (since working with strings seemed easier than XML), but SQLString has a max length of 4000 and thus wouldn’t work.

    I had to have a SqlXml output parameter.

  4. Packaging up a SqlXml datatype took me a bit of searching to figure out.  Here’s how to do it:
//Package XML as output
System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
xmlDoc.InnerXml = //The XML in a string here//
System.Xml.XmlNodeReader xnr = new System.Xml.XmlNodeReader(xmlDoc);
PlanXML = new SqlXml(xnr);

The Final Solution

Here’s the final CLR procedure I created to solve my problem. Note I had to prepare a connection string from passed-in parameters. Next, I opened up a single connection and executed two batches: the first to set SHOWPLAN_XML ON, the second to run the input SQL. I then pulled out the result into a dataset, took the first column of the first row of the first table and packaged it up as a SqlXml type.

    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void GetShowPlanXML
    (
          SqlString SQL
        , out SqlXml PlanXML
        , SqlString server
        , SqlString database
        , SqlBoolean isIntegratedSecurity
        , SqlString loginIfNotIntegrated
        , SqlString passwordIfNotIntegrated
    )
    {
        //Prep connection
        string strConnectionString;

        if ((bool)isIntegratedSecurity)
        {
            strConnectionString = @"Data Source="
                + server.ToString()
                + @";Initial Catalog="
                + database.ToString()
                + @";Integrated Security=True";
        }
        else
        {
            strConnectionString = @"data source="
                + server.ToString()
                + @";initial catalog="
                + database.ToString()
                + @";Persist Security Info=True;User ID="
                + loginIfNotIntegrated.ToString()
                + @";Password="
                + passwordIfNotIntegrated.ToString();
        }

        SqlConnection cn = new SqlConnection(strConnectionString);

        //Set command texts
        SqlCommand cmd_SetShowPlanXml = new SqlCommand("SET SHOWPLAN_XML ON", cn);
        SqlCommand cmd_input = new SqlCommand(SQL.ToString(), cn);

        if (cn.State != ConnectionState.Open)
        {
            cn.Open();
        }

        //Run SET SHOWPLAN_XML ON
        cmd_SetShowPlanXml.ExecuteNonQuery();

        //Run input SQL
        SqlDataAdapter da = new SqlDataAdapter();
        DataSet ds = new DataSet();

        da.SelectCommand = cmd_input;
        ds.Tables.Add(new DataTable("Results"));

        ds.Tables[0].BeginLoadData();
        da.Fill(ds, "Results");
        ds.Tables[0].EndLoadData();

        if (cn.State != ConnectionState.Closed)
        {
            cn.Close();
        }

        //Package XML as output
        System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
        //XML is in 1st Col of 1st Row of 1st Table
        xmlDoc.InnerXml = ds.Tables[0].Rows[0][0].ToString();
        System.Xml.XmlNodeReader xnr = new System.Xml.XmlNodeReader(xmlDoc);
        PlanXML = new SqlXml(xnr);
    }

Below is the example wrapper TSQL procedure I created around the CLR procedure. The purpose of this wrapper is twofold. One, I wanted it to assign default connections for server and database. Two, I’ve put in an example XML analysis that I might do.

CREATE PROCEDURE [dbo].[GetEstimatedExecutionPlanInfo]
(
	 @SQL nvarchar(max)
	,@server nvarchar(4000)						= NULL
	,@database nvarchar(4000)					= NULL
	,@isIntegratedSecurity bit					= 1
	,@loginIfNotIntegrated nvarchar(4000)		= NULL
	,@passwordIfNotIntegrated nvarchar(4000)	= NULL
)
AS
BEGIN

	--Use current connection as default
	IF @server IS NULL
		SET @server = (SELECT @@SERVERNAME)

	IF @database IS NULL
		SET @database = (SELECT DB_NAME())

	--Get XML from CLR Proc
	DECLARE @PlanXML xml

	EXEC [dbo].[GetShowPlanXML]
		 @SQL
		,@PlanXML OUTPUT
		,@server
		,@database
		,@isIntegratedSecurity
		,@loginIfNotIntegrated
		,@passwordIfNotIntegrated

	--Return Plan
	SELECT @PlanXML AS PlanXML

	--Return info about Plan
	;WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS sql)
	SELECT
		 ro.relop.value('@NodeId', 'int')					AS NodeId
		,ro.relop.value('@PhysicalOp', 'nvarchar(200)')		AS PhysicalOp
		,ro.relop.value('@LogicalOp', 'nvarchar(200)')		AS LogicalOp
		,(ro.relop.value('@EstimateRows', 'float')
			* (ro.relop.value('@EstimateRewinds', 'float')
			+  ro.relop.value('@EstimateRebinds', 'float')
			+ 1.0))											AS EstimateRows
		,CAST(ro.relop.exist('*/sql:RelOp') AS bit)			AS IsNotLeaf
	FROM @PlanXML.nodes('//sql:RelOp') AS ro(relop)

END
GO

Here’s a test execution:

DECLARE @SQL nvarchar(max);
SET @SQL = 'SELECT * FROM sys.objects'

EXEC [dbo].[GetEstimatedExecutionPlanInfo] @SQL

Yay! The output we wanted, it’s during runtime, and we didn’t have to execute the query! Again, this is an example XML analysis that I might do; this could be used later on in code to decide whether or not to execute the user’s query.

Now CLR isn’t the only solution. I imagine the front end application could do the work that the CLR procedure is doing to get the plan. Indeed, this might have to be the option if we’re in SQLAzure, where CLR isn’t currently supported. And yes, there’d be a lot of analysis of the XML we’ll have to decide on and develop. But we’re on our way.

Aggregate to CSV in TSQL

Sometimes we need to rollup string values into lists of comma separated values. Using “FOR XML PATH” per column that you want to rollup can provide a way to do this.

Let’s say you have a data set like so:

id fruit city
101 Apple Cleveland
101 Apple Pittsburgh
101 Banana Pittsburgh
102 Grape Cleveland
102 Melon Cleveland
103 Melon Pittsburgh
103 Melon Cleveland

and for each distinct “id”, you want a list of its distinct fruits and cities.

If you put a FOR XML PATH statement for each column you want to rollup like so:

...
----------fruit---------
,STUFF((SELECT DISTINCT ', ' + NULLIF(fruit,'')
FROM tableToCSVRollup t2
WHERE t1.id = t2.id
FOR XML PATH(''),TYPE
).value('.','VARCHAR(MAX)')
,1,2,'') AS fruit
...

and then GROUP BY id, you’ll have a nice aggregated set. It will even put it in alphabetical order and remove NULL and empty string values for you.

Complete code example:

;WITH tableToCSVRollup (id,fruit,city)
AS 
(
			  SELECT 101, 'Apple', 'Cleveland'
	UNION ALL SELECT 101, 'Apple', 'Pittsburgh'
	UNION ALL SELECT 101, 'Banana','Pittsburgh'
	UNION ALL SELECT 102, 'Grape', 'Cleveland'
	UNION ALL SELECT 102, 'Melon', 'Cleveland'
	UNION ALL SELECT 103, 'Melon', 'Pittsburgh'
	UNION ALL SELECT 103, 'Melon', 'Cleveland'
)
SELECT
id
----------fruit---------
,STUFF((SELECT DISTINCT ', ' + NULLIF(fruit,'')
FROM tableToCSVRollup t2
WHERE t1.id = t2.id
FOR XML PATH(''),TYPE
).value('.','VARCHAR(MAX)')
,1,2,'') AS fruit
----------city----------
,STUFF((SELECT DISTINCT ', ' + NULLIF(city,'')
FROM tableToCSVRollup t2
WHERE t1.id = t2.id
FOR XML PATH(''),TYPE
).value('.','VARCHAR(MAX)')
,1,2,'') AS city
------------------------
FROM tableToCSVRollup t1
GROUP BY id
ORDER BY id

Results:

id fruit city
101 Apple, Banana Cleveland, Pittsburgh
102 Grape, Melon Cleveland
103 Melon Cleveland, Pittsburgh

——————————————

GROUP BY CUBE Example

The GROUP BY CUBE statement was introduced with SQL Server 2008 and is pretty sweet. Here’s an example.

First, let’s create our table and add data:

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[GroupByCubeExample]') AND type in (N'U'))
	DROP TABLE [dbo].[GroupByCubeExample]
GO

CREATE TABLE [dbo].[GroupByCubeExample]
(
	 [Region] [nvarchar](50) NOT NULL
	,[Product] [nvarchar](50) NOT NULL
	,[Salesman] [nvarchar](50) NOT NULL
	,[Amount] [money] NOT NULL
)
GO

INSERT INTO dbo.GroupByCubeExample
(Region,Product,Salesman,Amount)
		  SELECT 'America'	,'Bike'	,'Bean, Adam'		,10
UNION ALL SELECT 'America'	,'Bike'	,'Stanford, Matt'	,20
UNION ALL SELECT 'America'	,'Car'	,'Bean, Adam'		,10000.0000
UNION ALL SELECT 'America'	,'Car'	,'Stanford, Matt'	,20000.0000
UNION ALL SELECT 'Canada'	,'Bike'	,'Bean, Adam'		,10
UNION ALL SELECT 'Canada'	,'Bike'	,'Stanford, Matt'	,20
UNION ALL SELECT 'Canada'	,'Car'	,'Bean, Adam'		,10000.0000
UNION ALL SELECT 'Canada'	,'Car'	,'Stanford, Matt'	,20000.0000 

Then, we can SUM using GROUP BY CUBE for some nice automatic rollups.

SELECT
	 Region
	,Product
	,Salesman
	,SUM(Amount) AS Amount
FROM dbo.GroupByCubeExample
GROUP BY CUBE
(
	 Region
	,Product
	,Salesman
)
ORDER BY 
	 Region
	,Product
	,Salesman

Results (NULLs mean ALL):

Region Product Salesman Amount
NULL NULL NULL 60060.00
NULL NULL Bean, Adam 20020.00
NULL NULL Stanford, Matt 40040.00
NULL Bike NULL 60.00
NULL Bike Bean, Adam 20.00
NULL Bike Stanford, Matt 40.00
NULL Car NULL 60000.00
NULL Car Bean, Adam 20000.00
NULL Car Stanford, Matt 40000.00
America NULL NULL 30030.00
America NULL Bean, Adam 10010.00
America NULL Stanford, Matt 20020.00
America Bike NULL 30.00
America Bike Bean, Adam 10.00
America Bike Stanford, Matt 20.00
America Car NULL 30000.00
America Car Bean, Adam 10000.00
America Car Stanford, Matt 20000.00
Canada NULL NULL 30030.00
Canada NULL Bean, Adam 10010.00
Canada NULL Stanford, Matt 20020.00
Canada Bike NULL 30.00
Canada Bike Bean, Adam 10.00
Canada Bike Stanford, Matt 20.00
Canada Car NULL 30000.00
Canada Car Bean, Adam 10000.00
Canada Car Stanford, Matt 20000.00

And if you don’t like the NULLs, you can use a WHEN GROUPING() statement to better format:

SELECT
	 CASE
		WHEN GROUPING(Region) = 1	THEN 'ALL REGIONS'
		ELSE Region
	 END AS Region
	,CASE
		WHEN GROUPING(Product) = 1	THEN 'ALL PRODUCTS'
		ELSE Product
	 END AS Product
	,CASE
		WHEN GROUPING(Salesman) = 1	THEN 'ALL SALESMAN'
		ELSE Salesman
	 END AS Salesman
	,SUM(Amount) AS Amount
FROM dbo.GroupByCubeExample
GROUP BY CUBE
(
	 Region
	,Product
	,Salesman
)
ORDER BY 
	 Region
	,Product
	,Salesman

Results:

Region Product Salesman Amount
ALL REGIONS ALL PRODUCTS ALL SALESMAN 60060.00
ALL REGIONS ALL PRODUCTS Bean, Adam 20020.00
ALL REGIONS ALL PRODUCTS Stanford, Matt 40040.00
ALL REGIONS Bike ALL SALESMAN 60.00
ALL REGIONS Bike Bean, Adam 20.00
ALL REGIONS Bike Stanford, Matt 40.00
ALL REGIONS Car ALL SALESMAN 60000.00
ALL REGIONS Car Bean, Adam 20000.00
ALL REGIONS Car Stanford, Matt 40000.00
America ALL PRODUCTS ALL SALESMAN 30030.00
America ALL PRODUCTS Bean, Adam 10010.00
America ALL PRODUCTS Stanford, Matt 20020.00
America Bike ALL SALESMAN 30.00
America Bike Bean, Adam 10.00
America Bike Stanford, Matt 20.00
America Car ALL SALESMAN 30000.00
America Car Bean, Adam 10000.00
America Car Stanford, Matt 20000.00
Canada ALL PRODUCTS ALL SALESMAN 30030.00
Canada ALL PRODUCTS Bean, Adam 10010.00
Canada ALL PRODUCTS Stanford, Matt 20020.00
Canada Bike ALL SALESMAN 30.00
Canada Bike Bean, Adam 10.00
Canada Bike Stanford, Matt 20.00
Canada Car ALL SALESMAN 30000.00
Canada Car Bean, Adam 10000.00
Canada Car Stanford, Matt 20000.00