T-SQL Transpose

Recently I have been working on a clinical trial application and needed a way to export all of the data. The client wants the questions to be displayed horizontally, but the data is stored vertically. This is the structure of the data:

Pretty simple schema, the issue is here. This is what they want it to look like:

At first this was a little intimidating, because there are different types of questions in separate tables that would need to be combined and then transposed together. I decided to use the PIVOT function. Now there was an another requirement that needed to be addressed, the columns had to be sorted. This took a little bit of data massaging to get it in the correct pre-pivot format.

To set the stage I have a number of pre-processing steps that I won’t show here, but just know that those steps result in temp table #data. Now that I have all of the data in place, I need to order the questions so that they pivot correctly:

-- PRE-PIVOT PROCESSING
SELECT 
    [CaseId] ,    
    [Title],
    [Value]
INTO #ordered
FROM #data
ORDER BY    
    [CaseId],
    [Title]

Here comes the interesting part. I need to extract all of the columns into a comma delimited string for use in the pivot. For this I need a VARCHAR to hold the delimited fields. Now I can take the data from the #ordered table and generate the fields:

DECLARE @ColumnName AS NVARCHAR(MAX)
 
SELECT @ColumnName= ISNULL(@ColumnName + ',','') 
       + QUOTENAME([Title])
FROM (SELECT DISTINCT [Title]  FROM #ordered ) AS c ORDER BY [Title]

There is another function in there that I didn’t know about before I needed to do this: QUOTENAME. This will take the string and wrap it in square brackets. This allows for strings with spaces to be used for a column name. At this point, we have the @ColumnName variable set with a delimited set of question titles.

Now that we have our columns set, we need to generate the sql that will be executed. For this we need to get the data. We will take use a CTE to collect the data with the new list of columns and pump that into the PIVOT function. The PIVOT function take the aggregate that you want applied to the value field and the column that you want to PIVOT, in my case the @ColumnName array. Lastly, I need to have the rows grouped by the unique CaseId. In the end it looks like this:

DECLARE @query NVARCHAR(MAX);
 
SET @query = ';WITH p ([CaseId],'+ @ColumnName +') as (SELECT [CaseId],' + @ColumnName + ' from 
(
    SELECT  
        [CaseId], 
        [Title],
        [Value]
    FROM 
        #ordered
) x PIVOT (MAX([Value]) for [Title] in ('+@ColumnName+')) p  GROUP BY [CaseId],'+@ColumnName+') SELECT * FROM p '
EXECUTE (@query);

When we execute these queries, we end up with the result that we want:

One note, these values above are strings in the data, so an aggregate does not make sense, so you can use the MAX function to get the exact value.

T-SQL String Aggregation

Recently, I needed a way to take a list of name and concatenate them with a separator between them. There are many ways to this in C#, but I was feeling lazy so I wanted to see if there was a simpler way… in SQL. To set the stage, I needed to be able to take a table with the name of the object in rows. So let’s say we have two tables Group and Group Members:

CREATE TABLE [dbo].[Group]
(
    [Id] [BIGINT] IDENTITY(1,1) NOT NULL,
    [Name] VARCHAR(255) NULL
)

CREATE TABLE [dbo].[GroupMembers]
(
    [Id] [BIGINT] IDENTITY(1,1) NOT NULL,
    [GroupId] BIGINT NOT NULL ,
    [UserEmail] VARCHAR(255) NOT NULL 
)

Let’s create a group and some new members:

INSERT INTO [Group] (Name) SELECT 'My Group'

INSERT INTO [GroupMembers] ([GroupId], [UserEmail]) SELECT 1, 'test@test.com'
INSERT INTO [GroupMembers] ([GroupId], [UserEmail]) SELECT 1, 'test1@test.com'
INSERT INTO [GroupMembers] ([GroupId], [UserEmail]) SELECT 1, 'test2@test.com'
INSERT INTO [GroupMembers] ([GroupId], [UserEmail]) SELECT 1, 'test3@test.com'

The goal is to take a group and the first five users and return the name of the group and a concatenation of the emails. To start with, we can just pull the data directly, but it gives us all rows as opposed to just one:

SELECT 
    g.[Name],
    gm.[UserEmail] 
FROM
    [GROUP] g
    INNER JOIN [GroupMembers] gm ON gm.[GroupId] = g.[Id]

Reminder the goal is to have a single row with the emails concatenated and we want to do this in SQL, so this isn’t helpful. If I did choose to do this in code, I would have to iterate over the collection, insert the separator and combine the strings. I would also need to track if I was on the last item so that I didn’t add a trailing separator. Enter STRING_AGG. In their words:

Concatenates the values of string expressions and places separator values between them. The separator is not added at the end of string.

This is perfect, so let’s try it out. From the docs, we can see that we have two parameters: expression and separator.

This takes our previous query and with a few adjustments we get this:

SELECT 
    g.[Name],
    [Emails] = STRING_AGG(gm.[UserEmail], ', ') 
FROM
    [GROUP] g
    INNER JOIN [GroupMembers] gm ON gm.[GroupId] = g.[Id]
GROUP BY
     g.[Name]

When we run this, we get the exact results that we are looking for.

There are a couple of things to note. I added a space after the separator to make it look better and for it to work we have to apply the group by. The only thing that is left is to set the order of the aggregate strings. I mention this because it got logged as a bug. I should have known better, rookie mistake. Oh well. It is really simple, we just need to add the WITHIN statement at the end of the STRING_AGG function:

SELECT 
    g.[Name],
    [Emails] = STRING_AGG(gm.[UserEmail], ', ') WITHIN GROUP ( ORDER BY gm.[UserEmail] DESC) 
FROM
    [GROUP] g
    INNER JOIN [GroupMembers] gm ON gm.[GroupId] = g.[Id]
GROUP BY
     g.[Name]

Its really that simple. Good luck!