«

»

Mar 14

SSIS and Aggregations

I recently spotted a question about the aggregation task in SSIS, asking if there was an easy way to determine the maximum or minimum value of a group.  This had me confused… I distinctly recall having seen these options in the standard aggregate transformation.  I quickly setup a dataflow and found exactly that:

So, on further investigation I found out the full question was if this was possibly on string fields.  This time the question made sense, for string fields that list is reduced to Group By, Count, and Count Distinct.

So what to do?  The wonderful thing about SSIS is that it is extendable.  What is the simplest way to extend SSIS?  Use the built in script tasks.

This example was built in SSIS 2005 and as such is done in VB.Net.  SSIS 2008 and newer supports C#.

Now for this to work we need to setup a few things.  I am using the Northwind database for this example. I create a dataflow in SSIS with an OLE DB Source connecting to the Northwind Database.  The query retrieves the OrderId, the Discount and the ProductId (this field I converted to varchar for this example)

 

After adding a Script Transformation to the dataflow, I select these columns to be used in the script task.  Then go to the Inputs and Outputs page, select Output 0.  In the properties pane on the right, there is a value called SynchronousInputID.  Set this value to 0.  This is very important, without this option being set; the script task will create an output row for each input row.

Now add the output columns you want from this aggregation (remember to set the correct types).  Note that when the script task is set to run asynchronous, the input columns will not be added to the output.

Now, open the script page and click design script.  This will open up the Microsoft Visual Studio for Applications editor. In here you will have to write the code that will do the aggregations for you.  A sample of what this would look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
    Inherits UserComponent
    Dim records As Records

    Public Overrides Sub PreExecute()
        records = New Records()
        MyBase.PreExecute()
    End Sub

    Public Overrides Sub FinishOutputs()
        For Each rec As Record In records
            Output0Buffer.AddRow()
            Output0Buffer.OrderId = rec.OrderId
            Output0Buffer.ProductId = rec.ProductId
            Output0Buffer.Discount = rec.Discount
        Next
        MyBase.FinishOutputs()
    End Sub

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        Dim rec As Record

        If records.Contains(Row.OrderID) = True Then
            'Already in list
            rec = records.Item(records.IndexOf(Row.OrderID))
            Dim prdId As String
            Dim Discount As Single

            prdId = rec.ProductId
            If Row.ProdIdStr > prdId Then
                prdId = Row.ProdIdStr
            End If
            Discount = rec.Discount + Row.Discount

            records.Item(records.IndexOf(Row.OrderID)).ProductId = prdId
            records.Item(records.IndexOf(Row.OrderID)).Discount = Discount
        Else
            'New Item
            rec = New Record()
            rec.OrderId = Row.OrderID
            rec.ProductId = Row.ProdIdStr
            rec.Discount = Row.Discount
            records.Add(rec)
        End If
    End Sub
End Class

Public Class Records
    Inherits Collections.CollectionBase

    Public Sub Add(ByVal arecord As Record)
        List.Add(arecord)
    End Sub

    Public Function IndexOf(ByVal orderid As Integer) As Integer
        Dim index As Integer = 0
        For Each rec As Record In List
            If rec.OrderId <> orderid Then
                index = index + 1
            Else : Exit For
            End If
        Next
        Return index
    End Function

    Public Function Item(ByVal index As Integer) As Record
        Return CType(List.Item(index), Record)
    End Function

    Public Function Contains(ByVal orderid As Integer) As Boolean
        Dim found As Boolean = False
        For Each rec As Record In List
            If rec.OrderId = orderid Then
                found = True
            End If
        Next
        Return found
    End Function

End Class

Public Class Record
    Private _OrderId As Integer
    Private _ProductId As String
    Private _Discount As Single

    Public Property OrderId() As Integer
        Get
            Return _OrderId
        End Get
        Set(ByVal value As Integer)
            _OrderId = value
        End Set
    End Property

    Public Property ProductId() As String
        Get
            Return _ProductId
        End Get
        Set(ByVal value As String)
            _ProductId = value
        End Set
    End Property

    Public Property Discount() As Single
        Get
            Return _Discount
        End Get
        Set(ByVal value As Single)
            _Discount = value
        End Set
    End Property
End Class

Important bits to note from this example are the PreExecute Sub Override, and the FinishOutputs Sub Override.  These will perform the actions of setting up the task before processing rows and sending output rows once all input rows have been processed.  Also note that because output rows are only created once all input rows have been handled, this will be a fully blocking transformation so make sure when using this that you have assigned adequate memory to the dataflow task.

Starting with this, you can do all kinds of custom aggregations, including concatenating, statistical, and more

Leave a Reply

Your email address will not be published.


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>