SSIS Script: convert UPPERCASE to Mixed-Case using TitleCase

Applies to:
  • Microsoft Business Intelligence Development Studio (BIDS) Visual Studio 2008
  • Microsoft Windows 7
  • Microsoft .NET Framework 3.5
  • Microsoft Visual C# 2008

What?
So like lots of people on the net, I've been given a datasource with names and addresses all in UPPERCASE. Initially, no one seemed bothered but now the request came through asking if this could be restored to a normal case, a mixture of mostly lowercase and some UPPERCASE.

Why?
I'm writing this article because other solutions on the net were either incomplete, did not work for me or simply did not match the requirements (do they ever?).
  1. The script must be able to be copied into other SSIS packages.
  2. Need the option to merely tick input column names rather than hardcode these.
  3. "It " should be an acronym for IT, so "IT Department" and "Director of IT".
  4. "O'REILLY" should become "O'Reilly" and not "O'reilly".
  5. "MCDONALDS" should become "McDonalds" but do not apply for "MacDonalds" because of names like "MACK" and "MACHURA".
  6. " and ", "la", "le", " van ", " de ", " of " and similar should remain lowercase.

How?
The SSIS Script component task has to be generic, it should only apply this to columns we specifically tick in a package and we should be able to use the same script in any package... eek!

Either of the two following script examples can be included in an SSIS Script Task but this article includes the full process of packaging the script into a reusable component.

Example #1: As an SSIS Script Task
My solution so far: The following script will convert to lowercase and then capitalize the first letter in addition to the above specification. I am storing it here so that I can work on improving it:
/* Microsoft SQL Server Integration Services Script Component
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Globalization;     // for cultureinfo and textinfo
using System.Reflection;        // for looping through properties
using System.Threading;         // for threading

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    //  Method that will be started for each record in your dataflow
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // C# alternative for VB.Net StrConv uses cultureinfo and threading
        // See: http://support.microsoft.com/kb/312890/
        CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
        TextInfo textInfo = cultureInfo.TextInfo;

        // Use Reflection to loop through all the properties of Row:
        foreach (PropertyInfo p in Row.GetType().GetProperties())
        {
            try
            {
                // Check type for all string properties
                if (object.ReferenceEquals(p.PropertyType, typeof(string)))
                {
                    // declare two arrays
                    string[] exceptionArray_this;
                    string[] exceptionArray_that;

                    // get the value we want to transform
                    string ValueToConvert = p.GetValue(Row, null).ToString();

                    // PRIMARY transformation: Applies to all words
                    // (lowercases and capitalizes first letter taking into account punctuation symbols)
                    ValueToConvert = textInfo.ToTitleCase(ValueToConvert.ToLower());

                    // add a trailing space to catch below exceptions (eg. "Director of It ")
                    ValueToConvert = ValueToConvert + " ";

                    // tranformation exceptions: words to be lowercased
                    // note the prefixed spaces so as not to lowercase words at the beginning of a line.
                    exceptionArray_this = new string[] { " And ", " At ", " De ", " For ", " Le ", " La ", " Of ", " Or ", " To ", " Van " };
                    foreach (string myValue in exceptionArray_this)
                    {
                        if (ValueToConvert.Contains(myValue))
                        {
                            ValueToConvert = ValueToConvert.Replace(myValue, myValue.ToLower());
                        }
                    }

                    // tranformation exceptions: words to be UPPERCASED
                    // note the first letter case and suffixed space indicating a whole word.
                    exceptionArray_this = new string[] { "Av ", "Gp ", "Hr ", "It ", "Pa ", "Pr ", "Uk ", "Usa " };
                    foreach (string myValue in exceptionArray_this)
                    {
                        if (ValueToConvert.Contains(myValue))
                        {
                            ValueToConvert = ValueToConvert.Replace(myValue, myValue.ToUpper());
                        }
                    }

                    // tranformation exceptions: McDonalds, O'Reilly, D'Artagnan, I'Annon
                    // Match with case a multi-character combo, append the next letter in UPPERCASE and then append the remainder.
                    // NOTE: This will work with "MacDonalds" as well (specify "Mac") but bear in mind "Mack"
                    exceptionArray_this = new string[] { "Mc", "O'", "D'", "I'" };
                    foreach (string myValue in exceptionArray_this)
                    {
                        if (ValueToConvert.Contains(myValue))
                        {
                            string McValue = ValueToConvert.Substring(0, ValueToConvert.IndexOf(myValue)) + myValue;
                            McValue += ValueToConvert.Substring((ValueToConvert.IndexOf(myValue) + myValue.Length), 1).ToUpper();
                            McValue += ValueToConvert.Substring((ValueToConvert.IndexOf(myValue) + myValue.Length) + 1);
                            ValueToConvert = McValue;
                        }
                    }

                    // tranformation exceptions: words to be Mixed-CASE (business specific)
                    // align the following two arrays, eg. first value in array 1 will be replaced with first value in array 2
                    exceptionArray_this = new string[] { "Hod ", "Po Box " };
                    exceptionArray_that = new string[] { "HoD ", "PO Box " };
                    for (int i = 0; i < exceptionArray_this.Length; i++)
                    {
                        if (ValueToConvert.Contains(exceptionArray_this[i]))
                        {
                            ValueToConvert = ValueToConvert.Replace(exceptionArray_this[i], exceptionArray_that[i]);
                        }
                    }

                    // trim to remove the trailing space
                    ValueToConvert = ValueToConvert.Trim();

                    // Use a method to set the value of each String type property
                    // Make sure the length of the new value doesn't exceed the column size
                    p.SetValue(Row, ValueToConvert, null);
                }
            }
            catch (Exception e)
            {
                // Only handle ColumnIsNull and NullReference Exceptions
                if (e.GetBaseException() is Microsoft.SqlServer.Dts.Pipeline.ColumnIsNullException
                || e.GetBaseException() is System.NullReferenceException)
                {
                }
                else throw new Exception(e.ToString());
            }
        }
    }
}
Allowing me to simply tick columns to apply this to (note that some of the images have been censored due to this project containing sensitive data):

Script Component Task: Select Type
Script Transformation Editor: Transformation Type


Script Component Task: Transformation: Data Viewers
Script Transformation Editor: Data Viewers

Script Transformation Editor: Input Columns
Script Transformation Editor: Input Columns


Additional "Features"
  • I have noted that letters immediately following a punctuation symbol, have the UPPERCASE applied: eg. "Andrew Lloyd-Webber", "Marketing Manager (Acting)", "Lipman***Test" or "R&D".

  • As well as alphanumeric values such as "Flat 4B", "Suite 4899A", "123A Myhouse".




Example #2: Using as a Custom Script Component
The above will work as a SSIS Script task. At work however, we needed to separate it out so that it could be used in other SSIS packages. If we included an SSIS Script in each package, any change to the script would need to be repeated across the board. The following example is an isolated Script held in a separate solution which other packages import as a Custom Component Script. (ie. if we use this script in many packages, we can change just this script instead of the ones for every package).

Here is the near enough final version of my script (or more like an experienced colleague's comprehensive script to which I have merely added my transform part) added as a data flow transformation held in an assembly Custom Component Script with debug logging:
using System;
using System.Linq;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Pipeline.Localization;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Globalization;     // for cultureinfo and textinfo
using System.Reflection;        // for looping through all input fields without direct references
using System.Text;
using System.Threading;         // for threading

namespace JLDataFlowTransformations
{
    [DtsPipelineComponent(DisplayName = "JL - to Mixed case"
                        , Description = "Transforms selected text fields to Mixed case"
                        , ComponentType = ComponentType.Transform)]

    public class MyConvertToProperTransform : PipelineComponent
    {
        // Create a variable which is used to identify the data column being processed
        private int[] inputBufferColumnIdx;

        // Create variables used in debug logging (when it is enabled)
        private byte[] emptyBytes = null;
        private const string DebugLogEventName = "JL - Mixed Case Component Debug";
        private const string DebugLogEventDescription = "Log entry for debugging. This could produce extensive logs so normally have this disabled!";


        // This method is called to register a custom log event "JL - Mixed Case Component Debug" which can be enabled for debug purposes.
        // This could produce extensive logs so this event should normally be disabled within the package logging!
        public override void RegisterLogEntries()
        {
            this.LogEntryInfos.Add(DebugLogEventName, DebugLogEventDescription, DTSLogEntryFrequency.DTSLEF_PROPORTIONAL);
        }


        // This method is called during Design Time when the component is added to the data flow
        public override DTSValidationStatus Validate()
        {
            bool pbCancel = false;
            IDTSInput100 input = ComponentMetaData.InputCollection[0];

            string errorMsg1 = "Invalid datatype for {0}. Inputs must be DT_STR or DT_WSTR";
            string errorMsg2 = "Invalid usage type for {0}. Inputs must be READWRITE";

            // Check whether each column type is DT_STR/DT_WSTR
            for (int x = 0; x < input.InputColumnCollection.Count; x++)
            {
                if (!(input.InputColumnCollection[x].DataType == DataType.DT_STR
                   || input.InputColumnCollection[x].DataType == DataType.DT_WSTR))
                {
                    ComponentMetaData.FireError(
                                0
                                , ComponentMetaData.Name
                                , String.Format(errorMsg1, input.InputColumnCollection[x].Name)
                                , string.Empty
                                , 0
                                , out pbCancel);
                    return DTSValidationStatus.VS_ISCORRUPT;
                }

                // Check whether each usage type is READWRITE
                if (input.InputColumnCollection[x].UsageType != DTSUsageType.UT_READWRITE)
                {
                    ComponentMetaData.FireError(
                                0
                                , ComponentMetaData.Name
                                , String.Format(errorMsg2, input.InputColumnCollection[x].Name)
                                , string.Empty
                                , 0
                                , out pbCancel);
                    return DTSValidationStatus.VS_ISCORRUPT;
                }
            }

            return DTSValidationStatus.VS_ISVALID;
        }


        // This method repairs any errors identified during Design Time validation that caused the component to return VS_NEEDSNEWMETADATA at design time
        public override void ReinitializeMetaData()
        {
            ComponentMetaData.RemoveInvalidInputColumns();
            ReinitializeMetaData();
        }


        // This method blocks the addition of new output column from Advanced Editor as none are permitted.  Called during Design Time
        public override IDTSOutputColumn100 InsertOutputColumnAt(int outputID, int outputColumnIndex, string name, string description)
        {
            throw new Exception(string.Format("Outputs cannot be added.  The component only accepts inputs", ComponentMetaData.Name), null);
        }


        // This method is called once at Run Time
        // It creates an array (inputBufferColumnIdx) containing the index in the pipeline buffer of each input column passed into the component.
        public override void PreExecute()
        {

            // Debug Logging
            DebugLogEvent("PreExecute Started");

            IDTSInput100 input = ComponentMetaData.InputCollection[0];
            inputBufferColumnIdx = new int[input.InputColumnCollection.Count];

            Enumerable
            .Range(0, input.InputColumnCollection.Count)
            .ToList()
            .ForEach(i =>
            {
                IDTSInputColumn100 inputCol = input.InputColumnCollection[i];
                inputBufferColumnIdx[i] = BufferManager
                               .FindColumnByLineageID(input.Buffer, inputCol.LineageID);

                // Debug Logging
                DebugLogEvent(string.Format("inputBufferColumnIdx[{0}] contains bufferID {1} for column {2}", i, inputBufferColumnIdx[i], inputCol.Name));
            });

        }


        // This method is called repeatedly at Run Time when a PipelineBuffer from an upstream component is available to be processed
        // For each row in the pipeline buffer it loops through each of the input columns passed into the component
        // It locates the actual column value (including checking for null) using the index array created in the Pre-Execute
        public override void ProcessInput(int inputID, PipelineBuffer buffer)
        {
            string inputValue;
            string debugString;
            string outputValue;

            string[] exceptionArray_this;
            string[] exceptionArray_that;

            int rowCount=0;

            // C# alternative for VB.Net StrConv uses cultureinfo and threading
            // See: http://support.microsoft.com/kb/312890/
            CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
            TextInfo textInfo = cultureInfo.TextInfo;

            if (!buffer.EndOfRowset)
            {
                // loop through each row
                while (buffer.NextRow())
                {
                    rowCount++;

                    // loop through each column
                    for (int x = 0; x < inputBufferColumnIdx.Length; x++)
                    {
                        // reset default value of debug message
                        debugString = string.Format("ProcessInput - Row: {0} - Column: {1} was NULL - no transform applied", rowCount, x);

                        if (!buffer.IsNull(inputBufferColumnIdx[x]))
                        {
                            // get the value we want to transform
                            inputValue = buffer.GetString(inputBufferColumnIdx[x]);

                            // set the staging value
                            string ValueToConvert = inputValue;

                            // PRIMARY transformation (lowercases and capitalizes first letter taking into account punctuation symbols)
                            ValueToConvert = textInfo.ToTitleCase(ValueToConvert.ToLower());

                            // add a trailing space to catch below exceptions (eg. "Director of It ")
                            ValueToConvert = ValueToConvert + " ";

                            // tranformation exceptions: words to be lowercased
                            // note the prefixed spaces so as not to lowercase words at the beginning of a line.
                            exceptionArray_this = new string[] { " And ", " At ", " De ", " For ", " Le ", " La ", " Of ", " Or ", " To ", " Van " };
                            foreach (string myValue in exceptionArray_this)
                            {
                                if (ValueToConvert.Contains(myValue))
                                {
                                    ValueToConvert = ValueToConvert.Replace(myValue, myValue.ToLower());
                                }
                            }

                            // tranformation exceptions: words to be UPPERCASED
                            // note the first letter case and suffixed space indicating a whole word.
                            exceptionArray_this = new string[] { "Hr ", "It ", "Pa ", "Pr ", "Uk ", "Usa " };
                            foreach (string myValue in exceptionArray_this)
                            {
                                if (ValueToConvert.Contains(myValue))
                                {
                                    ValueToConvert = ValueToConvert.Replace(myValue, myValue.ToUpper());
                                }
                            }

                            // tranformation exceptions: McDonalds, O'Reilly, D'Artagnan, I'Annon
                            // Match with case a multi-character combo, append the next letter in UPPERCASE and then append the remainder.
                            exceptionArray_this = new string[] { "Mc", "O'", "D'", "I'" };
                            foreach (string myValue in exceptionArray_this)
                            {
                                if (ValueToConvert.Contains(myValue))
                                {
                                    string McValue = ValueToConvert.Substring(0, ValueToConvert.IndexOf(myValue)) + myValue;
                                    McValue += ValueToConvert.Substring((ValueToConvert.IndexOf(myValue) + myValue.Length), 1).ToUpper();
                                    McValue += ValueToConvert.Substring((ValueToConvert.IndexOf(myValue) + myValue.Length) + 1);
                                    ValueToConvert = McValue;
                                }
                            }

                            // tranformation exceptions: words to be Mixed-CASE (business specific)
                            // align the following two arrays, eg. first value in array 1 will be replaced with first value in array 2
                            exceptionArray_this = new string[] { "Hod ", "Po Box " };
                            exceptionArray_that = new string[] { "HoD ", "PO Box " };
                            for (int i = 0; i < exceptionArray_this.Length; i++)
                            {
                                if (ValueToConvert.Contains(exceptionArray_this[i]))
                                {
                                    ValueToConvert = ValueToConvert.Replace(exceptionArray_this[i], exceptionArray_that[i]);
                                }
                            }

                            // trim to remove the trailing space
                            outputValue = ValueToConvert.Trim();

                            // set debug message text (successful transform)
                            // debugString = string.Format("ProcessInput - SUCCESSFUL Transform: Column: {0} - Value Before: {1} - Value After: {2}", x, inputValue, outputValue);
                            debugString = "";  // blanking because I only need to know when it fails

                            // set input value to transformed value
                            buffer.SetString(inputBufferColumnIdx[x], outputValue);
                        }

                        // Log Debug Message (only display if not blank)
                        if(debugString!="")
                            DebugLogEvent(debugString);
                    }
                }
            }
        }


        private void DebugLogEvent(string DebugString)
        {
            DateTime now = DateTime.Now;
            this.ComponentMetaData.PostLogMessage(DebugLogEventName, this.ComponentMetaData.Name, DebugString, now, now, 0, emptyBytes);
        }
    }
}

Example #2: Refreshing a DLL
This has to be repeated every time the code changes in the script:
  1. Build CS file in solution
    1. Check-in file/change
    2. Build Project
  2. Check you have a compatible Global Assembly Cache (GAC) Utility
    1. C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin\gacutil.exe
    2. Note: Currently SSIS/BIDS only supporting assemblies compiled with .Net 3.5
  3. Take it to the GAC
    1. Run a command prompt as Administrator
    2. Change working directory to location of GACUtil
      CD C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin
    3. Run the GAC and add the DLL
      gacutil -i "C:\<full_path_to_dll>\MyDataFlowTransformations.dll"
      • You should get the response: Assembly successfully added to the cache
      • You can remove it with gacutil -u "TheNameOfTheAssembly"
  4. Copy the same DLL file into the appropriate SSIS folder (in this case "PipelineComponents")
    1. C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents
  5. Re-open BIDS and test.
the BAT I use to automate this (run as Administrator in Win7 from C:\temp):
@ECHO OFF

REM  This program is for Joel Lipman to run after he's made a change to some
REM  C# code for a SSIS Custom Script Component task.

REM Change the working directory to the GACUtility Folder
CD "C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin"

REM Add the assembly to the Global Assembly Cache (GAC)
gacutil -i "C:\<full_path_to_dll>\JLDataFlowTransformations.dll"

REM PipeLineComponentTask: Copy this DLL into the PipelineComponents folder
XCOPY "C:\<full_path_to_dll>\JLDataFlowTransformations.dll" "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents" /Y

REM  Returning to directory where this BAT is held.
CD "C:\Temp"
ECHO Done.


Deployment Method #?: As a setup.exe
Package the visual studio project by adding a "setup wizard" project. This results in a setup.exe and msi file which can be installed on Windows Servers run by accounts which do not have access or which do not have the full Windows SDK (ie. access to the global assembly cache). The process of registering the driver etc can be avoided by using a setup program:
  1. Open Visual Studio 2010
  2. Create a new setup project - File -> New Project...
  3. Under Other Project Types | Setup and Deployment, select Setup Project
  4. Delete User's desktop and User's Project Menus folders from File System (setup) screen.
  5. On Application Folder, set the DefaultLocation property to the root DTS directory (eg. [ProgramFilesFolder]Microsoft SQL Server\100\DTS)
  6. Right-click on the Application folder and add the sub-folder you need to put your sample in (ie. Tasks, PipelineComponents, Connections)
  7. Right-click on the new folder, and add "Project Output" to the folder to place binaries from your sample.
  8. If you need to place files in the GAC, right-click on the File System on Target Machine -> Add Special Folder -> Global Assembly Cache folder. It will now show up as an additional folder, and you can add project output to it.
  9. Exclude all dependencies by highlighting all of the files under the Detected Dependencies folder and selecting "Exclude".
  10. If you'd like to add a readme.rtf file, place it in the Application Folder
  11. You'll need to edit the UI pages to display a readme after the install. Right-click on the Setup project, View -> User Interface.
  12. For both the install and administrative install, right-click on End and select "Add Dialog". Select the "read me" dialog. Drag the "read Me" page to be above the "Finished" page.



Related Articles

Joes Revolver Map

Accreditation

Badge - Certified Zoho Creator Associate
Badge - Certified Zoho Creator Associate

Donate & Support

If you like my content, and would like to support this sharing site, feel free to donate using a method below:

Paypal:
Donate to Joel Lipman via PayPal

Bitcoin:
Donate to Joel Lipman with Bitcoin - Valid till 8 May 2022 3QnhmaBX7LQSRsC9hh6Je9rGQKEGNQNfPb
© 2021 Joel Lipman .com. All Rights Reserved.