Friday, August 29, 2014
   
Text Size
Login

DataJumble - Shuffling characters in a data value

This was called a Data-Scrambling Function but it depends on what you mean by "scrambling". This is a function which merely uses the same characters but switches their order randomly, so I've renamed it DataJumble as opposed to my article on Data-Scrambling.

Why?
We want to scramble sensitive data that we send to suppliers for support or analysis. With inspiration from: "Obfuscating your SQL Server Data" by John Magnabosco but tweaked for our purposes. NOTE that the following has only ever been run on development environments and I would not recommend running this on a production system as I have not tested the performance and database load.


  1.  Before: 
  2.             StudentID   StudentName          DateOfBirth 
  3.             ----------- -------------------- ------------- 
  4.             1           John Smith           1990-03-21 
  5.             2           Fred Bloggs          1988-11-02 
  6.             3           Another User         1985-07-11 
  7.             4           Yet Another User     1977-06-25 
  8.   
  9.         After: 
  10.             StudentID   StudentName          DateOfBirth 
  11.             ----------- -------------------- ------------- 
  12.             1           mStnoh iJh           1926-01-02 
  13.             2           lgreg BFdos          1969-03-14 
  14.             3           onrUest ehAr         1968-05-05 
  15.             4           otentre AhYe Usr     1964-08-08 

Precursor
You will need to be able to create the following view to generate random numbers on SQL Server.

  1.  -- Used to reference RAND with in a function 
  2.  CREATE VIEW dbo.vwRandom 
  3.  AS 
  4.  SELECT RAND() as RandomValue; 
  5.  GO 

The Function
Again you need permission to create this function. Don't forget to GRANT permission to execute this function for whatever user/system account that will run it.

  1.  -- Drop the function if it already exists 
  2.  IF OBJECT_ID ('ufn_DataJumble''FN'IS NOT NULL 
  3.  DROP FUNCTION ufn_DataJumble ; 
  4.  GO 
  5.   
  6.  -- Create user defined function 
  7.  CREATE FUNCTION ufn_DataJumble 
  8.  ( 
  9.      @OrigVal varchar(max) 
  10.  ) 
  11.  RETURNS varchar(max) 
  12.  WITH ENCRYPTION 
  13.  AS 
  14.  BEGIN 
  15.   
  16.          -- Variables used 
  17.          DECLARE @NewVal varchar(max); 
  18.          DECLARE @OrigLen int; 
  19.          DECLARE @CurrLen int; 
  20.          DECLARE @LoopCt int; 
  21.          DECLARE @Rand int; 
  22.   
  23.          -- for dates/times 
  24.          DECLARE @MinYear int; 
  25.          DECLARE @MaxYear int; 
  26.          DECLARE @RandYear int; 
  27.          DECLARE @RandMonth int; 
  28.          DECLARE @RandDay int; 
  29.          DECLARE @RandHour int; 
  30.          DECLARE @RandMinute int; 
  31.          DECLARE @RandSeconds int; 
  32.          DECLARE @RandMSeconds int; 
  33.          DECLARE @DateVal date; 
  34.          DECLARE @TimeVal time; 
  35.          DECLARE @DateMatch int; 
  36.          DECLARE @TimeMatch int; 
  37.          DECLARE @ColumnTypeIsDate int; 
  38.   
  39.          -- Format given value (prevents CHAR datatype issue) 
  40.          SET @OrigVal = LTRIM(RTRIM(@OrigVal)); 
  41.   
  42.          -- Set variable default values 
  43.          SET @NewVal = ''; 
  44.          SET @OrigLen = DATALENGTH(@OrigVal); 
  45.          SET @CurrLen = @OrigLen
  46.          SET @LoopCt = 1; 
  47.          SET @DateVal = ''; -- 1900-01-01 
  48.          SET @TimeVal = ''; -- 00:00:00.000 
  49.          SET @DateMatch = 0; 
  50.          SET @TimeMatch = 0; 
  51.          SET @ColumnTypeIsDate = 0; 
  52.   
  53.          -- Determine if Given Value is date 
  54.          IF @OrigVal IS NULL 
  55.                  RETURN NULL; 
  56.   
  57.          -- Determine if Given Value is date 
  58.          IF ISDATE(@OrigVal)=1 
  59.                  SET @ColumnTypeIsDate = 1; 
  60.   
  61.          -- Determine if Given Value is date (ISDATE does not recognize CAST AS DATE) 
  62.          IF @OrigLen=10 
  63.                  AND (LEN(@OrigVal) - LEN(REPLACE(@OrigVal'-', '')))=2 
  64.                  AND ISNUMERIC(SUBSTRING(@OrigVal, 1, 4))=1 
  65.                  AND ISNUMERIC(SUBSTRING(@OrigVal, 6, 2))=1 
  66.                  AND ISNUMERIC(SUBSTRING(@OrigVal, 9, 2))=1 
  67.                                          SET @ColumnTypeIsDate = 1; 
  68.   
  69.   
  70.          -- Determine if this is a date/time value 
  71.          IF @ColumnTypeIsDate=1 
  72.          BEGIN 
  73.   
  74.                  -- DETERMINE IF THIS FIELD CONTAINS A DATE VALUE 
  75.                  SET @DateMatch = 
  76.                          CASE 
  77.                                  WHEN CAST(CAST(@OrigVal AS date) AS VARCHAR(10))<>'1900-01-01' THEN 1 
  78.                                  ELSE 0 
  79.                          END; 
  80.   
  81.   
  82.                  -- DETERMINE IF THIS FIELD CONTAINS A TIME VALUE 
  83.                  SET @TimeMatch = 
  84.                          CASE 
  85.                                  WHEN CAST(CAST(@OrigVal AS time) AS VARCHAR(10))<>'00:00:00.000' THEN 1 
  86.                                  ELSE 0 
  87.                          END; 
  88.   
  89.   
  90.                  -- DETERMINE TIME VALUE IF TIME WAS IN GIVEN VALUE 
  91.                  IF @DateMatch=1 
  92.                  BEGIN 
  93.                          -- SET YEAR RANGE (1912 - 1994: for Birthdate specifically: Older than 18 but younger than 100) 
  94.                          SET @MinYear = DATEPART(YEAR,GETDATE())-100;    -- 100 years ago from today 
  95.                          SET @MaxYear = DATEPART(YEAR,GETDATE())-18;             -- 18 years ago from today 
  96.   
  97.   
  98.                          -- GET RANDOM DATE VALUES 
  99.                          SELECT @RandYear = ((@MinYear + 1) - @MaxYear) * RandomValue + @MaxYear FROM dbo.vwRandom; 
  100.                          SELECT @RandMonth = (1 - 12) * RandomValue + 12 FROM dbo.vwRandom; 
  101.                          SELECT @RandDay = (1 - 28) * RandomValue + 28 FROM dbo.vwRandom;  -- prevent invalid dates (specifically 31 days in feb) 
  102.                          SET     @DateVal = 
  103.                                  CAST(@RandYear AS CHAR(4)) + '-' 
  104.                                  + STUFF(@RandMonth, 1, 0, REPLICATE('0', 2 - LEN(@RandMonth))) + '-' 
  105.                                  + STUFF(@RandDay, 1, 0, REPLICATE('0', 2 - LEN(@RandDay))); 
  106.                  END 
  107.   
  108.   
  109.                  -- DETERMINE TIME VALUE IF TIME WAS IN GIVEN VALUE 
  110.                  IF @TimeMatch=1 
  111.                  BEGIN 
  112.   
  113.                          -- SET RANDOM TIME VALUE (include Milliseconds) 
  114.                          SELECT @RandHour = (0 - 23) * RandomValue + 23 FROM dbo.vwRandom; 
  115.                          SELECT @RandMinute = (0 - 59) * RandomValue + 59 FROM dbo.vwRandom; 
  116.                          SELECT @RandSeconds = (0 - 59) * RandomValue + 59 FROM dbo.vwRandom; 
  117.                          SELECT @RandMSeconds = (0 - 999) * RandomValue + 999 FROM dbo.vwRandom; 
  118.                          SET @TimeVal = 
  119.                                  STUFF(@RandHour, 1, 0, REPLICATE('0', 2 - LEN(@RandHour))) 
  120.                                  + ':' + STUFF(@RandMinute, 1, 0, REPLICATE('0', 2 - LEN(@RandMinute))) 
  121.                                  + ':' + STUFF(@RandSeconds, 1, 0, REPLICATE('0', 2 - LEN(@RandSeconds))) 
  122.                                  + '.' + STUFF(@RandMSeconds, 1, 0, REPLICATE('0', 3 - LEN(@RandMSeconds))); 
  123.                          SET @TimeVal = CONVERT(time, @TimeVal, 14); 
  124.   
  125.                  END 
  126.   
  127.   
  128.                  -- DETERMINE WHICH DATE PARTS WERE SUBMITTED 
  129.                  IF @TimeMatch=1 AND @DateMatch=0 
  130.                          SET @NewVal = @TimeVal
  131.                  ELSE IF @TimeMatch=0 AND @DateMatch=1 
  132.                          SET @NewVal = @DateVal
  133.                  ELSE IF @TimeMatch=1 AND @DateMatch=1 
  134.                          SET @NewVal = CAST(@DateVal AS varchar(10)) + ' ' + CAST(@TimeVal AS varchar(16)); 
  135.                  ELSE 
  136.                          SET @NewVal = ''; 
  137.   
  138.   
  139.                  -- REDUCE THIS STRING TO THE SAME LENGTH AS SUBMITTED VALUE 
  140.                  SET @NewVal = SUBSTRING(LTRIM(RTRIM(CAST(@NewVal AS VARCHAR(30)))), 1, LEN(@OrigVal)); 
  141.   
  142.          END 
  143.          ELSE 
  144.          BEGIN 
  145.   
  146.                  -- Loop through the characters passed 
  147.                  WHILE @LoopCt <= @OrigLen 
  148.                          BEGIN 
  149.                                  -- Current length of possible characters 
  150.                                  SET @CurrLen = DATALENGTH(@OrigVal); 
  151.   
  152.                          -- Random position of character to use 
  153.                                  SELECT @Rand = Convert(int,(((1) - @CurrLen) * RandomValue + @CurrLen)) FROM dbo.vwRandom; 
  154.   
  155.                                  -- Assembles the value to be returned 
  156.                                  SET @NewVal = @NewVal + SUBSTRING(@OrigVal,@Rand,1); 
  157.   
  158.                                  -- Removes the character from available options 
  159.                                  SET @OrigVal = SUBSTRING(@OrigVal,1,(@Rand-1))+SUBSTRING(@OrigVal,(@Rand+1),LEN(@OrigVal)); 
  160.   
  161.                                  -- Advance the loop 
  162.                                  SET @LoopCt = @LoopCt + 1; 
  163.                          END 
  164.   
  165.                      -- Format new value 
  166.                          SET @NewVal = LTRIM(RTRIM(@NewVal)); 
  167.   
  168.          END 
  169.   
  170.          -- Return new value 
  171.          RETURN @NewVal
  172.  END 
  173.  GO 

Usage

  1.  SELECT  
  2.       Common.ufn_DataJumble(Forenames) AS Forenames 
  3.       , Common.ufn_DataJumble(Surname) AS Surnames 
  4.       , Common.ufn_DataJumble(AccountNumber) AS AccountNumber 
  5.       , Common.ufn_DataJumble(DateOfBirth) AS DateOfBirth  
  6.  FROM  
  7.       mySchema.mySensitiveTable; 
or

  1.  UPDATE Student.PersonExtract  
  2.  SET Surname=Common.ufn_DataJumble(Surname); 

Known Issues (fixed in above function)
  • "Invalid length parameter passed to the LEFT or SUBSTRING function"
    • Add LTRIM(RTRIM(Forenames)) if Forenames is a char datatype.
  • DateOfBirth is interpreted as string (isdate does not recognize datatype date)
    • CAST(Common.ufn_DataJumble(CAST(DateOfBirth AS DATETIME)) AS DATE) AS DateOfBirth

Still To Do
Create stored procedure that will take any table and return the table with scrambled data:
  • Needs to determine datatype
  • Cannot exceed allowed length (shouldn't happen if simply scrambling data)
  • Function needs to be reduced in size


Stored Procedure (using this function)
So that I can give it any table name and it will maintain structure and data types:

  1.  USE UnitTestDataStaging; 
  2.  GO 
  3.   
  4.   
  5.  -- Drop Stored Procedure if already exists 
  6.  IF OBJECT_ID ( 'Common.ufn_DataJumble''P' ) IS NOT NULL  
  7.      DROP PROCEDURE Common.ufn_DataJumble; 
  8.  GO 
  9.   
  10.   
  11.  -- Create Stored Procedure 
  12.  CREATE PROCEDURE Common.ufn_DataJumble 
  13.      @SchemaTableName nvarchar(100)  
  14.  AS  
  15.  BEGIN 
  16.    
  17.       -- Variables used 
  18.       DECLARE @TableName varchar(50); 
  19.       DECLARE @SqlToExecute nvarchar(4000); 
  20.   
  21.   
  22.       -- Set variable default values 
  23.       SET @TableName = SUBSTRING(@SchemaTableNameCHARINDEX('.'@SchemaTableName)+1, LEN(@SchemaTableName)); 
  24.       SET @SqlToExecute = ''; 
  25.   
  26.       -- Generate Select Query to extract data from given table 
  27.       SET @SqlToExecute='SELECT ' + CHAR(10) + STUFF( 
  28.            (  
  29.            -- Code to concatenate column names and data types into one string 
  30.            SELECT  
  31.                 ',Common.ufn_DataJumble(ltrim(rtrim(' + COLUMN_NAME+ '))) AS ' + COLUMN_NAME + CHAR(10) 
  32.            FROM  
  33.                 INFORMATION_SCHEMA.COLUMNS  
  34.            WHERE 
  35.                 TABLE_NAME = @TableName 
  36.            ORDER BY ORDINAL_POSITION   
  37.            FOR XML PATH ('') 
  38.            ),1,1,'' 
  39.       ) + ' FROM ' + @SchemaTableName;      
  40.   
  41.   
  42.       EXEC(@SqlToExecute); 
  43.  END      
  44.  GO 
  45.   
  46.  -- execute stored procedure on a table 
  47.  EXECUTE Common.ufn_DataJumble 'dbo.myTable'
Add Comment

Name:

Email:

Website:

Message:


Latest Posts

  • Joes Word Cloud (JWC)

    • Sat 23-Aug-14
      Salut Joe, Thank you again for helping me. I had installed V2.2 for Joomla 2.5. Finally, as you ...
      Nathalie  
    • Wed 20-Aug-14
      Salut Nathalie! I could only access the under construction page of your website, so I'm not 100% sure.
      Webmaster  
    • Tue 19-Aug-14
      Hi Joe, Thank you for your prompt reply. Below, an "image" of the cloud on my website : "pour ante ...
      Nathalie  
    • Mon 18-Aug-14
      Hi Nathalie, Thanks for the message. The module does not see Virtuemart text but that sounds like ...
      Webmaster  
    • Mon 18-Aug-14
      Hi Joel, Fist of all, thank you for the module. What such a work ! I have a problem with Virtuemart module.
      Nathalie