Tuesday, November 17, 2009

SQL Server and legacy systems

Sometimes it happenes that we have to format the data which is stored in an SQL Server Database for an older system which requires the data to have a specific length. To achieve the length different datatypes are filled differently (at least in my case).
Lets take this example (we assume a length of 8):
- numbers have to be filled with zeros on the left (e.g. 66 -> 000066)
- strings have to be filled with spaces on the right (e.g. 'abc' -> 'abc ')

How can we achieve this?
Well the first problem is that we have to measure the length of the data and SQL Server provides two functions for that: 'LEN' and 'DATALENGTH'
This is not a problem you would say but it is because they behave differently. Look at the following examples:
Note the leading and trailing spaces of the strings!
"SELECT LEN(' abc ')" => 4
"SELECT DATALENGTH(' abc ')" => 5

"SELECT LEN(123)" => 3
"SELECT LEN(123.4567)" => 8

"SELECT DATALENGTH(123)" => 4
"SELECT DATALENGTH(999999999999)" => 9

Well this "strange" behaviour is because of what these functions do. Let's see the definition from MSDN on this:
LEN (Transact-SQL)Returns the number of characters of the specified string expression, trailing blanks.

DATALENGTH (Transact-SQL)
Returns the number of bytes used to represent any expression.

So in principle I would say: if you want to know the number of characters of a value then you should use DATALENGTH for string values ( nchar, text, nvarchar,...) and LEN for numbers (int, float, decimal,...).

The next problem when filling values is the NULL value. According to the definition of the two functions is LEN(NULL) equal to NULL and DATALENGTH(NULL) also NULL. This may be a problem since I have to provide blanks or zeros if the value is NULL (e.g. ' ', 00000000).

The third problem is on how to fill the value with the requested fillers (zeros and blanks)? Well for this SQL Server provides a very nice function: REPLICATE
This function takes as input a character which should be replicated and a length. E.g.:
"SELECT REPLICATE('0', 10)" => 0000000000
"SELECT REPLICATE(' ', 10)" => ' '
This helps us to create what we want. So if we have the number '99' and we want it to fill on the left to a length of 8 we can do something like:
"SELECT REPLICATE('0', 8 - LEN(99)) + CAST(99 AS NCHAR)" => 00000099
"SELECT REPLICATE(' ', 8 - DATALENGTH(' abc ')) + ' abc '" => ' abc '
To fill on the right we have another possibility; it is casted to an NCHAR of the desired length; but this fills only with blanks:
"SELECT CAST(' abc ' AS NCHAR(8))" => ' abc    '
"SELECT CAST(99 AS NCHAR(8))" => '99 '
If you want to fill with something different then you have to use again the REPLICATE function as described above.

To handle the NULL value we use the case statement:
"SELECT CASE WHEN(null IS NOT NULL) THEN REPLICATE('0', 8 - LEN(null)) + CAST(null AS NCHAR) ELSE REPLICATE('0', 8) END" => '00000000'

"SELECT CASE WHEN(99 IS NOT NULL) THEN REPLICATE('0', 8 - LEN(99)) + CAST(99 AS NCHAR) ELSE REPLICATE('0', 8) END" => '00000099'

Of course written like this it does not make much sense since we know in advance which part of the case statement is executed; but imagine you write in the place of the "null" or "99" the name of a column, then it makes perfectly sense. You can now easily write a case statement for string values.

No comments: