Importing multiple text files using SSIS

We can import multiple flat files in one go using the MULTIFLATFILE connection manager. This can be used in scenarios where you have multiple files with the same format lying in a directory and you want to import records from all the files in one table.

First you need to make a connection for this.

New Connection
Right click on the connection managers window. Select New Connection.

MultiFlatFile Connection Manager
Select MULTIFLATFILE connection manager and click Add.

Connection Manager Editor
The connection manager editor window will open. Here you can specify the file path in the file names section. You can use wildcard characters to specify multiple files. The easiest way is to click on the Browse button and navigate to the folder where your files are located. Select any file from the folder and click Open and then replace the file name with wildcard character to specify multiple files, like *.* or in my case *.txt.

Connection Manager Editor Columns
Click on the Columns tab to specify your delimiters. You should get some records in the preview rows. Now click Ok to close this window.

Data Flow TaskFlat File Source
Drag a data flow task and double click on it to edit. In the Data Flow add a Flat File Source and double click on it to edit.

Flat File Source Editor
Select the connection manager we just defined and click Ok.

OLE DB Destination
Add a OLE DB Destination.

OLE DB Destination Editor
Point it to the table you want to import the data into.

OLE DB Destination Editor Column Mappings
Set up the column mapping.

Execute Package
That’s it! Now if you execute this package, you will see total records transfer to the table is equal to sum of number of records in each.

Adding a FileName column:
If you want to identify which file a particular record comes from, you can add a FileName column.

Advanced Editor for Flat File Source
Right click on the flat file source and select Show Advanced Editor. In the editor window click on Component Properties tab. Now specify a column name in the FileNameColumnName property. This will add an extra column to the Flat File Source with the file name for each row. This column will contain complete file path. We want only file name, also we will have to convert the data type of the column to DT_STR. We can do both by adding Derived Column.

Derived Column Task
Drag a Derived Column task and connect it between our source and destination.

Derived Column Transformation
Double click on the derived column task to edit. Set the properties as shown in the screenshot and click Ok. Here is the expression

SUBSTRING(FileName,LEN(FileName) - FINDSTRING(REVERSE(FileName),"\\",1) + 2,LEN(RIGHT(FileName,FINDSTRING(REVERSE(FileName),"\\",1) - 1)))

If you want complete file path then just enter FileName (which is our column name) as expression text.

OLE DB Destination Editor Column Mappings for FileName
Now just double double click on the OLE DB Destination and map our new column with file name to the destination table.

Download sample project

Advertisements
Posted in Programming, SSIS | Tagged , , , , , , , | 17 Comments

Should I use OUTER JOIN instead of NOT EXISTS?

NOT EXISTS in a query can readily be replaced by an OUTER JOIN.
Here is how

CREATE TABLE Emp(EmpID INT, EmpName VARCHAR(20))
INSERT INTO Emp(EmpID, EmpName)
SELECT 1, 'Amit' UNION
SELECT 2, 'Sachin' UNION
SELECT 3, 'Sonam'

CREATE TABLE OnLeave(OnLeaveID INT, EmpID INT)
INSERT INTO OnLeave(OnLeaveID, EmpID)
SELECT 1, 2 UNION
SELECT 2, 3

--Using NOT EXISTS
SELECT Emp.EmpName FROM Emp
WHERE NOT EXISTS
	(SELECT 1 FROM OnLeave 
	WHERE OnLeave.EmpID = Emp.EmpID)

--Using LEFT OUTER JOIN
SELECT Emp.EmpName FROM Emp
LEFT OUTER JOIN OnLeave ON OnLeave.EmpID = Emp.EmpID
WHERE OnLeave.OnLeaveID IS NULL

This is because of the fact that in an LEFT OUTER JOIN every row in the left table is represented in the result set, whether or not there are any matching rows from the right table. If for a particular row of the left table there is no matching row from the right table, then the columns in that row which would have come from the right table will be NULL. The test for NULL should be made on a column that you are sure it wont contain a NULL value; for example primary key of the right table. Also it should be noted that a record will be repeated if there are more than one match in the right table. In that case we will have to use DISTINCT in the select line.

If proper indexes are in place, the OUTER JOIN method and WHERE EXISTS method are basically identical when it comes to performance. However there have been many case when I have replaced a NOT EXISTS with an OUTER JOIN and got reduced execution time. People keep on telling that both have almost identical query plan. And you should opt for NOT EXISTS as it is more readable. But I feel LEFT OUTER JOINT method is also equally readable. For example

UPDATE TblOne SET Errors = 1
FROM TblOne
WHERE NOT EXISTS
	(SELECT 1 FROM TblTwo
	WHERE TblTwo.TblOneId = TblOne.TblOneId
	AND TblTwo.Col1 = TblOne.Col1
	AND TblTwo.Chk = 'Y' )

UPDATE TblOne SET Errors = 1
FROM TblOne
LEFT OUTER JOIN TblTwo
	ON TblTwo.TblOneId = TblOne.TblOneId
	AND TblTwo.Col1 = TblOne.Col1
	AND TblTwo.Chk = 'Y'
WHERE TblTwo.TblTwoId IS NULL

What I do is, I keep both options handy. For queries with a very low execution time I go for NOT EXISTS just for the fact that it is easier for others to later understand what I am doing in the query. For heavier queries which have higher execution time I try both options and keep the one which gives better execution time. In general I have noticed that replacing NOT EXISTS with OUTER JOIN gives performance improvement when used in DELETE and UPDATE statements.

Posted in Programming, SQL | Tagged , , , , , , , , , , | Leave a comment

SQL function to remove extra multiple spaces from string

Here is a very efficient solution for removing multiple spaces from a string. I mean replacing multiple spaces between words in a string with a single space, without using a loop. The technique is very simple, I will just run a small script to show how it works.

DECLARE @str AS VARCHAR(50)
SET @str = 'Lets   jot some  code'
PRINT @str

SET @str = REPLACE(@str,' ','{}')
PRINT @str
SET @str = REPLACE(@str,'}{','')
PRINT @str
SET @str = REPLACE(@str,'{}',' ')
PRINT @str

And here is the output.

Lets   jot some  code
Lets{}{}{}jot{}some{}{}code
Lets{}jot{}some{}code
Lets jot some code

Below is a function based on this technique.

CREATE FUNCTION RemoveSpaces
(
-- =============================================
-- Author:      Amit Singh
-- Description: Remove extra spaces from string
-- Usage:       SELECT dbo.RemoveSpaces('Code  Jotter')
-- =============================================
	@str AS VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
	RETURN
	REPLACE(REPLACE(REPLACE(@str,' ','{}'),'}{',''),'{}',' ')
END
GO
Posted in Programming, SQL | Tagged , , , , , , , , , , , | 11 Comments

Allow SQL Server to use more RAM

On a default 32 bit Windows Server setup, 2 GB of RAM is the most that a SQL Server instance can use. Let’s say a server has 10 GB of RAM that is running nothing but the OS and SQL. The operating system will allocate 2 GB to the OS and 2 GB for SQL. That means your server is wasting 6 GB of RAM.

We can change that by using AWE. Microsoft SQL Server uses the Microsoft Windows Address Windowing Extensions (AWE) API to support very large amounts of physical memory. SQL Server can access up to 64 gigabytes (GB) of memory on Microsoft Windows 2000 Server and Microsoft Windows Server 2003.

Support for AWE is available in SQL Server Enterprise, Standard, and Developer editions and only applies to 32-bit versions of SQL Server 2005 and SQL Sever 2008.

We won’t go in details about how it works; MSDN has lots of articles about it.
Here are the four steps needed to be able to use AWE on your SQL Server.
1. Add the /PAE (Physical Address Extension) switch to boot.ini to allow Windows 2000/2003 to access more than 4 GB of memory.
2. Add /3GB switch to boot.ini if server has more than 4GB but less than or equal to 16GB RAM.
3. Grant “Lock Pages in Memory” to the user account that runs the SQL Server process.
4. Check the checkbox in the SQL Instance properties to allow AWE.

To edit the boot.ini file on a Windows 2003 server, open Control Panel, open System, then click Advanced, then settings for the Startup and Recovery, then Edit under System startup. Then you will see a file whose contents are similar to the following:

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="WindowsServer 2003, Standard" /fastdetect /NoExecute=OptOut))

The /PAE switch or /3GB switch are added directly after operating systems line. Below is an example with both the switches:

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="WindowsServer 2003, Standard" /fastdetect /NoExecute=OptOut /PAE /3GB))

Now we needed to change the “Lock Pages in Memory” registry setting to not use the swap file but use physical memory. Here are the steps :
1. On the Start menu, click Run. In the Open box, type gpedit.msc.
2. The Group Policy dialog box opens.
3. On the Group Policy console, expand Computer Configuration, and then expand Windows Settings.
4. Expand Security Settings, and then expand Local Policies.
5. Select the User Rights Assignment folder.
6. The policies will be displayed in the details pane.
7. In the pane, double-click Lock pages in memory.
8. In the Local Security Policy Setting dialog box, click Add.
9. In the Select Users or Groups dialog box, add an account with privileges to run sqlservr.exe.

Finally to enable AWE:
1. In SQL Server Management Studio, right-click a server and select Properties.
2. Click the Memory node.
3. Under Server memory options, select Use AWE to allocate memory.

You can use the following query to check if your AWE settings are working fine or not.

SELECT * FROM sys.dm_os_performance_counters
WHERE counter_name IN ('Target Server Memory (KB)','Total Server Memory (KB)'))

This will show you the amount of memory SQL Server is aiming to use (Target Server Memory), and how much it is currently using (Total Server Memory). If AWE is not enabled, these values won’t go above 1740800 KB i.e. 1700 MB.

Posted in Programming | Tagged , , , , , , , , | 1 Comment

Generate random and unique alphanumeric code in SQL

The key to generating large number of random and unique alpha numeric code is to generate all of it in one go. Store all the codes in a file or a table and server the requested amount of codes whenever required from there.
The SQL script below will generate 10 digit or rather 10 character alpha numeric codes. You can change the number of cross joins to change the number of digits in the code. For example to generate 8 digit code i will have only eight cross joins of the Chars table and eight concatenation of the Digit column in the select line.

SELECT
C1.Digit + C2.Digit + C3.Digit + C4.Digit + C5.Digit + C6.Digit
+ C7.Digit + C8.Digit + C9.Digit + C10.Digit Num
FROM Chars C1, Chars C2, Chars C3, Chars C4, Chars C5,
Chars C6, Chars C7, Chars C8, Chars C9, Chars C10
WHERE C1.Digit IN('A','B','C','D','E','F','G','H','2','3','4','5','6')
AND ASCII(c2.Digit) = ASCII(c3.Digit) + 6 --6th char after dependency char
AND C3.Digit IN('A','C','E','G','K','M','N','P','Q','R','S') --Dependency char
AND c4.Digit = 'R' --Fixed char
AND C5.Digit IN('B','J','K','L','M','N','P','Q','R','S','7','8','9')
AND ASCII(c6.Digit) = ASCII(c3.Digit) + 2
AND C7.Digit IN('T','U','V','W','X','Y','Z','3','4','5','6')
AND C8.Digit NOT IN('O','I','0','1')
AND ASCII(c9.Digit) = ASCII(c3.Digit) + 5
AND C10.Digit NOT IN('O','I','0','1')
ORDER BY NEWID()

Take note of the WHERE condition. I have defined a simple condition for each character of the code. You can use your own conditions. This solves two purposes, one is that it gives a meaning to the generated codes; also it limits the number of codes that will be generated. The script below generates codes in excess of 20 million. If I need more unique codes; ill just change the fixed character and regenerate. I have used ORDER BY NEWID() to randomize the codes as I am using the script to dump all generated codes to a text file using SSIS. If you want to store the codes in table, you can use an INTO clause and skip the ORDER BY clause. Of course then you will want to use ORDER BY NEWID() when you are fetching the codes from the table.

Below is the script for generating the characters table which you will be needing.

CREATE TABLE Chars(Digit VARCHAR(1))
GO
INSERT INTO Chars(Digit)
SELECT '1' UNION SELECT '2' UNION SELECT '3' UNION SELECT '4' UNION
SELECT '5' UNION SELECT '6' UNION SELECT '7' UNION SELECT '8' UNION 
SELECT '9' UNION SELECT '0' UNION SELECT 'A' UNION SELECT 'B' UNION 
SELECT 'C' UNION SELECT 'D' UNION SELECT 'E' UNION SELECT 'F' UNION 
SELECT 'G' UNION SELECT 'H' UNION SELECT 'I' UNION SELECT 'J' UNION
SELECT 'K' UNION SELECT 'L' UNION SELECT 'M' UNION SELECT 'N' UNION 
SELECT 'O' UNION SELECT 'P' UNION SELECT 'Q' UNION SELECT 'R' UNION 
SELECT 'S' UNION SELECT 'T' UNION SELECT 'U' UNION SELECT 'V' UNION 
SELECT 'W' UNION SELECT 'X' UNION SELECT 'Y' UNION SELECT 'Z'
GO
Posted in Programming, SQL | Tagged , , , , , , , , , , , | 8 Comments

Convert string to ASCII

A simple SQL function I created to convert a string to ASCII values.

-- =============================================
-- Author:		Amit Singh
-- Description:	Convert string to ASCII values
-- Usage: 		SELECT dbo.ToASCII('Amit')
-- =============================================
CREATE FUNCTION ToASCII
(@string AS VARCHAR(2000))
RETURNS VARCHAR(5000)
AS
BEGIN
	SET @string = UPPER(@string)

	DECLARE @Output VARCHAR(5000)
	DECLARE @position INT
	SET @position = 1

	WHILE @position <= DATALENGTH(@string)
	BEGIN
		SELECT @Output = COALESCE(@Output, '')
					+ CAST(ASCII(SUBSTRING(@string, @position, 1)) AS VARCHAR(3))
		SET @position = @position + 1
	END

	RETURN @Output
END
-- =============================================
GO

This function is build as per my requirements but you can modify it to get desired results.
For example you can replace

COALESCE(@Output, '')

with

COALESCE(@Output + ',', '')

to get a delimited output.

Posted in Programming, SQL | Tagged , , , , , | Leave a comment

Visual Flowchart Builder

This one was done for my sixth semester B.E in Computer Technology. Completely done in VC++, it is a visual tool for problem solving by drawing flow chart.

Visual Flowchart Builder is a tool which enables a person to draw flow chart for a particular problem using the standard symbols and then execute the flowchart to get the result of the problem. The flowcharts can be made very easily using the toolbar. These flowcharts can then be executed or saved to a file for later retrieval.

The above snapshot shows the flowchart for finding the factorial of a number. You can see a input block below the start block; execution starts from the start block and when the input block is executed, a input dialog pops up to ask for user input. Similarly for every output block a message is displayed in a console window.

Click the link below to download the executable and flowchart samples.
Download

Posted in Academic Projects, Programming | Tagged , , , , , | 2 Comments