SSIS 'Data Profiling Task’ Control for data validation

Using SSIS ‘Data Profiling Task’ Control for data validation

By Subhendu Das  •  April 30, 2014  •  6955 Views

Netwoven Blog

When you are involved in an SSIS project, you have to collect data from various data sources like from databases, flat files, excel files etc. Generally when you find that the data is not well formed, it’s of very poor quality for analyzing, lots of data are missing etc. etc. So before starting the project and designing a database structure if we can analyze the data properly it will be helpful for us to design our data ware house properly.

SSIS provides a wonderful control ‘Data Profiling Task’ to accomplish this task that helps us to provide the Meta information of how the data is organized.

Scenario:

Suppose you have received data in excel format as follows:

Using SSIS ‘Data Profiling Task’ Control for data validation

Now we need to analyze this data to get the insight and Meta information to design our data warehouse.

Firstly we need to import this excel data into Sql Server. The reason for this is currently ‘Data Profiling Task’ control only supports SQL Server ADO.NET connection type to import data. Hopefully in future Microsoft will give more option to connect excel file, text file etc. for profiling directly.

Steps:

It’s basically a three steps process:

Step 1: Drag the control ‘Data Profiling Task’ in ‘Control Flow’ tab.

Using SSIS ‘Data Profiling Task’ Control for data validation

Step 2: Select the path where the profiling information should be stored.

E.g. ‘ProfileInfo’ is the file name in this case.

Using SSIS ‘Data Profiling Task’ Control for data validation

Step 3: Select the Server and Database name on which the profiling will be performed.

3.1   Click ‘Quick Profile’ in ‘General’ tab.

Using SSIS ‘Data Profiling Task’ Control for data validation

3.2   Select Server and Database

Using SSIS ‘Data Profiling Task’ Control for data validation

3.3   Select the table name and check all the ‘Compute’ options.

Using SSIS ‘Data Profiling Task’ Control for data validation

3.4 Run the Package.

Now the profile data has been stored in the file ‘ProfileInfo’ in this case, which is basically a XML file and looks like as follows if you open in web explorer:

Using SSIS ‘Data Profiling Task’ Control for data validation

But, the above file is not in a readable format. To read this file Microsoft has provide us a DataProfile Viewer that is stored in the path ‘C:Program Files (x86)Microsoft SQL Server110DTSBinn’

Open the DataProfile Viewer and open the file ‘ProfileInfo’(in this case) in it that looks like as follows:

Using SSIS ‘Data Profiling Task’ Control for data validation

In the above snap if you unfold the ‘Tables’ and then [dbo].[Sheet 1$],  you will find seven profile that I will discuss one by one in coming Blogs elaborately.

  1. Candidate Key Profiles : Suggest which column has the unique values.
  2. Column Length Distribution Profiles : Suggest the minimum and maximum length of column.
  3. Column Null Ratio Profiles : Suggest the null percentage of each column value.
  4. Column Pattern Profiles : Suggest the pattern of column value.
  5. Column Statistics Profiles :  Statistic for numeric field such as Minimum value, Maximum Value,  Mean, Standard Deviation.
  6. Column Value Distribution Profiles : Give a repetition percentage of a column value.
  7. Functional Dependency Profiles : Imply if any column value is dependent on another column value.

Conclusion:

In this blog I basically tried to explain how you can use the SSIS Data Profiling Task Control to get the insight of the received data. I hope this is helpful.

Subhendu Das

Subhendu Das

Subhendu Das is a technically competent IT Professional offering a distinguished career donning leadership roles for over 18 years primarily in IT Infrastructure Services along with a 12 years’ experience in IT Education Industry as a lead Educationalist. Subhendu has been working as a Senior Manager – IT Infrastructure with Netwoven and he is driving a team of IT Administrators and building sound IT Infrastructure for developers and remote servers in US. He is also actively involved with various client infrastructure migration, SharePoint, Exchange and Office 365 projects. Subhendu holds a Bachelor of Science from Calcutta University and also is a graduate from National Institute of Information Technology. He is a Microsoft Certified professional with certifications in MCSE, MCITP, MOS, MCTS, MCSA.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *

Microsoft Partner
Microsoft Partner
Microsoft Partner
Microsoft Partner
Microsoft Partner
Microsoft Partner
Microsoft Fast Track
Microsoft Partner
MISA
MISA
Unravel The Complex
Stay Connected

Subscribe and receive the latest insights

Netwoven Inc. - Microsoft Solutions Partner

Get involved by tagging Netwoven experiences using our official hashtag #UnravelTheComplex