Tag Archives: Information

Data Acquisition and Cleansing using IS DataStage and Quality Stage

Lets take each question one by one in the series
1 How is the information generated
As we have seen that data gets generated over a period of time  from various sources. What are the sources are we looking at

Lets take an example for a CRM in a MART
1. Data is generated when there is acquisition of material  for sales from vendors
2. Data relating to the Sales units (shops) and materials  to be dispatched ie quantity and type of materials
3. Data generated due to Sale of material.
4. data generated  due to accounting

Looking above statements we come to know from where data is generated. This data is not information yet. We need to cleanse it to become the Information.
How do we acquire the data or create a ware house
We have various tools which allow you to do move the data from a OLTP machine to a ware house. One of such tools is
IBM Infosphere InformationServer. The suite provides a complete set of tools which can assist the user to Extractdata from different sources transform data to Load the ware house  and feed the customer with intelligence to make it as information.
Would broadly talk about the different tools available for what
Data Extraction and transformation – InformationServer DataStage
Data Cleansing and standardization and Analysis – InformationServer Quality Stage and InformationServer InformationAnalyzer

By using above tools we will be able to move data from a  source to a Target Warehouse. In the process of moving Data is cleansed using the Quality Stage and InformationAnalyzer.

Process is to investigate the data  either thru InformationAnalyzer to get the depedency of a column data  on other columns and with weightage along with graphs.  We can also define Rules on how the data is distributed or investigated.

Quality Stage has a stage called Investigate which also provides similar analysis but the analysis can be  reused in  further processing of data cleansing.  IA analysis is being integrated to DataStage  to provide complete analysis and use of the data.

Once the investigation is completed, Data is standardized based on the Rules defined ,

Address standardization tools are approved by some Government authorities and also give some discounts for the mail processing if these tools are used in Address standardizations like (CASS – US Address, DPID- Australian Address verification, SERP and AddressDoctor)

Once the standardization is done matching or comparison to remove the duplicates in the data and only unique data is pushed forward. Now we have Clean DATA.

In a typical customer Dataset,  atleast 30 -40 % data is either duplicate or incomplete   and not fit enough to augment to decision making process.


There are  few virtual session which are going in parallel in IBM

Is Your Data Secure? A Conversation with Experts – June 17 / 2:30 p.m. ET
Join us for a live interactive virtual event to explore The issues and challenges facing organizations as they look to privatize and secure their most sensitive data. Register today – http://bit.ly/1qhkBMf






Information, Information and Information

Welcome all,

In the current world information is the buzzword. The more one is informed the more one is in better position to make decisions. But why do we need to have information? Where do I need to use this?

Lets see couple of scenarios where we have to use the information
1. I need to buy a car. I do a research on web with the specifications interested ie
a. Budget available ( down payment, EMI – allowable funds)
b. Type of Car
c. No of Miles per day travelling
d. No of people driving and travelling
e. Fuel efficiency
f. Specifications
g. Alternatives available
h. security features

2. A retailer wants to promote sales and want to identify commodities that boost sales
What are all the commodities/items that retailer sells
what are the exclusive comoditives in terms of range, competitive pricing,stock
Margin of profit
Fast moving items and period of sales

From the above 2 scenarios, it is evident that information does makes sense and in fact required to
make decisions. Though it is not necessary that decisions are correct but they would be informed judgements

But would it be sufficient that one is just informed?
Is it should be qualified?
How qualified is the information?
How confident are you on the information?
Who are the sources of the information?

Again these questions lead only to the information sources and confidence in information.

But how is the information generated?
What is information and data?
Are any standards followed in the data acquisition?
What standards are followed and are there any certifications towards the data acquisition.
Is the data clean and standardized?

Lets discuss these things in the following posts.