Lets take each question one by one in the series
1 How is the information generated
As we have seen that data gets generated over a period of time from various sources. What are the sources are we looking at
Lets take an example for a CRM in a MART
1. Data is generated when there is acquisition of material for sales from vendors
2. Data relating to the Sales units (shops) and materials to be dispatched ie quantity and type of materials
3. Data generated due to Sale of material.
4. data generated due to accounting
Looking above statements we come to know from where data is generated. This data is not information yet. We need to cleanse it to become the Information.
How do we acquire the data or create a ware house
We have various tools which allow you to do move the data from a OLTP machine to a ware house. One of such tools is
IBM Infosphere InformationServer. The suite provides a complete set of tools which can assist the user to Extractdata from different sources transform data to Load the ware house and feed the customer with intelligence to make it as information.
Would broadly talk about the different tools available for what
Data Extraction and transformation – InformationServer DataStage
Data Cleansing and standardization and Analysis – InformationServer Quality Stage and InformationServer InformationAnalyzer
By using above tools we will be able to move data from a source to a Target Warehouse. In the process of moving Data is cleansed using the Quality Stage and InformationAnalyzer.
Process is to investigate the data either thru InformationAnalyzer to get the depedency of a column data on other columns and with weightage along with graphs. We can also define Rules on how the data is distributed or investigated.
Quality Stage has a stage called Investigate which also provides similar analysis but the analysis can be reused in further processing of data cleansing. IA analysis is being integrated to DataStage to provide complete analysis and use of the data.
Once the investigation is completed, Data is standardized based on the Rules defined ,
Address standardization tools are approved by some Government authorities and also give some discounts for the mail processing if these tools are used in Address standardizations like (CASS – US Address, DPID- Australian Address verification, SERP and AddressDoctor)
Once the standardization is done matching or comparison to remove the duplicates in the data and only unique data is pushed forward. Now we have Clean DATA.
In a typical customer Dataset, atleast 30 -40 % data is either duplicate or incomplete and not fit enough to augment to decision making process.
There are few virtual session which are going in parallel in IBM
Is Your Data Secure? A Conversation with Experts – June 17 / 2:30 p.m. ET
Join us for a live interactive virtual event to explore The issues and challenges facing organizations as they look to privatize and secure their most sensitive data. Register today – http://bit.ly/1qhkBMf