Metadata is data about data. Metadata will define the architecture of the data that is beingstored. Successfully executing your Information Agenda begins with Getting Your Arms Around WhatYou Have Today. This is one of the most difficult challenges that companies continue to struggle with. Itmeans understanding and confirming what data you have, learning how to use and structure that datato optimize your business and how you can implement a repeatable process to manage this informationover its lifecycle and leverage across your enterprise. It means creating a blueprint that accuratelyrepresents your information.
For a successful governance of Data that is getting generated and filling up the warehouse, athorough understanding of the metadata play an important role. There are many aspects to thegovernance of the Data warehouse.
1.Ensure the data is Trusted and strategic.
2.Management of the Metadata.
As cross department information demands continue to grow, the need to address this keychallenge becomes even more important to tackle. For every new IT project comes the daunting task oflocating the information, validating its content, reconciling definitions between sources and ensuringproper usage. The problems are further exacerbated with the increased number of new IT projects thatrequire information across siloed sources with less time to respond. CXOs today still suffer from not being able to get the right information at the right time. Just as information is often times created in the context of specific projects and applications, so are the tools that understand and control the information.
Companies need a unified way to inventory, understand, define and optimize their information separate from applications and technologies. IBM’s unique set of InfoSphere Foundation Tools does exactly just that. Only IBM has the complete set of foundational tools for your Information Agenda to help you to do this across your existing data and content.
The InfoSphere Foundation Toolkit was created to work in any environment and can be deployed at any point with multiple configuration options on any given project for completely flexibility to match your organization’s needs. This unique industry-leading offering allows companies to understand disparate data spread across their heterogeneous systems, govern it as business information over time, and design trusted information structures for business optimization.Foundation Tools combine discovery & understanding, data modeling & mapping, creation of business rule specifications, data stewardship, business vocabulary management, lineage of information and metadata management, all with a shared repository. Foundation Tools works with any data integration,business intelligence, or data warehouse tools or in conjunction with the comprehensive set of IBM Foundation engines for complete end-to-end data integration processing. The Foundation Tools are a perfect entry point for your Information Agenda.
Lets take each question one by one in the series
1 How is the information generated
As we have seen that data gets generated over a period of time from various sources. What are the sources are we looking at
Lets take an example for a CRM in a MART
1. Data is generated when there is acquisition of material for sales from vendors
2. Data relating to the Sales units (shops) and materials to be dispatched ie quantity and type of materials
3. Data generated due to Sale of material.
4. data generated due to accounting
Looking above statements we come to know from where data is generated. This data is not information yet. We need to cleanse it to become the Information.
How do we acquire the data or create a ware house
We have various tools which allow you to do move the data from a OLTP machine to a ware house. One of such tools is
IBM Infosphere InformationServer. The suite provides a complete set of tools which can assist the user to Extractdata from different sources transform data to Load the ware house and feed the customer with intelligence to make it as information.
Would broadly talk about the different tools available for what
Data Extraction and transformation – InformationServer DataStage
Data Cleansing and standardization and Analysis – InformationServer Quality Stage and InformationServer InformationAnalyzer
By using above tools we will be able to move data from a source to a Target Warehouse. In the process of moving Data is cleansed using the Quality Stage and InformationAnalyzer.
Process is to investigate the data either thru InformationAnalyzer to get the depedency of a column data on other columns and with weightage along with graphs. We can also define Rules on how the data is distributed or investigated.
Quality Stage has a stage called Investigate which also provides similar analysis but the analysis can be reused in further processing of data cleansing. IA analysis is being integrated to DataStage to provide complete analysis and use of the data.
Once the investigation is completed, Data is standardized based on the Rules defined ,
Address standardization tools are approved by some Government authorities and also give some discounts for the mail processing if these tools are used in Address standardizations like (CASS – US Address, DPID- Australian Address verification, SERP and AddressDoctor)
Once the standardization is done matching or comparison to remove the duplicates in the data and only unique data is pushed forward. Now we have Clean DATA.
In a typical customer Dataset, atleast 30 -40 % data is either duplicate or incomplete and not fit enough to augment to decision making process.
There are few virtual session which are going in parallel in IBM
Is Your Data Secure? A Conversation with Experts – June 17 / 2:30 p.m. ET
Join us for a live interactive virtual event to explore The issues and challenges facing organizations as they look to privatize and secure their most sensitive data. Register today – http://bit.ly/1qhkBMf
We have sourced the data from various tools into a warehouse. From the Data Warehouse Business Analysts create data models to represent their needs to represent the data in a way suitable to make decisions.
Analysts analysis is based on the assumption that the sourced data is clean and rid of duplicates.
What is Clean data?
Picture shows the unclean data which needs to be cleaned
We can discuss how to clean the data in next post
Here we have some some trainings coming along
2014 Global Cost of Data Breach Study
How much can a data breach cost? And are you doing enough to prevent it from happening to you? The 2014 Global Cost of Data Breach Study from Ponemon Institute, sponsored by IBM, provides benchmark data based on actual experiences of more than 300 organizations. Read this critical research to know more: http://bit.ly/1iXETUJ
How to Calculate Data Confidence
Register for this webinar to learn about compelling new research that identifies the critical criteria to measure and score confidence levels in customer data to make better business decisions: http://ubm.io/1mCCgQG
Based on 2 scenarios, provide the case of requirement of data. But just providing data as it is will not be sufficient. It should be presented in a way that end-user will be able to deduce conclusions which is categorized as Information based on which decision are made.
We have various WIKI sites which give the definition of data and Information.
Information when matures leads to Knowledge. This gives end-user the confidence based on which a firm judgement is made.
we have few tools which assist the users to drive them to decisions with confidence.
In the current world information is the buzzword. The more one is informed the more one is in better position to make decisions. But why do we need to have information? Where do I need to use this?
Lets see couple of scenarios where we have to use the information
1. I need to buy a car. I do a research on web with the specifications interested ie
a. Budget available ( down payment, EMI – allowable funds)
b. Type of Car
c. No of Miles per day travelling
d. No of people driving and travelling
e. Fuel efficiency
g. Alternatives available
h. security features
2. A retailer wants to promote sales and want to identify commodities that boost sales
What are all the commodities/items that retailer sells
what are the exclusive comoditives in terms of range, competitive pricing,stock
Margin of profit
Fast moving items and period of sales
From the above 2 scenarios, it is evident that information does makes sense and in fact required to
make decisions. Though it is not necessary that decisions are correct but they would be informed judgements
But would it be sufficient that one is just informed?
Is it should be qualified?
How qualified is the information?
How confident are you on the information?
Who are the sources of the information?
Again these questions lead only to the information sources and confidence in information.
But how is the information generated?
What is information and data?
Are any standards followed in the data acquisition?
What standards are followed and are there any certifications towards the data acquisition.
Is the data clean and standardized?
Lets discuss these things in the following posts.