Saturday, 28 April 2018

Planning for Big Data Success

Most organizations are either starting or have already started on their Big Data journey.  As most other technology hypes, Big Data has also followed the hype cycle and there was a drop in interest in past years after the initial frenzy.  What we are seeing now is the second round of influx in interest around Big Data after the initial peak some years back. With increasing maturity in Big Data products and business use cases, we are inclined to believe that Big Data is here to stay and will prove to be a differentiator by providing the competitive advantage needed in this age.
With this move, we should see more successful adoptions of Big Data technologies. There is a definite opportunity to cash in early on this technology and get the early bird advantage. But, let’s not get carried away! The industry wide success rate for Big Data project still remains between 22-27% (In 2017, Gartner had got this figure at 40% which most other studies found this figure optimistic and peg it at 15%). So, even if you start on your Big Data journey early, there is no guarantee on the returns UNLESS there is a way to increase the probability of success.
This blog post lists some key takeaways from our experience with Big Data initiatives across several industry segments. These insights should assist in better planning and increased control on your Big Data initiative.

Tuesday, 17 April 2018

Informatica MDM – Fuzzy Matching

This blog touches upon the basics of Informatica MDM Fuzzy Matching.

Informatica MDM – SDP approach

A master data management (MDM) system is installed so that the core data of an organization is secure,  is accessible by multiple systems as and when required and does not have multiple copies floating in the system, in order to have a single source of truth. A solid Suspect Duplicate Process is required in order to achieve the 360 degree view of an entity.
The concept of Suspect Duplicate Processing represents the broad category of activities related to identifying entities that are likely duplicates of each other. Suspect duplicate processing is the process of searching for, matching, creating associations between and, when appropriate, merging data for existing duplicate party records in the system.
To achieve this functionality, Informatica MDM has come up with its own Suspect Duplicate Processing (SDP) approach. An organization based on its use case can opt any of the following two approaches:
  1. Deterministic Matching Approach
  2. Fuzzy Matching Approach
Deterministic Matching Approach
Deterministic Matching uses a series of rules, like nested if statements, to run a series of logical tests on the data sets. This is how we determine relationships, hierarchies, and households within a dataset. Deterministic matching seeks a clear “Yes” or “No” result on each and every attribute, based on which we define whether:
  • Two records are duplicates
  • should be resolved by a data steward or
  • Two unique entities.
It doesn’t leave any room for error and provides the result in an ideal scenario. But most of the data in organizations is far from an ideal scenario. These are the cases when the Fuzzy Matching Approach of Informatica comes handy.
Fuzzy Matching Approach
A fuzzy matching approach is required when we are dealing with less than perfect data to improve the quality of results. Fuzzy Matching measures the statistical likelihood that two records are the same. By rating the “matchiness” of the two records, the fuzzy method is able to find non-obvious correlations between data and hence rates the two records by saying how close they are to each other.
Informatica MDM fuzzy matching offers the above in an easy to configure, flexible, repeatable and probabilistic manner. It gives us the flexibility to define which attributes are required to be matched deterministically (such as Country IDs) and which using the fuzzy logic (such as Names).
The fuzzy matching in Informatica works on different aspects of the data.  The algorithm can be configured depending on whether we are catering our algorithm to match an Individual or a household, contact person or an organization, etc. This helps us to handle different scenarios in the data. Also based on the understanding of the data we can choose the strictness of the algorithm, not only in terms of the matching but in terms searching as well.
The main strength of Informatica MDM Fuzzy matching is that it is a rule-based matching system and unless and until the match criterion is met we won’t be getting a match, which makes it a business user-friendly matching system.
The match criteria can be defined into two categories,
  • Automatic Merge and
  • Manual Merge.
Automatic Merge is a scenario where the system by itself finds out that the two entities in question are duplicates whereas Manual merge is a scenario where we need a Data Steward to decide whether two parties in question are duplicates or not. Based on the rule (Automatic or Manual) that is satisfied by a suspect pair, the fate of the pair is decided whether the records merge automatically or a task is created for a Data Steward. If none of the defined rules satisfy the suspect pair then the two records are treated two unique parties/entities.
The rule based approach of Fuzzy logic makes it easy for Business Users and Data Stewards to identify what record patterns can constitute of a duplicate pair. Thus making it a hit with Business Users and resonating the effect with the program sponsors by making the MDM implementation successful.
About the Author
Ripudaman Singh Dhaliwal, Manager at Mastech InfoTrellis has considerable experience in Probabilistic (Fuzzy) Matching Algorithms.
http://www.infotrellis.com/informatica-mdm-fuzzy-matching/

Wednesday, 11 April 2018

Interfacing Virtual MDM through DataStage

MDM Connector stage is a key to open the door of IBM Virtual MDM. Yes, we can manipulate the data in MDM (MDM refers to IBM Virtual MDM in this post) using the MDM Connector stage which was introduced in IBM DataStage v11.3.
We know that loading data into MDM is not an easy task since it involves many tables and the relationship among the tables should be maintained properly, otherwise will end up dealing with junk not with the data. MDM Connector stage makes this task simpler by allowing us to configure everything in the single configuration window.
This blog post details on how the basic operations (read/write) on data can be performed using the Connector stage in v11.5.

Tuesday, 10 April 2018

Intelligent Data Management meets SMART MDM™ Methodology

Informatica MDM Multi Domain Edition (MDE) supports multiple Business Data Domains with a flexible data model which allows you to adapt to Data model in line with your business requirements; also provides you the flexibility not to adapt to fixed vendor defined data models. The business rules can be reused across unified MDM, data quality and data integration on a single platform. The granular web services are automatically generated and high level composite services are created for rapid integration. The UI elements are automatically generated from Data model definitions. The value can be delivered within weeks hence reduces the chances of risks for delays and cancellations.

Learn more. http://www.infotrellis.com/intelligent-data-management-meets-smart-mdm-methodology/

Monday, 9 April 2018

Big Data Challenges and Solutions

Big Data has empowered organizations to inspect substantial volumes of structured and unstructured data. Big Data augments decision making, by delivering data and conclusions from the projected valuable information. Organizations are presently in a situation to consolidate their data with the acquired large data sets such as geospatial data. Client sentiments can be observed and changes in client conclusions can be effectively distinguished through the scouring of online information.
Data is an asset and becomes a liability when you are drowning in it. If an organization does not know how to leverage the data properly the greatest resource can become a downside. One of the biggest challenges is to extract value from their information resources, make better decisions, improve operations and reduce risk. How do Organizations add context to the unstructured data to fuel better analytics and decision making? The challenges include capture, curation, storage, search, sharing, analysis, and visualization.
This blog post gives an overview of Big Data, the associated challenges and the possible solutions offered by us.
http://www.infotrellis.com/big-data-challenges-solutions/

Wednesday, 4 April 2018

Automate Data Quality with Informatica IDQ

Data Quality is the process of understanding the quality of data attributes such as data types, data pattern, existing values, and so on. Data quality is also about capturing the score of an attribute based on some specific constraints. For example, get the count of records for which the attribute value is NULL, or find the count of records for which a date attribute does not fit into the specified Date Pattern.

Managing your Data Quality

This means that we can weigh the quality of data to any extent irrespective of the available data being good or bad. This Data Quality report can be captured with the complete data details, at record level or even at the attribute level. Using this report, business can identify the quality of data and make out how it can be used to help / benefit the customer. A plan can also be worked out to enhance the quality of data by applying business rules and correcting the required information based on the business needs.
This blog post aims at bringing out the significance of data quality, data quality report generation, steps involved in automation of the data quality report using the scheduler feature of Informatica IDQ.
http://www.infotrellis.com/automate-data-quality-informatica-idq/