Infotrellis: August 2018

Monday, 27 August 2018

MDM for Regulatory Compliance in the Banking Industry

Banking Regulations – Overview

Managing regulatory issues and risk has never been so complex. Regulatory expectations continue to rise with increased emphasis on the institution’s ability to respond to the next potential crisis. Financial Institutions continue to face challenges implementing a comprehensive enterprise-wide governance program that meets all current and future regulatory expectations. There has been a phenomenal rise in expectations related to data quality, risk analytics and regulatory reporting.

Following are some of the US regulations that MDM and customer 360 reports can be used for compliance:

FATCA (Foreign Account Tax Compliance Act)

FATCA was enacted to target non-compliance by U.S. taxpayers using foreign accounts. The objective of FATCA is the reporting of foreign financial assets. The ability to align all key stakeholders, including operations, technology, risk, legal, and tax, is critical to successfully comply with FATCA.

OFAC (Office of Foreign Asset Control)

The Office of Foreign Assets Control (OFAC) administers a series of laws that impose economic sanctions against hostile targets to further U.S. foreign policy and national security objectives. The bank regulatory agencies should cooperate in ensuring financial institutions comply with the Regulations.

FACTA (Fair and Accurate Credit Transactions Act)

Its primary purpose is to reduce the risk of identity theft by regulating how consumer account information (such as Social Security numbers) is handled.

HMDA (Home Mortgage Disclosure Act)

This Act requires financial institutions to provide mortgage data to the public. HMDA data is used to identify probable housing discrimination in various ways.

Dodd Frank Regulations

The primary goal of the Dodd-Frank Wall Street Reform and Consumer Protection Act was to increase financial stability. This law places major regulations in the financial industry.

Basel III

A wide sweeping international set of regulations that many US banks must adhere to is Basel III. Basel III is a comprehensive set of reform measures, developed by the Basel Committee on Banking Supervision, to strengthen the regulation, supervision and risk management of the banking sector.

What do banks need to meet regulatory requirements?

To meet the regulatory requirements described in the previous section, Banks need an integrated systems environment that addresses requirements such as Enterprise-wide data access, single source of truth for customer details, customer identification programs, data auditability & traceability, customer data synchronization across multiple heterogeneous operational systems, ongoing data governance, risk and compliance reports.

What You May Be Missing by Not Monitoring Your MDM Hub

Organizations spend millions of dollars to implement their MDM solution. They may have different approaches (batch vs. real time; integrated customer view vs. integrated supplier view etc.) – but in general they all expect to get a “one version of the truth” view by integrating different data sources and then providing that integrated view to a variety of different users.

After the completion and successful testing of the MDM implementation project, companies sit back and enjoy the benefits of their MDM hub – and more often than not don’t even think about looking under the hood. It never occurs to them that they could be trying to gain insights into what’s happening inside that MDM hub by asking questions like

– How is the data quality changing?

– What are the primary activities (in processing time) inside the MDM hub?

– How are service levels changing?

However, organizations change, people change, requirements change – impacting what is happening inside the MDM Hub. Such changes can open up significant opportunities for an organization – but without doing any sort of investigation that opportunity is typically not recognized.

Here are two examples – diagnosed through the use of an MDM audit tool:

– The company’s MDM Hub had approximately 100,000 incorrect customer addresses. These addresses were used for regular mailings; the mailings generated (in case of correct address) incremental revenues. Impact on the business related to just one mailing:

$400K wasted on the mailing cost ($4 is the conservative mailing cost per person – for postage, printing of the mailer etc.)
$100K of immediately lost revenues (as past data shows that one in 50 customers spends about $50 immediately following the mailing)
The longer term revenue lost was not assessed, but was estimated to be well over $400K
The opportunity: Cost saving of $400k and revenue increase of $500K or more
– At a different company, by analyzing data processed by week the resulting report was able to determine that the number of new customers processed was declining by 1-2% every week – starting about 6 weeks before the audit was conducted. A deeper review of the audit report suggested that

The original service levels related to customer file changes had been getting worse and worse over that same time period

As customer file changes (as per the audit report) took over 85% of the total processing time, the slower processing lead to less time available for new customer processing

This initial diagnostic was confirmed by the client – they had a slowly growing backlog of new customer files

Ultimately the audit was able to highlight which input data source had been causing the slowdown, allowing the company to resolve the problem at its source

Business impact: a major risk (very significant slowdown in new customer set up) was eliminated before it became a real problem

Read full story at https://bit.ly/2PTp0Ix

Monday, 20 August 2018

Blueprint for a successful Data Quality Program

Data Quality – Overview

Corporates have started to realize that Data accumulated over the years is proving to be an invaluable asset for the business. The data is analyzed and strategies are devised for the business based on the outcome of Analytics. The accuracy of the prediction and hence the success of the business depends on the quality of the data upon which analytics is performed. So it becomes all the more important for the business to manage data as a strategic asset and its benefits can be fully exploited.
This blog aims to provide a blueprint for a highly successful Data Quality project, practices to be followed for improving the Data Quality and how companies can make the right data-driven decisions by following these best practices.

Source Systems and Data Quality Measurement

To measure the quality of the data, third party “Data Quality tools “should hook on to the source system and measure the Data Quality. Having a detailed discussion with the owners of the systems identified for Data Quality measurement needs to be undertaken at a very early stage of the project. Many system owners may not have an issue with allowing a third party Data Quality tool to access their data directly.
But some systems will have regulatory compliance because of which the systems’ owners will not permit other users or applications to access their systems directly. In such a scenario the systems owner and the Data Quality architect will have to agree upon the format in which the data will be extracted from the source system and shared with the Data Quality measurement team for assessing the Data Quality.
Some of the Data Quality tools that are leaders in the market are Informatica, IBM, SAP, SAS, Oracle, Syncsort, Talend.
The Data Quality Architecture should be flexible enough to absorb the data from such systems in any standard format such as CSV, API, and Messages. Care should be taken such that the data that is being made available for Data Quality measurement is extracted and shared with them in an automated way.

Environment Setup

If the Data Quality tool is directly going to connect to the source system, evaluation of the source systems’ metadata, across various environments is another important activity which should be carried out at the initial days of the Data Quality Measurement program. The tables or objects, which hold the source data, should be identical across different environments. If they are not identical, then decisions should be taken to sync them up across environments and should be completed before the developers are on-boarded in the project.
If the Data Quality Team is going to receive data in the form of files, then the location in which the files or data will be shared should be identified and the shared location is created with the help of the infrastructure team. Also, the Data Quality tool should be configured so that it can READ the files available in the SHARED Folder.

Monday, 13 August 2018

Connecting MongoDB using IBM DataStage

Introduction

MongoDB is an open-source document- oriented schema-less database system. It does not organize the data using rules of a classical relational data model. Unlike other relational databases where data is stored in columns and rows, MongoDB is built on the architecture of collections and documents. One collection holds different documents and functions. Data is stored in the form of JSON style documents. MongoDB supports dynamic queries on documents using a document based query language like SQL.

This blog post explains how MongoDB can be integrated with IBM DataStage with an illustration.

Why MongoDB?
For the past two decades we have been using Relational Database as data store as they were the only option that was available. But with the introduction of NoSQL, we have more options based on the requirement. Mongo DB is predominantly used in insurance and travel industry.

We can extract any semi-structured data and load it to MongoDB through any of the integration tools. Also Extract from MongoDB is easier and faster when compared to relational databases.

MongoDB integration with IBM DataStage
Since we don’t have a specific external stage in IBM DataStage tool to integrate MongoDB, we are going with Java Integration stage to load or extract data from MongoDB.

Since MongoDB is a schema free database, we can use structured or semi-structured data extracted through DataStage and load it to MongoDB.

Prerequisites

Make sure you have java installed on your machine.
Install Eclipse tool.
Java requires below MongoDB jar to be imported inside the package to use MongoDB functions

mongo-java-driver-2.11.3.jar or higher versions if available (Download it from the internet)

Also, Java requires below jar file to be imported inside the package to extract or load data from DataStage

jar (It is available on the DataStage server. Location: /opt/IBM/InformationServer/Server/DSEngine/java/lib)

Illustration of a DataStage job
Create a job in DataStage to parse the below sample XML

Read more steps at http://www.infotrellis.com/connecting-mongodb-using-ibm-datastage/

Tuesday, 7 August 2018

How to Match Tweets to Customer Records

Many organizations are analyzing Tweets for various purposes such as sentiment at an aggregate level. For example, “generally what are people saying about us in the Twitter universe?” This is a good baby step into Big Data Analytics but where organizations want to get to is “what is my customer John Smith saying about us?” This customer-level analytics is much more valuable as it allows the organization to serve the customer better, identify “market of one” opportunities and so on.

You have to match Tweets to customer records as a pre-requisite to such analytics. So what are the considerations in doing so? It is a key capability of MDM hubs to address the problem of matching customers together by using structured data sourced from internal systems within the organization and applying traditional deterministic and/or probabilistic matching techniques. But the problem shifts dramatically when trying to match Big Data together. You need to re-think a solution given the problem has changed.

Many are familiar with Twitter and Tweets. What some don’t know is that there is a set of metadata that is distributed with each Tweet. Some of it is useful for matching purposes such as the user’s name, Tweet timestamp, high level location information and so on. This information along with information in the text of the Tweet triangulated with internal information can yield high quality matches.

So below are some considerations in matching Tweets to internal customer records.

http://www.infotrellis.com/how-to-match-tweets-to-customer-records/

Thursday, 2 August 2018

Approaching Data as an Enterprise Asset

If you walk into a meeting with all your senior executives and pose the question:

“Do you consider and treat your data as an Enterprise Asset?”

The response you will get is:

“Of course we do.”

The problem in most organizations, however, is that while it is recognized that data is a corporate asset, the practices surrounding the data do not support the automatic response of Yes We Do.

What does it really mean, to treat your data as an enterprise asset?

http://www.infotrellis.com/approaching-data-as-an-enterprise-asset/