Friday, 22 December 2017

Data Warehouse Vs Data Lake

Data Generation, Analysis, and Usage – Current Scenario

Last decade has seen an exponential increase in the data being generated from across traditional as well as non-traditional data sources. International Data Corporation (IDC)report says that, data generated in the year 2020 alone will be a staggering 40 zettabytes which would constitute a 50-fold growth from 2010. The data generated per second has increased to 2.5 Quintillion bytes and with the advent of latest innovations like the Internet of Things; it is poised to grow even more rapidly. This increase in data generation coupled with growing ability to store various types of data that is being generated has ensued in a vast repository of data which is now available for scrutiny.

Unstructured Data

According to reports by wealth management firm Merrill Lynch,among all these data,80 percent of business-relevant information originates in unstructured form. Now unstructured data refers to information which either does not tailor to a pre-defined data model or is not organized in a pre-defined manner. These could be images, videos, emails, social media data or even sonar readings. Essentially these are data points which cannot be captured in our traditional relational databases.

Analysis of Unstructured Data

As the ability to store varied data increased so did our ability to analyze and derive actionable insights from it. Thecompanies started realizing the significance of analyzing unstructured data along with structured data,started investing more into it andas a result, thepotential benefits which could be harnessed from these previously useless data became more apparent.The personalized loan offerings from banks or the customized offers from e-commerce sites or exclusive loyalty discounts offered by retail chains are just a few examples of how organizations have started deep diving into the unstructured data to come up with tailored offerings.

This blog post brings out the significance of the data storage Repositories namely Data Warehouse and Data Lake, does a comparative analysis and suggests on the different approaches to be adopted based on the implementation decision and architecture.

Traditional Data Warehouse Challenges
Storage and Performance:
A Data Warehouse is a conceptual architecture that helps to store structured, subject-oriented, time variant, non-volatile data for decision making. Historical as well as real-time data from various sources are transformedto load to a structured form.

While a traditional Data warehouse can act as a master repository for all the structured data across the organization, its inability to store unstructured data prevents it from acting as a unified data source for analytics thereby hampering its ability to successfully garner value from such hugedata. Because unstructured data constitutes such large chunk of business-related information, enterprises can no longer afford to neglect it, and leaving this data out of the purview of analytics could prove detrimental for companies.

Also with the exponential increase in the data being generated each day, storing these data in traditional databases could prove expensive for organizations. And as a result of such humongous data being stored, the performance also suffers unless we invest more heavily in the hardware configurations.

Data Quality:
From an implementation standpoint one of the main challenges a data warehousing project poses is pertaining to the data quality. Often when we try to combine inconsistent data from disparate sources it would result in duplicates, inconsistent data, missing data and logical conflicts. Varied level of standardizations across different databases also adds to the issue. These would create a problem at a later stage and will result in faulty reporting and analytics thereby affecting optimal decision making.

Reporting:
By the virtue of having data from across different databases, data warehouse projects often cater to varied reports and analytics as per user demand.Data warehouses being ‘schema on-write’,such reporting and analytics need to be taken into design considerations upfront as we need to define the schema before loading data into the databases. However, envisioning all such reports at the onset might be difficult for business users who are not exposed to the capabilities of the tools and will often result in rework for the technical team.

Change Management:
Because data warehouse projects are structure driven, it does not adapt itself easily to changes. The effort and resource required to adapt to any such changes are invariably exorbitant and will most likely drive up the cost significantly. For instance, if a new business requirement emerges at a later point, which fundamentally changes the original data structure, it would necessitate remodeling of Data Warehouse and this can be extremely time-consuming.

Read more at http://www.infotrellis.com/data-warehouse-vs-data-lake/

Monday, 18 December 2017

How to access Informatica PowerCenter as a Web Service?

Web Services Overview:

Web Services are services available over the web that enables communication and provide a standard protocol for communication. To enable the communication, we need a medium (HTTP) and a format (XML/JSON).

There are two parties to the web services, namely Service Provider and Service Consumer. A web service provider develops/implements the application (web service) and makes it available over the internet (web).  Service Provider publishes an interface for the web services that describes all the attributes of the web service. Service Consumer consumes the web service. For the Consumer to consume the web service, the consumer has to know the services available, request and response parameters, how to call the services and so on.

Hence we can define Web Service as a standardized way of integrating web desk applications using XML, SOAP, WSDL and UDDI open standards over an internet protocol backbone. XML is used to tag the data. SOAP is used to transfer the data. WSDL is used for describing the services available and UDDI is used for listing what services are available.

Why Web Services?
Web Services is used mainly for two reasons:

Platform agnostic communication
Two different applications can talk to each other and exchange data using web services

PowerCenter Web Hub and Web Services
InformaticaPowerCenter has the ability to expose its job (workflow) as a SOAP web service which can be utilized from external applications to access the data integration functionalities even outside Informatica PowerCenter.

This blog post gives an overview of web service, explains how to create web source and targets, create workflows and test the functionality from Web Service Hub.

Read more at http://www.infotrellis.com/how-to-access-informatica-powercenter-as-a-web-service/

Sunday, 10 December 2017

How to integrate Informatica Data Quality (IDQ) with Informatica MDM?

Overview

Data cleansing and standardization is an important aspect of any Master Data Management (MDM) project. Informatica MDM Multi-Domain Edition (MDE) provides reasonable number of cleanse functions out-of-the-box. However, there are requirements when the OOTB cleanse functions are not enough and there is a need for comprehensive functions to achieve data cleansing and standardization, for e.g. address validation, sequence generation. Informatica Data Quality (IDQ) provides an extensive array of cleansing and standardization options. IDQ can easily be used along with Informatica MDM.

This blog post describes the various options to integrate Informatica MDM and IDQ, explains the advantages and disadvantages of each approach to aid in deciding the optimal approach based on the requirements.


Informatica MDM-IDQ Integration Options
There are three options through which IDQ can be integrated with Informatica MDM.

  • Informatica Platform staging
  • IDQ Cleanse Library
  • Informatica MDM as target
  • Option 1: Informatica Platform Staging


Starting with Informatica MDM’s Multi-Domain Edition (MDE) version 10.x, Informatica has introduced a new feature called “Informatica Platform Staging” within MDM to integrate with IDQ (Developer Tool). This feature enables to directly stage/cleanse data using IDQ mappings to MDM’s Stage tables bypassing Landing tables.

Advantages

  • Stage tables are immediately available to use in the Developer tool after synchronization eliminating the need to manually create physical data objects.
  • Changes to the synchronized structures are reflected into the Developer tool automatically.
  • Enables loading data into Informatica MDM’s staging tables bypassing the landing tables.


Read more at, http://www.infotrellis.com/integrate-informatica-data-quality-idq-informatica-mdm/

Friday, 1 December 2017

How to integrate Informatica Data Quality (IDQ) with Informatica MDM

Data cleansing and standardization is an important aspect of any Master Data Management (MDM) project. Informatica MDM Multi-Domain Edition (MDE) provides reasonable number of cleanse functions out-of-the-box. However, there are requirements when the OOTB cleanse functions are not enough and there is a need for comprehensive functions to achieve data cleansing and standardization, for e.g. address validation, sequence generation. Informatica Data Quality (IDQ) provides an extensive array of cleansing and standardization options. IDQ can easily be used along with Informatica MDM.
This blog post describes the various options to integrate Informatica MDM and IDQ, explains the advantages and disadvantages of each approach to aid in deciding the optimal approach based on the requirements.

Informatica MDM-IDQ Integration Options

There are three options through which IDQ can be integrated with Informatica MDM.
  1. Informatica Platform staging
  2. IDQ Cleanse Library
  3. Informatica MDM as target
Option 1: Informatica Platform Staging
Starting with Informatica MDM’s Multi-Domain Edition (MDE) version 10.x, Informatica has introduced a new feature called “Informatica Platform Staging”within MDM to integrate with IDQ (Developer Tool). This feature enables to directly stage/cleanse data using IDQ mappings to MDM’s Stage tables bypassing Landing tables.

http://www.infotrellis.com/integrate-informatica-data-quality-idq-informatica-mdm/

Wednesday, 29 November 2017

Interpreting your data graphically

Apart from the better understanding on data, we need to pay more attention towards the basic statistics as it is the key concept of driving the data to develop interactive visualizations and convert tables into pictures.
The rapid rise of visualization tools such as Spotfire, Tableau, Qlikview and Zoomdata, has gained immense use of graphics in the media. These tools currently hold an ability to transform the data into meaningful information with proper standard level principles in the statistical visualization world. They are very helpful in translating the analysis into pixels when you are ready with the cleansed data/analysis.
It is quite often in everyday life, we get to see a massive amount of data crawling that is continuously increasing. Now the question is how to get better insights before making any decision for a business problem. Let us assume you are inclined towards a specific sector say sales (Or any other sector like finance, marketing & operations) that has billions of records and you want to draw a graphical layer on the data set for a better decision. Being a decision maker, you should probably think about a way to analyze the data, which is a data visualization Tool. Now the question arises on what data visualization is and why to adopt data visualization?
This blog post is about data visualization. It explains in detail on how to convert the massive volume databanks into statistical graphics in a pedagogically meaningful way.

Data Visualization in the Pharma Industry

Pharma industry faces unprecedented challenges that have an impact on development, production and marketing of the medical products.
It has been facing declined success rates in Research & development, patent expirations, global sales, medical bill review, reference based reimbursement system, drug testing/clinical trials, electronic trial master file, Hospital Food, drug and maintenance administration due to huge volume of databanks coupled with lack of decision making strategies where key element of cure is big data and the analytics that go with it.  Big Data helps in organizing your data for future analysis and to derive new business logic from it. You can also change the current business logic as per the data trend to increase your business throughput.
http://www.infotrellis.com/interpreting-data-graphically/

Wednesday, 22 November 2017

Mastech InfoTrellis - Enterprise Data Integration

Using niche technologies, Mastech InfoTrellis enables customers to extract, transform and load data from disparate source systems to centralized data repositories like Master Data Management Hub, Big Data and Analytics Hub.

Mastech InfoTrellis can help your organization attain and manage consistent and transformed data throughout your organization through the use of state-of-the-art ETL tools. Starting from source data system analysis to performance centric data loading processes, we leverage our expertise and experience to get your data organized and available for data analysis by the business user community.

Poor data quality costs firms upwards of $8.8 million per year. Regardless of your business or data management initiative, regardless of whether you are a business or IT user, or where you are on your data quality journey, as a data user you must have ready access to trusted data. With Informatica Data Quality, organizations deliver business value by ensuring that all key initiatives and processes are fueled with relevant, timely and trustworthy data.

Visit our website at http://www.infotrellis.com/enterprise-data-integration/

Sunday, 12 November 2017

Thrive on IBM MDM CE

IBM InfoSphere Master Data Management Collaborative Edition provides a highly scalable, enterprise Product Information Management (PIM) solution that creates a golden copy of products and becomes trusted system of record for all product related information.

Performance is critical for any successful MDM solution which involves complex design and architecture. Performance issues become impedance for smooth functioning of an application, thus obstructing the business to get the best out of it. Periodic profiling and optimizing the application based on the findings is vital for a seamless application.

InfoTrellis has been providing services in PIM space over a decade now for our esteemed clientele that is spread across the globe.

This blog post details on optimizing IBM InfoSphere MDM Collaborative Edition application based on the tacit knowledge acquired from implementations and upgrades carried out over the years.

Performance is paramount
Performance is one of the imperative factors that make an application more reliable. Application performance of MDM Collaborative Edition is influenced by various factors such as solution design, implementation, infrastructure, data volume, DB configurations, WebSphere setup, application version, and so on. These factors play a huge role in affecting business either positively or otherwise. Besides, even in a carefully designed and implemented MDM CE solution, performance issues creep up over a period of time owing to miscellaneous reasons.

Performance Diagnosis
The following questions might help you to narrow down a performance problem to a specific component.

What exactly is slow – Only a specific component or general slowness which affects all UI interactions or scheduled jobs?
When did the problem manifest?
Did performance degrade over time or was there an abrupt change in performance after a certain event?
Answers to the above queries may not be the panacea but provide a good starting point to improve the performance.

Hardware Sizing and Tuning
Infrastructure for the MDM CE application is the foundation on top of which lays the superstructure.

IBM recommends a hardware configuration for a standard MDM CE Production server. But then, that is just a pointer towards the right direction and MDM CE Infrastructure Architects should take it with a pinch of salt.

Some of the common areas which could be investigated to tackle performance bottlenecks are:


  • Ensuring availability of physical memory (RAM) so no or little memory swapping and paging occurs.
  • Latency and bandwidth between the application server boxes and database server. This gains prominence if the Data Centers hosting these are far and away. Hosting Primary DB and App Servers on Data Center could help here.
  • Running MDM CE on dedicated set of boxes will greatly help so that all the hardware resources are up for grabs and isolating performance issues becomes a fairly simple process, of course, relatively.
  • Keeping an eye on disk reads, writes and queues. Any of these rising beyond dangerous levels is not a good sign.

Clustering and Load Balancing


Clustering and Load balancing are two prevalent techniques used by applications to provide “High Availability and Scalability”.


  • Horizontal Clustering – Add more firepower to MDM CE Application by adding more Application Servers
  • Vertical Clustering – Add more MDM CE Services per App Server box by taking advantage of MDM CE configuration – like more Scheduler and AppServer services as necessary
  • Adding a Load Balancer, a software or hardware IP sprayer or IBM HTTP Server will greatly improve the Business user’s experiences with the MDM CE GUI application

Go for High Performance Network File System
Typically clients go with NFS filesystem for MDM CE clustered environments as it is a freeware. For a highly concurrent MDM CE environment, opt for a commercial-grade, tailor-made high performance network file system like IBM Spectrum Scale .

Database Optimization
The performance and reliability of MDM CE is highly dependent on a well-managed database. Databases are highly configurable and can be monitored to optimize performance by proactively resolving performance bottlenecks.

The following are the few ways to tweak database performance.


  • Optimize database lock wait, buffer pool sizes, table space mappings and memory parameters to meet the system performance requirements
  • Go with recommended configuration of a Production-class DB server for MDM CE Application
  • Keeping DB Server and Client and latest yet compatible versions to take advantage of bug fixes and optimizations
  • Ensuring database statistics are up to date. Database statistics can be collected manually by running the shell script from MDM CE located in $TOP/src/db/schema/util/analyze_schema.sh
  • Check memory allocation to make sure that there are no unnecessary disk reads.
  • Defragmenting on need-basis
  • Checking long running queries and optimizing query execution plans, indexing potential columns
  • Executing $TOP/bin/indexRegenerator.sh whenever the indexed attributes in MDM CE data model is modified


MDM CE Application Optimization
The Performance in MDM CE application can be overhauled at various components like data model, Server config., etc. We have covered the best practices that have to be followed in the application side.

Data Model and Initial Load

  • Carefully choose the number of Specs. Discard the attributes that will not be mastered, governed in MDM CE
  • Similarly, larger number of views, Attribute Collections, Items and attributes slower the user interface screen performance. Tabbed views should come handy here to tackle this.
  • Try to off-load cleansing and standardization activities outside of MDM solution
  • Workflow with a many steps can result in multiple problems ranging from an unmanageable user interface to very slow operations that manage and maintain the workflow, so it should be carefully designed.

MDM CE Services configuration

MDM CE application comprises of the following services which are highly configurable to provide optimal performance – Admin, App Server, Event Processor, Queue Manager, Workflow Engine and Scheduler.

All the above services can be fine-tuned through the following configuration files found  within the application.

$TOP/bin/conf/ini – Allocate sufficient memory to the MDM CE Services here
$TOP/etc/default/common.properties – Configure connection pool size and polling interval for individual services here
Docstore Maintenance

Document Store is a placeholder for unstructured data in MDM CE – like logs, feed files and reports.  Over a period of time the usage of Document Store grows exponentially, so are the obsolete files. The document store maintenance report shall be used to check document store size and purge documents that do not hold significance anymore.

Use the IBM® MDMPIM DocStore Volume Report and IBM MDMPIM DocStore Maintenance Report jobs to analyze the volume of DocStore and to clean up the documents beyond configured data retention period in IBM_MDMPIM_DocStore_Maintenance_Lookup lookup table.
Configure IBM_MDMPIM_DocStore_Maintenance_Lookup lookup table to configure data retention period for individual directories and action to be performed once that is elapsed – like Archive or Purge
Cleaning up Old Versions

MDM CE does versioning in two ways.

Implicit versioning

This occurs when the current version of an object is modified during the export or import process.

Explicit versioning

This kind of versioning occurs when you manually request a backup.

Older versions of items, performance profiles and job history need to be cleansed periodically to save load on DB server and application performance in turn.

  • Use the IBM MDMPIM Delete Old Versions Report and IBM MDMPIM Estimate Old Versions Report in scheduled fashion to estimate and clear out old entries respectively
  • Configure IBM MDMPIM Data Maintenance Lookup lookup table to hold appropriate data retention period for Old Versions, Performance Profiles and Job History


Best Practices in Application Development

MDM CE presents couple of programming paradigms for application developers who are customizing the OOTB solution.


  • Scripting API – Proprietary scripting language which at runtime converts the scripts into java classes and run them in JVM. Follow the best practices documented here for better performance
  • Java API – Always prefer Java API over the Scripting API to yield better performance. Again, ensure the best practices documented here are diligently followed


If Java API is used for the MDM CE application development, or customization, then :


  • Use code analyzing tools like PMD, Findbung, SonarQube to have periodic checkpoints so that only the optimized code is shipped at all times
  • Use profiling tools like JProfiler, XRebel, YourKit or VisualVM to constantly monitor thread pools use, memory pools statistics, details about the frequency of garbage collection and so on. Using these tools during resource-intensive activities in MDM CE, like running heavyweight import or export jobs, will not just shed light on inner workings of JVM but offers cues on candidates for optimization


Cache Management

Keeping frequent accessed objects in cache is a primary technique to improvement performance.  Cache hit percentage need to be really high for smooth functioning of the application.


  • Check the Cache hit  percentage for various objects in  the GUI menu System Administrator->Performance Info->Caches
  • The $TOP/etc/default/mdm-ehcache-config.xml and $TOP/etc/default/mdm-cache-config.properties files can be configured to hold large number of entries in cache for better performance


Performance Profiling
A successful performance testing will project most of the performance issues, which could be related to database, network, software, hardware etc. Establish a baseline, identify targets, and analyze use cases to make sure that the performance of the application holds good for long.

You should identify areas of solution that generally extends beyond normal range and few examples are large number of items, lots of searchable attributes, large number of lookup tables.
Frameworks such as JUnit, JMeter shall be used in a MDM CE engagement where Java API is the programming language of choice

About the author

Sruthi is a MDM Consultant at InfoTrellis and worked in multiple IBM MDM CE engagements. She has over 2 years of experience in technologies such as IBM Master Data Management Collaborative Edition and BPM.

Selvagowtham is a MDM Consultant at InfoTrellis and plying his trade in Master Data Management for over 2 years. He is a proficient consultant in IBM Master Data Management Collaborative Edition and Advanced Edition product.

Wednesday, 1 November 2017

Data Warehouse Migration to Amazon Redshift – Part 1

Traditional data warehouses require significant time and resource to administer, especially for large datasets. In addition, the financial cost associated with building, maintaining, and growing self-managed, on-premise data warehouses is very high. As your data grows, you need to always exchange-off what data to stack into your data warehouse and what data to archive in storage so you can oversee costs, keep ETL complexity low, and deliver good performance.
This blog post details on how Amazon Redshift can make a significant impact in lowering the cost and operational overheads of a data warehouse, how to get started with Redshift, what are the steps involved in migration, prerequisites for migration, post migration activities.

Key business and technical challenges faced:

Business Challenges
  • What kind of analysis do the business users want to perform?
  • Do you currently collect the data required to support that analysis?
  • Where is the data?
  • Is the data clean?
  • What is the process for gathering business requirements?
Technical Challenges
Data Quality –Data comes from many disparate sources of an organization. When a data warehouse tries to combine inconsistent data from disparate sources, it runs into errors.  Inconsistent data, duplicates, logic conflicts, and missing data all result in data quality challenges.  Poor data quality results in faulty reporting and analytics necessary for optimal decision making.
Understanding Analytics – When building a data warehouse, analytics and reporting will have to be taken into design considerations.  In order to do this, the business user will need to know exactly what analysis will be performed.  Envisioning these reports is a great challenge.
Quality Assurance – The end user of a data warehouse makes use of Big Data reporting and analytics to make the best decisions possible.  Consequently, the data must be 100 percent accurate. This high reliance on data quality makes testing a high priority issue that will require a lot of resources to ensure the information provided is accurate.  Successful STLC process has to be completed which is a costly and time intensive process.
Performance –A data warehouse must be carefully designed to meet overall performance requirements. While the final product can be customized to fit the performance needs of the organization, the initial overall design must be carefully thought out to provide a stable foundation from which to start.
Designing the Data Warehouse – Lack of clarity in defining what is expected from a data warehouse by the business users’ result in miscommunication between the business users and the technicians building the data warehouse.  Hence the expected end results are not delivered to the user which calls for fixes after delivery adding up to the existing development fees.
User Expectation – People are not keen to changing their daily routine especially if the new process is not intuitive.  There are many challenges to overcome to make a data warehouse that is quickly adopted by an organization.  Having a comprehensive user training program can ease this hesitation but will require planning and additional resources.
Cost – Building a data warehouse in house to save money though a great idea has multitude of hidden problems. The required levels of skill sets to deliver effective result is not feasible with few experienced professionals leading a team of non-BI trained technicians. The do it yourself efforts turn out costlier than expected.
Data Structuring and Systems Optimization – As you add more and more information to your warehouse; structuring data becomes increasingly difficult and can slow down the process significantly. In addition, it will become difficult for the system manager to qualify the data for analytics. In terms of systems optimization, it is important to carefully design and configure data analysis tools.
Selecting the right type of Warehouse – Choosing the right type of warehouse from the variety of warehouse types available in the market is challenging. You can choose a pre-assembled or customized warehouse. Choosing a custom warehouse saves time building a warehouse from various operational databases, but pre-assembled warehouses save time on initial configuration. Depending on the business model and specific goals the choice has to be made.
Data Governance and Master Data – Information being one of the crucial assets should be carefully monitored. Implementing data governance is mandatory because it allows organizations to clearly define ownership and ensures that shared data is both consistent and accurate.

Amazon Redshift

Redshift is a managed data warehousing and analytics service from AWS, It will make it easy for developers and businesses to set up, operate and scale a clustered relational database engine suitable for complex analytic queries over large data sets. It is fast, utilizing columnar technology and compression to reduce IOs and spreading data across nodes and spindles to parallelize execution. It is disruptively cost-efficient, removing software licensing costs and supporting a pay-as-you-go and grow-as-you-need model. It is a managed service, greatly reducing the hassles of monitoring, backing up, patching and repairing a parallel, distributed environment. It is standards-based, using PostgreSQL as the basic query language and JDBC/ODBC interfaces, enabling a variety of tool integrations.
Amazon Redshift also includes Amazon Redshift Spectrum, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3. No loading or transformation is required, and you can use open data formats.

Read more here:http://www.infotrellis.com/data-warehouse-migration-amazon-redshift-part-1/

Tuesday, 24 October 2017

Big Data and Analytics Solutions - Mastech InfoTrellis

Mastech InfoTrellis offers managed Big Data Analytics Hub Solution Centered on Hadoop, which enables customers to consolidate multi-channel data of various formats into a single source. Big Data Analytics Hub enables self service analytics by different business functions.

Big Data and Analytics Solutions - Mastech InfoTrellis
Big Data and Analytics Solutions - Mastech InfoTrellis


IBM Big Data Solutions combine open source Hadoop and Spark for the open enterprise to cost effectively analyze and manage big data. With BigInsights, you spend less time creating an enterprise-ready Hadoop infrastructure, and more time gaining valuable insights. IBM provides a complete solution, including Spark, SQL, Text Analytics and more to scale analytics quickly and easily.

Informatica provides the industry's only end-to-end big data management solution to deliver successful data lakes. Informatica enables you to integrate, govern and secure big data. Informatica’s self-service data preparation and information catalog, role-based tools, and comprehensive metadata management capabilities ensure that big data can be quickly turned into trusted data assets.

A fully managed cloud data warehouse built on big data platform, Amazon Redshift is highly scalable and effectively contributes to improved performance by helping organizations acquire new insights on customers, with optimized costs.

Visit Our website for more information at, http://www.infotrellis.com/big-data/

Tuesday, 17 October 2017

Big Data Analytics Solutions by Mastech InfoTrellis

Mastech InfoTrellis offers best of breed Master Data Management Services enabling Customers to harness the power of their Master Data. Mastech InfoTrellis has successfully delivered Master Data Management Projects time and again over the past decade.

Mastech InfoTrellis’ diverse expertise in the Big Data space, has helped to assist global enterprises in their Big Data initiatives

Mastech InfoTrellis is Premier Data & Analytics Company Specializing in helping Customers to generate Actionable insights From Their Enterprise Data Assets

Over 7 years of resource and process rigor in Data Management and Analytics space

Experienced fast and laser focused implementation team with client success chips on shoulders

Thursday, 12 October 2017

Interesting facts about Mastech InfoTrellis Big Data Analytics Solution

Interesting facts about Mastech InfoTrellis Big Data Analytics Solution
Interesting facts about Mastech InfoTrellis Big Data Analytics Solution
Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills.

Big data is changing the way people within organizations work together. It is creating a culture in which business and IT leaders must join forces to realize value from all data. Insights from big data can enable all employees to make better decisions-deepening customer engagement, optimizing operations, preventing threats and fraud, and capitalizing on new sources of revenue. But escalating demand for insights requires a fundamentally new approach to architecture, tools and practices.

Mastech InfoTrellis offers managed Big Data Analytics Hub Solution Centered on Hadoop, which enables customers to consolidate multi-channel data of various formats into a single source. Big Data Analytics Hub enables self service analytics by different business functions.

Sunday, 24 September 2017

Big Data and Advanced Analytics Solution Provider

Big Data and Advanced Analytics Solution Provider

Mastech InfoTrellis offers managed Big Data Analytics Hub Solution Centered on Hadoop, which enables customers to consolidate multi-channel data of various formats into a single source. Big Data Analytics Hub enables self service analytics by different business functions.

Over 7 years of resource and process rigor in Data Management and Analytics space.
Experienced fast and laser focused implementation team with client success chips on shoulders.

Visit our website online at http://www.infotrellis.com

Sunday, 17 September 2017

Mastech InfoTrellis Master Data Management Solutions & Services

Mastech InfoTrellis offers best of breed Master Data Management Services enabling Customers to harness the power of their Master Data. Mastech InfoTrellis has successfully delivered Master Data Management Projects time and again over the past decade.

IBM InfoSphere Master Data Management (MDM) manages all aspects of your critical enterprise data, no matter what system or model, and delivers it to your application users in a single, trusted view. Provides actionable insight, instant business value alignment and compliance with data governance, rules and policies across the enterprise.

Informatica Cloud Customer 360 for Salesforce eradicates duplicate, inaccurate, and incomplete account and contact records. It provides clean, trusted data, increases Salesforce user adoption, and boosts ROI.

A complete master data management solution addresses the critical business objectives digital organizations face. Informatica MDM offers the only true end-to-end solution, with a modular approach to ensure better customer experience, decision making and compliance.

Sunday, 10 September 2017

Mastech InfoTrellis - Big Data Management Company

Mastech InfoTrellis’ diverse expertise in the Big Data space, has helped to assist global enterprises in their Big Data initiatives

Mastech InfoTrellis offers managed Big Data Analytics Hub Solution Centered on Hadoop, which enables customers to consolidate multi-channel data of various formats into a single source. Big Data Analytics Hub enables self service analytics by different business functions.

Tuesday, 29 August 2017

Data Integration Solutions by Mastech InfoTrellis

Mastech InfoTrellis can help your organization attain and manage consistent and transformed data throughout your organization through the use of state-of-the-art ETL tools. Starting from source data system analysis to performance centric data loading processes, we leverage our expertise and experience to get your data organized and available for data analysis by the business user community.



Data integration is the process of combining data from many different sources into an application. You need to deliver the right data in the right format at the right timeframe to fuel great analytics and business processes. A data integration project usually involves accessing data, integrating data from one source map to records in another and delivering integrated data to the business, exactly when the business needs it.

Visit our website at http://www.infotrellis.com/Informatica_partner_solutions.php

Sunday, 20 August 2017

Mastech InfoTrellis - Big Data Management Company

Mastech InfoTrellis offers best of breed Master Data Management Services enabling Customers to harness the power of their Master Data. Mastech InfoTrellis has successfully delivered Master Data Management Projects time and again over the past decade.



IBM InfoSphere Master Data Management

IBM InfoSphere Master Data Management (MDM) manages all aspects of your critical enterprise data, no matter what system or model, and delivers it to your application users in a single, trusted view. Provides actionable insight, instant business value alignment and compliance with data governance, rules and policies across the enterprise.

Cloud Customer 360 For Sales Force

Informatica Cloud Customer 360 for Salesforce eradicates duplicate, inaccurate, and incomplete account and contact records. It provides clean, trusted data, increases Salesforce user adoption, and boosts ROI.

Informatica Intelligent Master Data Management

A complete master data management solution addresses the critical business objectives digital organizations face. Informatica MDM offers the only true end-to-end solution, with a modular approach to ensure better customer experience, decision making and compliance.

Friday, 18 August 2017

Mastech InfoTrellis - Big Data Management Services

Mastech InfoTrellis’ diverse expertise in the Big Data space, has helped to assist global enterprises in their Big Data initiatives

Mastech InfoTrellis offers managed Big Data Analytics Hub Solution Centered on Hadoop, which enables customers to consolidate multi-channel data of various formats into a single source. Big Data Analytics Hub enables self service analytics by different business functions.

AllSight Customer Intelligence Management System which delivers Enterprise Customer 360 by ingesting structured and unstructured data from disparate data sources across the organization.

IBM Big Data Solutions combine open source Hadoop and Spark for the open enterprise to cost effectively analyze and manage big data. With BigInsights, you spend less time creating an enterprise-ready Hadoop infrastructure, and more time gaining valuable insights. IBM provides a complete solution, including Spark, SQL, Text Analytics and more to scale analytics quickly and easily.

Sunday, 13 August 2017

Mastech InfoTrellis - Big Data Management Company

Since 2007, Mastech InfoTrellis has been a pioneer in information management, helping clients realize full value from their investments in this space. InfoTrellis offers leading information management technology and expert consulting services to the world’s most recognized brands. The complete suite of products and services is highly-focused around Master Data Management, Big Data, Data Integration, Data Analytics and many more data-based processes enterprises use for Data Governance.

With rich experience in data management for more than a decade, InfoTrellis is pioneering big data management. Traditional data processing techniques are proving to be inadequate. We have the business acumen and technical expertise to provide the best-in-class solutions for handling big data.

Saturday, 5 August 2017

SMART Implementation Methodology Services by MasTech InfoTrellis



SMART Implementation Methodology Services by MasTech InfoTrellis


SMART Implementation Methodology, a common sense multi-phased approach based on parts of Rational Unified Process (RUP) and Agile development methodologies is the secret ingredient to our Success Story.

Mastech InfoTrellis Smart delivery methodology is a common sense approach based on parts of Rational Unified Process (RUP) and Agile development methodologies.

It blends deliverables with accountability and requires customer focused teams to collaborate and deliver a high-quality information management solution.

It provides tried and tested methods to ensure success for information management programs.

Tuesday, 1 August 2017

Intelligent Master Data Management by Informatica

Master Data Management is used to ensure that the master data is maintained and created for the enterprise as system of records. It also makes sure that master data is validated as consistent, correct and complete.

Intelligent Master Data Management by Informatica
Intelligent Master Data Management by Informatica


Drivers for Master Data Management
The key adoption drivers for organizations to implement MDM are regulatory compliance, cost savings, productivity improvements, increased revenues and strategic goals.

Mostly MDM is deployed as part of broader Data Governance program which involves combination of technology, process, policy and people.

Informatica’s approach to MDM
The Informatica MDM software uniquely identifies all master data and the relationships within it, stored in different multiple systems and different formats. It reconciles duplicate, conflicting and inconsistent data into a golden record for enterprise- Best version of truth.

Informatica MDM is remarkably flexible & scalable unlike other MDM applications where the data models are fixed and you are forced to start with single domain such as product or customer. You can start the MDM project with any domain and as the need grow/change over time, domains can be added to same data model and relationships between different data domains can be defined.

Informatica MDM’s data model, security, business rules and Data stewardship functions can be configured easily to support client’s environment. It can be quickly deployed and maintenance is easy for organization, thus can accelerate time to value.

Generally MDM starts with UI then moves to business logic and in the end data model. Informatica’s platform approach is the other way round and starts with data model, moves to business logic and in the end UI.

Informatica MDM delivers value straight to the organization bottom line by providing accurate and complete view of business critical master data. It creates a data about products, customers, suppliers and help customers optimize critical business process, ensure regulatory compliance and make smarter decisions.

Mastech InfoTrellis competency in Informatica MDM
Mastech InfoTrellis an essential resource if you are looking for right MDM partner for an extremely successful MDM implementation and can help you in delivering a quicker return on investment.

Our Smart MDM™ methodology is a result of our many years of experience working with clients on complex MDM programs. This methodology offers a common sense, multi-phased approach based on parts of Rational Unified Process (RUP) and Agile development methodologies. It blends deliverables with accountability and requires customer-focused teams to collaborate to deliver a high-quality MDM solution. It provides tested methods to ensure success for MDM projects.

For more insights regarding Smart MDM™ methodology of Mastech InfoTrellis please visit our page, http://www.infotrellis.com/smart-methodology/

Advantages of Smart MDM™:

  • Value is delivered early and often
  • Greater ability to respond to change
  • Collaboration encouraged between business and IT
  • Tried and tested approach on multiple projects
  • Client satisfaction managed more closely

Sunday, 23 July 2017

MDM for Regulatory Compliance in the Banking Industry

Banking Regulations – Overview

Managing regulatory issues and risk has never been so complex. Regulatory expectations continue to rise with increased emphasis on the institution’s ability to respond to the next potential crisis. Financial Institutions continue to face challenges implementing a comprehensive enterprise-wide governance program that meets all current and future regulatory expectations. There has been a phenomenal rise in expectations related to data quality, risk analytics and regulatory reporting.

Following are some of the US regulations that MDM and customer 360 reports can be used for compliance:

FATCA (Foreign Account Tax Compliance Act)

FATCA was enacted to target non-compliance by U.S. taxpayers using foreign accounts. The objective of FATCA is the reporting of foreign financial assets. The ability to align all key stakeholders, including operations, technology, risk, legal, and tax, is critical to successfully comply with FATCA.

OFAC (Office of Foreign Asset Control)

The Office of Foreign Assets Control (OFAC) administers a series of laws that impose economic sanctions against hostile targets to further U.S. foreign policy and national security objectives. The bank regulatory agencies should cooperate in ensuring financial institutions comply with the Regulations.

FACTA (Fair and Accurate Credit Transactions Act)

Its primary purpose is to reduce the risk of identity theft by regulating how consumer account information (such as Social Security numbers) is handled.

HMDA (Home Mortgage Disclosure Act)

This Act requires financial institutions to provide mortgage data to the public. HMDA data is used to identify probable housing discrimination in various ways.

Dodd Frank Regulations

The primary goal of the Dodd-Frank Wall Street Reform and Consumer Protection Act was to increase financial stability. This law places major regulations in the financial industry.

Basel III

A wide sweeping international set of regulations that many US banks must adhere to is Basel III. Basel III is a comprehensive set of reform measures, developed by the Basel Committee on Banking Supervision, to strengthen the regulation, supervision and risk management of the banking sector.

What do banks need to meet regulatory requirements?

To meet the regulatory requirements described in the previous section, Banks need an integrated systems environment that addresses requirements such as Enterprise-wide data access, single source of truth for customer details, customer identification programs, data auditability & traceability, customer data synchronization across multiple heterogeneous operational systems, ongoing data governance, risk and compliance reports.

How MDM can help?




Enterprise view of customer data

MDM solutions providean enterprise view of all customer data to ensure that a customer is in compliance with Government imposed regulations (e.g. FATCA, Basel II/III, Dodd Frank, HMDA, OFAC, AML etc.) and facilitate data linking for easy access.

Compliance Users

Users who satisfy the compliance criteria will be able to retrieve the customer information such as name, address, contact method and demographics from the MDM solution. They will be able to ensure customer compliance while creating reports, performing reviews and monitor the customer against watch lists.

Compliance Applications

FATCA supporting applications, Dodd Frank reporting applications, HMDA compliance reporting applications, Basel II & III compliance applications receive a data extract from the MDM solution containing detailed customer information such as name, addresses, contact methods, identifiers, demographics and customer to account relationships that enhance compliance reporting and customer analytics.

Compliance users can ensure compliance with all FATCA laws, create reports, link customer information to create HMDA reports and provide complete financial profile of all commercial customers to ensure compliance with Basel II & III regulations

Regulatory Risk Users

Regulatory risk users will be able to use customer data from MDM solution, create reports on an ad hoc basis, and perform annual reviews to ensure customer is compliant with risk regulations. These users will also be able to check if customers are on existing watch lists through pre-configured alerts and update the MDM solution as required during annual reviews.

Regulatory Risk Applications

MDM solution supplies detailed customer information such as name, addresses, identifiers, demographics, and customer to account relationships to Applications supporting AML, OFAC data, KYC, fraud analysis so that they can determine compliance to regulations such as AML. OFAC standards, determine if the proper KYC data has been captured for all customers and monitors fraudulent activities of any customer.

MDM solution will receive a close account transaction from the AML applications if the regulatory risk user determines the customer relationship must be exited for AML non-compliance.OFAC applications update customer’s watch list status within the MDM solution and send add/update/delete customer alert transactions to monitor customers on OFAC watch lists.

Conclusion

MDM solutions when implemented properly, can provide critical information to banks who have to comply with a number of regulations across many countries. At InfoTrellis, we have helped many organizations achieve these goals through IBM MDM implementations.

 About the Author

Greg Pierce

Greg is a Senior MDM Business Architect at InfoTrellis. He has helped many clients across banking, insurance and retail clients actualize value out of their MDM investments.

InfoTrellis to be Acquired by Mastech Digital

InfoTrellis, Inc., a Toronto-based  Data Management & Analytics company, today announced that it has entered into a definitive agreement under which Mastech Digital, Inc. (NYSE:MHH), a leading Digital Transformation Services provider, will acquire InfoTrellis’ services division. The demand for Digital Transformation consulting services is growing rapidly, as more Fortune 1000 companies are funding digital transformation initiatives to reinvent how they do business.  Data & analytics is at the heart of every digital transformation strategy and our combination will bring together two differentiated companies – both with global brand recognition and complementary service offerings – and a unique value proposition to offer expertise and scalability in digital transformation consulting.

The acquired business, to be branded as Mastech InfoTrellis, will offer project-based consulting services to customers in the areas of Master Data Management, Data Integration, Big Data and Customer Intelligence Management (Customer 360), and at the same time strengthen Mastech Digital’s digital transformation services capability. InfoTrellis’ industry-wide recognition for its thought leadership and depth in data management & analytics, as well as proven global delivery expertise, will enhance the growth opportunities of the combined entity.
“Mastech Digital has global brand recognition and is highly respected in the IT industry, and InfoTrellis is very excited to join them,” said Mahmood Abbas, CEO and Co-founder of InfoTrellis.  “I am confident that our strengths in delivering consulting services in data management & analytics, combined with Mastech Digital’s scale and proven experience in providing IT staffing and digital transformation services, will offer a unique value proposition in the market of scalable consulting teams with deep expertise.”
The transaction is valued at USD $55 million, with USD $35.7 million paid in cash at closing and USD $19.3 million deferred over an earn-out period .
InfoTrellis and Mastech Digital expect to realize several synergies from the combined enterprise:

  • Ability to participate in larger digital transformation deals
  • Opportunity to cross-sell broader range of service offerings to existing and future customers of both businesses
  • Larger scale, improved operational efficiencies and expanded global delivery model
  • Better economies of scale from shared support services integration

“We began transforming Mastech into a digital transformation services company a year ago when we launched our new name ‘Mastech Digital’ and recast our service offerings,” said Vivek Gupta, President and CEO of Mastech Digital. “The acquisition of InfoTrellis will be a significant milestone in that journey. I am delighted with the capabilities InfoTrellis brings to the Mastech Digital family and am excited about creating a combined organization that will deliver world-class services around Data Management & Analytics in its expanded portfolio of digital transformation services.”
The combined entity will have augmented scale, expanded global delivery capability, and deeper leadership strength. “The integration will advance our vision of addressing the wider challenges faced by our clients in adopting Big Data and Advanced Analytics,” said Sachin Wadhwa, COO and Co-founder of InfoTrellis. “We believe the combination of Mastech Digital and InfoTrellis will have a meaningful impact in the way Data Management and Analytics consulting is delivered by us.“ 
Mahmood Abbas and Sachin Wadhwa will continue to play the leadership role in the combined entity.
The product division of InfoTrellis has been spun off as a separate company called AllSight, Inc., and Zahid Naeem, one of the cofounders of InfoTrellis, has moved to the AllSight company. “Our vision for InfoTrellis consulting has always been to provide premium data management and analytics consulting services for our clients,” said Zahid Naeem.  “This partnership with Mastech Digital is a major milestone in continuing with that vision. We are also looking forward to a strong partnership between the combined entity and AllSight Inc., as Customer Intelligence Management, or a Customer 360, is a critical software system to deliver upon a digital transformation strategy.”  
InfoTrellis has over 200 associates globally, a customer-base of more than 40 blue chip customers in North America, offices in Toronto (Canada) and Austin (Texas, U.S.), as well as a global delivery center in Chennai (India).  Mastech Digital has nearly 1,300 associates globally, a customer-base of over 300 companies, seven offices across the US, as well as a delivery center in New Delhi (India).
The acquisition is subject to customary closing conditions and is expected to close in July 2017.
About Mastech Digital
Mastech Digital (NYSE MKT: MHH) is a national provider of IT services focused on solving its customers’ digital transformation challenges. The Company’s IT staffing services span across digital and mainstream technologies while its digital transformation services include Salesforce.com, SAP HANA, and digital learning services. A minority-owned enterprise, Mastech Digital is headquartered in Pittsburgh, PA with offices across the U.S. and India. For more information, visit www.mastechdigital.com.
About InfoTrellis
InfoTrellis is a Canada-based Information Management Consulting and Technology Services Company that provides project and consulting services in Master Data Management, Data Integration and Big Data.

Sunday, 16 July 2017

InfoTrellis to be Acquired by Mastech Digital

Toronto, Ontario, Canada – July 7, 2017: InfoTrellis, Inc., a Toronto-based  Data Management & Analytics company, today announced that it has entered into a definitive agreement under which Mastech Digital, Inc. (NYSE: MHH), a leading Digital Transformation Services provider, will acquire InfoTrellis’ services division. The demand for Digital Transformation consulting services is growing rapidly, as more Fortune 1000 companies are funding digital transformation initiatives to reinvent how they do business.  Data & analytics is at the heart of every digital transformation strategy and our combination will bring together two differentiated companies – both with global brand recognition and complementary service offerings – and a unique value proposition to offer expertise and scalability in digital transformation consulting.
The acquired business, to be branded as Mastech InfoTrellis, will offer project-based consulting services to customers in the areas of Master Data Management, Data Integration, Big Data and Customer Intelligence Management (Customer 360), and at the same time strengthen Mastech Digital’s digital transformation services capability. InfoTrellis’ industry-wide recognition for its thought leadership and depth in data management & analytics, as well as proven global delivery expertise, will enhance the growth opportunities of the combined entity.
“Mastech Digital has global brand recognition and is highly respected in the IT industry, and InfoTrellis is very excited to join them,” said Mahmood Abbas, CEO and Co-founder of InfoTrellis.  “I am confident that our strengths in delivering consulting services in data management & analytics, combined with Mastech Digital’s scale and proven experience in providing IT staffing and digital transformation services, will offer a unique value proposition in the market of scalable consulting teams with deep expertise.”
The transaction is valued at USD $55 million, with USD $35.7 million paid in cash at closing and USD $19.3 million deferred over an earn-out period .
InfoTrellis and Mastech Digital expect to realize several synergies from the combined enterprise:
  • Ability to participate in larger digital transformation deals
  • Opportunity to cross-sell broader range of service offerings to existing and future customers of both businesses
  • Larger scale, improved operational efficiencies and expanded global delivery model
  • Better economies of scale from shared support services integration
 “We began transforming Mastech into a digital transformation services company a year ago when we launched our new name ‘Mastech Digital’ and recast our service offerings,” said Vivek Gupta, President and CEO of Mastech Digital. “The acquisition of InfoTrellis will be a significant milestone in that journey. I am delighted with the capabilities InfoTrellis brings to the Mastech Digital family and am excited about creating a combined organization that will deliver world-class services around Data Management & Analytics in its expanded portfolio of digital transformation services.”
 The combined entity will have augmented scale, expanded global delivery capability, and deeper leadership strength.  “The integration will advance our vision of addressing the wider challenges faced by our clients in adopting Big Data and Advanced Analytics,” said Sachin Wadhwa, COO and Co-founder of InfoTrellis.  “We believe the combination of Mastech Digital and InfoTrellis will have a meaningful impact in the way Data Management and Analytics consulting is delivered by us.“ 
Mahmood Abbas and Sachin Wadhwa will continue to play the leadership role in the combined entity.
The product division of InfoTrellis has been spun off as a separate company called AllSight, Inc., and Zahid Naeem, one of the Co-founders of InfoTrellis, has moved to the AllSight company. “Our vision for InfoTrellis consulting has always been to provide premium data management and analytics consulting services for our clients,” said Zahid Naeem.  “This partnership with Mastech Digital is a major milestone in continuing with that vision. We are also looking forward to a strong partnership between the combined entity and AllSight Inc., as Customer Intelligence Management, or a Customer 360, is a critical software system to deliver upon a digital transformation strategy.”  
InfoTrellis has over 200 associates globally, a customer-base of more than 40 blue chip customers in North America, offices in Toronto (Canada) and Austin (Texas, U.S.), as well as a global delivery center in Chennai (India).  Mastech Digital has nearly 1,300 associates globally, a customer-base of over 300 companies, seven offices across the US, as well as a delivery center in New Delhi (India).
The acquisition is subject to customary closing conditions and is expected to close in July 2017.
About Mastech Digital
Mastech Digital (NYSE MKT: MHH) is a national provider of IT services focused on solving its customers’ digital transformation challenges. The Company’s IT staffing services span across digital and mainstream technologies while its digital transformation services include Salesforce.com, SAP HANA, and digital learning services. A minority-owned enterprise, Mastech Digital is headquartered in Pittsburgh, PA with offices across the U.S. and India. For more information, visit www.mastechdigital.com.
About InfoTrellis
InfoTrellis is a Canada-based Information Management Consulting and Technology Services Company that provides project and consulting services in Master Data Management, Data Integration, Customer Intelligence Management, and Big Data.