Tuesday, 11 September 2018

Big Data Management System

Mastech InfoTrellis’ diverse expertise in the Big Data space, has helped to assist global enterprises in their Big Data initiatives

Big Data Management System

Our Solutions

Big Data Analytics Hub
Mastech InfoTrellis offers managed Big Data Analytics Hub Solution Centered on Hadoop, which enables customers to consolidate multi-channel data of various formats into a single source. Big Data Analytics Hub enables self service analytics by different business functions.

AllSight – Customer Intelligence Management

AllSight Customer Intelligence Management System which delivers Enterprise Customer 360 by ingesting structured and unstructured data from disparate data sources across the organization.

IBM Big Data Solutions

IBM Big Data Solutions combine open source Hadoop and Spark for the open enterprise to cost effectively analyze and manage big data. With BigInsights, you spend less time creating an enterprise-ready Hadoop infrastructure, and more time gaining valuable insights. IBM provides a complete solution, including Spark, SQL, Text Analytics and more to scale analytics quickly and easily.


Monday, 3 September 2018

IBM MDM BatchProcessor – Tips for better throughput

MDM BatchProcessor is a multi-threaded J2SE client application used in most of the MDM implementations to load large volumes of enterprise data into MDM during initial and delta loads. Oftentimes, processing large volumes of data might cause performance issues during the Batch Processing stage thus bringing down the TPS (Transactions per Second).

Poor performance of the batch processor often disrupts the data load process and impacts the go-live plans. Unfortunately, there is no panacea available for this common problem. Let us help you by highlighting some of the potential root causes that influence the BatchProcessor performance. We will be suggesting remedies for each of these bottlenecks in the later part of this blog.

Infrastructure Concerns
Any complex, business-critical Enterprise application needs careful planning, well ahead of time, to achieve optimal performance and MDM is no exception. During development phase it is perfectly fine to host MDM, DB Server and BatchProcessor all in one physical server. But the world doesn’t stop at development. The sheer volume of data MDM will handle in production needs execution of a carefully thought-out infrastructure plan. Besides, when these applications are running in shared environments Profiling, Benchmarking and Debugging become a tedious affair.

CPU Consumption
BatchProcessor can consume lot of precious CPU cycles in most trivial of operations when it is not configured properly. Keeping an eye for persistently high CPU consumption and sporadic surges is vital to ensure CPU is optimally used by BatchProcessor.

Deadlock
Deadlock is one of the frequent issues encountered during the Batch Processing in multi-threaded mode. Increasing the submitter threads count beyond the recommended value might lead into deadlock issue.

Stale Threads
As discussed earlier, a poorly configured BatchProcessor might open up Pandora’s Box. Stale threads can be a side-effect of thread count configuration in BatchProcessor. Increasing the submitter threads, reader and writer threads beyond the recommended numbers may cause some of the threads to wait indefinitely thus wasting precious system resources.

100% CPU Utilization
“Cancel Thread” is one of the BatchProcessor daemon threads, designed to gracefully shutdown BatchProcessor when the user intends to. Being a daemon thread, this thread is alive during the natural lifecycle of the BatchProcessor. But the catch here is it hogs up to nearly 90% of CPU cycles for a trivial operation thus bringing down the performance.

Let us have a quick look at the UserCancel thread in BatchProcessor client. The thread waits for user interruption indefinitely and checks for the same every 2 seconds once while holding on the CPU all the time.

Read full article at https://bit.ly/2Nvx3Nh