AML Data Quality: The Challenge of Fitting a Square Peg into a Round Hole
Editor's Note: This article originally appeared on the DataVisor blog on February 12, 2017.
As mentioned in my previous articles, traditional rule-based transaction monitoring systems (TMS) have architectural limitations which make them prone to false positives and false negatives:
- Naive rules create a plague of false positives that are expensive for investigators to sift through
- Sophisticated money launderers know how to circumvent rule-based systems leading to false negatives and potential fines from regulators
This article focuses on the third drawback of existing TMS solutions: how their inflexible data models lead to poor data quality, resulting in additional false positives and false negatives.
I think many of us working in the anti-money laundering (AML) technology space have experienced the frustration of spending many hours retrofitting new data types to squeeze into the rigid data model of a TMS. Unfortunately, the more effort we spend retrofitting data, the more likely we introduce data quality issues. Further, when we don’t complete it in a timely fashion, we’re exposed to risk of large fines from regulators. That said, there’s hope on the horizon from machine learning solutions that are more forgiving of disparate data formats.
Square peg in a round hole
Sending data from source systems to many of the existing TMS is like trying to fit a square peg in a round hole. There are two major reasons for why this is the case.
First, TMS require a lot of data of many various types. Financial institutions typically have many disparate customer, account and transaction systems that feed data into the TMS to satisfy monitoring requirements. Second, existing TMS have a monolithic data model that’s generally difficult to adjust without significant customization.
This forces the financial institution to change its data to conform. However, this is difficult because each source system will have its own unique characteristics and ultimately serve a different business purpose. For example, a mortgage lending application may function differently than a system handling retail demand deposit accounts (DDA). Furthermore, each system will have its own data model or way to store and update information.
Unfortunately, these challenges result in a long, arduous process that’s filled with subtle gotchas, leading to missing potential AML events, leaving you exposed to huge fines from regulators. For example, imagine that a financial institution purchased a commercial loans company. The financial institution must integrate the acquired company’s data into their existing TMS, but the process takes longer than anticipated. During a regulatory exam, the regulator uncovers that the purchased company’s data is still not being monitored by the existing TMS. The regulator views the acquired firm’s lack of integration into the existing AML framework as a red flag and decides to probe the program deeper than it had in the past.
Even worse, the more the data is reshaped to fit the TMS data model, the greater the likelihood of developing additional data issues. And as you know, this will lead to false positives and false negatives down the line.
The best solution is to minimize data transformations. If the files are kept as close to the system’s original format as possible, the data integrity issues will be isolated to the system. While a certain degree of data transformations will be required before the detection algorithms are run, this can be accomplished within the TMS itself. However, this would require a TMS that is not based on a monolithic data model, and has some flexibility and adaptability.
How unsupervised machine learning (UML) leads to a more flexible TMS
There are some promising AI-based TMS solutions that are designed to solve this data inconsistency problem. Using unsupervised machine learning (UML) allows the TMS to have flexible data requirements. (For more information about how UML works in the context of AML, read my first blog post on the subject.)
To understand why, consider their differences. Traditional TMS with rule-based models look for specific scenarios and require specific fields structured in certain ways to map them to their internal data model. UML does not have a strict data model that inputs must adhere to; rather, it works with the data that it’s given.
Consider the scenario where an account was previously dormant and then suddenly began transacting very quickly. A rule would require several highly specific data fields and encode strict thresholds in order to try to match the scenario. However, the rigidity of the data fields make the initial integration difficult which increases the likelihood of data quality issues. A secondary issue is the strict thresholds, which lead to false positives and false negatives.
On the other hand, a TMS that leverages UML can take in a variety of data fields to find hidden networks of accounts with anomalous behavior. For example, UML may uncover a network of accounts that were previously dormant and started transacting quickly.
Note this example is simplified, as in practice the UML model would take into account hundreds to thousands of different data attributes to uncover the network.
There are three major benefits of using UML to power or supplement a TMS. First, with low data integration effort required, there are few chances to make mistakes that lead to data quality issues (and ultimately, false positives and false negatives). Second, it’s faster to get the TMS up and running. And third, it’s much easier to add new data fields or entire new use cases over time. This includes changing business logic (for example, new product offerings are launched) and relentless criminals adapting their methods.
The future of TMS technology
Ultimately, detecting money laundering is extremely complex. To make matters worse, customers, customer behaviors, product offerings, regulatory requirements, and even institutions themselves are under a constant state of change. We must consider that the tools we use to fight financial crime today not only limit our technical capabilities, but may actually influence the way we think about the problem itself. As Marshall McLuhan said, “We shape our tools and afterwards our tools shape us.” It’s time we got some better tools.