How machine learning is helping effectively prevent complex social engineering scams with no impact to genuine customers. 

In this blog we document how LexisNexis(R) ThreatMetrix(R) works as a live deployment platform for advanced machine learning models, delivering easy implementation and high performance.

What makes our job challenging is also what makes it so interesting. Fraud is a constantly evolving world and so we too must always be evolving and adapting to the ever-more complex and inventive techniques fraudsters dream up to scam unfortunate individuals out of their hard-earned cash. Strong signals from session profiling tools, combined with a traditional scoring engine already do a good job at detecting 3rd party frauds and so defences against that particular attack type has dramatically improved in recent years for all industries.  

But, in the fraud world, no sooner is one weakness addressed, fraudsters will find a new modus operandi, and you’re back to square one. What is more, each evolution tends to be progressively harder to tackle from a data perspective.  

Scams are a great example. Most of the time, the fraudulent transactions are conducted by the victim themselves, from their own device – thus making it much harder to spot. After all, how can you tell if the individual is acting with free will, or under duress?  The fraudulent signals are often far more subtle than for traditional 3rd party frauds, hence the challenge posed for fraud prevention specialists such as our Professional Services Team.  

The Decision Tree 

Linear scorecards, such as ThreatMetrix default scoring policy, have a strong track record at effectively detecting frauds while also being easy to use and 100% ‘clearbox’ – meaning decisions are transparent and explainable. However, they come with some limitations. For one, they are based on binary features (also referred as “rules”) which need to be manually designed using the continuous variables – i.e. things that can be used to predict potential fraudulent activity, such as how many times in an hour a user logs in, or whether a new account beneficiary has been created in the past few minutes. Although rules can be combined, most of them require to be independently checked, weighted, and summed in sequence (hence the linear term) which is not always appropriate for some complex fraud scenarios.  

Machine learning, on the other hand, is a method of data analysis that automates the model building process, thereby doing away with the need for a linear set of rules. In contrast to scorecards, it automatically consumes all the available continuous variables simultaneously and can capture subtle interactions between these fraud predictors. A classic example of a Machine learning algorithm is a decision tree. By feeding continuous variables and fraud data into the tree, the algorithm determines the best combination of predictors to most effectively authenticate the user and screen for potential fraudulent behaviour, without adding delays to the process. 

A Forest of Boosted Trees 

Figure 1 – Decision Tree illustration. Depth 3 and 6 “child” nodes. 
Although these more advanced techniques can be matched in performance by standard scorecards in the case of some traditional frauds, they are for the most part, far more effective at detecting subtle fraud risk scenarios, such as scams where the genuine user is being manipulated to carry out the transaction themselves.  

ThreatMetrix Dynamic Decision Platform (DDP) is designed for flexibility and supports most types of machine learning algorithm. Armed with this powerful tool, our Data scientists have been able to build high-performing models to prevent scams for our banking and financial services customers. 

In a recent example, we developed a scam detection model to help a tier-one UK bank tackle authorised push payment (APP) fraud, a major challenge for them since 2021. We elected to use a forest of boosted trees, an ensemble of decision trees built leveraging machine learning. The expertise of our data scientists, along with strong domain knowledge, a long-term customer relationship and an excellent understanding of their challenges, paved the way to a collaborative model-building process, highly performing and tailored to the business’ needs.  

Working with the client, we focussed on: 

  • Augmenting fraud data – The long-standing customer relationship allowed us to not only link, clean and augment the fraud data but also identify different populations to prioritise. For example, we analysed defrauded accounts to find missed scam cases and were able to delete some false positives using known device intelligence. We’ve also focussed on payments to new mandates as being higher potential risk compared to existing mandates. 
  • Selecting the best predictors – 200 highly performing fraud predictors were carefully selected from a very rich diversity of sources afforded by the ThreatMetrix decision engine – e.g. device, network, digital identities, behavioural biometrics, two party payment model and global entity reputations. The selection was made using a combination of expert judgment and powerful feature selection methodology. 
  • Tailored to the business – By carefully considering the business objectives and focusing on the practical operational use of the model, we designed a custom approach that would emphasise high-value frauds to the client and produce an optimal solution for minimising financial losses. In technical terms, this is known as a cost function
  • Model Stability – emphasis was also put on establishing model stability, so that the performance could remain consistent over time and to avoid impacting the customer’s operations. 

Using these best practices, the model was built and deployed in under one month and achieved top performance in production for this Tier 1 UK Bank client. ThreatMetrix alerted on ~50% of scams, at a cost of one alert every 3500 online banking payments, representing 0.03% of their daily traffic. In comparison, a traditional scorecard would capture three times fewer scams for a similar number of alerts. 

Figure 2 –  ROC curve on the Web Channel – higher Review Rate (target of 0.3%) 
Figure 3 – ROC curve on the Mob Channel – lower Review Rate (target of 0.03%) 

Note – VDR stands for value detection rate and is defined by the percentage of fraud captured in GBP. The graph on the right is a zoomed version of the one on the left. 

Deployment and Interpretability 

Following the building process, our data scientists were able to deploy the model live in ThreatMetrix DDP with just a few clicks. Thanks to an extremely lightweight computer language and a well-designed back end, the model is scoring live events without introducing any latency into the customer journey – that’s around 3000 complex calculations in less than 2 milliseconds, all deployable in a matter of hours. 

Scores are then used for live decisioning, downstream in the customer journey. The decision engine provides the client with a wide range of configurable options to determine the decision threshold enabling them to easily calibrate it to support their operational objectives. In addition to deploying a model, ThreatMetrix’s transparent policy engine can also be used to create business rules around the model, or combine the score with other models (in an ensemble of models). 

Recognising that achieving better performance with machine learning often comes at the cost of interpretability, we also designed a graphical interpretation of the model outputs based on a technique called SHAP (shown below) to help users explain how the model reaches a particular decision, in even the most complex scenarios. 

Conclusion 

ThreatMetrix DDP is a highly customisable platform. In addition to scorecards, it is a great platform for machine learning model deployment. These models provide far superior performance on the more complex fraud scenarios that are currently posing a significant challenge to the financial services sector, all using subtle fraud indicators that passively authenticate in the blink of an eye, using live data without introducing latency into the customer experience. With a focus on data interpretability, we also stay true to our clear box principles.  

Thanks to data science, machine learning and the unshakeable curiosity and ingenuity of our people, we’re better than ever at catching fraudsters’ latest dirty tricks. Now, we just need to wait for them to make their next move! 

Authored by Lancelot Pregniard, Data Scientist, Professional Services Team