“From black-box to white-box”
On 12th February 2018 at Buildo, Data Science Milan and Buildo have opened 2018 data science events, with talks about online banking fraud detection.
Financial fraud is a broad term with several potential meanings, it can be defined as the intentional use of illegal methods with the purpose to obtain financial gain. There are many different types of financial fraud, from credit card fraud to automobile insurance fraud and advancements in modern technologies such as internet and mobile have led to an increase of financial fraud.
“What is Banksealer” by Daniele Gallingani, Buildo
Daniele presented Banksealer, an online banking fraud and anomaly detection framework used by analysts as a decision support system. It started in 2016 from research by Politecnico di Milano sponsored by Secure Network and by Buildo.
It can be defined as a decision support system of the It Security teams that, by aggregating the historical transaction data, summarizes the interaction of each customer with the e-banking system and, using advanced statistical and machine learning techniques, notes if, and how, a transaction is atypical.
Some usual frauds in a bank scenario comes from phishing to credentials database compromise until most advanced techniques.
In this tool there is a real time ranking, with high level scoring to block the transaction instead with low level giving the opportunity to proceed, so a device integrated with bank infrastructure.
Banksealer can be defined an explaining machine learning with graphs, by a dashboard that visualize many useful information for the analyst and with other window, a top list of anomalous transactions.
“Banksealer Algorithms and Architecture” by Claudio Caletti, Buildo
In the second speech Claudio Caletti talked about software architecture and algorithms implemented in Banksealer.
Bansealer is a system that moves transactions from different states, the main entities in this tool are exactly transactions: bank transfers, payments, prepaid cards transactions, phone recharge transactions and so on.
The inputs comes from bank with transactions and are trained by machine learning algorithms, then labelled by a scored transaction step forwarding to the output by external systems.
The process can be shared in three blocks: a block exposed to external systems made by import transactions from banks in a raw format and the score transactions; a block of data made by relational database and elastic search; last block is the Banksealer core made by machine learning models and user interface (Front-end, Back-end).
All services in Banksealer are written in Scala because is a type-safety language, while components are made by generic components and specific components, the first one are the core of the system equal for all banks and the second one is a bank driver made up ad-hoc.
Banksealer approach is based on three main algorithm, the first one, local profile, is the most important.
Local profile works on single customers, it defines each user’s individual spending pattern to evaluate the anomaly of each new transaction. During training process transactions are aggregated by customer and each feature distribution is approximated by an histogram.
The anomaly score of each new transaction is calculated using the HBOS (Histogram Based Outlier Score) method. It computes the log-likelihood of a transaction according to the marginal distribution learned. HBOS score is a weighted sum of the normalization applied at the frequency histogram of each feature, where weighting coefficients are tuned by analyst and in the upgrade version they are calculated by a genetic algorithm.
HBOS assumes independence of features making it really faster than multivariate approaches but with a less precision, in fact it performs poor on local outlier problems.
The second algorithm is the global profile, it’s good for new users, it defines “classes” of spending patterns and mitigate the undertraining problem. Each user is represented by six components: total number of transactions, average transaction amount, total amount, average time span between subsequent transactions, number of transactions executed from overseas countries, number of transactions to overseas recipients.
To cluster customer’s profiles is used a DBSCAN (Density-based spatial clustering of applications with noise) using the Mahalanobis distance. For each global profile is calculated the CBLOF (Cluster Based Local Outlier Factor) anomaly score, which tells the analyst how uncommon is the spending pattern respect other closest customers. It detects how much the user profile deviates from the density cluster of “normal” users, small clusters are considered outliers respect large clusters.
The third algorithm is the temporal profile looks on with frauds that take advantage of many transactions made in a time window, by comparing the current spending profile with their history. During training, are calculated mean and standard deviation of these aggregated features for each customer: total amount, total and maximum daily number of transactions. At runtime, is calculated the cumulative value for each features belonging each user and compared it against the previously computed metrics.
All these algorithms are merged to an output ranking score.
Banksealer can be defined a white-box despite other similar tools (black-box) because analyst understand what’s going on, it’s not completely automated and it should be easy to deploy besides a good false positive ratio.
This tool is mainly focused on HBOS algorithm based on histograms, easy to understand by analyst, also there is a ranking score that help him to manage the number of reported transactions with the presence of false positive.
Author: Claudio Giancaterino
Actuary & Data Science Enthusiast