Evaluation

Four interdependent criteria for evaluating AI algorithms and tools in finance were identified:

  1. Appropriate data management is a fundamental issue for every algorithm, as both performance and regulatory compliance are conditional upon it. Ethical considerations, such as fairness of processing and the absence of discriminatory bias, have to be taken into account in this regard.

  2. Performance of an ML algorithm can be addressed using a variety of metrics. The range of metrics available is sufficient for assessing the accuracy of virtually any ML algorithm used in finance, according to both technical and functional criteria. It is however sometimes necessary to balance the selected criteria against the desired degree of explainability.

  3. Stability describes how robust and resilient an ML algorithm’s behaviour turns out to be over its lifecycle. Due care must be taken to guarantee its generalizability to production data and to continuously monitor risks of model drift once deployed in production.

  4. Explainability, a close cousin of algorithmic transparency and interpretability, has to be put in context in order to define its actual purpose. The “explanation” of a specific result or of the algorithm’s behaviour may prove necessary for end users (whether customers or internal users); in other cases, it will serve those tasked with the compliance or governance of the algorithm. The provided explanation thus aims to either inform the customer, ensure the consistency of workflows wherein humans make decisions, or facilitate validation and monitoring of ML models. We therefore introduce four levels of explanation (observation, justification, approximation, and replication) in order to clarify the expectations in terms of explainability of AI in finance, depending on the targeted audience and the associated business risk.

Governance

Incorporating AI into business processes in finance inevitably impacts their governance. We recommend to particularly focus, as early as the algorithm’s design phase, on the following aspects.

Integration into business processes. Does the AI component fulfil a critical function, by dint of its operational role or of the associated compliance risk? Does the engineering process follow a well-defined methodology throughout the ML lifecycle (from algorithmic design to monitoring in production), in the sense of reproducibility, quality assurance, architectural design, auditability, and automation?

Human/algorithm interactions. Those can require a particular kind of explainability, intended either for internal operators who need to confirm or reject an algorithm’s output, or for customers who are entitled to understand the decisions impacting them or the commercial offers made to them. Besides, processes involving AI often leave room for human intervention, which is beneficial or even necessary, but also bears new risks. Such new risks include the introduction of biases into the explanation of an algorithm’s output, or a stronger feeling of engaging one’s responsibility when contradicting the algorithm than when confirming its decisions.

Security and outsourcing. ML models are exposed to new kinds of attacks. Furthermore strategies such as development outsourcing, skills outsourcing, and external hosting should undergo careful risk assessment. More generally, third-party risks should be evaluated.

Initial validation process. This process must often be re-examined when designing an AI algorithm intended for augmenting or altering an existing process. For instance, the governance framework applicable to a business line may in some cases be maintained, while in other cases it will have to be updated before putting the AI component into production.

Continuous validation process. The governance of an ML algorithm also presents challenges after its deployment in production. For example, its continuous monitoring requires technical expertise and ML-specific tools in order to ensure the aforementioned principles are followed over time (appropriate data management, predictive accuracy, stability, and availability of valid explanations).

Audit. As for the audit (both internal and external) of AI-based systems in finance, exploratory works led by the ACPR suggest adopting a dual approach:

  • The first facet is analytical. It combines analysis of the source code and of the data with methods (if possible based on standards) for documenting AI algorithms, predictive models and datasets.

  • The second facet is empirical. It leverages methods providing explanations for an individual decision or for the overall algorithm’s behaviour, and also relies on two techniques for testing an algorithm as a black box: challenger models (to compare against the model under test) and benchmarking datasets, both curated by the auditor.

Such a multi-faceted approach is suitable both for internal auditors and for a supervisory authority, however the latter faces specific challenges due to the scope of its mission. In order to effectively audit AI systems, it will need to build both theoretical and hands-on expertise in data science, while developing a toolkit for the specific purpose of AI supervision.

Public consultation

The analysis presented in this document is subject to public consultation. The objective is to submit to financial actors and other concerned parties (researchers, service and solution providers, control authorities, etc.) guidelines sketched herein for feedback, and more broadly to gather any useful comment, including on supervisory authorities’ best practices.

Updated on the 28th of February 2025