How to: Data Science & AI Centres of Excellence can accelerate digital transformation in banking

--

Der Wanderer über dem Nebelmeer — C.D.Friedrich 1818

A Centre of Excellence can be a major accelerator for data science & AI to become a natural part of banking enterprise-wide and thereby pave the way for digital transformation. read.how.

Data science and AI in banking

By now, writing in 2020, all major banks have run numerous experiments with data science and AI for years. Most have run virtual assistant experiments, investing considerable effort to deliver an experience that is more ‘human like’ than regular chatbots can provide. Most have built data science teams as part of innovation hubs or departments with a more traditional structure.

Success stories are beginning to surface. Nonetheless, common for transformative initiatives of this type is that in banking, they stumble into the challenge of having to integrate with structures built before data science and AI[1] became part of common vocabulary. Even recent digital applications in the banking sector are often not analytics friendly either: They come in combinations of cloud-based, social and mobile, but do not invoke advanced analytics. Between the infrastructure transition to cloud and the change of user interface to mobile, the world of banking is still very traditional in regards to data structures, processes and mind-sets.
This will change over the coming years, and not just organically. There are major transformative efforts in all areas of banks and capable, determined leaders, which will transition the banking ecosystem towards being much more analytics-friendly and make data science and AI differentiators in the industry. Customers are also demanding new agile digital services and products from their banks.

However, don’t just wait for perfect conditions to arise. A Data science and AI Centre of Excellence (DACOE) can be a major accelerator for these disciplines to become a natural part of banking enterprise-wide across all functions and thereby pave the way for digital transformation.

Centre of excellence activities and level of embeddedness

To accelerate both data science and AI agendas and digital transformation successfully, the centre of excellence should,
· Not only be constituted of specialist resources running semi-scientific proofs of concept
· Not work with the rest of the organization on a needs basis leveraged to solve specific tasks
· Not be a standalone structure outside the rest of the organization
· Not provide specialised and scarce resources whose services are not required on a full-time basis in the line organization[2]

These characteristics are common to use when describing a Centre of Excellence. In the case with Data Science and AI in banking, this ‘factory’ model is bound to fail, because it keeps the advanced analytics at an arm’s length distance from where real business is happening

Instead, a data science and AI centre of excellence in banking should have continuous and close interaction with business and IT departments in the architecture and design phase, and analytics resources being integral to the teams. Rather than focusing on thought leadership, best practice, support and staff augmentation, the following should be the main activities for a DACOE in banking:

1. Data science and AI vision for the organization
2. Business development: Business-driven use cases, objectives and key results
3. Build relationship with externals — vendors and universities
4. Communication: Maintain a network of senior promoters and share success stories
5. Develop AI literacy in the organization, and acquire and build talent
6. Common foundation: Target data and technology architecture for data science and AI
7. Common standards and governance: Ethics, model governance, explainability

Each of the main activities are described in more detail in the next section. The DACOE drives each of these activities but they essentially happen ‘outside’ the centre. The only activities ‘inside’ the centre of excellence should be preparatory for the activities happening outside. This must be the level of integration.

Mission of a Data Science and AI Centre of Excellence

A Data Science & AI Centre of Excellence in banking should incorporate into their mission statement to activate advanced analytics for digital transformation of the organization.
All teams need to create their own mission and so it should be with every centre of excellence. In banking, however, the link to digital transformation is essential and must be explicit. Data science and AI are part of a larger movement in banking and in society in general to digitalize interactions and thereby opening up to a world of possibilities. DACOE’s must explicate and sponsor this affiliation to digital transformation in the organization to look beyond natural entry barriers and become integral to banking’s business processes. [3] The DACOE is essentially making Data science and AI easier for each business area to implement well.

In organizations with a long pre-digital history, such as banks, there are multiple barriers. Data is the fuel of analytics and it can be hard to distil when a significant part of the core is still on mainframe, the data landscape stitched from multiple mergers and silos, and urgency of regulatory requirements and business opportunities has created a spider’s web of short cuts. Furthermore, product design workshops start with processes, flows and interactions needed to generate the desired outcome. Data science and AI enter the scene late. When the IT and process architecture is drawn, the question is raised whether AI could sprinkle some magic over certain steps, for instance the segmentation or scoring. This is a strong indicator that AI is still a detached discipline.

Data Science and AI will never be at home within a traditional ecosystem of pre-transformation tools and mind-sets that still abound in Banking. Acknowledge this by connecting to the digital transformation, which is the natural home.

The mission is to activate advanced analytics for digital transformation of the organization. Very little data science and AI happens in the Centre of Excellence. Business development, planning etc. are the main activities

The next section describes each of the main activities in more detail. The final section returns to the overall design question described above and describes underlying key principles for designing a successful DACOE in banking.

Main activities for the Data science and AI centre of excellence

Data science and AI vision for the organization

The DACOE should drive the process for the organization to formulate the vision for data science and AI for the organization. Executives ultimately sponsor the vision and must therefore be part of the defining process.
To get to this point, however, is not straightforward unless you have executive members who have a balanced understanding of what AI can do and the natural path to success. The word ‘balanced’ is important. The advanced analytics industry including high profile consultancies has adopted the software industry’s bad habit of communicating an image that everyone else is moving ahead without looking back. Data science and AI are excellent places to come up with impressive demonstrations that do not scale.[4] AI experts with their feet on the ground should join the work group and interact with executive stakeholders.

A vision statement points to what the DACOE aims to achieve and is in the future tense. It is revised on a regular basis and discarded every two-three years based on where the organization should aim to get next. Anything that stretches beyond that timeline is going to be too abstract and high flying.
Vision statements are individual, based on where the organization is, where it wants to go and how data science and AI can contribute to this movement. However, to set the ambition for your organization’s next move, the following general pairs or trade-offs might be useful:

In general, data science and AI visions have trended towards the left column in this table. In 2021 and onwards the time has come to explore the possibilities on the right column and reduce emphasis on the left column. For example, delivering data science products that are embedded and integral to a banking product. Improving the general analytics literacy in the organization, buying and enabling decision intelligence products attending to the full data science lifecycle and foundation for scalability.

Business development: Business-driven use cases, objectives and key results

A DACOE must be proactive in the identification of use cases. A DACOE with a certain set of capability will have plenty of incoming assignments and a full backlog, so the challenge is not to deliver of AI solutions as such, but to find the use cases where data science and AI can deliver a differentiating impact in business areas. To get the right backlog to prioritize from, the DACOE needs to become proactive and explore projects that are in the early phases of the design and have potential to embed advanced analytics. Several areas should be continuously investigated and the conclusions logged:

· Often quoted and accepted success stories from the industry curriculum
· Digital transformation initiatives launched in the organization — social, mobile, cloud[6]
· Big data projects in core banking functions like risk, fraud and compliance

Only through proactive, early inquiries from the DACOE will a full backlog include these categories. List activities within the categories as early as possible so that data science and AI does not become a last minute injection to any initiative.

The DACOE does not have its own AI projects. There might be foundational activities that it would have to sponsor directly, but even in this case it should be injection to a business-driven project. If no one will sponsor or it does not tie into a prioritized business driven project, then it is not worth doing.[7]

For any particular initiative prioritized, an ambition level for data science and AI should be set. This requires careful analysis of the prerequisites that need to be in place and the opportunity cost of a more ‘low key’, less advanced solution. The general principle here would be to “think big, but start small’ and use an outcome-based business angle to filter and prioritize features.
The focus on business impact is central to execution, because it offers the opportunity to use less advanced analytics for the same outcome. For example, focus on user needs and experience, essentially eliminated avatar-like virtual assistants that were the talk of the town a few years ago. These were trained to converse and sense general sentiments but when users just want to get their question answered; these features are in the way. Complexity reduced.
The “think big, but start small” principle combined with the ultimate idea that applications should be analytics-native, i.e. build with analytics at the centre, leads to the necessary conclusion that work on a particular initiative can go on some time after go-live, which is something that should be accepted as long as focus on business-driven use cases is maintained.

Common foundation: Target data and technology architecture for data science and AI

The practicalities of defining and managing a target data architecture and target technology architecture for data science and AI is tricky. Departments generally share data and technology at some level. Building an independent data platform for data science at department level that can both underpin experimentation and execution is often not an option financially — and it would slow down the time to execute considerably.

At data storage and processing level, the appropriate slicing and dicing is e.g. AML/Compliance, Risk, digital marketing, not ‘data science and AI’. For each of these, a data and technology architecture for data science and AI might make sense, but the ownership of this should sit with the respective groups or IT departments, not the DACOE.
The DACOE could own the Data science and AI tooling. The choice of editor and programming language should not be free in the organization, and a central unit should stay on top of the trends in this fast moving field and continuously evaluate the need for new tooling.[8] Practitioners have different background and preferences, but a BYOT[9] strategy is not necessary will make the field less collaborate and make support a challenge.
Hardly any serious data scientist can deny that they have occasionally stumbled across cool tooling and tried to fit a use case to the tool instead of the other way around. Or just tried different tools to solve a specific problem. The list of tooling needed is long even with tight control: Different relational databases (SQL, MySQL, PostgreSQL), big data frameworks (Hadoop, Spark), NoSQL databases (MongoDB, Cassandra, Redis), visualization tools (PowerBI, Tableau, QlikView), Scraping tools (Octoparse, Content Grabber), data science workbenches (SAS, Cloudera, IBM, Dataiku, Databricks), Labeling tools (MDS, Labelbox, Cloudfactory), data preparation (Alteryx, Seahorse), CatBoost/TensorFlow, etc. With the growing cloud footprint, the list will grow. This needs to be managed for consistency, maintainability, and licensing and operating cost.
Allow room for experimentation with new tools, because the field is tool driven, but maintain a central overview to build a community of knowledge and support. It will improve delivery times, if teams have a central unit they can ask what tools are available and cleared by security.

Build relationship with externals — vendors, partners and universities

The data science and AI ecosystem maintained by the DACOE must stretch beyond organizational boundaries. Multiple external parties can contribute to innovation and the DACOE’s vision:

· Open source community: Data science and AI as a competence has close ties to the open source philosophy with its crowdsourced innovation, sharing and the natural chaos that comes with it. This is particularly relevant when the capabilities of AI are growing at the current pace.
· Decision intelligence vendors: Decision Intelligence is a maturing market with numerous strong vendors within disciplines such as compliance and digital marketing.
· Cloud and tooling vendors: High profile digital natives such as google, amazon, Microsoft and IBM are important to collaborate up with because they drive a data science and AI agenda with top talent.
· Start-ups: Both local and global start-ups should be explored in relation to specific business-driven use cases. In some cases, banks invest in and collaborate with start-ups that show promising relevant potential.
· Academia: Universities have started to offer training in data science disciplines. They are a main funnel for talent.

The DACOE must establish relationships with these externals to ensure a constant inflow of innovation, knowledge, intellectual property and talent. However, it cannot become an intermediary but must adopt a facilitating role of establishing, nurturing and supporting relationships between organizational units and externals.

The relationship with externals has some similarity with the need to manage tools. Data science and AI is a community, network based discipline. Delivery times can improve dramatically by finding the right partners and creating the right setting for data scientists.

Communication: Maintain a network of senior promoters and share success stories

To become successful in the organization, the DACOE must build a network of influential supporters and stakeholders across the value chain.

Just getting the model accepted or approved often requires multiple groups. For example, the ethics and ability to explain results might involve legal and presentation to financial authorities, data governance standards need to be met, and any customer filtering and portfolio managers must accept lead generation.
The most time-consuming involvement starts very early in the process: Getting the model embedded into a business context where an executive leader is willing to take the projected benefits on his books. This is always a challenge. When money and budgets are involved, the confidence needs to be raised to a different level and the game changes. This is where many data science projects in banking currently fail, because they do not offer an end-to-end solution. Model execution is a part of a value chain. Everyone who owns part of a value chain need to be comfortable with the model or be pushed to come into sync with it. For example introducing a new churn model that the data science team believes will deliver leads that are more precise. Portfolio managers will not sign on to an improved top line, if they are i) not comfortable with the predicted outcome and assumptions, ii) not convinced that the organization can (or will) take action on it, iii) not certain that the quality can be maintained, iv) not assured that corrective action is possible if the forecast is not enough etc.

The missing link to cold top or bottom line figures is a significant challenge and a main reason why data science and AI is struggling even in areas where there are industry success stories. Data science must get out of the ivory tower or factory model for a centre of excellence to create the link. Communication, however, is also an essential component to bridge data science and AI to business, and all the groups that the discipline depends on for success.

Develop AI literacy in the organization, and acquire and build talent

Data literacy is the ability to read, write and communicate with data — in short “Do you speak data?”[10] There is a growing recognition that it will become an essential enterprise level capability of a data-driven organization and Gartner predicts “Data literacy will become an explicit and necessary driver of business value in 80% of data and analytics strategies”[11].

Data literacy programs must be extended to include classes specifically designed to increase literacy in Analytics and AI: The ability to read, write and communicate AI and analytics — to speak AI. The group in scope for analytics and AI literacy is a subset of the data literacy group, which might be a large percentage of the organization. 3 groups specifically should be considered in scope of the analytics and AI curriculum:

· Owners and operators of AI flows: In fully automated flows involving advanced analytics and AI, the owners and operators of the flows should be comfortable and understand the business logic implemented in the analytical model.
· Consumers of advanced analytics results and mediators: for example, advisors or portfolio managers that rely on advanced analytics models to provide customer advice or take action on leads.
· Business functions that rely on traditional business intelligence and are prepared to increase the complexity or automation of their analytics — transitioning from data analysis to data engineering and citizen data science roles.

An inspiring education program that also captures the fact that different teams and people will need different levels of literacy is Airbnb’s Data University.[12] The program was started in 2016 with the vision to ‘empower every employee to make data informed decisions’. The programs includes classes taught by data scientists, which motivates participants to go back and create their own localized solutions.

The DACOE should define a levels based model for experts in the field to ensure development paths for data scientists and define expected skills and knowledge that every worker in the field are expected to master at a certain level. The curriculum could be a set of Massive Open Online Courses (MOOC). This would make it possible to start fast without a large investment.

Common standards and governance: Ethics, model governance, explainability

In general, the DACOE structure described here advocates a model, which links data science and AI closely to the individual domains. The DACOE advocates and supports the distributed activities while ensuring that the data science and AI vision is realized and that local usage of the techniques are applied to a local context but with a global and industrywide perspective.

Whereas the common foundation was discussed from an efficiency and scalability angle, common standards and governance is necessary to maintain the level of centralized control and documentation required by external and internal authorities. Centralization of these processes should be justifiable in terms of an explicitly stated demand — otherwise removee them. Areas that belong to this category include,

· Ethics
· Data and Model governance and ownership criteria
· Model criticality levels
· Explainability requirements
· Model risk management framework

The DACOE should regulate adherence to these standards and conduct regular audit to ensure quality of the documentation and purpose of the frameworks.

Key principles for a successful DACOE in banking

The Centre of Excellence model proposed for data science and AI in banking is a version of hub and spoke, leaning towards a federated model. Data science and AI must be embedded and applied to end-to-end flows, business processes and IT. This is the overarching success formula to how you drive growth. Successful data science and AI is applied and the sophistication level is prioritized lower than applicability.

Returning to the opening remarks and the claim that a DACOE should deviate from a standard model of how to set up centres of excellence. Based on the key centre activities, the model described here clarifies why this will not work, and what a contrasting model looks like:

A recurring point above is that data science and AI does not have to be advanced to be successful. This theme has recently also been highlighted as a success formula not just in banking: “Oddly enough, the AI that can drive the explosive growth of a digital firm often isn’t even all that sophisticated. … You need only a computer system to be able to perform tasks traditionally handled by people”[13]
An example of this from outside banking is Amazon’s recommender system — the system that suggest additional products based on your search or purchase history. Two possible approaches for recommender systems avail themselves: The simpler similarity-based model, which compares your search or purchasing history to others and suggests products to you based on this comparison. The second more sophisticated model-based approach builds a profile of your preferences and compares it to which products match these preferences. Amazon has been highly successful in implementing the simpler model; the similarity-based model does the work.

I am convinced — because we have proved it repeatedly — that the factory or centralized version of a DACOE will fail against the model listed above in nearly all cases in banking. The most sought after data science profiles for the DACOE should be data engineers and AI engineers that are generally more versatile and solution oriented, not data scientists. Sure, your organization will not develop the most advanced models in the market, but the model laid out here provides the best conditions for activating data science by embedding it into business designs throughout the organization.

Data science must get out of the ivory tower or factory model for a centre of excellence to connect directly to top or bottom line impact. A recurring point above is that data science and AI does not have to be advanced to be successful.

Notes

[1] Data science as Machine Learning and statistics of applied to big data, AI as the broaden sets of techniques, including data science, that can be used to mimic or augment complex human capabilities)

[2] In the closing section I contrast this negative characteristic with a positive. Keep reading ;-)

[3] DACOE’s in digital native organizations will not need to highlight the connection to digital transformation because it is implicit.

[4] To be successful beyond their laptop, data scientists need a lot of friends in IT, Data, and business. The ‘demo paradox’ highlights a main shortcoming of a centralized model.

[5] Full lifecycle includes data analysis, data engineering, data modelling, ML and Data operations.

6] Digital transformation here defined as the convergence of 5 megatrends: Social, Mobile, Cloud, Data & Analytics, Internet of Things (IoT). The latter is still unexplored in banking.

[7] This point goes against the ’science’ part of data science

[8] PyCharm, Visual studio code, Sublime Text, Atom, Jupyter notebook, Spyder, RStudio are popular editors in 2020 — wonder where the list is in 2 years.

[9] Bring Your Own Tool

[10] Gartner Virtual Briefing on Data & Analytics leadership and vision for 2021 (oct, 2020)

[11] ibid

[12] Stober: ‘How Airbnb is Boosting Data Literacy with ‘Data U Intensive’ training’ (Medium, 2018)

[13] Iansiti/Lakhani: ‘Competing in the age of AI’ (Harvard Business Review, 2020)

--

--