The Enterprise Data Tool Portfolio: Self-service Business Intelligence and Analytics

Keld Stehr Nielsen
7 min readOct 18, 2020

10 times the value of Data Science?

Consider the experience when you join an organization as a data scientist in 2020: you become part of a team of like-minded professionals and have time to set up your own ‘lab’ at your desk with the help from your assigned buddy. A few days later your manager introduces you to your project, the team and its folders. The next milestone is a few months out in the future.
Now consider the experience when you join an organization as a data analyst in 2020: you become part of a team with lots of different profiles. Day one, your manager introduces your predecessor’s reports and asks you to work on updating the immediately as they are due by the end of the week. Over the next months, you operate these reports while urgent requests keep coming in.

Could the work of data analysts be 10 times the value of data science for some organizations? Organizations have to invest heavily in data science capabilities, but think about it: The scientist-to-analyst ratio is often 1:10 in organizations that are not digital native.[1] The value of data science projects is difficult to extract because of the integrations required, irrespective of whether they create new market differentiator products or add an advanced analytics layer to rule-based decision processes. On the hand, data analysts are part of daily business and essential for keeping the lights on. They are already highly integrated and on the edge together with business.

Despite the importance of data analysis to daily operations of an organization, there is a lack of focus on how to develop the data analyst discipline at enterprise level. This article seeks to remedy this shortcoming partly by providing tool portfolio guidelines for self-service Business Intelligence & Analytics.

Self-service data products that data analysts deliver

A data product is a product whose primary objective is to use data to facilitate an end goal.[2] Data analysts develop and operate a wide range of specific data products. Within self-service, we can operate with the following categories:

I. Global reports or dashboards distributed to a wide range of consumers across the organization or to customers. Non-functional requirements for reports within this category are determined by the fact that consumers are often at a distance from where the report is produced, and individually might only be allowed to access to subsets of the information.

II. Local reports or dashboards are in use within a specific business line or function. Producer and consumers share terminology and are able to communicate quite easily. Often, consumers can get customizations and adjustments approved and implemented fast, because only a small amount of coordination is necessary for everyone to be aware.

III. Analyses of a significant variation along multiple dimensions. Gartner operates with 4 main levels: a) Descriptive, b) Diagnostic, c) Predictive, and d) Prescriptive. These types are part of the Self-Service Analytics continuum ranging from simpler descriptive analyses to advanced analyses involving statistics and machine learning.

IV. Rule-based or statistical models consumed by systems. Ideally, these should not be operated as self-service but in reality, it happens.

There are grey areas between the categories. A local report can function as a descriptive analysis, and a global report used locally.

The defining criterion is the consumer group. The first two categories have in common that the producer and consumer are different people or groups, whereas the analysis category, the third group, has the producer as part of the consumer group also. The fourth category, models consumed by systems, has been included because some business de facto develop and maintain this type. However, they are not self-service as any integrations with an IT system means that more roles need to be involved.[3]

The categorization of self-service data products here is silent about the medium through which the consumer receives the information. The delivery of a dataset via an API could be a report if, for instance, each user would only need a small information set for action guidance. Likewise, a dashboard can be accessed via a dedicated tool or as a web element.[4]

Selection criteria for data analyst tools

The flow of data from the producing systems to eventually ending up in a report has many steps involving tools that are not self-service as such from the perspective of a data analyst sitting in a business function. Data catalogues, integration hubs, warehouses, lakes, virtualization solutions are not self-service tools, because they do not allow business users to create their own products. The term ‘Self-Service Business Intelligence and Analytics tooling’ is fairly vague, but in this context it will denote data engineering, analysis, visualization and monitoring tools that business users can use themselves to create self-service data products as defined above. Suggestions for other or better definitions are most welcome as they could definitely vary based on purpose, organizational setup and industry.

The table below lists the tool selection criteria and sample tools for each of the categories. The samples tools list is not exhaustive nor do I intend to recommend these specifically. Tools are included because many know them and therefore can help clarify the criteria in each category. There are many other powerful tools and the many of tools listed can cover several categories.

Some of the tool selection criteria can be accommodated by deciding on a specific distribution channel and will therefore not have to be tool properties. For instance, access mgmt. built to a web user interface might make the same property redundant for the business intelligence tool.

How to create a self-service tool portfolio for data analysts

In many organizations, the self-service tool portfolio is a product of an organic process where tools are introduced locally, grow and divest, until it is clear to everyone that more control or support is needed. Then in the next phase, the tool operations is centralized but growth still happens organically when functions share information, someone moves to a new department, etc. The result of this unmanaged process is a suboptimal use of tools: i) It is a significant budget post at enterprise level even though spend by each individual using team is small — the sum of small number is not necessarily a small number. ii) It is complex to maintain because several units have received an exception to procure and install tools warranted by business needs. iii) It is value degenerating because producers are just using tool they have heard of can do the job without knowing if there is a better or more adequate tool already in the organization. iv) It is incompliant because the tools have not been properly integrated into the wider IT ecosystem.

This article will not discuss how to manage self-service business intelligence and analytics in large organizations. However, the following principles are useful to adopt for this field:

· Keep the tool portfolio simple and cheap

· Challenge any new tool request from an outcome perspective

· Align and formalize expectations between central teams and the data analysts’ teams

Keeping the tool portfolio simple and cheap is respected easiest by having one, perhaps two, tool in each category and they should ideally be open source, but total cost of ownership needs to be taken into account and very versatile open source tools can be difficult to maintain.
Challenging any new tool request from an outcome perspective is essentially the way to keep the portfolio in control. This includes challenging a particular data analysts why they need a specific tool from the existing portfolio — especially if it a paid solution. It also includes challenging any requests for procuring or installing new tool. Expertise and a couple of fast gates can be applied to ensure that producers do not just get their hands on tool because this is what the analyst — not the organization — prefers or happens to have seen a conference.
Aligning and formalizing expectations between central teams and the data analysts’ teams is important because it is generally not possible or preferable to control everything centrally — this is after all a self-service domain. Especially the local reporting and analysis categories, tools might be used because the team believes this is the best available solution or because a new hire has some experience with the tool and get results fast if he is allowed to use it. It should be clear what any central support and control is necessary, and what teams can install if they agree to support it themselves under certain agreed rules of conduct.

Applying these principles will help organizations find the sweet spot between centralized control and distributed flexibility. This requires an understanding of the role these tools play at the level that is appropriate for an informed decision on which tool portfolio is the optimal to further the enterprise’s interests as a whole. The above categories offer a good, general vantage point to adjust to the particular situation in an organization.
Another indirect conclusion from the principles is that open source and Bring Your Own Tool (BYOT) solutions should be considered. The criteria related to global reports can draw in the direction of licensed, enterprise ready versions — likewise some function might benefit from a comprehensive workbench for their analytics. However, in general, there is money to save and possibilities to find solutions that meet particular needs in the open source landscape, and many teams do not need enterprise ready solutions for their local solutions.

Notes

[1] In my experience, the ratio is probably 1:20 or more in larger organizations.

[2] O’Regan “Designing data products” (2018, Medium). See also Patil: Data Juijitsu: The art of Turning Data into Product (2012, O’Reilly)

[3] A single individual might both develop, deploy and operate models but that only means that this person acts as both data analyst and IT engineer.

[4] See O’Regan “Designing data Products’ (Medium, 2018)

--

--