Moving Past Gen AIs Honeymoon Phase

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Moving past gen AI’s honeymoon

phase: Seven hard truths for


CIOs to get from pilot to scale
Getting to scale requires CIOs to focus on fewer things but do them better.

This article is a collaborative effort by Aamer Baig, Douglas Merrill, and Megha Sinha, with Danesha Mead and
Stephen Xu, representing views from McKinsey Technology and QuantumBlack, AI by McKinsey.

© Getty Images

May 2024
The honeymoon phase of generative AI (gen AI) We explored many of the key initial technology
is over. As most organizations are learning, it is issues in a previous article.2 In this article, we want
relatively easy to build gee-whiz gen AI pilots, but to explore seven truths about scaling gen AI for the
turning them into at-scale capabilities is another “Shaper” approach, in which companies develop
story. The difficulty in making that leap goes a long a competitive advantage by connecting large
way to explaining why just 11 percent of companies language models (LLMs) to internal applications
have adopted gen AI at scale, according to our and data sources (see sidebar “Three approaches
latest tech trends research..1 to using gen AI” for more). Here are seven things
that Shapers need to know and do:
This maturing phase is a welcome development
because it gives CIOs an opportunity to turn gen 1. Eliminate the noise, and focus on the signal.
AI’s promise into business value. Yet while most Be honest about what pilots have worked.
CIOs know that pilots don’t reflect real-world Cut down on experiments. Direct your efforts
scenarios—that’s not really the point of a pilot, after toward solving important business problems.
all—they often underestimate the amount of work
that needs to be done to get gen AI production 2. It’s about how the pieces fit together, not the
ready. Ultimately, getting the full value from gen AI pieces themselves. Too much time is spent
requires companies to rewire how they work, and assessing individual components of a gen AI
putting in place a scalable technology foundation engine. Much more consequential is figuring
is a key part of that process. out how they work together securely.

1
“McKinsey Technology Trends Outlook 2024,” forthcoming on McKinsey.com.
2
“Technology’s generational moment with generative AI: A CIO and CTO guide,” McKinsey, July 11, 2023.

Three approaches to using gen AI

There are three primary approaches to take in using gen AI:

— In “Taker” use cases, companies use off-the-shelf, gen AI–powered software from third-party vendors such as GitHub
Copilot or Salesforce Einstein to achieve the goals of the use case.

— In “Shaper” use cases, companies integrate bespoke gen AI capabilities by engineering prompts, data sets, and connections
to internal systems to achieve the goals of the use case.

— In “Maker” use cases, companies create their own LLMs by building large data sets to pre-train models from scratch.
Examples include OpenAI, Anthropic, Cohere, and Mistral AI.

Most companies will turn to some combination of Taker, to quickly access a commodity service, and Shaper, to build a
proprietary capability on top of foundation models. The highest-value gen AI initiatives, however, generally rely on the Shaper
approach.1

1
For more on the three approaches, see “Technology’s generational moment with generative AI: A CIO and CTO guide,” McKinsey, July 11, 2023.

2 Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale
3. Get a handle on costs before they sink you. been deemed “successful,” but it was not applied
Models account for only about 15 percent to an important part of the business.
of the overall cost of gen AI applications.
Understand where the costs lurk, and apply the There are many reasons for failing to scale,
right tools and capabilities to rein them in. but the overarching one is that resources and
executive focus are spread too thinly across
4. Tame the proliferation of tools and tech. The dozens of ongoing gen AI initiatives. This is not a
proliferation of infrastructures, LLMs, and tools new development. We’ve seen a similar pattern
has made scaled rollouts unfeasible. Narrow when other technologies emerged, from cloud
down to those capabilities that best serve the to advanced analytics. The lessons from those
business, and take advantage of available innovations, however, have not stuck.
cloud services (while preserving your flexibility).
The most important decision a CIO will need to
5. Create teams that can build value, not just make is to eliminate nonperforming pilots and
models. Getting to scale requires a team with scale up those that are both technically feasible
a broad cross-section of skills to not only build and promise to address areas of the business that
models but also make sure they generate the matter while minimizing risk (Exhibit 1). The CIO will
value they’re supposed to, safely and securely. need to work closely with business unit leaders
on setting priorities and handling the technical
6. Go for the right data, not the perfect implications of their choices.
data. Targeting which data matters most and
investing in its management over time has a big
impact on how quickly you can scale. 2. It’s about how the pieces fit together,
not the pieces themselves
7. Reuse it or lose it. Reusable code can In many discussions, we hear technology leaders
increase the development speed of generative belaboring decisions around the component parts
AI use cases by 30 to 50 percent. required to deliver gen AI solutions—LLMs, APIs,
and so on. What we are learning, however, is that
solving for these individual pieces is relatively easy
1. Eliminate the noise, and focus on and integrating them is anything but. This creates
the signal a massive roadblock to scaling gen AI.
Although many business leaders acknowledge
the need to move past pilots and experiments, The challenge lies in orchestrating the range of
that isn’t always reflected in what’s happening on interactions and integrations at scale. Each use
the ground. Even as gen AI adoption increases, case often needs to access multiple models, vector
examples of its real bottom-line impact are few databases, prompt libraries, and applications
and far between. Only 15 percent of companies (Exhibit 2). Companies have to manage a variety
in our latest AI survey say they are seeing use of of sources (such as applications or databases
gen AI have meaningful impact on their companies’ in the cloud, on-premises, with a vendor, or a
EBIT. 3 combination), the degree of fidelity (including
latency and resilience), and existing protocols (for
Exacerbating this issue is that leaders are drawing example, access rights). As a new component is
misleading lessons from their experiments. They added to deliver a solution, it creates a ripple effect
try to take what is essentially a chat interface pilot on all the other components in the system, adding
and shift it to an application—the classic “tech exponential complexity to the overall solution.
looking for a solution” trap. Or a pilot might have

3
That is, they attribute 5 percent or more of their organizations’ EBIT to gen AI use. McKinsey Global Survey on the state of AI in early 2024,
February 22 to March 5, 2024, forthcoming on McKinsey.com.

Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale 3
Exhibit 1
Focus
Focus on
on use
usecases
casesthat
thatare
arefeasible and
feasible where
and business
where impact
business is clear.
impact is clear.

Criteria for determining business impact and technical feasibility


Use cases Quick / high-impact wins
Category Criteria (illustrative) Second priority

Business Value Can we accurately quantify the value? Is it High


impact creation incremental or a step function in performance?

Strategic How well does this align with or support the


alignment company’s primary strategic objectives?

Ease of Are end users enthusiastic about adopting


adoption the solution? Is there a demand for more Business
features or capabilities? impact
Business Are we introducing this solution at an
readiness appropriate time, considering ongoing
transformations or other projects?

Technical Data Is the data readily available, or do we need to


feasibility readiness create or synthesize it? Are there any special
considerations for handling sensitive data?
Low
Solution Does the solution require proven or nascent Technical
Low High
readiness techniques? feasibility

Ability to Will the proposed business model remain


scale viable as number of users and cloud
consumption increase?

Reusability Can the components of the solution be


repurposed for other use cases?

McKinsey & Company

The key to effective orchestration is embedding The orchestration of the many interactions
the organization’s domain and workflow expertise required to deliver gen AI capabilities, however,
into the management of the step-by-step flow is impossible without effective end-to-end
and sequencing of the model, data, and system automation. “End-to-end” is the key phrase here.
interactions of an application running on a cloud Companies will often automate elements of the
foundation. The core component of an effective workflow, but the value comes only by automating
orchestration engine is an API gateway, which the entire solution, from data wrangling (cleaning
authenticates users, ensures compliance, logs and integration) and data pipeline construction to
request-and-response pairs (for example, to help model monitoring and risk review through “policy
bill teams for their usage), and routes requests to as code.” Our latest research has shown that gen
the best models, including those offered by third AI high performers are more than three times as
parties. The gateway also enables cost tracking likely as their peers to have testing and validation
and provides risk and compliance teams a way embedded in the release process for each model. 4
to monitor usage in a scalable way. This gateway A modern MLOps platform is critical in helping to
capability is crucial for scale because it allows manage this automated flow and, according to
teams to operate independently while ensuring McKinsey analysis, can accelerate production by
that they follow best practices (see sidebar “Main ten times as well as enable more efficient use of
components for gen AI model orchestration”). cloud resources.

4
We define gen AI high performers as those who attribute more than 10 percent of their organizations’ EBIT to their use of gen AI.
McKinsey Global Survey on the state of AI in early 2024, February 22 to March 5, 2024, forthcoming on McKinsey.com.

4 Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale
Exhibit 2
A gen
A AI solution
gen AI solution needs toaccommodate
needs to accommodatea acomplex
complexsetset
of of integrations
integrations across
across the entire tech
the entire tech stack. stack.

Illustrative tech stack with end-to-end automation Data Gen AI capabilities Cloud Models

Front-end application
User interface
Data enrichment
and processing Orchestration
Enhancing
Source data Query validation and intent routing Guardrails capabilities

Unstructured Structured Security Data Semantic Prompt LLM Conversation


data ETL¹ data ETL¹ and retrieval and engi- flow memory
access hybrid neering management
control search and Prompt library
Databases (eg,
observability
vector stores)
Image Prompt
search enrichment LLM agents

Structured Fallback External


data query runtime
search integration

Infrastructure and cloud services API gateway

Foundation models (eg, LLMs, multimodal


models, embedding generation models)

1
Extract, transform, load.

McKinsey & Company

Gen AI models can produce inconsistent results, 3. Get a handle on costs before they
due to their probabilistic nature or the frequent sink you
changes to underlying models. Model versions can The sheer scale of gen AI data usage and model
be updated as often as every week, which means interactions means costs can quickly spiral out
companies can’t afford to set up their orchestration of control. Managing these costs will have a huge
capability and let it run in the background. They impact on whether CIOs can manage gen AI
need to develop hyperattentive observing and programs at scale. But understanding what drives
triaging capabilities to implement gen AI with costs is crucial to gen AI programs. The models
speed and safety. Observability tools monitor themselves, for example, account for only about 15
the gen AI application’s interactions with users percent of a typical project effort. 5 LLM costs have
in real time, tracking metrics such as response dropped significantly over time and continue to
time, accuracy, and user satisfaction scores. If decline.
an application begins to generate inaccurate
or inappropriate responses, the tool alerts the CIOs should focus their energies on four realities:
development team to investigate and make any
necessary adjustments to the model parameters, — Change management is the biggest cost. Our
prompt templates, or orchestration flow. experience has shown that a good rule of thumb
for managing gen AI costs is that for every $1

5
“Generative AI in the pharmaceutical industry: Moving from hype to reality,” McKinsey, January 9, 2024.

Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale 5
Main components for gen AI model orchestration

Orchestration is the process of coordinating various data, transformation, and AI components to manage complex AI workflows.
The API (or LLM) gateway layer serves as a secure and efficient interface between users or applications and underlying gen AI
models. The orchestration engine itself is made up of the following components:

— Prompt engineering and prompt library: Prompt engineering is the process of crafting input prompts or queries that guide
the behavior and output of AI models. A prompt library is a collection of predefined prompts that users can leverage as best
practices/shortcuts when they invoke a gen AI model.

— Context management and caching: Context management highlights background information relevant to a specific task or
interaction. Caching relates to storing previously computed results or intermediate data to accelerate future computations.

— Information retrieval (semantic search and hybrid search): Information-retrieval logic allows gen AI models to search for
and retrieve relevant information from a collection of documents or data sources.

— Evaluation and guardrails: Evaluation and guardrail tools help assess the performance, reliability, and ethical
considerations of AI models. They also provide input to governance and LLMOps. This encompasses tools and processes for
evaluating model accuracy, robustness, fairness, and safety.

spent on developing a model, you need to spend companies default to simply creating a chat
about $3 for change management. (By way of interface for a gen AI application), and second,
comparison, for digital solutions, the ratio has involving their best employees in training models
tended to be closer to $1 for development to $1 to ensure the models learn correctly and quickly.
for change management.6) Discipline in managing
the range of change actions, from training your — Run costs are greater than build costs for
people to role modeling to active performance gen AI applications. Our analysis shows that
tracking, is crucial for gen AI. Our analysis has it’s much more expensive to run models than to
shown that high performers are nearly three build them. Foundation model usage and labor
times more likely than others to have a strong are the biggest drivers of that cost. Most of
performance-management infrastructure, such the labor costs are for model and data pipeline
as key performance indicators (KPIs), to measure maintenance. In Europe, we are finding that
and track value of gen AI. They are also twice as significant costs are also incurred by risk and
likely to have trained nontechnical people well compliance management.
enough to understand the potential value and
risks associated with using gen AI at work.7 — Driving down model costs is an ongoing
process. Decisions related to how to engineer
Companies have been particularly successful in the architecture for gen AI, for example, can
handling the costs of change management by lead to cost variances of 10 to 20 times, and
focusing on two areas: first, involving end users sometimes more than that. An array of cost-
in solution development from day one (too often, reduction tools and capabilities are available,

6
Eric Lamarre, Kate Smaje, and Rodney Zemmel, “Rewired to outcompete,” McKinsey, June 20, 2023.
7
McKinsey Global Survey on the state of AI in early 2024, February 22 to March 5, 2024, forthcoming on McKinsey.com.

6 Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale
such as preloading embeddings. This is analytics solutions. The goal here is to develop
not a one-off exercise. The process of cost a modeling discipline that instills an ROI focus
optimization takes time and requires multiple on every gen AI use case without getting lost in
tools, but done well, it can reduce costs from a endless rounds of analysis.
dollar a query to less than a penny (Exhibit 3).

— Investments should be tied to ROI. Not all 4. Tame the proliferation of tools
gen AI interactions need to be treated the and tech
same, and they therefore shouldn’t all cost Many teams are still pushing their own use cases
the same. A gen AI tool that responds to live and have often set up their own environments,
questions from customers, for example, is resulting in companies having to support multiple
critical to customer experience and requires infrastructures, LLMs, tools, and approaches
low-latency rates, which are more expensive. to scaling. In a recent McKinsey survey, in fact,
But code documentation tools don’t have to be respondents cited “too many platforms” as the
so responsive, so they can be run more cheaply. top technology obstacle to implementing gen AI
Cloud plays a crucial rule in driving ROI because at scale. 8 The more infrastructures and tools, the
its prime source of value lies in supporting higher the complexity and cost of operations, which
business growth, especially supporting scaled in turn makes scaled rollouts unfeasible. This state

8
McKinsey survey on generative AI in operations, November 2023.

Exhibit 3
As solutions scale,
As solutions scale,organizations
organizationscan
canoptimize
optimizecosts.
costs.

Cost per query by week,¹ $


1.0

0.8

0.6

0.4

0.2

0.0
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Backlog
Initial proof Add RAG,² Add intent Re-ranking Migrate from Migrate from Vendor price Batching,
of concept maxing out recognition and prompt paid GPT for risk reduction, and
prompt and routing, optimization embedding guardrails and semantic reevaluate
length reducing generation and intent cache need for
search space model to recognition to chatbot
and adding open-source open-source
LLM calls model models and
regular
expression
1
Illustrative example pulling from multiple case studies.
2
Retrieval-augmented generation.

McKinsey & Company

Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale 7
of affairs is similar to the early days of cloud and for example. But greater impact came only when
software as a service (SaaS), when accessing the other parts of the organization—such as risk and
tech was so easy—often requiring no more than a business experts—were integrated into the teams
credit card—that a “wild west” of proliferating tools along with product management and leadership.
created confusion and risk.
There are multiple archetypes for ensuring
To get to scale, companies need a manageable set this broader organizational integration. Some
of tools and infrastructures. Fair enough—but how companies have built a center of excellence to act
do you know which providers, hosts, tools, and as a clearinghouse to prioritize use cases, allocate
models to choose? The key is to not waste time on resources, and monitor performance. Other
endless rounds of analysis on decisions that don’t companies split strategic and tactical duties among
matter much (for example, the choice of LLMs is less teams. Which archetype makes sense for any
critical as they increasingly become a commodity) or given business will depend on its available talent
where there isn’t much of a choice in the first place— and local realities. But what’s crucial is that this
for example, if you have a primary cloud service centralized function enables close collaboration
provider (CSP) that has most of your data and your between technology, business, and risk leads, and
talent knows how to work with the CSP, you should is disciplined in following proven protocols for
probably choose that CSP’s gen AI offering. Major driving successful programs. Those might include,
CSPs, in fact, are rolling out new gen AI services for example, quarterly business reviews to track
that can help companies improve the economics of initiatives against specific objectives and key
some use cases and open access to new ones. How results (OKRs), and interventions to resolve issues,
well companies take advantage of these services reallocate resources, or shut down poor-performing
depends on many variables, including their own cloud initiatives.
maturity and the strength of their cloud foundations.
A critical role for this governing structure is to ensure
What does require detailed thinking is how to build that effective risk protocols are implemented and
your infrastructure and applications in a way that followed. Build teams, for example, need to map
gives you the flexibility to switch providers or models the potential risks associated with each use case;
relatively easily. Consider adopting standards widely technical and “human-in-the-loop” protocols need
used by providers (such as KFServing, a serverless to be implemented throughout the use-case life
solution for deploying gen AI models), Terraform for cycle. This oversight body also needs a mandate
infrastructure as code, and open-source LLMs. to manage gen AI risk by assessing exposures and
implementing mitigating strategies.
It’s worth emphasizing that overengineering for
flexibility eventually leads to diminishing returns. A One issue to guard against is simply managing the
plethora of solutions becomes expensive to maintain, flow of tactical use cases, especially where the
making it difficult to take full advantage of the volume is large. This central organization needs a
services providers offer. mandate to cluster related use cases to ensure large-
scale impact and drive large ideas. This team needs
to act as the guardians for value, not just managers
5. Create teams that can build value, of work.
not just models
One of the biggest issues companies are facing One financial services company put in place
is that they’re still treating gen AI as a technology clearly defined governance protocols for senior
program rather than as a broad business priority. management. A steering group, sponsored by
Past technology efforts demonstrate, however, that the CIO and chief strategy officer, focused on
creating value is never a matter of “just tech.” For gen enterprise governance, strategy, and communication,
AI to have real impact, companies have to build teams driving use-case identification and approvals. An
that can take it beyond the IT function and embed it enablement group, sponsored by the CTO, focused
into the business. Past lessons are applicable here, on decisions around data architecture, data science,
too. Agile practices sped up technical development, data engineering, and building core enabling

8 Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale
capabilities. The CTO also mandated that at least engineering teams (tech sales/support teams)
one experienced architect join a use-case team developed their own version to find solutions for
early in their process to ensure the team used the unique client calls, commercialization teams had
established standards and tool sets. This oversight product descriptions, and customer support teams
and governance clarity was crucial in helping the had a set of specific product details to answer
business go from managing just five to more than queries. As each team updated its version of the
50 use cases in its pipeline. product information, conflicts emerged, making
it difficult for gen AI models to use the data. To
address this issue, the company is putting all
6. Go for the right data, not the relevant product information in one place.
perfect data
Misconceptions that gen AI can simply sweep up
the necessary data and make sense of it are still 7. Reuse it or lose it
widely held. But high-performing gen AI solutions Reusable code can increase the development
are simply not possible without clean and accurate speed of generative AI use cases by 30 to 50
data, which requires real work and focus. The percent.9 But in their haste to make meaningful
companies that invest in the data foundations to breakthroughs, teams often focus on individual use
generate good data aim their efforts carefully. cases, which sinks any hope for scale. CIOs need to
shift the business’s energies to building transversal
Take the process of labeling, which often oscillates solutions that can serve many use cases. In fact, we
between seeking perfection for all data and have found that gen AI high performers are almost
complete neglect. We have found that investing in three times as likely as their peers to have gen
targeted labeling—particularly for the data used for AI foundations built strategically to enable reuse
retrieval-augmented generation (RAG)—can have a across solutions.10
significant impact on the quality of answers to gen
AI queries. Similarly, it’s critical to invest the time to In committing to reusability, however, it is easy to
grade the importance of content sources (“authority get caught in building abstract gen AI capabilities
weighting”), which helps the model understand the that don’t get used, even though, technically, it
relative value of different sources. Getting this right would be easy to do so. A more effective way to
requires significant human oversight from people build up reusable assets is to do a disciplined
with relevant expertise. review of a set of use cases, typically three to five,
to ascertain their common needs or functions.
Because gen AI models are so unstable, companies Teams can then build these common elements
need to maintain their platforms as new data is as assets or modules that can be easily reused or
added, which happens often and can affect how strung together to create a new capability. Data
models perform. This is made vastly more difficult preprocessing and ingestion, for example, could
at most companies because related data lives in include a data-chunking mechanism, a structured
so many different places. Companies that have data-and-metadata loader, and a data transformer
invested in creating data products are ahead of as distinct modules. One European bank reviewed
the game because they have a well-organized data which of its capabilities could be used in a wide
source to use in training models over time. array of cases and invested in developing a
synthesizer module, a translator module, and a
At a materials science product company, for sentiment analysis module.
example, various teams accessed product
information, but each one had a different version. CIOs can’t expect this to happen organically. They
R&D had materials safety sheets, application need to assign a role, such as the platform owner,

9
Eric Lamarre, Alex Singla, Alexander Sukharevsky, and Rodney Zemmel, “A generative AI reset: Rewiring to turn potential into value in 2024,”
McKinsey, March 4, 2024.
10
McKinsey Global Survey on the state of AI in early 2024, February 22 to March 5, 2024, forthcoming on McKinsey.com.

Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale 9
Exhibit 4
A gen AI
A gen platform team
AI platform team needs
needsan
anarray
arrayof
ofskills.
skills.

Cross-functional platform team DataOps: Manages and optimizes the data pipeline, ensuring the
roles and skills availability and quality of data; supports training and deployment of
gen AI models
Site reliability engineer: Ensures reliability, availability, and perfor-
mance of software systems and applications
Data DataOps DevOps engineer: Establishes the CI/CD¹ pipeline and other auto-
engineer mation needed for teams to rapidly develop and deploy code (eg,
chatbot, APIs) to production
Site
Data Cloud architect: Ensures scalability, security, and cost optimization
reliability
scientist of the cloud infrastructure; designs data storage and management
engineer
systems; facilitates integration and deployment of the AI models
Platform Solution/data architect: Develops creative and efficient solutions
Full-
stack team DevOps
using engineering practices and software/web development
technologies
developer engineer
Platform owner: Acts like a product owner, oversees the build of a
gen AI platform
Full-stack developer: Writes clean and quality scalable code (eg,
Platform Cloud front-end/back-end APIs) that can be easily deployed with CI/CD¹
owner architect pipelines
Solution/
data Data scientist: Fine-tunes foundational models to help
architect RAG²-based approach, ensures alignment of LLM outputs with
responsible AI guidelines
Data engineer: Architects data models to ingest data into vector
databases, creates and maintains automated pipelines, performs
closed-loop testing to validate responses and improve performance

1
Continuous integration (CI) and continuous delivery (CD).
2
Retrieval-augmented generation.

McKinsey & Company

and a cross-functional team with a mandate to The value gen AI could generate is
develop reusable assets for product teams transformational. But capturing the full extent of
(Exhibit 4), which can include approved tools, that value will come only when companies harness
code, and frameworks. gen AI at scale. That requires CIOs to not just
acknowledge hard truths but be ready to act on
them to lead their business forward.

Aamer Baig is a senior partner in McKinsey’s Chicago office, Douglas Merrill is a partner in the Southern California office,
Megha Sinha is a partner in the Bay Area office, Danesha Mead is a consultant in the Denver office, and Stephen Xu is
director of product management in the Toronto office.

The authors wish to thank Mani Gopalakrishnan, Mark Gu, Ankur Jain, Rahil Jogani, and Asin Tavakoli for their contributions
to this article.

Copyright © 2024 McKinsey & Company. All rights reserved.

10 Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale

You might also like