Want ROI from genAI? Rethink what both terms mean

When generative AI popularity and marketing hype went into overdrive last year, just about every enterprise launched a wide range of genAI projects. And for various reasons, very few of them delivered the kind of return on investment that CEOs and board members had expected.

That meant that 2024 has become the year of AI postmortems and recriminations about why projects went sour and who was to blame. What can IT leaders do now to make sure that genAI projects launched later this year and throughout 2025 fare better? Experts are suggesting a radical rethinking of how ROI should be measured in genAI deployments, as well as the kinds of projects where generative AI belongs at all.

“We have an AI ROI paradox in our sector, and we have to overcome it,” said Atefeh “Atti” Riazi, CIO for media enterprise Hearst, which reported $12 billion in revenue last year. “Although we have [years of experience] measuring the ROI for IT on lots of other projects, AI is so disruptive that we don’t really yet understand its impacts. We don’t understand the implications of it long term.”

 

When boards push down genAI mandates — and LOBs go rogue

After OpenAI captured the attention of the industry when consumer fascination with ChatGPT surged in early 2023, Conor Twomey observed a “wave of euphoria and fear that swept over every boardroom.” AI vendors tried to take advantage of this euphoria by marketing their own version of FUD (fear, uncertainty, and doubt), said Twomey, head of AI strategy at data management firm KX.

“Every organization went down the same path and said, ‘We don’t know what this thing is capable of.’”

That sparked a flood of genAI deployments ordered from boards of directors and, to a lesser extent, CEOs. This was happening to an extent that has not been seen since the early days of web euphoria around 1994.

“That was something different with generative AI, where a lot of the motion came top-down,” said Rajiv Shah, who manages AI strategy for Snowflake, a cloud data storage and analytics service provider. “Deep learning, for example, was certainly hyped up, but it didn’t have the same top-down pushing.”

Shah says this top-down approach colored and often complicated the traditional requirements for ROI analysis prior to major rollouts. Little wonder that those rollouts failed to meet expectations.

And mandates from above weren’t the only source of pressure IT leaders faced to push through genAI projects. Many business units also brought AI ideas to IT, and IT pointed out why they would be unlikely to be successful. And those departments often said, “Thanks for the input. We are doing it anyway.”

Such projects tend to shift focus away from companies’ true priorities, notes Kelwin Fernandes, CEO at AI consultant NILG.AI.

“I see genAI being applied in non-core processes that won’t directly affect the core business, such as chatbots or support agents. These projects lack support and long-term engagement from the organization,” Fernandes said. “I see genAI not bringing the promised ROI because people moved their priorities from making better decisions to building conversational interfaces or chatbots.”

Inflated expectations, underestimated costs

Early genAI apps often delivered breathtaking results in small pilots, setting expectations that didn’t carry over to larger deployments. “One of the primary culprits of the cost versus value conundrum is lack of scalability,” said KX’s Twomey.

He points to an increasing number of startup companies using open-source genAI technology that is “sufficient for introductory deployments, meaning they work nicely with a couple hundred unstructured documents. Once enterprises feel comfortable with this technology and begin to scale it up to hundreds of thousands of documents, the open-source system bloats and spikes running costs,” he said.

“Same goes for usage,” he added. “When genAI is inserted into a workflow ideal for a subset of users and then exponentially more users are added, it doesn’t work as hoped.”

Patrick Byrnes, formerly senior consultant for AI at Deloitte and now an AI consultant for DataArt, attributes some of the inflated ROI expectations for generative AI projects to the impressive performance delivered by the earliest genAI applications.

“If you go into Gemini or ChatGPT and ask it something basic, you can get an incredible response right away,” he said. Expecting similar results on a larger scale, “some enterprises did not start small. Right out of the gate, they went with high-impact customer facing efforts.”

Indeed, many of the ROI shortcomings with genAI deployments are a result of executives not thinking through the rollout implications sufficiently, according to an executive in the AI field who asked that her name and affiliation not be used.

“Automation driven by AI leads to productivity gains, but often the cost to enable it is overlooked,” she said. “Enterprises focus on model development, training, and system infrastructure but don’t accurately account for cost of data prep. They spin up massive data sets for AI, but small errors can make it useless, which also leads employees to mistrust outputs, leading to costs without ROI.”

Another overlooked factor, she noted, is that many AI vendors are currently focused on customer acquisition, keeping costs down in the short term. “Then they will ratchet up prices with an eye toward profitability, which will lead to higher costs for enterprise users in the future.”

Those costs are not likely to get meaningfully better by 2025. IDC noted that the costs with generative AI efforts are extensive.

“Generative AI requires enormous levels of compute power. NVIDIA’s workhorse chip that powers the GPUs for datacenters and the AI industry costs ~$10,000 per chip,” the analyst firm said in a September 2023 report. “Operational costs are in the range of $4 million to $5 million monthly, and businesses expect model training costs to exceed $5 million. Added to this are electricity costs and datacenter management.”

The hallucination challenge

On top of all this is the fact that genAI periodically hallucinates, meaning that the system makes things up. That will deliver a bitter surprise if the company is trusting it to analyze critical data in healthcare, finance, or aerospace — and even if it is simply relying on genAI to accurately summarize what happened during a meeting.

For business managers who are used to trusting the numbers generated by a spreadsheet projecting revenue growth, that can be unsettling. Those executives are used to the projections failing because an employee’s assumptions turned out to be too optimistic, but they are not used to Excel lying about the mathematical result of 800 numbers being multiplied.

And it cuts into ROI because all generative AI output must be closely fact-checked by a human, erasing many of the perceived productivity gains.

Hearst’s Riazzi sees the genAI hallucination issue as temporary. “Hallucinations do not bother me. Eventually, it will address itself,” she said.

More importantly, she argues that business simply needs to apply the same supervision and oversight to genAI that it has for decades with its human employees, stressing that “people hallucinate as well” and coders have been known to write “buggy code.”

“Human error is already a big issue in medicine and patient care,” Riazzi said. “There is a lot of bad data out there, but there is no difference [in managing hallucinations] from what we are already doing today. We see a lot of data cleansing going on.”

NILG.AI’s Fernandes is doubtful that genAI hallucinations will ever go away, but he says that shouldn’t necessarily be a dealbreaker for any application. It is simply a matter of enterprises adjusting their thinking to deal with an imperfect reality, something they already have experience doing.

“We have quality assurance to reduce production errors, but errors still exist, and that’s why we have return policies and warranties. We use the QA process as a fallback plan of the factory errors and the warranty as a fallback plan of the QA,” he said. “All those actions reduce the probability of failure to a certain point. They can still exist; we have learned to do business with those errors. We need to understand — on each application — what the right fallback action is for an AI error.”

Looking for ROI in all the wrong places

Even when genAI succeeds, its results are sometimes less valuable than anticipated. For example, generative AI is a very effective tool for creating information that is generally handled by lower-level staffers or contractors, where it is simply tweaking existing material for use in social media or e-commerce product descriptions. It still needs to be verified by humans, but it has the potential for cutting costs in creating low-level content.

But because it often is low level, some have questioned whether that is really going to deliver any meaningful financial advantages.

“Even before AI, the market for mediocre written and visual content was already fairly saturated, so it’s no surprise that some enterprises have discovered there is limited ROI in similar mediocre content generated by AI,” said Brian Levine, a managing director at consultant Ernst & Young.

What ROI should look like for enterprise genAI

KX’s Twomey questioned whether many senior enterprise executives have a realistic handle on what ROI should mean in a generative AI rollout, especially in the first year where it is mostly an experiment rather than a traditional deployment.

“Enterprise deployment of genAI has slowed down — and will continue to do so — as enterprises experience an increase in costs that exceeds the value they are getting,” Twomey said. “When this happens, it tells me that enterprises aren’t understanding the ROI and they’re not appropriately controlling TCO.”

And therein lies the conundrum: How can executives appropriately control the total cost of ownership and appropriately interpret the return on investment if they have no idea what either should look like in a generative AI reality?

This gets even more difficult when secondary ROI factors are considered, such as market and customer/prospect perceptions, Twomey points out.

“This complexity with transiting — and scaling — AI workflows in production has been prohibitive for many enterprise deployments,” he said. “The repercussions are clear losses in time, money, and effort that can also result in competitive disadvantages, reputational damage, and stalled future innovation initiatives.”

It may even be premature to measure ROI monetarily for genAI. “The value for enterprises today is to practice, to experiment,” said DataArt’s Byrnes. “That is one of the things that people don’t really appreciate. There is a strong learning component to all of this.”

Focusing genAI

But while experimentation is important, it should be done intelligently. EY’s Levine notes that some companies are inclined to trust generative AI too much when it comes to methodology, allowing the software to figure out how to obtain the desired information. 

Consider the example of a large and growing retail chain that turned to genAI to figure out the best locations for its next 50 stores. Given insufficient guidelines, the AI went off the rails and returned completely unusable results, according to inside sources.

Instead of simply telling the AI to make recommendations for the best places to launch stores, Levine suggests that the retailer would be better served by coding very extensive and very specific lists of how it currently evaluates new locations. That way, the software can follow those instructions, and the chances of it making errors is somewhat reduced.

Would an enterprise ever tell a new employee, “Figure out where our next 50 stores should be. Bye!”? Unlikely. The business would spend days training that employee on what to look for and where to look, and the employee would be shown lots of examples of how it had been done before. If a manager wouldn’t expect a new employee to figure out how to answer the question without extensive training, why would that manager expect genAI to fare any better?

Given that ROI simply means value delivered minus cost, the best way to improve value is to increase the accuracy and usability of the answers provided. Sometimes, that means not giving genAI broad requests and seeing what it chooses to do. That might work in machine learning, but genAI is a different animal.

To be fair, there absolutely are situations where it makes sense to set genAI loose and see where it chooses to go. But for the overwhelming majority of situations, IT will see far better results if it takes the time to train genAI appropriately.

Reining in genAI projects

Now that the initial hype over genAI has died down, it’s important for IT leaders to protect their organizations by focusing on deployments that will bring true value to the company, say AI strategists.

One suggestion for trying to better control generative AI efforts is for enterprises to create AI committees consisting of specialists in various AI disciplines, Snowflake’s Shah said. That way, every single generative AI proposal originating anywhere in the enterprise would have to be run by this committee, who could veto or approve any idea.

“With security and legal, there are so many things that can go wrong with a generative AI effort. This would make executives go in front of the committee and explain exactly what they wanted to do and why,” he said.

Shah sees these AI approval committees as short-term placeholders. “As we mature our understanding, the need for those committees will go away,” he said.

Another suggestion comes from NILG.AI’s Fernandes. Instead of flashy, large-scale genAI projects, enterprises should focus on smaller, more controllable objectives such as “analyzing a vehicle’s damage report and estimating costs, or auditing a sales call and identifying if the person follows the script, or recommending products in e-commerce based on the content/description of those products instead of just the interactions/clicks.”

And instead of implicitly trusting genAI models, “we shouldn’t use LLMs on any critical task without a fallback option. We shouldn’t use them as a source of truth for our decision-making but as an educated guess, just like you would deal with another person’s opinion.”

More by Evan Schuman:

Renegade business units trying out genAI will destroy the enterprise before they help

AI managing AI that is monitoring AI: What could possibly go wrong?

US makes new move to rein in China’s advanced chip manufacturing

GenAI might be the least-trustworthy software that exists. Yet IT is expected to trust it.

Privacy policies have gone insane. Doubt it? Consider Instacart

​When generative AI popularity and marketing hype went into overdrive last year, just about every enterprise launched a wide range of genAI projects. And for various reasons, very few of them delivered the kind of return on investment that CEOs and board members had expected.

That meant that 2024 has become the year of AI postmortems and recriminations about why projects went sour and who was to blame. What can IT leaders do now to make sure that genAI projects launched later this year and throughout 2025 fare better? Experts are suggesting a radical rethinking of how ROI should be measured in genAI deployments, as well as the kinds of projects where generative AI belongs at all.

“We have an AI ROI paradox in our sector, and we have to overcome it,” said Atefeh “Atti” Riazi, CIO for media enterprise Hearst, which reported $12 billion in revenue last year. “Although we have [years of experience] measuring the ROI for IT on lots of other projects, AI is so disruptive that we don’t really yet understand its impacts. We don’t understand the implications of it long term.”

 

When boards push down genAI mandates — and LOBs go rogue

After OpenAI captured the attention of the industry when consumer fascination with ChatGPT surged in early 2023, Conor Twomey observed a “wave of euphoria and fear that swept over every boardroom.” AI vendors tried to take advantage of this euphoria by marketing their own version of FUD (fear, uncertainty, and doubt), said Twomey, head of AI strategy at data management firm KX.

“Every organization went down the same path and said, ‘We don’t know what this thing is capable of.’”

That sparked a flood of genAI deployments ordered from boards of directors and, to a lesser extent, CEOs. This was happening to an extent that has not been seen since the early days of web euphoria around 1994.

“That was something different with generative AI, where a lot of the motion came top-down,” said Rajiv Shah, who manages AI strategy for Snowflake, a cloud data storage and analytics service provider. “Deep learning, for example, was certainly hyped up, but it didn’t have the same top-down pushing.”

Shah says this top-down approach colored and often complicated the traditional requirements for ROI analysis prior to major rollouts. Little wonder that those rollouts failed to meet expectations.

And mandates from above weren’t the only source of pressure IT leaders faced to push through genAI projects. Many business units also brought AI ideas to IT, and IT pointed out why they would be unlikely to be successful. And those departments often said, “Thanks for the input. We are doing it anyway.”

Such projects tend to shift focus away from companies’ true priorities, notes Kelwin Fernandes, CEO at AI consultant NILG.AI.

“I see genAI being applied in non-core processes that won’t directly affect the core business, such as chatbots or support agents. These projects lack support and long-term engagement from the organization,” Fernandes said. “I see genAI not bringing the promised ROI because people moved their priorities from making better decisions to building conversational interfaces or chatbots.”

Inflated expectations, underestimated costs

Early genAI apps often delivered breathtaking results in small pilots, setting expectations that didn’t carry over to larger deployments. “One of the primary culprits of the cost versus value conundrum is lack of scalability,” said KX’s Twomey.

He points to an increasing number of startup companies using open-source genAI technology that is “sufficient for introductory deployments, meaning they work nicely with a couple hundred unstructured documents. Once enterprises feel comfortable with this technology and begin to scale it up to hundreds of thousands of documents, the open-source system bloats and spikes running costs,” he said.

“Same goes for usage,” he added. “When genAI is inserted into a workflow ideal for a subset of users and then exponentially more users are added, it doesn’t work as hoped.”

Patrick Byrnes, formerly senior consultant for AI at Deloitte and now an AI consultant for DataArt, attributes some of the inflated ROI expectations for generative AI projects to the impressive performance delivered by the earliest genAI applications.

“If you go into Gemini or ChatGPT and ask it something basic, you can get an incredible response right away,” he said. Expecting similar results on a larger scale, “some enterprises did not start small. Right out of the gate, they went with high-impact customer facing efforts.”

Indeed, many of the ROI shortcomings with genAI deployments are a result of executives not thinking through the rollout implications sufficiently, according to an executive in the AI field who asked that her name and affiliation not be used.

“Automation driven by AI leads to productivity gains, but often the cost to enable it is overlooked,” she said. “Enterprises focus on model development, training, and system infrastructure but don’t accurately account for cost of data prep. They spin up massive data sets for AI, but small errors can make it useless, which also leads employees to mistrust outputs, leading to costs without ROI.”

Another overlooked factor, she noted, is that many AI vendors are currently focused on customer acquisition, keeping costs down in the short term. “Then they will ratchet up prices with an eye toward profitability, which will lead to higher costs for enterprise users in the future.”

Those costs are not likely to get meaningfully better by 2025. IDC noted that the costs with generative AI efforts are extensive.

“Generative AI requires enormous levels of compute power. NVIDIA’s workhorse chip that powers the GPUs for datacenters and the AI industry costs ~$10,000 per chip,” the analyst firm said in a September 2023 report. “Operational costs are in the range of $4 million to $5 million monthly, and businesses expect model training costs to exceed $5 million. Added to this are electricity costs and datacenter management.”

The hallucination challenge

On top of all this is the fact that genAI periodically hallucinates, meaning that the system makes things up. That will deliver a bitter surprise if the company is trusting it to analyze critical data in healthcare, finance, or aerospace — and even if it is simply relying on genAI to accurately summarize what happened during a meeting.

For business managers who are used to trusting the numbers generated by a spreadsheet projecting revenue growth, that can be unsettling. Those executives are used to the projections failing because an employee’s assumptions turned out to be too optimistic, but they are not used to Excel lying about the mathematical result of 800 numbers being multiplied.

And it cuts into ROI because all generative AI output must be closely fact-checked by a human, erasing many of the perceived productivity gains.

Hearst’s Riazzi sees the genAI hallucination issue as temporary. “Hallucinations do not bother me. Eventually, it will address itself,” she said.

More importantly, she argues that business simply needs to apply the same supervision and oversight to genAI that it has for decades with its human employees, stressing that “people hallucinate as well” and coders have been known to write “buggy code.”

“Human error is already a big issue in medicine and patient care,” Riazzi said. “There is a lot of bad data out there, but there is no difference [in managing hallucinations] from what we are already doing today. We see a lot of data cleansing going on.”

NILG.AI’s Fernandes is doubtful that genAI hallucinations will ever go away, but he says that shouldn’t necessarily be a dealbreaker for any application. It is simply a matter of enterprises adjusting their thinking to deal with an imperfect reality, something they already have experience doing.

“We have quality assurance to reduce production errors, but errors still exist, and that’s why we have return policies and warranties. We use the QA process as a fallback plan of the factory errors and the warranty as a fallback plan of the QA,” he said. “All those actions reduce the probability of failure to a certain point. They can still exist; we have learned to do business with those errors. We need to understand — on each application — what the right fallback action is for an AI error.”

Looking for ROI in all the wrong places

Even when genAI succeeds, its results are sometimes less valuable than anticipated. For example, generative AI is a very effective tool for creating information that is generally handled by lower-level staffers or contractors, where it is simply tweaking existing material for use in social media or e-commerce product descriptions. It still needs to be verified by humans, but it has the potential for cutting costs in creating low-level content.

But because it often is low level, some have questioned whether that is really going to deliver any meaningful financial advantages.

“Even before AI, the market for mediocre written and visual content was already fairly saturated, so it’s no surprise that some enterprises have discovered there is limited ROI in similar mediocre content generated by AI,” said Brian Levine, a managing director at consultant Ernst & Young.

What ROI should look like for enterprise genAI

KX’s Twomey questioned whether many senior enterprise executives have a realistic handle on what ROI should mean in a generative AI rollout, especially in the first year where it is mostly an experiment rather than a traditional deployment.

“Enterprise deployment of genAI has slowed down — and will continue to do so — as enterprises experience an increase in costs that exceeds the value they are getting,” Twomey said. “When this happens, it tells me that enterprises aren’t understanding the ROI and they’re not appropriately controlling TCO.”

And therein lies the conundrum: How can executives appropriately control the total cost of ownership and appropriately interpret the return on investment if they have no idea what either should look like in a generative AI reality?

This gets even more difficult when secondary ROI factors are considered, such as market and customer/prospect perceptions, Twomey points out.

“This complexity with transiting — and scaling — AI workflows in production has been prohibitive for many enterprise deployments,” he said. “The repercussions are clear losses in time, money, and effort that can also result in competitive disadvantages, reputational damage, and stalled future innovation initiatives.”

It may even be premature to measure ROI monetarily for genAI. “The value for enterprises today is to practice, to experiment,” said DataArt’s Byrnes. “That is one of the things that people don’t really appreciate. There is a strong learning component to all of this.”

Focusing genAI

But while experimentation is important, it should be done intelligently. EY’s Levine notes that some companies are inclined to trust generative AI too much when it comes to methodology, allowing the software to figure out how to obtain the desired information. 

Consider the example of a large and growing retail chain that turned to genAI to figure out the best locations for its next 50 stores. Given insufficient guidelines, the AI went off the rails and returned completely unusable results, according to inside sources.

Instead of simply telling the AI to make recommendations for the best places to launch stores, Levine suggests that the retailer would be better served by coding very extensive and very specific lists of how it currently evaluates new locations. That way, the software can follow those instructions, and the chances of it making errors is somewhat reduced.

Would an enterprise ever tell a new employee, “Figure out where our next 50 stores should be. Bye!”? Unlikely. The business would spend days training that employee on what to look for and where to look, and the employee would be shown lots of examples of how it had been done before. If a manager wouldn’t expect a new employee to figure out how to answer the question without extensive training, why would that manager expect genAI to fare any better?

Given that ROI simply means value delivered minus cost, the best way to improve value is to increase the accuracy and usability of the answers provided. Sometimes, that means not giving genAI broad requests and seeing what it chooses to do. That might work in machine learning, but genAI is a different animal.

To be fair, there absolutely are situations where it makes sense to set genAI loose and see where it chooses to go. But for the overwhelming majority of situations, IT will see far better results if it takes the time to train genAI appropriately.

Reining in genAI projects

Now that the initial hype over genAI has died down, it’s important for IT leaders to protect their organizations by focusing on deployments that will bring true value to the company, say AI strategists.

One suggestion for trying to better control generative AI efforts is for enterprises to create AI committees consisting of specialists in various AI disciplines, Snowflake’s Shah said. That way, every single generative AI proposal originating anywhere in the enterprise would have to be run by this committee, who could veto or approve any idea.

“With security and legal, there are so many things that can go wrong with a generative AI effort. This would make executives go in front of the committee and explain exactly what they wanted to do and why,” he said.

Shah sees these AI approval committees as short-term placeholders. “As we mature our understanding, the need for those committees will go away,” he said.

Another suggestion comes from NILG.AI’s Fernandes. Instead of flashy, large-scale genAI projects, enterprises should focus on smaller, more controllable objectives such as “analyzing a vehicle’s damage report and estimating costs, or auditing a sales call and identifying if the person follows the script, or recommending products in e-commerce based on the content/description of those products instead of just the interactions/clicks.”

And instead of implicitly trusting genAI models, “we shouldn’t use LLMs on any critical task without a fallback option. We shouldn’t use them as a source of truth for our decision-making but as an educated guess, just like you would deal with another person’s opinion.”

More by Evan Schuman:

Renegade business units trying out genAI will destroy the enterprise before they help

AI managing AI that is monitoring AI: What could possibly go wrong?

US makes new move to rein in China’s advanced chip manufacturing

GenAI might be the least-trustworthy software that exists. Yet IT is expected to trust it.

Privacy policies have gone insane. Doubt it? Consider Instacart Read More