How to build an AI opportunity backlog

Every executive team I meet has a list of AI ideas. Almost none has a backlog.

The difference is not length. A list is what comes out of a brainstorm. A backlog is scored, ranked, and mostly dead: three to five things you are actually doing, plus a visible record of everything you decided not to do. The second part matters as much as the first.

This is the method I use in my assessments. I’m giving it away because you can run it yourself, and because most of the value is in the discipline, not the template.

Why the ideas list fails

The pattern is well documented by now. More than 80 percent of companies report no tangible enterprise-level EBIT impact from gen AI, per McKinsey’s Global Survey. BCG’s survey of 1,000 executives found 74 percent have yet to show tangible value from AI, and the companies that do generate value pursue, on average, about half as many opportunities as their less advanced peers.

Read that second finding again. The winners are not running more pilots. They are running fewer, chosen better.

An ideas list produces the opposite behavior. Twenty ideas, no scores, no owner, and whoever argues loudest in the meeting gets a pilot. Six months later you have four demos, zero production systems, and a leadership team that quietly believes AI doesn’t work here.

Inventory workflows by following the hours

Step one is not a brainstorm. Never ask a room “where could we use AI?” You’ll get answers shaped by LinkedIn posts, not by your business.

Instead, walk the departments and ask where expert hours actually go. What does this team do every week that takes hours and makes them groan? What documents do people wait on? What gets done at 9pm before a deadline? Where does someone senior spend a day doing something a checklist could describe?

For each candidate, I capture the same facts: what goes in (usually documents), what judgment gets applied, what comes out, how often it happens, and who owns it. That last one is not optional. A workflow without a single accountable owner is not a candidate, it’s a committee.

A day of these conversations at a 100-person firm typically surfaces 15 to 25 candidates. That’s your raw inventory. It is worth more than any AI strategy deck you could buy, and it required zero technology.

Score on four axes

Now score every candidate. I use four axes, 1 to 5 each. Resist the urge to add more; a scoring model nobody maintains is just a slower opinion.

Expected hours saved or revenue effect. Frequency times hours times loaded cost. Use real numbers where you have them and honest ranges where you don’t. “High impact” is not a number.

Feasibility given the data that actually exists. Not the data the org chart says exists. If the answer lives in the heads of two senior people, or in a CRM where nobody fills in the notes field, feasibility is a 2 no matter how good the demo looked.

Data sensitivity. What leaves the building if you build this, and under what controls? Client financials and PHI don’t disqualify a workflow, but they change the architecture and the timeline, so they change the score.

Adoption risk. Will the people doing the work actually use it? A tool that threatens someone’s sense of craft, or adds a review step to their day, will be politely ignored. The best candidates are workflows the team already hates.

Here’s a real one, delivered with Last Rev, the platform engineering firm I co-founded. A construction services company was spending 40 to 60 hours per RFP response. Scored: hours enormous (frequent, expensive people, direct revenue linkage), feasibility high (years of past RFPs and proposals sitting in files), sensitivity manageable, adoption low-risk because nobody enjoyed assembling boilerplate at midnight. It scored at the top, we built it, and responses now take 8 to 12 hours. The scoring predicted the outcome. That’s the point of scoring.

McKinsey’s same survey found that out of 25 attributes tested, redesigning workflows has the biggest effect on whether gen AI shows up in EBIT — and only 21 percent of companies using gen AI have fundamentally redesigned any workflow at all. The backlog is how you pick which workflows deserve that redesign.

Take the top three to five. Kill the rest visibly.

Rank by score. Take the top three to five. Then, and this is the part most teams skip, publish the kill list. Every idea below the line gets a one-sentence reason: data doesn’t exist, sensitivity too high for current controls, no owner. Killing ideas in public is what stops them from reincarnating in every quarterly meeting.

Two heuristics I trust after three years of shipping production AI at AnswerAI, the AI product company I run:

The best first project is a painful, frequent, document-heavy workflow with a clear owner and a measurable before and after. Not the flashiest one. The research backs the instinct: the measured wins are on specific, well-scoped tasks. A controlled study of professional writing tasks found ChatGPT users finished about 40 percent faster with 18 percent higher-rated quality. GitHub’s controlled Copilot experiment found developers completed a scoped coding task 55.8 percent faster. Nobody has measured a 55 percent gain on “transform the company.” Pick at the task level, where gains are real and provable.

The most common failure I see is the CEO’s pet idea beating the workflow the ops team bleeds on. The pet idea is usually customer-facing, flashy, low-frequency, and scored honestly it lands mid-pack. The unglamorous internal workflow — the RFP grind, the technical document search that takes 6 to 8 hours and should take 15 minutes — is where the first win lives. First wins buy you permission for everything after, so protect the scoring from seniority.

Re-score monthly

A backlog scored once is a snapshot that starts rotting immediately. Feasibility is the axis that moves: models get better, your document layer gets cleaner, a connector ships, a vendor adds the control that made sensitivity a blocker. A workflow that scored a 2 on feasibility in January can be a 4 by June, and you’ll miss it if the spreadsheet is frozen.

Monthly re-scoring takes an hour once the inventory exists. It also does quiet governance work: pilots that stopped earning their rank get killed on schedule instead of shambling on because someone’s attached to them.

That’s the whole method. A day of walking departments, a four-axis score, a short list, a public kill list, and a monthly hour to keep it honest. No platform purchase required. The companies stuck at 20 ideas and zero results aren’t missing technology. They’re missing the ranking, and someone willing to own it.

How to build an AI opportunity backlog

Why the ideas list fails

Inventory workflows by following the hours

Score on four axes

Take the top three to five. Kill the rest visibly.

Re-score monthly

Keep reading

What a 90-day AI roadmap looks like for a professional services firm

Why internal knowledge assistants fail on messy company data