30%pipeline decline while activity numbers went up

Activity Died in 2024

Every Monday morning I open the same dashboard. Twenty reps, sorted by activity. Emails sent, calls logged, sequences started. The columns are green. Everybody's hitting their numbers. Most weeks, more than hitting them. The activity totals have actually gone up over the past year, pretty significantly, and if you looked at this dashboard without any other context you'd say the team is performing well.

Pipeline is down 30% from two quarters ago.

I've been staring at this gap for months. The dashboard says the team is busier than ever. The pipeline says something is wrong. Both of those things are true and they are not contradicting each other, which is the part that took me a while to understand. The team is busier than ever. The busyness just stopped correlating with the outcome sometime around the middle of 2024 and nobody noticed because the metric that was supposed to catch it is the same metric that's broken.

I manage sales ops for about 20 reps. I've been doing this full-time for a while now, and I also run a consultancy where I fix exactly these kinds of problems for other companies, which means I get to experience this particular frustration from two directions simultaneously. I see it in my own dashboard every morning and then I see it again in client systems every evening. The same gap. Healthy activity numbers. Declining pipeline. A manager somewhere asking "what changed?" and the answer is nothing changed. The numbers just stopped meaning what they used to mean.

01What forty calls used to measure

Here's what activity counts used to measure, roughly. If a rep logged 40 calls in a day, that meant they spent somewhere between four and six hours on the phone. Dialing, waiting, talking, getting rejected, taking notes, dialing again. The number 40 wasn't just a count. It was a proxy for effort. You couldn't log 40 calls without actually working for most of the day, because the friction of making each call was high enough that the number naturally correlated with time spent doing the work.

Same thing with emails. If a rep sent 50 emails, they wrote 50 emails. Maybe some were templated, but they still had to open each one, customize the first line, check the recipient, hit send. The act of sending was slow enough that the count was meaningful. A rep with 50 sent emails had probably spent a couple of hours writing and sending, and you could reasonably infer from that number that they'd been doing outbound work for a decent portion of the day.

The number 40 wasn't just a count. It was a proxy for effort.

That correlation broke, and the thing that's strange about it is how fast it broke. We went from "activity count roughly equals effort" to "activity count is essentially meaningless as a measure of human engagement" in about 18 months. I watched it happen in my own team.

200contacts enrolled in three minutes flat

0meetings from the top activity performer

02Two hundred contacts enrolled before lunch

A rep enrolled 200 contacts into an AI-personalized sequence last month. Took about three minutes. Each contact got a custom first line generated from their LinkedIn profile and company description. The emails looked good. They looked like someone had researched each prospect individually. The activity log showed 200 emails sent, and on the dashboard this rep was the top performer for the week. Reply rate was under half a percent. Zero meetings booked. But on Monday morning, in the column that says "emails sent," there's a big green number that looks great.

Meanwhile, a different rep spent that same day researching about 20 accounts. Read their websites. Looked at their recent hires. Figured out which ones were actually in a buying window. Wrote custom outreach to maybe 15 of them. Her dashboard for the day showed 15 emails sent. On the leaderboard she looked like she barely worked. She booked three meetings that week.

The system is working as designed. The design just doesn't account for the fact that the metric it's optimizing against stopped being useful.

I want to be clear about something because this can sound like a criticism of the first rep and it isn't. The first rep used the tools exactly as they're designed to be used. Every sequence tool on the market is built to make bulk enrollment easy. Every AI writing feature is built to generate personalized messages at scale. Every parallel dialer is built to maximize call volume. The rep who enrolled 200 contacts in three minutes did exactly what the software told them to do, and the dashboard rewarded them for it, and their manager saw a green number and moved on. The system is working as designed. The design just doesn't account for the fact that the metric it's optimizing against stopped being useful.

The parallel dialer thing is even worse. A rep can run a parallel dialer that calls multiple numbers simultaneously, drops a pre-recorded voicemail on every number that goes to voicemail, and logs each one as a "call." Forty calls in an hour. Zero live conversations. The rep might have actually spoken to two humans during that entire session. On the dashboard it says 40 calls. There is no field in Pipedrive or Salesforce or HubSpot that distinguishes between "left a robot voicemail" and "had a five-minute conversation with a decision maker." The data model wasn't built for that distinction because when the data model was designed, every call required a human to physically dial and physically wait and physically speak.

I keep coming back to this idea that the friction was a feature. Before these tools existed, being unselective was painful. If you wanted to contact 200 people, you had to spend days doing it, and the pain of that effort naturally forced reps to be choosy. You couldn't afford to email everyone because emailing everyone took forever. The selectivity wasn't a skill. It was enforced by the physics of the work. AI removed the pain, and when it removed the pain it also removed the filter, and nobody built anything to replace the filter because nobody realized the filter was load-bearing.

500reps at Ramp when they shut down AI outbound

03Ramp shut it down with five hundred salespeople

Ramp figured this out, I think, before most people. Ramp, the corporate card company, has something like 500 salespeople. They built an AI outbound system specifically designed for their sales org. Purpose-built, well-resourced, exactly the kind of implementation that should work if the concept works. They shut it down. Not the data layer. Not the targeting. Not the signal infrastructure. They shut down the automated sequence, the outbound activity layer, because the volume of automated outreach had gotten so high across the entire market that prospects were ignoring all of it. The sequence was disposable. The intelligence underneath it was the asset.

If a company with 500 reps and a custom-built AI system decided the activity layer wasn't worth running, I don't know why a 15-person staffing agency should be confident that their off-the-shelf version is working. But the dashboard says it is. The activity numbers are great. Everybody's hitting quota on touches.

If a company with 500 reps and a custom-built AI system decided the activity layer wasn't worth running, I don't know why a 15-person staffing agency should be confident that their off-the-shelf version is working.

The entire sales technology industry spent the last two years building better tools for the rep layer. Smarter sequences. Better personalization. Automated research. AI SDRs that can book meetings without human involvement. All of it genuinely impressive engineering. All of it made the activity number bigger. I can't find a single tool that was launched in that same period whose primary purpose was making the activity number more meaningful. The measurement layer is basically unchanged from 2018. We're counting the same things we counted when every touch required a human to execute it, and we're counting them in a world where most touches don't.

8real conversations from 35 logged calls

04Thirty-five calls on the dashboard, eight that were real

So what replaces it. I got tired of waiting for someone else to figure that out.

I built a system at work that addresses the problem directly. It records every sales call, runs it through AI transcription, and then does something that no CRM dashboard does on its own: it filters out the voicemails. The parallel dialer drops, the unanswered rings, the "leave a message after the tone" recordings, all of that gets classified and stripped out before a manager ever sees it. What's left are the real conversations. The calls where a human spoke to another human and something actually happened.

A rep who logged 35 calls might show 8 real conversations after the filter runs. That's not a worse number. That's an honest one.

The number gets smaller. That's the whole point. A rep who logged 35 calls might show 8 real conversations after the filter runs. That's not a worse number. That's an honest one. And those 8 conversations get scored against 12 criteria, things like whether the rep identified the decision maker, whether they handled the primary objection, whether they established concrete next steps. The system also redacts sensitive payment data automatically, which is its own compliance problem I had to solve, but the core function is this: take the inflated activity count, strip it down to what's real, and then measure whether the real interactions were any good.

The reps see fewer calls on their dashboard now. Some of them didn't love that at first, which I understand, because the old number was bigger and bigger numbers feel like validation even when they're hollow. But the conversations that show up are the ones that matter. They're the ones worth reviewing, worth coaching against, worth using to figure out which reps are actually developing skills and which ones are just running the machine.

05The Monday after the metric dies

I'm not pretending this solves the entire measurement problem. It solves one piece of it, the call piece, and it solves it in a way that I haven't seen anyone else do, which is telling. The broader question is still open: where in the entire sales process did a human make a decision that mattered? Not "did the system run," because the system always runs. Not "how many things happened," because things can happen without judgment being applied. The question is where did a person look at something, think about it, and do something different than what the default would have been. Where they skipped a prospect the sequence would have included. Where they wrote something the AI wouldn't have written. Where they called someone back a second time because the first conversation felt promising even though the lead score was low.

I can answer that question for calls now. I can't answer it for emails yet, or for sequence enrollments, or for most of the other places where the activity count is lying. You could probably build something that compares AI-generated draft emails against what the rep actually sent and measures the delta. You could track which automatic enrollments a rep manually overrode. Those pieces exist in fragments across different tools. Nobody has assembled them into a single metric that a sales manager can look at on Monday morning.

The teams that figure this out are going to look like their reps are just better, or their product is easier to sell. What will actually be happening is that they're measuring something real while everyone else is counting touches that stopped meaning anything two years ago.

I open the dashboard every Monday. The activity numbers are still there and they're still green. I look at them, and then I open the system I built and look at the real numbers underneath. The real numbers are smaller and uglier and more useful. The gap between the two is the gap between what the industry is measuring and what actually matters, and the gap is getting wider every quarter as the tools get better at generating activity that looks like work.

I don't know when the rest of the industry catches up to this. I don't know if it does. It's possible we keep counting activities the same way for another five years while the number becomes increasingly decorative and everyone quietly develops their own workarounds. I built mine. Most teams are still using gut feel and spot checks, which is basically what I was doing before I got frustrated enough to build something.

The metric died in 2024. Most people are still managing by it in 2026. The replacement is being built in pieces, by the people close enough to the problem to feel it every Monday morning.

If your dashboard looks healthy but pipeline doesn't: demo.outblox.com