More Rules, Less Confidence

Writing rules got easy. Running a program didn't.

AI changed detection engineering, just not the way most people expected.

Writing a detection used to take a skilled engineer real time: understand the threat, learn the query language, test it against data, tune out the noise. Now you can ask a model for a Sysmon rule and have a working draft in seconds. That part got easy.

Running a detection program did not. If anything, it got harder.

The bottleneck moved, it didn't disappear

For years, authoring was the constraint. Good detection engineers were scarce, rules were hand-built, and your coverage was limited by how fast a few experts could write. AI removed that limit. The cost of producing a rule fell to near zero, and the number of rules a team can generate jumped by an order of magnitude.

When you remove a bottleneck, you don't eliminate the work. You relocate it. Everything downstream of authoring now has to carry the load that scarcity used to hold back. The hard part of detection engineering was never typing the rule. It was everything that happens after.

What a flood of cheap rules breaks

Infrastructure and process that were sized for a trickle now take a firehose.

Execution. Can your SIEM or query engine actually run five times the rules, at the latency you need, without your bill running away from you? The rules are free to write. They are not free to run.
Quality. AI writes confidently, including when it's wrong. A rule that looks right and silently never fires is worse than no rule at all. Nobody is hand-reviewing a flood, so the review has to be built in.
Noise. Five times the rules is five times the false positives if you don't manage it. Analyst trust is the first casualty, and it doesn't come back easily.
Coverage. When rules multiply, you stop knowing what you have. Which of these forty detections overlap? What do they actually cover against ATT&CK? Where are the real gaps hiding behind the volume?

Generating rules is not running a program

"Detection as code" was a step forward, and AI-assisted authoring is another. But both solve the same narrow problem: getting a rule written and into a repo. That was never the expensive part.

A detection program is the whole lifecycle: validation, execution, tuning, measurement, and the judgment to know what's worth running at all. AI makes the cheap part cheaper. It can take a swing at the rest too, but generating a test or a coverage report is not the same as knowing whether either is right. AI doesn't supply that judgment, and by multiplying the volume, it makes the need for it acute. Producing detections and operating detections are different disciplines, and the gap between them just widened.

Three pillars for the AI era

1. Rigor and validation. When a flood of generated rules hits, automated testing is the gate that keeps junk out of production. Every rule tested against known-good and known-bad data before it ships. This used to be a nice-to-have. Now it's the only thing standing between you and confident, silent failure at scale.

2. Infrastructure that scales. The execution layer has to keep up. That means an architecture that can run higher rule volume affordably, with the query patterns, partitioning, and cost controls to match. The constraint on coverage has moved from "how fast can we write" to "how much can we afford to run." Design for that.

3. Measurement and coverage. You cannot manage what you cannot see. Dynamic coverage mapping against ATT&CK, deduplication, false-positive tracking, and performance per rule. Especially when volume explodes, measurement is what turns a pile of rules back into a program.

Authoring isn't a pillar anymore. It's the part that got easy.

The flywheel: AI runs the loop, experience steers it

Here's the turn. The same AI that can't supply the judgment can absolutely supply the labor, and that's exactly what makes a flywheel possible. AI drafts the rules. AI writes the tests for those rules. AI analyzes how they perform in production, what fires, what's noisy, what's covered, what's missing, and feeds that back into the next round of authoring. Done right, it compounds: each turn of the loop makes the next batch better than the last.

But the flywheel doesn't spin on its own. Every step needs someone who knows what good looks like: to steer the AI off confident nonsense, to judge which generated rules are worth keeping, to read the performance signal correctly. And someone has to build the glue, the connective tissue between authoring, testing, execution, and measurement that turns four separate AI tasks into one system. The AI does the labor. The experience and the glue are what make it a flywheel instead of four disconnected piles.

This is an expertise problem, not a product

You can't buy your way out of this with another platform. The teams that win in the AI era are the ones that build the discipline underneath the tooling, and put an experienced hand on the wheel.

That's the work I do. I'm Brian Concannon, and I've built and run detection at the FBI, CrowdStrike, and Expel. EchoTrail Solutions helps security teams build detection programs that hold up when authoring is free and volume is the new problem.

If your rule count is climbing faster than your confidence in it, let's talk.