When AI helps and when it doesn't
A year of using AI in daily work. Concrete cases where it earned its keep, concrete cases where it didn't, and the pattern that explains the difference. Worth being honest about both halves of the practitioner experience.
I've been using AI as a serious daily tool for about a year now, across code, writing, research, ops, and a fair amount of just thinking out loud with it. Long enough to have a real opinion about where it pays off and where it doesn't, separate from the marketing in either direction. The pattern is clearer than I expected it to be a year ago, and it's worth being plain about both halves of the experience, because the honest version of this conversation doesn't get had often enough.
The short version: AI helps a lot in some specific categories of work, helps almost not at all in others, and the line between the two is much sharper than the "AI augments everything" framing suggests.
Where it genuinely helps
The categories where I'd say AI has materially changed my daily output:
Drafting from a clear brief. When I know what I want to say and I need to get the first version on the page. AI is a real accelerant. Cold emails, design docs, internal updates, the rough cut of a blog post when I already have the argument worked out. The model gets me to a 70% draft in minutes that would have taken me an hour or two before, and the 70% draft is a much better starting point for editing than a blank page. This compounds, if I write five things a day and each one is 30 minutes faster, that's real time back.
Coding tasks I already understand. Boilerplate, plumbing, the third version of a pattern I've written twice before. Translating between languages I know, a Python script to a shell script, a SQL query to a different dialect. Filling in the obvious parts of a test suite. The model is fast at the work where I know what right looks like and I just need it written down. It's a calculator for code I would have written by hand.
Information triage. Reading through a long technical document and asking specific questions, summarizing a meeting transcript against a specific lens, pulling the structured bits out of an unstructured dump. The model is good at the "find the parts of this that matter for what I'm doing" job in a way that saves real time over reading everything carefully.
Rubber-ducking and structured thinking. Talking through a problem out loud with the model as a thinking partner. The model isn't always right (often it's wrong in interesting ways) but the act of articulating the problem clearly enough for the model to engage with it tends to clarify my own thinking. The output of the conversation matters less than the conversation itself. This one surprised me; I didn't expect the rubber-duck use case to be so durable.
Repetitive structured operations. Reformat this JSON, convert this CSV, generate variations of this template, run this transformation across all 200 rows. The kind of work that used to involve writing a quick script and now involves describing the transformation to the model. Faster, less error-prone if the cases are simple, and the model can usually flag the edge cases that don't fit.
These are the categories where I'd say the productivity gain is real and durable. Across them I'd estimate I save somewhere between two and four hours a week, meaningful, but not the order-of-magnitude shift the most aggressive AI marketing implies.
Where it doesn't help
The categories where I keep trying and keep coming back with the same answer:
Strategic judgment under ambiguity. Should we take this contract. Is this hire the right one. Does this product direction make sense given what we know about the market. The model can lay out frameworks, list considerations, surface trade-offs, but the actual judgment call requires holding context the model doesn't have and integrating signals that aren't legible in the prompt. I've tried handing these decisions off and tried using the model as a structured input. The structured-input version is mildly useful; the hand-off version always ends with me re-deciding from scratch.
Anything that depends on knowing the people involved. Drafting feedback for a specific direct report, navigating a tense conversation with a customer, writing the right cover note for a particular hire. The model produces something competent and generic. The competent and generic version is actively worse than the slightly clumsier thing I'd write myself, because the slightly clumsier thing is calibrated to the actual relationship. Generic-good is the wrong target for relational work.
Original analysis on a topic I know well. When I'm writing about something I have a real opinion on, the model is mostly an obstacle. Its instinct is to give me the consensus view, and the consensus view is what I'm trying not to write. I can prompt around this, but it takes more effort than just writing the thing myself. AI helps drafting from a clear brief; it gets in the way when the brief is "the take that nobody else has."
Verification of consequential outputs. AI-generated code that's going to production. AI-extracted data that's going to inform a decision. AI summaries that I'm going to act on. The cost of being wrong is high enough that I need to verify against primary sources anyway, and the verification work is usually most of the work. The model speeds up the generation step that wasn't the slow part to begin with. Net time savings on these tasks are smaller than they look.
Novel problems where I don't yet know what right looks like. Debugging an unfamiliar system, designing something I haven't built before, working in a domain that's new to me. The model is happy to produce confident-sounding output that I can't evaluate because I lack the expertise to know if it's right. This is the dangerous case, the case where it feels like AI is helping and is actually setting me up to ship something wrong. I've gotten more careful about this.
Creative work where the voice is the point. Writing in someone else's style, including my own. AI can hit a generic version of my voice now, well enough that it'd fool a casual reader. It cannot hit the version of my voice where the specifics make the writing mine. The closer the output needs to be to a specific human's actual voice, the less AI helps and the more it does measurable damage by averaging toward the median.
These are the categories where the productivity gain doesn't show up. Across them I'd estimate the model saves me zero time, and in some cases it costs me time because I end up correcting model output that I could have done right the first time.
The pattern that explains it
What separates the two columns is fairly simple.
AI helps when I already know what right looks like and the model is filling in the work. The "know what right looks like" part can come from expertise, from a clear brief, from a verifiable spec, or from a domain where the right answer is conventional. In these cases the model is a fast hand applied to work I've already specified well.
AI doesn't help when I don't yet know what right looks like, or when the right answer is specifically the non-conventional one. In these cases the model substitutes its consensus pattern for the actual judgment the work requires. The output looks finished and is structurally wrong.
The line moves with capability, but slower than the labs' marketing implies. Each generation of frontier models pushes the line a little, there's some work today that the model can do well that it couldn't a year ago. The line is moving in roughly the direction you'd expect. But the categories on the "doesn't help" side are not collapsing as fast as the demos suggest, and a lot of them, the relational work, the original-take work, the verify-the-consequential-thing work, may be structural rather than temporary. I'm not sure those move much regardless of capability gains.
What this means for daily practice
A few rules of thumb I've settled into:
Match the task to the column honestly. Most of my productivity gain comes from getting the categorization right. When I find myself struggling to make AI work on a task, the answer is usually that I've mis-classified the task as "AI helps" when it's actually in the other column. Reaching for the model isn't a productivity strategy in itself; matching the model to the right work is.
Build the verification habit before the productivity habit. Every category where AI ships consequential output benefits from a verification step. The verification has to be in the workflow, not retroactive. If I'm going to use AI for code that goes to production, the test suite has to run; if I'm going to use AI for data extraction, the spot-check has to be designed in. The productivity gain is real only after verification is automatic.
Keep the not-AI muscles strong. The categories where AI doesn't help are the categories where I still need to be sharp, the judgment work, the relational work, the original-take work. The temptation when AI helps with the rote work is to coast on the rest. The right move is the opposite: use the time AI saves to invest more in the work it can't do.
Be honest about what you're getting. Two-to-four hours a week is real and worth having; it's not the 10x productivity revolution that gets pitched. The honest framing produces better expectations, better adoption choices, and better conversations about where to invest next.
The honest summary
A year in. AI is a real and durable productivity tool for a specific set of categories, and a costly distraction in others. The categorization is the work. Getting it right is the difference between AI being a useful daily tool and being a perpetually-frustrating-thing-you-keep-trying-and-it-keeps-not-quite-working.
The marketing in both directions (AI will replace everyone, AI doesn't actually work) is wrong in the same way. Both miss that the technology is sharp in some places and blunt in others, and the boundary between the two is where the practitioner conversation should live. Anyone who tells you they've found a way to make AI work for everything they do is either doing easier work than they think they are or letting model output through that they shouldn't be.
I'd rather be honest about both halves. The half where it helps is genuinely valuable and worth investing in. The half where it doesn't is worth being clear-eyed about, because the cost of pretending otherwise, shipping wrong outputs, eroding the muscles you need for the judgment work, mis-allocating your own attention, adds up.
Worth checking your own list against this one. The categorization that's true for me probably isn't exactly the categorization that's true for you. The discipline of doing the categorization at all is the part that transfers.