Milestone raises $10M to maximize the ROI of generative AI coding for enterprises 🎉

Abstract

Agentic AI development is often framed as a shift from engineering to upfront natural-language specification: describe the system, let models build it. This paper argues that, for complex software, this is not a new paradigm but a return to an old failure mode: treating specification as prior knowledge rather than as an output of construction. One-shot generation and static-spec autonomy can be effective in bounded, low-uncertainty domains, but they become risky when unvalidated assumptions about architecture, users, data, security boundaries, and integrations are converted directly into working artifacts.

The failure is not merely imperfect code generation. It is the removal of the feedback loop through which engineering discovers what the system must be. AI can lower the cost of candidate implementations and expand exploration, including through tools, tests, live environments, and adversarial agents. But correctness still depends on whether those instruments expose the assumptions that matter, whether their environments are representative, and whether accountable engineering judgment is applied to the results. The core control surface for AI-mediated engineering is therefore not only the generated code, but the engineer-model-tool interaction where assumptions are surfaced, challenged, tested, and revised. System architecture, functional boundaries, and specification are not prerequisites to engineering. They are outputs of it.

1. A New Promise

“Come on in, the water’s fine.”

This is the implicit promise behind one-shot AI development and a broader class of workflows that can be described as static-spec autonomy: systems proceeding from an initial natural-language intent without a disciplined mechanism for surfacing, testing, revising, and owning assumptions.

Whether framed as a single prompt that generates a system, or as an autonomous agent operating over extended periods, the underlying assumption is the same: that intent can be sufficiently captured upfront to guide system formation without continuous discovery. Systems can be described in natural language, handed to a model, and brought into existence in a single step. In this framing, the role of the engineer appears to shift from builder to specifier. Natural language appears to displace formal design artifacts, and execution becomes framed as a function of model capability rather than engineering iteration.

By “specification,” this paper means the natural-language intent that actually drives implementation in AI-mediated workflows, not formal mathematical specification. The problem is not specification itself. The problem is premature narrative specification treated as discovered truth.

Discovery, in this paper, does not mean human intuition alone. It means the reduction of uncertainty through contact with external constraints, followed by interpretation accountable to correctness. AI can participate in this process when tools, tests, environments, integrations, and adversarial agents expose real friction. The question is whether the uncertainty being reduced is the uncertainty that actually matters.

The appeal is obvious: faster development, reduced need for deep technical involvement, and the collapse of complex engineering into a single act of specification. But the promise depends on one critical assumption: that the hard work of understanding the system has already been done.

2. The Waterfall Parallel

This assumption is not new. It mirrors the core premise of the waterfall model of software development.

Waterfall separates defining a system from building it. Requirements are specified upfront. Implementation is treated as a mechanical translation of those specifications into code.

The static-spec version of agentic workflows reproduces this structure:

  • The specification becomes a prompt or structured instruction set
  • Implementation becomes model-driven construction
  • The engineer is displaced from construction into specification and supervision

Both models rely on the same belief: that the system can be fully known before it is built. As a general model for complex, high-uncertainty software, this belief has repeatedly failed.

Even this history is often remembered too simply. The canonical source associated with waterfall understood that first-pass specification was unsafe, and warned that real system constraints are only encountered once implementation and testing begin. What current AI hype risks forgetting is not just the failure of waterfall, but the corrective mechanisms that even its earliest formulations recognized as necessary. The irony is that AI does not resurrect waterfall as bureaucracy; it resurrects waterfall as convenience.

3. Software as Discovery

The failure of waterfall was not a lack of discipline, but a failure of epistemology: it assumed that a system could be fully understood before it was built.

But waterfall itself was not the root cause. It was one expression of a deeper organizational illusion, one that AI tooling is now resurrecting at scale: Completion Bias.

Completion Bias is the tendency to mistake a coherent artifact for resolved understanding. It begins as the “I know what I want” theory of engineering: once intent has been expressed clearly, the hard intellectual work is assumed to be largely complete and construction is treated as execution. In AI-mediated workflows, it becomes the “looks done” theory of engineering: once the model has rendered intent into code, tests, diagrams, or explanation, the system appears more understood than it is.

In waterfall, the artifact was the upfront specification. Those specifications were often wrong, but they were not trivial. They were the product of sustained effort to grapple with unknowns, contradictions, dependencies, and edge cases. Waterfall failed not because no one tried to understand the system, but because it overestimated how much could be understood before construction began.

Static-spec AI introduces a sharper inversion. The artifact may be a prompt, a generated codebase, a passing test suite, an architecture diagram, or the model’s explanation. These arrive already shaped like completion, but without necessarily passing through the discovery process that would normally expose what is missing. The visible tokens of engineering are present, but the uncertainty-reducing work those tokens normally imply may not have occurred.

Where waterfall mistook hard-won specification for understanding, static-spec AI can mistake generated completion for understanding. The system does not merely describe the desired outcome; it appears to instantiate it. Apparent completion becomes a substitute for discovery.

Modern AI tooling has democratized and accelerated Completion Bias. Because AI can instantly translate vague, high-level intent into plausible code, the weakness of the original specification can remain invisible at the point of declaration. The human is no longer forced to confront the incompleteness of their own assumptions. The translation from specification to design appears to happen automatically.

Specification does not complete understanding. It initiates discovery.

The dangerous version of the automation argument is not that AI will help engineers move faster. It is that engineering itself can be reduced to specification: that the remaining human role is to describe outcomes, supervise generation, and manage the resulting artifacts.

In other words, the engineer is replaced not by the AI alone, but by a role that manages intent and accepts complete-looking artifacts as progress.

That is Completion Bias at machine speed.

The mistake is not believing that AI can assist construction. It can. The mistake is believing that construction is the easy part because the system has already been described. In reality, construction is where much of the description becomes meaningful for the first time.

This introduces a form of epistemic overconfidence: the belief that one understands the system well enough to specify it, when in fact the understanding is incomplete. The tools amplify confidence without increasing depth of understanding. They make it easier to produce artifacts that look like progress while postponing the discovery of whether the underlying assumptions are correct.

Consider a generated multi-tenant support dashboard. The system compiles, deploys, and passes its tests. Users can log in, view tickets, update statuses, and export reports. But one generated endpoint fetches records by ticket ID without also scoping the query by tenant. Nothing fails, because the tests only verify that authenticated users can access their own fixtures.

The system works exactly as specified. The specification was wrong: it treated authentication as if it implied tenant authorization. The missing boundary was not a coding error in the narrow sense. It was an undiscovered system invariant. The agent implemented the visible workflow; the team had not yet discovered the boundary condition that made the workflow safe.

This is why security-sensitive failures often arise from incorrect models of the system boundary: who can access what, through which identity, under which state, across which tenant, and through which integration path. These are not minor details discovered after the real design work is complete. They are part of the design work itself.

4. Specification Collapse

While often discussed in terms of “one-shot” generation, the deeper failure mode is broader: static-spec autonomy. It appears whenever a system proceeds from an initial intent without a disciplined mechanism for surfacing, testing, revising, and owning assumptions. Long-horizon agentic execution can fall into this pattern when autonomy is governed by a static initial specification rather than by a discovery loop. Systems that are given an initial specification and allowed to operate autonomously over extended periods inherit the same core assumption: that the specification is sufficiently complete to guide correct behavior over time.

In long-horizon systems, the problem compounds. Even if a specification is initially sufficient, it degrades over time as the environment, data, and system context evolve. In a dynamic system, a static specification cannot be assumed to remain correct. Without continuous discovery, drift becomes likely.

The core issue is not delegation itself, but the loss of challenge that normally forces specifications to be refined. When work is delegated to an LLM without this challenge, the system continues to produce coherent outputs and implicit signals of success, even when the underlying assumptions have not been tested. The specification is no longer pressured by reality; it is stabilized by narrative.

In this sense, static-spec systems do not remove the need for discovery. They remove the signals that discovery has not yet happened.

This pattern can be understood as a form of Specification Collapse: provisional, narrative specification is treated as if it were already validated understanding, despite the absence of the discovery process required to test those assumptions against real conditions.

Tool augmentation does not by itself prevent Specification Collapse. If the tools expose only the constraints already implied by the original specification-tests, mocks, sandbox data, or narrow success criteria-the system may validate the map while leaving the territory untested.

Static-spec systems fail when they convert untested beliefs directly into artifacts. Specifications encode assumptions about users, data, boundaries, integrations, and failure modes. In iterative engineering, those assumptions are pressured by construction and revised. In static-spec systems, they can be embedded before they have been tested. The result is not merely error, but the freezing of incorrect assumptions into the system. Completion Bias makes this harder to see because the output looks finished before the underlying uncertainty has been resolved.

This does not imply that one-shot or static-spec generation is ineffective. In bounded, low-uncertainty work, it can be genuinely transformative. Scaffolding, migrations, internal tools, tests, boilerplate, reporting workflows, and repeated integration patterns often benefit from specification-driven generation. This is not a marginal use case; it is a large and growing portion of software work.

The failure arises when this success is extrapolated to complex, real-world systems where uncertainty dominates.

The boundary is not whether the model can generate code. The boundary is the uncertainty profile of the work.

Heuristic: Static-spec generation is safest when the work has low semantic uncertainty, low coupling, fast feedback, and low blast radius. As semantic uncertainty, coupling, or blast radius increase – or as feedback slows – the work moves out of the safe static-spec zone and back into discovery.

In practice, static-spec approaches are safest where:

  • the desired behavior can be stated precisely;
  • correctness can be checked quickly and objectively;
  • the work follows stable, repeated patterns;
  • dependencies and integrations are known;
  • representative feedback is available before deployment;
  • failure is cheap, reversible, and locally contained;
  • tests or acceptance criteria cover the important behavior.

They become risky when:

  • the workflow, user need, or business rule is still ambiguous;
  • the system crosses trust, tenant, data, or authorization boundaries;
  • correctness depends on tacit domain knowledge;
  • integrations, state, scale, concurrency, or operational drift dominate;
  • failure is expensive, irreversible, regulated, or security-sensitive;
  • feedback arrives late, indirectly, or only under real-world use.

In low-uncertainty work, AI can turn known patterns into working artifacts. In high-uncertainty work, the central task is still to discover the pattern.

This boundary is not fixed. As models improve, more work will move into the low-uncertainty category. But the boundary will not disappear, because some uncertainty is not execution uncertainty. It is uncertainty about meaning, boundaries, consequences, and risk. The practical question for engineering teams is therefore not “Can the model build it?” but “Do we understand the system well enough to know what a correct build would mean?”

5. The Role of the Engineer

The role of the engineer is frequently mischaracterized. It is tempting to see engineers primarily as implementers: people who take a specification and turn it into working code. Under this view, improvements in code generation naturally reduce the need for engineering effort.

But this framing misses where the real work happens.

Engineers are not valuable because they write code. They are valuable because they resolve uncertainty. The act of building is not just execution; it is where assumptions are surfaced, tested, and refined against reality.

In practice, this work is continuous. As systems take shape, engineers detect when assumptions break, identify mismatches between intended and actual behavior, and adjust abstractions in response to emerging constraints. The system is not simply being implemented-it is being understood.

This leads to a more important point: the hardest part of building software is not the act of implementation. It is determining what should be built in the first place.

System architecture, functional boundaries, and specification are not prerequisites to engineering. They are outputs of it.

Engineering is the work that occurs between an abstract goal and its realization in the real world. It is the process by which vague intent is transformed into something that actually works under real conditions.

AI reduces the cost of execution. It can make uncertainty cheaper to explore. But uncertainty is resolved only when assumptions are tested against reality. In practice, those tests still have to be interpreted by someone accountable for correctness.

This creates a dangerous illusion: reducing the role of the engineer appears to reduce effort.

In reality, removing the engineer from the construction loop removes the process by which teams discover what the system actually needs to be. It also removes the primary guardrail that prevents incorrect assumptions from being carried forward into the design.

6. Knowledge Decay and Causal Detachment

As engineers are distanced from the act of building, a second-order effect emerges: knowledge decay.

This decay is not simply a loss of familiarity with code. It is a loss of causal understanding. Engineers who no longer participate in the construction of a system become less able to explain how intent becomes implementation, how implementation produces observed behavior, and where that behavior may fail under real conditions.

They lose intuition about:

  • which boundaries are fragile,
  • which assumptions are carrying the design,
  • which parts of the system are merely plausible rather than understood,
  • and which failure modes are likely to emerge only under integration, scale, adversarial use, or operational drift.

The danger is not abstraction itself. Engineering has always advanced through abstraction. Better abstractions allow engineers to reason about larger systems without holding every implementation detail in memory.

The danger is abstraction without causal traceability.

A healthy abstraction preserves the ability to move between levels: from intent, to design, to implementation, to behavior, and back again. An unhealthy abstraction severs that path. It allows outcomes to be requested without preserving enough understanding of how those outcomes are produced.

AI-mediated development can create this severance when engineers move too far from construction and become primarily managers of descriptions. In this mode, the engineer’s role shifts from discovering the system through interaction to maintaining a narrative of what the system is supposed to do. The system may continue to grow, but the human understanding of the system does not grow with it.

This is not merely a skills problem. It is a governance problem.

AI can generate plausible system structure faster than teams can validate it. Interfaces, services, tests, policies, configuration, and documentation may accumulate quickly, while causal understanding grows slowly or not at all.

This creates an Assurance Gap: the distance between the rate at which AI-assisted workflows can produce plausible system artifacts and the rate at which teams can understand, validate, and take responsibility for them.

This imbalance is especially dangerous because AI-generated artifacts often arrive in coherent form: well named, well structured, and accompanied by explanations that make the design appear settled. Coherence becomes a proxy for correctness, making the embedded assumptions easier to accept.

Over time, this weakens Interaction Quality.

Interaction Quality is not a measure of how fluent or productive the model appears to be. It is a measure of whether the human-AI loop is actively reducing uncertainty. High-quality interaction challenges assumptions, tests boundaries, explores alternatives, and connects generated output to evidence. Low-quality interaction accepts plausible output as progress.

Knowledge decay makes low-quality interaction more likely. An engineer who no longer understands the causal structure of the system cannot reliably challenge the model’s assumptions. They cannot easily distinguish between a good abstraction and a convenient one, between a harmless simplification and a hidden defect, between a local fix and a structural regression.

This creates a compounding failure mode: weaker causal understanding makes engineers less able to challenge generated output; more accepted assumptions make the system harder to reason about; and the resulting opacity increases dependence on narrative specification and model-generated explanation.

The result is specification drift. The description of the system and the actual behavior of the system begin to diverge. Future changes are then specified against an increasingly inaccurate understanding of what already exists.

This is Completion Bias expressed in practice: treating apparent completion as understanding. The engineer remains present, but is no longer embedded in the discovery process by which the system is formed. They become responsible for a system whose behavior they can describe, but no longer fully explain.

The risk is not that AI raises the level of abstraction. The risk is that it raises the level of abstraction while weakening the engineer’s ability to trace, test, and challenge what lies beneath it.

When that happens, velocity increases while assurance declines.

7. Instrumented Discovery Is Not Full Discovery

A common counterargument is that modern agents do not merely iterate internally. They run code, inspect failures, execute tests, call APIs, deploy to sandboxes, and observe system behavior. This is an important objection. Once agents interact with tools and environments, they are no longer operating only inside a closed narrative context. They encounter friction from outside the model.

That friction is real, but it is bounded. The key distinction is between mechanical discovery, operational discovery, semantic discovery, architectural discovery, and risk discovery. Tool-augmented agents are increasingly effective at mechanical discovery: syntax errors, broken tests, API mismatches, missing dependencies, runtime exceptions, and schema violations. They can also participate in operational discovery by surfacing deployment failures, performance issues, environment mismatches, and integration errors.

But semantic and architectural discovery are different. A system can compile, deploy, and pass its tests while still encoding the wrong business rule, the wrong access model, the wrong user workflow, or the wrong architectural boundary. In these cases, the issue is not that the agent failed to respond to friction. The issue is that the environment did not expose the right friction.

Risk discovery is different again. A system may function exactly as designed while still violating the organization’s security posture, privacy expectations, compliance obligations, or tolerance for failure. These are not always visible as runtime errors. They require judgment about what kinds of behavior are acceptable, not merely whether the behavior works.

A test suite is not reality. It is an executable map of someone’s assumptions about reality. Passing it shows that the system conforms to the map; it does not prove that the map is complete. The same is true of sandboxes, mock data, staging environments, and synthetic user flows. They are essential instruments, but they are still instruments. They expose constraints through the frame their designers provide: the data, tools, permissions, threat model, success criteria, and stopping conditions.

Adversarial agents make this instrumentation more powerful. A second model tasked with breaking the system can probe authorization boundaries, generate unexpected inputs, search for exploit paths, fuzz workflows, and challenge assumptions that the builder did not explicitly enumerate. This is materially better than passive validation. It turns the map into an active search process.

But adversarial search is still framed. It depends on what the adversary is allowed to see, which tools it can use, which objectives it is given, which harms are in scope, which environment it runs in, and which oracle determines success. It can discover counterexamples inside that frame. It cannot prove that the frame is complete. Tool use validates the frame. Engineering asks whether the frame is enough.

This means adversarial agents do not eliminate the need for engineering judgment. They shift the question from “Did we run tests?” to “What adversary did we simulate, what uncertainty did that adversary reduce, and what classes of failure remain outside the simulation?”

This creates a local-maximum trap. Tool-driven agents can optimize rapidly for the nearest visible constraint: make the test pass, resolve the stack trace, satisfy the linter, preserve existing behavior. But local correction is not the same as global understanding. A locally successful patch may move the system further from a coherent architecture. The right answer may not be to patch the function; it may be to rethink the boundary, the data model, or the workflow.

Tool-augmented agents therefore expand discovery, but they do not complete it. They can perform many of the experiments. They can surface failures faster and across a wider surface area. But the interpretation of those experiments-whether the right thing was tested, whether the signal is representative, whether a passing result matters, and whether the original specification was wrong-remains the work of accountable engineering.

8. Toward a Governed Construction Model

Specification improves starting conditions. It does not remove the need for discovery.

A viable model treats AI as a participant in construction, not a replacement for it. The implication is not that autonomy is impossible, but that autonomy must be coupled with mechanisms that continuously test and revise the assumptions it operates on.

In this model, the agent is not merely an implementer. It is an experimental apparatus. It can generate candidate designs, run code, exercise integrations, test failure modes, and surface traces of system behavior. This expands the engineer’s ability to explore the problem space. But the apparatus does not define the question by itself. The engineer remains responsible for deciding what must be tested, what the results mean, and whether the original assumptions should be revised.

In this model:

  • Systems are built iteratively, not one-shot
  • Engineers remain embedded in the loop of construction and evaluation
  • AI accelerates exploration rather than replacing it

The unit of progress shifts.

Progress is no longer defined by outputs alone, but by the quality of the interaction that produces them.

The interaction among engineer, model, and tools is where:

  • assumptions are proposed and tested
  • designs are explored and refined
  • errors are introduced, detected, and corrected

The collaboration is the system-forming process.

9. Governance of the Interaction Surface

If the interaction among engineer, model, and tools is where systems are formed, it becomes the primary target for governance. The need for governance arises precisely because this interaction can produce coherent artifacts even when assumptions remain unresolved.

In AI-mediated development workflows, more of the decisive work can occur pre-commit, before code exists as a stable shared artifact. Traditional governance focuses on outputs: code quality, test coverage, CI/CD results, security reviews, and production incidents. These remain essential, but they are lagging indicators. They evaluate artifacts after assumptions have already been encoded into the system.

The role of interaction-level governance is to surface potential issues before they are committed into architecture, code, tests, policies, or deployment paths. The cost of an incorrect assumption increases with how far it propagates. A mistaken access-control assumption caught during exploration is cheap. The same assumption embedded into service boundaries, database design, and production workflows is expensive, and may only be discovered by users or attackers.

This does not require prescribing how engineers should work. It requires making the assumption trail visible enough to diagnose whether the system-forming process is actively reducing uncertainty or merely converting intent into plausible artifacts. The detailed criteria for making those signals visible are set out in the next section.

Within this frame, Interaction Quality is best understood as a diagnostic signal rather than a performance metric. It is not a measure of model fluency, output volume, developer speed, or individual productivity. It asks whether the human-AI loop is reducing uncertainty or allowing coherent output to stand in for evidence. Section 10 describes the signals required to distinguish those cases.

Without this visibility, organizations are forced to reconstruct reasoning from downstream artifacts. That reconstruction is indirect and often ambiguous: similar code, tests, and documentation can result either from disciplined exploration or from passive acceptance. By contrast, observing interaction patterns gives earlier signals of whether a system is converging toward correctness or drifting into narrative specification.

Governance of the interaction surface is therefore not a speed bump. It is a diagnostic layer. It complements downstream validation rather than replacing it, restoring visibility into how systems are formed rather than simply what they produce. If governance is applied only at the output layer, the Assurance Gap can widen quickly: system velocity increases while the organization’s ability to understand, validate, and take responsibility for the system does not keep pace.

The immediate priority is not to prescribe a specific technical implementation. It is to recognize that the interaction surface is the earliest place where system formation can be observed, and therefore the earliest place where the Assurance Gap can be detected. The next question is what a governance layer must be able to see.

10. Criteria for Closing the Assurance Gap

Any serious approach to governing AI-mediated engineering must make a minimum set of signals visible. These criteria are not a proposed implementation; they describe what organizations must be able to observe if they are to govern system formation rather than merely inspect its outputs.

First, it must provide pre-commit observability. The exploratory phase before version control is where intent is translated into design direction, architectural boundaries, generated code, tests, and assumptions about behavior. Traditional governance sees the result of this process. It rarely sees the process itself. A viable governance layer must make this phase visible enough to diagnose whether uncertainty is being reduced or merely converted into plausible artifacts.

Second, it must distinguish active engineering judgment from passive acceptance. The important question is not whether a human was technically “in the loop,” but what the human did while there. Did the engineer challenge the model’s architectural choices? Did they probe edge cases? Did they clarify ambiguous boundaries? Did they ask what could go wrong? Or did they accept coherent generated output because it appeared complete? Human presence alone is not governance. The quality of the interaction matters.

Third, it must surface and track assumptions. AI-generated artifacts often contain implicit claims about user roles, trust boundaries, data models, error handling, authorization, tenant isolation, integration behavior, and operational context. A governance capability must help identify when a high-risk assumption enters the system-forming process and whether that assumption was acknowledged, challenged, tested, or carried forward unexamined. The important question is not simply what the model generated, but what the generation assumed.

Fourth, it must connect validation to uncertainty. Tests, sandboxes, staging environments, deployment feedback, and adversarial agents are valuable only insofar as they reduce meaningful uncertainty. A passing test suite shows that the system conforms to the test suite. It does not show that the test suite represents reality. A green pipeline is evidence, not proof; governance must ask what uncertainty the green result actually reduced. For tool-augmented agents, governance must ask whether the instruments are representative of the real constraints the system will face. For adversarial agents, it must ask who defined the adversary, what attack surface it could observe, what goals it pursued, what tools it could use, what counted as success, and which classes of harm remained out of scope.

Fifth, it must distinguish convergence from accumulation. AI-mediated workflows can produce large volumes of code, tests, documentation, configuration, and explanation. This accumulation can look like progress while the system’s semantic uncertainty remains unresolved. A governance capability must provide a signal of whether the development loop is converging toward a validated architecture or merely accumulating coherent artifacts. The relevant signal is not output volume. It is whether assumptions are being resolved against evidence.

Sixth, it must be risk-weighted. Not every generated artifact requires the same level of scrutiny. Governance should concentrate where incorrect assumptions have high blast radius: authorization, tenant isolation, data handling, external integrations, irreversible workflows, regulated behavior, architectural boundaries, and security-sensitive decisions. The goal is not to slow all development. It is to create earlier signals where being wrong is expensive.

Finally, it must be diagnostic rather than surveillant. The purpose of interaction governance is system assurance, not individual productivity measurement. It should help engineering leaders identify where the Assurance Gap is widening: where generation has outpaced comprehension, where assumptions are being accepted without challenge, or where validation is too narrow for the risk being taken. The goal is to intervene before structural flaws are committed, not to police the style or speed of individual developers.

A governance layer that satisfies these criteria would not eliminate uncertainty. That is not the goal. Its purpose is to make uncertainty visible while it is still cheap to act on. It would help teams see when AI is accelerating discovery, and when it is merely accelerating the appearance of completion.

Conclusion

The appeal of static-spec autonomy is the illusion of collapsed complexity. It suggests that understanding can be replaced by specification. This is incorrect.

Software systems do not emerge from specification alone. They are discovered through interaction with partial implementations under real constraints. AI does not eliminate engineering. It relocates it. The shift is not just about implementation effort, but about the process of determining what should be built.

Knowing what to build remains the dominant difficulty. AI reduces the cost of execution and accelerates exploration, but it does not by itself resolve the uncertainty inherent in specification. The opportunity is real: where feedback is fast, failure is reversible, and correctness is externally checkable, AI-mediated generation can be transformative. The warning begins when success in those contexts is generalized to systems whose correctness cannot be validated by immediate feedback.

AI can make construction faster, but construction is not the same as engineering. Engineering is the disciplined reduction of uncertainty under real constraints. Any workflow that accelerates construction while weakening that reduction of uncertainty is not progress; it is waterfall with a better interface.

The risk is not that engineers disappear. The risk is that they become detached from the process by which systems are formed, and once detached, they lose the ability to recognize when the system is wrong.

The water looks calm from the shore. Depth is only discovered by getting in.

Written by

Sign up to our newsletter

By subscribing, you accept our Privacy Policy.

Related posts

How to Assess Engineering Team Performance Without Demoralizing Your Engineers
How Engineering Leaders Should Think About AI Tool Budgets in 2026
AI Governance for Engineering Leaders: How to Scale GenAI Without Losing Control

Ready to Transform
Your GenAI
Investments?

Don’t leave your GenAI adoption to chance. With Milestone, you can achieve measurable ROI and maintain a competitive edge.
Website Design & Development InCreativeWeb.com