With the background perspective out of the way, we can get
to the practical details. So far there are two that stand
out - these are by no means new prospective issues, but
they are often normalized in ways that could easily sew
chaos.
One-off Branching
There is a range of content floating around about the evil
of if statements - which generally amount to preferring
polymorphism over ifs. While the use of
practices such as polymorphism can feel cleaner, they're
all branches under the hood, so why would one mechanism be
deemed more dangerous than others?
While I've typically fallen into the anti-if camp in the
interest of taming cyclomatic complexity, my recent
experiences with Copilot clarified for me the
siginificance of tightly controlling branching (and
reminded me of past projects where that was not done).
This distinction certainly echoes previous debates
about GOTO statements during the rise of
structured programming, as does the underlying
rationale. In addition to what may be more cosmetic
benefits, approaches such as polymorphism guide you
towards making sure that the system has a whole has a
sound model and conceptual integrity, and that the
resulting branching falls along the seams of that
model. Any equivalent mechanism (including as many
if statements as you can reliably reason
through) can serve the same purpose.
What is not
wanted is branching that allows for deviation from that
model. This is incredibly common based on past
experience and implied by code produced by Copilot (I
would also not expect Copilot to be able to reason through
better fixes). When a specific scenario comes up with
undesirable behavior, then a branch is added to set things
back on course. This is far easier and more obvious than
working to refine the system as a whole to make sure that
that the model yields behavior which is consistent for
that scenario, without requiring additional
compensation. Outside of the current focus of GenAI, this
may also be related to deficient requirements engineering
practices which neglects such refinement (another topic
I'll probably touch on at some point, though there's
plenty of material from Bertrand Meyer and others). Over
time such special cases (particularly as they compound on
top of each other) yield a house of cards
with unmaintainable complexity. This can lead to
situations where systems end up rapidly accreting
complexity as new logic is incidentally compensating for
past logic where there's a neglected option to make the
system simpler by backing out those earlier decisions.
As a disclaimer, this is not to say that such approaches
do not have their place; there may be times during which
the need for a fix has urgency and the more holisitic
adjustment is more time consuming or not forthcoming. Such
scenarios are where actual technical debt comes into play
(rather than the abuse of the phrase where it is used
to refer to seemingly anything inconvenient). Introducing
a quick if can allow you to get something
in the short term without fully paying for it, but if you
don't manage those tradeoffs you end up insolvent.
In some of my recent sessions with Copilot, it introduced
such branching in ways that fairly egregiously violated
some of the component boundaries, but previously it also
produced code local to a module that buried much of the
intended semantics in a way that would out of the gate
make the system much harder to understand, and with
subsequent increments could make the overall flow
inscrutable. Another closely related concept (which I may
expand upon later) is that it seems worth differentiating
code that is readable from that which is
understandable. It is incredibly easy to look at
code and think you know what it is doing, but without
practices that extend beyond the low-hanging fruit of code
conventions that inferred behavior may not be accurate or
complete (which brings us back to wrestling edge and
corner cases that could be designed out).
I've certainly created such systems early on in my career,
and worked with such systems more recently. They typically
are those where either noone wants to touch anything for
fear of breaking things, or there is one resident SME who
I'll call Fred
in reference to Michael A Jackson's
Brilliance essay, whom everyone goes to in order
to understand how to do their work. If Copilot assumes the
role of Fred it quickly flips from a tool of empowerment,
to one upon which we rely and we could drive our front
wheels off a cliff before we notice he's heading in that
direction.
DWIMmery
An equally insidiuous but even more self-inflicted malady
is the tendency to sprinkle in some additional cleverness
so that the system will Do What I Mean (DWIM). I was
fairly surprised when Copilot did this: adding some
unsolicited sophistication to how data was being handled
to provide what seemed like reasonable
corrections. This was not done in response to any defined
requirement and therefore the resulting behavior was not
specified in any way...but once added it should be
assumed that it would need to be supported and so it's
setting the course for a future where users rely on
unspecified and potentially poorly understood behavior.
My brief bewilderment was replaced with the hypothesis that
Copilot was simply (stochastically) parroting questionable
practices from its training data, practices that I've
dealt with (and almost certainly fallen into) in the past.
Systems should consistently be built with defensive
programming, but guarding against undesirable input
does not imply that such inputs should be in any
way improved. There's likely some other supporting
information, but there's a good talk from Greg Young
floating around somewhere that goes into some of
this. Returning an error for invalid input keeps the
complexity down and enables forward compatible support for
such cases if they are worth
attention. Adding logic to these cases can lead to a nest
of complexity which is perversely consolidated around rare
and often lost causes. A tell tale sign of such behavior
is when a system seems very hard to understand, but then
it is discovered that the happy path that handles the
majority of data remains simple but is eclipsed by
code that is rarely excercised and provides nebulous
(and often still not completely safe) value.
As with most things the danger here is in GenAI amplifying
what is in all cases pernicious. This is particularly
concerning not just because of the volume and likelihood
that GenAI may plough through logic that is
structured in a way that people can't understand, but that
this may be tucked into a larger code modification. When
this is done manually it is likely to be discussed or
otherwise get attention...in the best scenarios also being
captured as a requirement/specification: but GenAI
sprinkling in behaviors that have not been requested and
have not been accounted for can leave systems which are
not understood and difficult to support .