Intent is all you need

プロダクト開発において、今は意図をいかに LLM に渡すかが重要になってきているという話し。

コードは意図の表層である🔗

LLM はコードの意図を名前から読み取っている🔗

When Names Disappear: Revealing What LLMs Actually Understand About Code

Large Language Models (LLMs) achieve strong results on code tasks, but how they derive program meaning remains unclear. We argue that code communicates through two channels: structural semantics, which define formal behavior, and human-interpretable naming, which conveys intent. Removing the naming channel severely degrades intent-level tasks such as summarization, where models regress to line-by-line descriptions. Surprisingly, we also observe consistent reductions on execution tasks that should depend only on structure, revealing that current benchmarks reward memorization of naming patterns rather than genuine semantic reasoning. To disentangle these effects, we introduce a suite of semantics-preserving obfuscations and show that they expose identifier leakage across both summarization and execution. Building on these insights, we release ClassEval-Obf, an obfuscation-enhanced benchmark that systematically suppresses naming cues while preserving behavior. Our results demonstrate that ClassEval-Obf reduces inflated performance gaps, weakens memorization shortcuts, and provides a more reliable basis for assessing LLMs' code understanding and generalization.

arXiv.org

LLM がコードを理解するとき、次の 2 つの経路があるらしい。

naming channel
- 各シンボルを自然言語上の意味で捉えて解釈する
structural channel
- 制御フローや演算パターンから処理内容を解釈する

通常 naming channel が使われるとのこと。というか学習の過程で自然言語としての解釈のほうが筋が良いという戦略を身に着けてしまうようだ。学習データに含まれるコードがちゃんとしているんだろう。

この論文だけでなく、様々な関連研究でこういう傾向が報告されている。

これらから読み取れるのは coding agent が偶然変な命名をしてしまった場合に後続のタスクで coding agent の意図解釈精度が悪化すること。その状態で命名させると、さらに変な名前が増えるのでどんどん読みづらいコードになっていく。

coding agent は既存コードの解釈が甘くてもコードを生成するので、こういった状況が進行していることに気づきづらい。

ちゃんと命名してもらうには？🔗

より分解すると、事前に意図を伝える方向性と事後に意図通りの命名がなされているかをチェックする方法がある。

事前のアプローチでは PRD で要件、つまり why をきちんと伝えたり、コードから読み取れない設計を design doc として書いて渡したり、人間の判断基準を表すために ADR を読ませたり、ユニットテストで意図を含んだ動く仕様として記述するなど。どうしても使ってほしくない単語などアンチパターンをプロンプトとして示すのも有用。

事後アプローチとして、人間がレビューする方向しかない気がする。ドキュメントやテストとの命名の齟齬を探させたり、アンチパターンに引っかかるものがないかを LLM にレビューさせるなどで少し負担は軽減できる。が、微妙なニュアンスはやはり人間が判断するしかない。 fee / tax / charge のどれが良いかなど。

intent debt がすべての源泉🔗

最近の研究結果として、 coding agent によって intent debt の借り入れが加速しているのではないか？という指摘がある。

From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI

Generative AI is accelerating software development, but may quietly shift where the most significant risks lie. As AI generates code faster than teams can understand it, two under appreciated forms of debt accumulate: cognitive debt, the erosion of shared understanding across a team, and intent debt, the absence of externalized rationale that developers and AI agents need to work safely with code. This article proposes a Triple Debt Model for reasoning about software health, built around three interacting debt types: technical debt in code, cognitive debt in people, and intent debt in externalized knowledge. Cognitive debt is a team-level, project-level property reflecting the erosion of shared understanding across a software system over time, leading to increasingly inadequate shared mental models for reasoning about and safely changing the system. Intent debt refers to the absence or erosion of explicit rationale, goals, and constraints that guide how humans and agents evolve the system. We discuss how generative AI changes the relative importance of these debt types, how each can be diagnosed and mitigated, and surface points of debate for practitioners.

arXiv.org

ここでいう intent debt とはざっくり一言でいうと、コードを書くときに必要な情報に人間も LLM もアクセスできないことを指している。実際は要求、仕様、アーキテクチャー、テストなどが存在しない、もしくは権限不足や類推できない位置にあるなどでアクセスできないという状態。この状態でコード書かせたら coding agent だけでなく人間であっても思ってたんと違う、となるのは当たり前だ。

また、 intent debt が cognitive debt 、つまりソフトウェアへの理解を阻害していることも指摘されている。まあ intent debt / cognitive debt / technical debt それぞれがそれぞれを増幅するというパスがあるのだが、たぶん最初は LLM による intent debt が発生、からの気づかずに蓄積し cognitive debt / technical debt へという流れだと思う。気づかない、というあたりは cognitive surrender の文脈。アジャイル開発の文脈では実践することによって情報を取得し、そこでまた判断をし、というプロセスがあったが、 coding agent を使いながらの開発ではひとりでずっと debt を溜め込んでしまい、どうにもならなくなるところまで気づかないという可能性もありうる。

意図を明文化していこう🔗

別々の論文から、意図を適切に coding agent に伝えることが重要だという示唆が得られた。少なくともこれらで言及されている問題の軽減にはなるだろう。

たぶん一番コスパのいい介入は PRD や design doc 、 ADR など、今まで実績のあるフォーマットで文書を書いて LLM に読ませていくこと。仕様駆動開発はまだ Spec-first のものが多いし、そこを LLM に書かせると AI slop によってメンテできなくなるのでやめたほうが良い。使ったとして、各フォーマットがどういうものなのかやテンプレートの調整、例を書かせてみる、書く際の個別事項の相談などに留めると良いだろう。

できるところ、わかる範囲でいいので書いていくことが大切だと思っている。 coding agent に書かせてみて、そこから学ぶところがあれば書き足していく形で良い。まず書き足せる場所を作るのが何より大事だ。 ThoughtWorks でもコンテキストへの入力を明文化しておくというアプローチに一定の効果があるとか開発者が持つ暗黙知を表出化させるのが重要というハナシをしている。

ただし、コードと内容が重複するドキュメントはメンテナンス難度を上げるだけなので避けたほうが良い。アジャイル開発の文脈における「必要なドキュメント」が何かを見極めて書いていく必要がある。