This thought is interesting because we can interpret AI hallucinations from a metaphysical perspective. For instance, a model might mistakenly claim that the composer of Rocky is John Williams. Of course, both Rocky and John Williams are extremely famous, but the metaphysical connection between them is not very stable within the model’s internal representation. Therefore, if we can compute the internal representations of Rocky and John Williams, alongside the relationship between them, we could potentially detect the model’s internal hallucinations.
这个想法很有意思,因为我们可以从形而上学的层面去理解 AI 的幻觉。例如,模型可能会错误地生成《洛奇》的配乐作曲家是约翰·威廉姆斯(John Williams)。毫无疑问,《洛奇》和约翰·威廉姆斯都非常出名,但这两者之间的形而上学联系在模型的内部表征中并不稳定。因此,如果我们能够计算出《洛奇》和约翰·威廉姆斯的内部表征,并计算两者之间的关系,我们或许就能够检测出模型内部的幻觉。
Stay updated
Occasional essays on LLM epistemology, alignment, and political philosophy. No spam.