Hi, I'm CY I study philosophy, democracy, and LLMs.

My research focuses on the epistemology of LLMs, normative interpretability of models, and applying various normative concepts to models.

Essays

all →
Fri, February 6, 2026 · 21 min read

Alignment and Large Language Models

#AI#Alignment

Field Notes

all →
CY
CY @polis_notebook · Wed, February 25, 2026 · Long Form

Accuracy is about preventing wishful thinking and having a sense of reality; when it comes to preventing wishful thinking, it is about facing our insecurity and inspecting our motivation for the truth. So, when it comes to facing the psychological weakness of ourselves, it is not a problem of scale, but it becomes a problem of will. Once we admit that our belief formation is about will, then accuracy requires self-restraint. With restraint, we have directly faced our psychological weakness as humans: to endure uncertainty, to accept facts that may trouble us. This restraint already has a moral structure. Then restraint, desire, and uncertainty become an expression of character. Accuracy becomes virtue.


中文翻译

准确性在于防止一厢情愿,并拥有现实感;当谈到防止一厢情愿时,它关乎面对我们的不安全感,并审视我们对真相的动机。因此,当谈到面对我们自己的心理弱点时,它不是一个规模的问题,而是一个意志的问题。一旦我们承认我们的信念形成关乎意志,那么准确性就需要自我克制。有了克制,我们就直接面对了作为人类的心理弱点:忍受不确定性,接受可能困扰我们的事实。这种克制已经具有道德结构。于是,克制、欲望和不确定性成为品格的表达。准确性成为美德。

philosophy truth character mindfulness
CY
CY @polis_notebook · Wed, February 25, 2026 · Long Form

This is very social epistemology, essentially about testimony. Williams believes wishful thinking is a lie to ourselves. In the activity of deception, we not only blame the deceiver but also stress the importance of caution. So improving vigilance in Williams’s perspective matters equally. Self-deception in this sense is a failure because we cannot maintain vigilance on the formation of the beliefs we wish to believe are true. It is an activity that allows us to believe what we wish is true. Accuracy reappears at this point. It is a capacity to monitor our own judgment and to understand what our epistemic limits are.


中文翻译

这是非常社会认识论的内容,本质上关乎证言。威廉姆斯认为,一厢情愿的思维是我们对自己撒的谎。在欺骗活动中,我们不仅谴责欺骗者,也强调谨慎的重要性。因此,在威廉姆斯看来,提高警觉性同样重要。就此意义而言,自我欺骗是一种失败,因为我们无法对希望为真的信念形成过程保持警觉。它是一种让我们相信自己所希望之事为真的活动。准确性在此重新出现。它是一种监控自身判断并理解我们认识论局限的能力。

epistemology testimony self-deception vigilance
CY
CY @polis_notebook · Wed, February 25, 2026 · Long Form

Now we can combine sincerity and accuracy. To be accurate requires us to have a sense of reality. An ability can pull us from wishful thinking to objective reality. Sincerity, as we have talked about, is resist the temptation to manipulate understanding of others. Accuracy is about resist the temptation to deceive oneself, to prevent wishful thinking.


中文翻译

现在我们可以将真诚与准确结合起来。要做到准确,就需要我们具备现实感。一种能力可以把我们从一厢情愿的幻想拉回到客观现实。正如我们所讨论的,真诚是抵制操纵他人认知的诱惑;准确则是抵制自欺的诱惑,防止一厢情愿的幻想。

honesty clarity truth self-deception

Stay updated

Occasional essays on LLM epistemology, alignment, and political philosophy. No spam.