Hi, I'm CY I study philosophy, democracy, and LLMs.

My research focuses on the epistemology of LLMs, normative interpretability of models, and applying various normative concepts to models.

Essays

all →
Fri, February 6, 2026 · 7 min read

Alignment and Large Language Models

#AI#Alignment

Field Notes

all →
CY
CY @polis · Tue, February 24, 2026

Okay, it’s interesting because we can understand the hallucination from a metaphysical level. For example, a model generated the composer of Rocky is John Williams. Of course, Rocky is very famous and John Williams is also very famous, but the metaphysical station between Rocky and John Williams is not very stable in models internal representation so that if we can compute the internal representation of Rocky and and John Williams and also the relation between Rocky and John Williams then we can detect models internal hallucinations

#AI #Hallucination #Metaphysics #NLP
CY
CY @polis · Tue, February 24, 2026

It is really interesting if we can somehow align model with accuracy and truthfulness. We can somehow implement the truthfulness as the first priority of models.

#AI #Truthfulness #Accuracy #ModelAlignment
CY
CY @polis · Tue, February 24, 2026

And we can come to understand that people who stick to accuracy and sincerity sometimes are not rational, at least in the utilitarian sense. For example, if a reporter is trying to seek the truth, doing so may damage her and bring bad outcomes. From a utilitarian calculation, if people follow benefits only and pursue interests alone, then in many circumstances maintaining facts or maintaining illusions, and complying with common sense within a certain culture, becomes the safer choice.

Yet we, as humans, still have a tendency to pursue the truth and to feel terrible about lying, even when it costs us. This demonstrates that the pursuit of truth and accuracy is not instrumentally rational. It is a stable and autonomous function of agency. An honest person does not primarily consider interests but instead speaks autonomously according to what he believes.

This is one reason we trust language, because language concerns what we say and what we believe. It therefore seems that Williams believed we often think honesty comes from rationality, while in fact rationality comes from honesty. The reason we, as humans, can conduct complex rational activities is that, at a fundamental level, we have a stable commitment to honesty, accuracy, and sincerity.

If all of us treated the pursuit of truth purely as an instrumental value, then we could always adjust truth-seeking according to our interests and the social structures in which we are located. In that case, it would be difficult to explain how public knowledge systems accumulate over time. The history of science and law would become merely strategic outcomes.

The pursuit of truth sometimes requires us to adhere strongly to what we believe. This may appear unwise at the local level, yet without such commitment we would lose the structure of rationality itself. In postmodern discussions of power structures and standpoint, we may reveal social structures, but this does not automatically provide normative instruction about the nature of accuracy and sincerity. Because of this fundamental tendency toward truth, we possess the freedom and intellectual structure necessary to understand social structures and history.

If we lose the pursuit of truth, we lose the freedom to criticize and the possibility that something can become more accurate. From this perspective, the pursuit of truth and accuracy does not arise from suddenly grasping a prior metaphysical structure of belief. It concerns understanding history, society, and humanity itself. Truth is something that survives within human history.

#truth #rationality #honesty #utilitarianism

Stay updated

Occasional essays on LLM epistemology, alignment, and political philosophy. No spam.