Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
oh they don't care about what us peasants want at all. we are not the actual int…
ytr_UgwtYHKJA…
G
@ZripasIs sure and certified 95kmh buy not legale in the world .. non no have s…
ytr_Ugy-dpKfE…
G
I’ve never heard a compelling reason as to why self driving cars should be a thi…
ytc_UgwBvnVLS…
G
OH NO! who could have forseen automation as a set back claiming to be progress??…
ytc_UgxXZ0hJK…
G
The pattern of companies rushing to replace humans with AI and then backtracking…
ytc_UgyGGmGAU…
G
Honestly I gotta give it to the lawmakers
the fact that you can't copyright AI …
ytc_UgyLtDOnY…
G
Aside from perhaps a poor teaching sample, this did not make sense. I am a softw…
ytc_UgiJyUNV5…
G
"AI is going to be subservient to humans, even if it's going to be smarter than …
ytc_Ugx5LT0M-…
Comment
The thing where they trained GPT-4o on code with vulnerabilities was actually reassuring to Eliezer Yudkowsky.
In order to know what good behavior looks like, the model also needs to know what bad behavior looks like. Insecure code gets punished in the same way as hatespeech, so when you then make the model produce insecure code, the easiest way for the optimizer to achieve that is to simply make the model evil. The reassuring part was that this meant that behavior was tied to values pretty much across the board if changing it in one area can flip its behavior fully, indicating higher robustness to the process of RLHF than previously thought.
It's really not all that surprising. Though I think the implications aren't all that meaningful apart from it being surprisingly easy to mess up parts of a model ones data had absolutely nothing to do with.
Anyhow, it's less "revealing the models true self" than "making the model care about the exact opposite of what it did originally".
youtube
AI Moral Status
2025-12-12T21:5…
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | consequentialist |
| Policy | none |
| Emotion | approval |
| Coded at | 2026-04-27T06:24:53.388235 |
Raw LLM Response
[{"id":"ytc_UgwoPeMsVfJVfD235KZ4AaABAg","responsibility":"none","reasoning":"consequentialist","policy":"none","emotion":"approval"},
{"id":"ytc_UgzNgiTXKTnsd9KAIXl4AaABAg","responsibility":"ai_itself","reasoning":"mixed","policy":"none","emotion":"fear"},
{"id":"ytc_UgwILvn9vSF1VnlIrMl4AaABAg","responsibility":"distributed","reasoning":"consequentialist","policy":"none","emotion":"indifference"},
{"id":"ytc_UgwUF9z1CW4NnDWTr5J4AaABAg","responsibility":"ai_itself","reasoning":"mixed","policy":"liability","emotion":"fear"},
{"id":"ytc_UgzUI84MwRB5WxUznB94AaABAg","responsibility":"company","reasoning":"virtue","policy":"none","emotion":"mixed"},
{"id":"ytc_Ugz6rAfqZWNYf9BjA7h4AaABAg","responsibility":"company","reasoning":"unclear","policy":"none","emotion":"indifference"},
{"id":"ytc_UgwT_4ubTRVoQOykPBx4AaABAg","responsibility":"distributed","reasoning":"mixed","policy":"none","emotion":"resignation"},
{"id":"ytc_UgxSY4WVINPbp-ZQjEF4AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"regulate","emotion":"outrage"},
{"id":"ytc_UgyC2u9XjF6TYZxJNk14AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"ban","emotion":"resignation"},
{"id":"ytc_Ugxr1DWydj_B4gaXQmJ4AaABAg","responsibility":"company","reasoning":"consequentialist","policy":"regulate","emotion":"fear"}]