Raw LLM Responses
Inspect the exact model output for any coded comment.
Look up by comment ID
Random samples — click to inspect
G
AI chatbots are trained on human written code so you can't expect it to outperfo…
ytr_UgzQO3hIA…
G
I’m calling my psychiatrist right now to increase my Xanax dosage, then I’m goin…
ytc_Ugy5hgliJ…
G
You’ll have to unplug the internet to stop AI, so better start a contingency pla…
ytc_UgztnweNU…
G
YouTube is SMART! They are protecting their own company, because IF AI-generated…
ytc_UgwRpITR7…
G
This kind of logical discussion about personhood is totally unhinged from the fa…
ytc_UgwJWOGw1…
G
@crmd5336 yeah, I feel like if you use it as inspiration then that's fine. Just …
ytr_UgzRNMoZb…
G
I dunno if you realize, but you kind of created an argument for why AI art is go…
ytr_UgzRsNh9J…
G
Me: goes to jail for slapping a lady dressed as a robot on the boonky😂…
ytc_UgwhbOYV0…
Comment
5:13 <Krystal> "And he would keep asking it [for a diagnosis based on the exact same data, and the evaluations would change] You get a B [..] You get a D [..] You get an F"
Yes: this is a core "design feature" of LLM / GPT-based chat tools.
There two inherent problems:
1) if you are asking for summary statistics of raw data - e.g. trend analysis, first and second derivative, etc - you might achieve good-enough results. However, as soon as you step into unbounded "future probabilities" prediction rather than historic analysis, your risk of a poor response increases substantially.
One way to reduce such problems might be to provide a verified set of known data profiles that result in a solid, expert-verified diagnosis that would act as known anchors or markers for your own analysis to be considered against.
2) all that said, you're essentially fighting against foundational design principles. If you attempt to eradicate response variation completely (exact repetition in responses based on a specific prompt and associated inputs), they essentially don't work (they don't produce responses humans find appealing).
Although you can tune "Temperature" - which increases or decreases the variability, randomness, or "creativity" of responses, you can only really adjust this so far before the results at either end of the scale are poor.
This parameter acts as a "weighting" mechanism on the probability distribution of the next predicted token (word or word part). Again, you can tune this a little bit).
youtube
2026-02-10T21:5…
♥ 1
Coding Result
| Dimension | Value |
|---|---|
| Responsibility | none |
| Reasoning | unclear |
| Policy | unclear |
| Emotion | indifference |
| Coded at | 2026-04-27T06:26:44.938723 |
Raw LLM Response
[
{"id":"ytc_Ugy4ZsFeJBrwcIx7kiZ4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_Ugzraf-Jcx6fmEZc1Ad4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_UgxRQEijIaqAPMS-Dct4AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"fear"},
{"id":"ytc_UgyZO_QLVDGzHcPAw914AaABAg","responsibility":"distributed","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"ytc_Ugw0VgCOin3q1KDQRG94AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"unclear","emotion":"resignation"},
{"id":"ytc_Ugzg-7JyTAzWpeuOMNF4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"indifference"},
{"id":"ytc_UgyXF3aM3c6sKh79EDx4AaABAg","responsibility":"government","reasoning":"deontological","policy":"regulate","emotion":"outrage"},
{"id":"ytc_UgxB35mhJyV5uGQxqV94AaABAg","responsibility":"ai_itself","reasoning":"consequentialist","policy":"none","emotion":"fear"},
{"id":"ytc_UgxmlsQAeRWPEpbI65V4AaABAg","responsibility":"none","reasoning":"unclear","policy":"unclear","emotion":"outrage"},
{"id":"ytc_UgxnuHhJTIp0ZhUAhHp4AaABAg","responsibility":"ai_itself","reasoning":"deontological","policy":"ban","emotion":"outrage"}
]