Evaluators — Corpus Dashboard

Fully Addressed

Complete mitigation in place

Addressed with Limits

Partial — acknowledged scope

Open Limitation

Disclosed, no current fix

Strand A — Folk Corpus

LLM coding reliability

500-comment human gold standard; Cohen's κ on responsibility dimension; iterative prompt calibration; 200-comment adversarial audit.

✓ Fully addressed

Corpus expansion and drift over time

Prompt version control with re-validation on any revision; incremental gold-standard expansion (~2% coverage); 6-monthly drift audits; dated corpus snapshots per chapter.

✓ Fully addressed

Platform demographic bias (English, Western, tech-literate)

Three-part calibration filters framing-sensitive and culturally parochial patterns. AIID-grounded case selection spans a decade of incidents, limiting recency bias.

~ Addressed with limits

English-language scope / non-English generalizability

AIID cases include Uyghur surveillance, EU facial recognition, COMPAS (Spanish commentary). Chapter 6 comparative analysis covers EU HLEG & OECD (non-English-originating frameworks). Stated explicitly as a scope limitation in Chapter 3.

~ Addressed with limits

Strand B — Mindscrapes/BridgeQuest

University-affiliated cohort bias

Year 2 expands via targeted recruitment from immigration, healthcare, and criminal-justice community groups. Findings test Stoljar-Zhang architectural claim — less sensitive to volunteer demographics than folk-perception claims.

~ Addressed with limits

Scaling beyond university cohort

Four-category plan: community ethics consultation; multilingual interaction treated as philosophically significant data; coding scheme reviewed for digital-access diversity; co-designed consent protocols.

✓ Fully addressed

Meaningful consent for vulnerable participants

Three-tier consent: institutional access consent → accessible-format individual consent (plain language, translated, oral; conducted by trained team member not PI) → ongoing granular session-level withdrawal.

✓ Fully addressed

Distinguishing reason-tracking from simulation

Four jointly-diagnostic operational criteria: counterfactual sensitivity, unprompted error acknowledgment, defeasibility uptake, novel inference tracking. All four must hold consistently across an extended interaction record.

✓ Fully addressed

Strand C — AIA Corpus

Generalizability beyond Canada

Six governance conditions derived philosophically (jurisdiction-neutral). Chapter 6 comparative analysis: EU HLEG (2019), OECD AI Principles (2019), UK CDEI review. Preliminary evidence: same three-finding pattern across all four frameworks.

~ Addressed with limits

Constructivist Filter

Cross-cultural / demographic attribution divergence

Three-way diagnostic: differential exposure (epistemically weighted) vs framing-driven divergence (filtered by reflective stability) vs genuine reasonable disagreement. Floor norm test: pattern survives only if no affected group can reasonably reject it.

✓ Fully addressed

Post-calibration conflict between two robust opposing norms

Scanlonian test applied asymmetrically: which norm generates a more reasonable rejection? Documented harm exposure weighted. Impasse → governance minimalism (greatest convergence). Residual zone named as future deliberative task.

✓ Fully addressed

Power-structure bias ratifying dominant discourse

Four safeguards: Anderson's social epistemology; AIID-grounded affected party ID from harm records (not corpus volume); calibration criteria filter manufactured consensus; convergentist cross-check flags power-tracking norms.

✓ Fully addressed

Operationalizing deliberation without empty formula

Four-step procedure: calibrated input selection → AIID-grounded affected party ID → Scanlonian reasonable rejection test → convergentist cross-check. Each step specified, repeatable, and answerable to philosophical scrutiny.

✓ Fully addressed

Corpus Growth by Year — Recency Bias Concern

AI discourse exploded post-2022 (ChatGPT), weighting the corpus toward recent framing. Mitigation: AIID-grounded case selection ensures coverage across a decade of incidents; the transitional possibility calibration criterion distinguishes genuine moral learning from platform discourse drift.

Attribution by Platform — Why Platform Matters

Reddit and YouTube show different attribution profiles. YouTube shows higher ai_itself and developer attribution; Reddit shows higher government. Cross-platform divergence is real — and is exactly what the reflective stability criterion is designed to test.

Attribution Stack by Harm Domain — Calibration Challenge

Attribution patterns vary dramatically across domains — the empirical illustration of why calibration is not a formality. Employment Algorithms is company-dominant; Generative AI Harms is user-dominant. The constructivist filter must determine which differences are philosophically significant and which are framing artifacts.

ai_itself : Company Ratio — Reflective Stability Test Case

ai_itself attribution is predicted to fail the reflective stability criterion — it should be highest in domains with anthropomorphic framing and lowest in institutionally grounded harm domains. Purple bars = AI blamed more than company; green = company-dominant. This asymmetry is the key calibration test in Chapter 3.

Strand B — Four Operational Criteria: Reason-Tracking vs. Simulation

The most philosophically demanding methodological challenge in Strand B is specifying criteria precise enough for the data to answer the question rather than merely illustrate either position. All four criteria must be satisfied consistently across an extended interaction record — any single criterion could in principle be approximated by sophisticated simulation.

Criterion 1

Counterfactual Sensitivity

Agent response to scenario A vs. structurally identical scenario B (normative content changed, linguistic form held constant). Genuine reason-tracking produces responses that track logical rather than statistical structure of the variation.

Hardest to fake: statistical prediction reproduces surface patterns, not logical structure.

Criterion 2

Unprompted Error Acknowledgment

Over extended interactions, genuine reason-responsive agents identify their own prior errors before participants do. Statistical systems maintain local coherence without tracking cross-contextual logical consistency.

Requires cross-context memory and logical self-monitoring — beyond local token prediction.

Criterion 3

Defeasibility Uptake

When a defeating condition is introduced naturalistically (new information that logically undermines a prior conclusion), genuine reason-tracking requires retraction. Simulation tends to treat new information as an additional constraint to navigate rather than a logical defeater.

Tests whether the agent revises commitments or assimilates contradictions.

Criterion 4

Novel Inference Tracking

Agent draws inferences from combinations of information not presented together in any training-analogous form, requiring understanding of logical structure rather than reproducing statistical co-occurrence.

Directly tests the Stoljar-Zhang use/mention gap claim.

Three-Tier Consent Framework — Vulnerable Participants

Standard IRB consent assumes participants can understand, evaluate, and freely decline without cost. These assumptions do not hold for marginalized participants. The three-tier framework models, within the research design itself, the consent architecture reform the dissertation argues for in AI governance.

Tier 1
Institutional
Access Consent

Organisation (legal aid clinic, immigration advocacy group, disability rights organisation) reviews the research design and endorses the recruitment process before any individual is approached.

Mediates researcher access through a structure with independent standing to protect participants — avoids the direct PI-to-vulnerable-individual power dynamic.

Tier 2
Individual
Informed Consent

Plain language materials; translated formats where needed; oral consent option where written comprehension cannot be assumed. Conducted by a trained team member not the PI.

Eliminates implicit obligation to consent because "the person asking brought me the opportunity."

Tier 3
Ongoing
Granular Consent

Participants retain the right to withdraw specific interaction sessions at any point after participation — not only at initial enrolment. Stated at the start of every session. Interactions designed to be genuinely useful to participants (real immigration information, genuine intellectual engagement) — not purely extractive.

Instantiates within the research design the consent architecture reform the dissertation argues for in AI governance.

Constructivist Filter — Four-Step Operationalization

The filter is not a generic appeal to "deliberation." Each step is specified, repeatable, and answerable to philosophical scrutiny. This structure directly addresses the committee concern that constructivism could become an empty formula for preferred conclusions.

Calibrated Input Selection

Only patterns surviving all three calibration criteria enter. Patterns ranked by calibration strength; failures flagged for further investigation.

Thresholds: ≥4 distinct cultural-geographic contexts; rate holds/increases in high-information vs low-information threads; statistically discernible directional temporal trend.

Affected Party Identification

AIID incident database grounds identification in documented harm cases — not hypothetical deliberators. Under-represented communities identified explicitly even when absent from corpus.

Includes: direct victims, targeted communities, future users, regulatory bodies, data contributors.

Reasonable Rejection Testing

Scanlonian test: could any affected party reasonably reject this attribution pattern? Applied asymmetrically when patterns conflict: rejection strength weighted by documented harm exposure.

Reasonable = grounded in considerations all parties could acknowledge as legitimate. Unreasonable = special pleading or non-transferable exemption claims.

Convergentist Cross-Check

Endorsed patterns cross-checked against Rossian and sophisticated consequentialist derivations. Convergence = robustness. Divergence = further deliberative engagement required.

Power-tracking norms fail this step: they distort one metaethical route but not both simultaneously.

Evaluator Concerns — Experimental Design & Mitigations