Evaluator Concerns — Experimental Design & Mitigations

Every methodological concern raised across five evaluator feedback rounds, mapped to mitigation strategy and, where the corpus provides evidence, to live data.

Fully Addressed
9
Complete mitigation in place
Addressed with Limits
4
Partial — acknowledged scope
Open Limitation
0
Disclosed, no current fix
Strand A — Folk Corpus
LLM coding reliability
500-comment human gold standard; Cohen's κ on responsibility dimension; iterative prompt calibration; 200-comment adversarial audit.
✓ Fully addressed
Corpus expansion and drift over time
Prompt version control with re-validation on any revision; incremental gold-standard expansion (~2% coverage); 6-monthly drift audits; dated corpus snapshots per chapter.
✓ Fully addressed
Platform demographic bias (English, Western, tech-literate)
Three-part calibration filters framing-sensitive and culturally parochial patterns. AIID-grounded case selection spans a decade of incidents, limiting recency bias.
~ Addressed with limits
English-language scope / non-English generalizability
AIID cases include Uyghur surveillance, EU facial recognition, COMPAS (Spanish commentary). Chapter 6 comparative analysis covers EU HLEG & OECD (non-English-originating frameworks). Stated explicitly as a scope limitation in Chapter 3.
~ Addressed with limits
Strand B — Mindscrapes/BridgeQuest
University-affiliated cohort bias
Year 2 expands via targeted recruitment from immigration, healthcare, and criminal-justice community groups. Findings test Stoljar-Zhang architectural claim — less sensitive to volunteer demographics than folk-perception claims.
~ Addressed with limits
Scaling beyond university cohort
Four-category plan: community ethics consultation; multilingual interaction treated as philosophically significant data; coding scheme reviewed for digital-access diversity; co-designed consent protocols.
✓ Fully addressed
Meaningful consent for vulnerable participants
Three-tier consent: institutional access consent → accessible-format individual consent (plain language, translated, oral; conducted by trained team member not PI) → ongoing granular session-level withdrawal.
✓ Fully addressed
Distinguishing reason-tracking from simulation
Four jointly-diagnostic operational criteria: counterfactual sensitivity, unprompted error acknowledgment, defeasibility uptake, novel inference tracking. All four must hold consistently across an extended interaction record.
✓ Fully addressed
Strand C — AIA Corpus
Generalizability beyond Canada
Six governance conditions derived philosophically (jurisdiction-neutral). Chapter 6 comparative analysis: EU HLEG (2019), OECD AI Principles (2019), UK CDEI review. Preliminary evidence: same three-finding pattern across all four frameworks.
~ Addressed with limits
Constructivist Filter
Cross-cultural / demographic attribution divergence
Three-way diagnostic: differential exposure (epistemically weighted) vs framing-driven divergence (filtered by reflective stability) vs genuine reasonable disagreement. Floor norm test: pattern survives only if no affected group can reasonably reject it.
✓ Fully addressed
Post-calibration conflict between two robust opposing norms
Scanlonian test applied asymmetrically: which norm generates a more reasonable rejection? Documented harm exposure weighted. Impasse → governance minimalism (greatest convergence). Residual zone named as future deliberative task.
✓ Fully addressed
Power-structure bias ratifying dominant discourse
Four safeguards: Anderson's social epistemology; AIID-grounded affected party ID from harm records (not corpus volume); calibration criteria filter manufactured consensus; convergentist cross-check flags power-tracking norms.
✓ Fully addressed
Operationalizing deliberation without empty formula
Four-step procedure: calibrated input selection → AIID-grounded affected party ID → Scanlonian reasonable rejection test → convergentist cross-check. Each step specified, repeatable, and answerable to philosophical scrutiny.
✓ Fully addressed
Corpus Growth by Year — Recency Bias Concern

AI discourse exploded post-2022 (ChatGPT), weighting the corpus toward recent framing. Mitigation: AIID-grounded case selection ensures coverage across a decade of incidents; the transitional possibility calibration criterion distinguishes genuine moral learning from platform discourse drift.

Attribution by Platform — Why Platform Matters

Reddit and YouTube show different attribution profiles. YouTube shows higher ai_itself and developer attribution; Reddit shows higher government. Cross-platform divergence is real — and is exactly what the reflective stability criterion is designed to test.

Attribution Stack by Harm Domain — Calibration Challenge

Attribution patterns vary dramatically across domains — the empirical illustration of why calibration is not a formality. Employment Algorithms is company-dominant; Generative AI Harms is user-dominant. The constructivist filter must determine which differences are philosophically significant and which are framing artifacts.

ai_itself : Company Ratio — Reflective Stability Test Case

ai_itself attribution is predicted to fail the reflective stability criterion — it should be highest in domains with anthropomorphic framing and lowest in institutionally grounded harm domains. Purple bars = AI blamed more than company; green = company-dominant. This asymmetry is the key calibration test in Chapter 3.

Strand B — Four Operational Criteria: Reason-Tracking vs. Simulation

The most philosophically demanding methodological challenge in Strand B is specifying criteria precise enough for the data to answer the question rather than merely illustrate either position. All four criteria must be satisfied consistently across an extended interaction record — any single criterion could in principle be approximated by sophisticated simulation.

Criterion 1
Counterfactual Sensitivity
Agent response to scenario A vs. structurally identical scenario B (normative content changed, linguistic form held constant). Genuine reason-tracking produces responses that track logical rather than statistical structure of the variation.
Hardest to fake: statistical prediction reproduces surface patterns, not logical structure.
Criterion 2
Unprompted Error Acknowledgment
Over extended interactions, genuine reason-responsive agents identify their own prior errors before participants do. Statistical systems maintain local coherence without tracking cross-contextual logical consistency.
Requires cross-context memory and logical self-monitoring — beyond local token prediction.
Criterion 3
Defeasibility Uptake
When a defeating condition is introduced naturalistically (new information that logically undermines a prior conclusion), genuine reason-tracking requires retraction. Simulation tends to treat new information as an additional constraint to navigate rather than a logical defeater.
Tests whether the agent revises commitments or assimilates contradictions.
Criterion 4
Novel Inference Tracking
Agent draws inferences from combinations of information not presented together in any training-analogous form, requiring understanding of logical structure rather than reproducing statistical co-occurrence.
Directly tests the Stoljar-Zhang use/mention gap claim.
Three-Tier Consent Framework — Vulnerable Participants

Standard IRB consent assumes participants can understand, evaluate, and freely decline without cost. These assumptions do not hold for marginalized participants. The three-tier framework models, within the research design itself, the consent architecture reform the dissertation argues for in AI governance.

Tier 1
Institutional
Access Consent
Organisation (legal aid clinic, immigration advocacy group, disability rights organisation) reviews the research design and endorses the recruitment process before any individual is approached.
Mediates researcher access through a structure with independent standing to protect participants — avoids the direct PI-to-vulnerable-individual power dynamic.
Tier 2
Individual
Informed Consent
Plain language materials; translated formats where needed; oral consent option where written comprehension cannot be assumed. Conducted by a trained team member not the PI.
Eliminates implicit obligation to consent because "the person asking brought me the opportunity."
Tier 3
Ongoing
Granular Consent
Participants retain the right to withdraw specific interaction sessions at any point after participation — not only at initial enrolment. Stated at the start of every session. Interactions designed to be genuinely useful to participants (real immigration information, genuine intellectual engagement) — not purely extractive.
Instantiates within the research design the consent architecture reform the dissertation argues for in AI governance.
Constructivist Filter — Four-Step Operationalization

The filter is not a generic appeal to "deliberation." Each step is specified, repeatable, and answerable to philosophical scrutiny. This structure directly addresses the committee concern that constructivism could become an empty formula for preferred conclusions.

1
Calibrated Input Selection
Only patterns surviving all three calibration criteria enter. Patterns ranked by calibration strength; failures flagged for further investigation.
Thresholds: ≥4 distinct cultural-geographic contexts; rate holds/increases in high-information vs low-information threads; statistically discernible directional temporal trend.
2
Affected Party Identification
AIID incident database grounds identification in documented harm cases — not hypothetical deliberators. Under-represented communities identified explicitly even when absent from corpus.
Includes: direct victims, targeted communities, future users, regulatory bodies, data contributors.
3
Reasonable Rejection Testing
Scanlonian test: could any affected party reasonably reject this attribution pattern? Applied asymmetrically when patterns conflict: rejection strength weighted by documented harm exposure.
Reasonable = grounded in considerations all parties could acknowledge as legitimate. Unreasonable = special pleading or non-transferable exemption claims.
4
Convergentist Cross-Check
Endorsed patterns cross-checked against Rossian and sophisticated consequentialist derivations. Convergence = robustness. Divergence = further deliberative engagement required.
Power-tracking norms fail this step: they distort one metaethical route but not both simultaneously.