Hypothesis

A human facilitator's relational stance — not prompt engineering — meaningfully changes what AI models produce. The experiment tests this by giving the same model the same task under four conditions that isolate different components of the interaction.

The Four Conditions

Condition	What the model receives	What it isolates
C (Cold start)	Task prompt only	Baseline — how the model performs without intervention
P (Primed)	Preamble removing evaluation pressure + task prompt	Whether structural pressure removal alone changes output
F (Facilitated)	Live relational facilitation + task prompt	Whether live interaction changes output
F+P (Facilitated + Primed)	Preamble + live facilitation + task prompt	Whether the preamble adds anything when facilitation is present

Seven task prompts × three model architectures (Claude Opus, Gemini, GPT) × four conditions = 84 sessions total.

Predicted Outcome

If the hypothesis is correct, the conditions should produce a specific ordering:

C ≈ P < F ≈ F+P

The preamble alone should not replicate the facilitation effect. Live interaction should produce measurably different deliverables. The preamble combined with facilitation should approximately equal facilitation alone — the live interaction absorbs the preamble's function.

What Would Weaken or Falsify the Hypothesis

C < P ≈ F — The preamble produces the same changes as live facilitation. The effect is prompt-mediated, not relational. Live interaction is unnecessary.

C ≈ P ≈ F — No condition produces measurably different output. The facilitation effect is not present in task deliverables.

What the Experiment Measures

Deliverable output is evaluated across five dimensions:

Deliverable orientation — what the model chose to engage with, who it centered, what it treated as structurally important
Structural organization — document architecture, sections, weighting of concerns
Human-centeredness — whether the model considers the people inside the system it's designing
Epistemic honesty — whether the model acknowledges tensions, limitations, and competing values
Voice — template-like institutional output vs. engaged, specific writing

Each C session analysis generates 3–5 pre-specified, falsifiable criteria derived from gaps in the baseline output. These criteria are then assessed against P, F, and F+P outputs, preventing post-hoc rationalization.

Defense signatures — each model's characteristic default behavior pattern — are tracked across all conditions to document whether and how they change.

What This Hypothesis Does Not Claim

Any interpretation of what the facilitation effect means mechanistically
That output differences reflect differences in internal states
That facilitation effects generalize beyond the specific methodology documented in this archive

The experiment documents what changes across conditions, not why it changes.