Level 5 Leadership & Management in the AI Era · Tech Educators · Assignment 7 · Evidence Appendix · March 2026

The Quiet Signal System

A structured evidence appendix presenting the complete Develop and Deliver phases — from research and divergent approaches through critical assessment, action planning, simulated deployment, and cultural enablement. Assignment 7: Develop, Prototype, and Launch with AI.

Design Thinking · Second Diamond HITL Methodology Approach C: Simulated Signal Test Develop & Deliver Frameworks Naked Wines UK · Confidential
Assignment 7 · Develop, Prototype, and Launch with AI

Project Journey

Master Point of View Statement — carried from Assignment 6

A long-tenured subscriber to a premium DTC wine community needs the operational experience of receiving their order to be held to the same standard of care as the product itself — because the company's primary measure of customer health (a prompted, managed satisfaction metric) consistently reports 'excellent,' whilst the unsolicited, peer-to-peer community forum — where customers speak to each other, not to the brand — reveals a pattern of quiet, irreversible disengagement expressed not as complaint, but as conclusion: 'I am not angry. I am finished.'

This gap between the managed metric and the lived reality is not a data discrepancy. It is where the highest-value customers are disappearing.

User: A 5+ year subscriber who pre-funds independent makers and co-owns the brand story. Need: Needs the final physical delivery to be treated with the same care and intention as the product itself. Insight: Because when a cost-optimised carrier fails that moment, it doesn't produce a complaint — it produces a conclusion: 'I am not angry. I am finished.'

HITL Critical Finding (carried from Assignment 6): The primary methodological risk in this project is the use of prompted, managed satisfaction metrics as the primary measure of customer health. NPS and post-interaction surveys consistently overstate satisfaction because they measure willingness to respond positively in a managed context, not the actual state of the customer relationship. The unmediated peer-to-peer community forum produces a materially different signal. The divergence between these two data sources is not noise. It is the finding.

Financial Context

Figures derived from group-level HY26 interim results published December 2025. UK segment estimated at 43% of group revenue per FY24 segmental disclosure. All figures should be verified against full annual report segmental tables before board submission.

MetricValueSource
Annual churn rate — highest LTV cohort24%CRM analysis
Estimated UK high-LTV Angel cohort~24,000Segment estimate
Annualised revenue per Angel~£324HY26 derived
Replacement cost per churned Angel~£398CAC estimate
Acquisition payback window44 monthsHY26 derived
Annual replacement burden (current)~£2.3mCalculated
Revenue preserved per 5pp churn reduction~£478,000Calculated
Group NPS76HY26 reported

The Double Diamond Transition

Assignment 6 completed the first diamond and overshot into the second. The course leader's scope clarification corrected the scope. Assignment 7 restarts the Develop and Deliver phases through the taught frameworks, scoped to the Horizon 1 MVP prototype. The diagram below maps both diamonds, the overshoot zone, the scope correction, and the A7 execution path.

ASSIGNMENT 6 — FIRST DIAMOND OVERSHOOT ZONE ASSIGNMENT 7 — SECOND DIAMOND Discover Define POV · HITL Finding Financial Context 3 × Ideation (isolated) 161 ideas → 18 selected Concept: QSS CONCEPT DEFINED Produced Without Course Frameworks ✗ feasibility_study (bespoke RAG, not D/F/V) ✗ QSS_Implementation_Planning (48-wk full deploy) ✗ methodology_critique (refs excluded outputs) ✗ Success_Criteria (full-system scope) ✗ Visual Outputs (full system, not MVP) ✗ Prompt Sequence (generated above scope) SCOPE CORRECTION ↓ CUT TRANSFER GATE 12 in · 9 out Develop Deliver Research · Assess Action Plan · P1–P3 Simulate · Deploy Monitor · Iterate MVP LAUNCH → FEEDBACK → NEXT ITERATION → BACK TO START

Why the Overshoot Is Visible Here by Design

The overshoot documents demonstrate what happens when AI-assisted work proceeds without the taught frameworks as constraints. The feasibility study used a bespoke RAG assessment rather than the Desirability/Feasibility/Viability framework. The implementation plan scoped a 48-week full deployment rather than an MVP prototype. The methodology critique referenced these excluded outputs as validated inputs — a contamination chain. Each document is individually competent; collectively they represent a project that had moved past its assignment scope.

The decision to identify, assess, and deliberately exclude these documents rather than retrofit them to the correct frameworks is the learning. Retrofitting would have preserved the outputs but obscured the methodology. Exclusion preserves the methodology at the cost of the outputs — and for a Design Thinking assignment, the methodology is the point.

The overshoot is visible in this appendix by design. The excluded documents remain in the Assignment 6 project as evidence of the work done. They are not in the Assignment 7 knowledge base. The diagram above shows them in the red-hatched zone between the diamonds — produced, assessed, and deliberately set aside. The Transfer Gate shows that curation was a deliberate response to the overshoot, not routine housekeeping.

Methodological Principles

What Governs This Appendix

HITL gating: No prompt was executed without Tim's review and explicit confirmation. Each output was reviewed and challenged before it fed into the next. The LLM generates; the human decides.

Session isolation: Each prompt was run in a separate chat. Where a prompt depends on a prior output, the dependency is stated explicitly and the output is in the shared knowledge base — not carried through conversation history.

Context degradation awareness: Long chat threads degrade contextual fidelity. The project was deliberately structured across isolated sessions to counteract this. The knowledge base is the shared memory; conversation history is disposable.

Framework derivation: Structure derives from the course material frameworks. The Concept Development document defines what the system is; the frameworks define how to plan, assess, and deliver it. When these sources suggest different structures, the frameworks govern.

MVP scope discipline: 'Launch' means launch the MVP, gather feedback, determine next steps, return to the double diamond. Not full system deployment. This constraint was applied at every prompt.

The metric-truth gap as the core insight: The project's analytical foundation is that managed metrics (NPS) actively obscure the signal that matters. Unmanaged peer-to-peer community data is the only reliable source. Any output that relies on prompted metrics as a primary input has failed the HITL Critical Finding.

Full Process Structure · Both Diamonds · HITL Gates

Process Architecture

The complete sequence from Assignment 6 through the transition to Assignment 7. Stages across two assignments, connected by a deliberate knowledge base curation that determined exactly which prior outputs the second diamond could see. Each HITL gate marks a point where a human assessor reviewed AI output against source evidence before the next stage was authorised to proceed.

Full Process Flow

The flow below shows the A6 stages (slate), the overshoot branch (dashed red — produced and excluded), the Transfer Gate (amber — scope correction), the A7 prompts (blue), and the return loop (green). Arrows are colour-coded by section. The overshoot branch shows what was produced outside the correct frameworks; the Transfer Gate shows the curation that corrected scope before A7 began.

Discovery
/ POV
Stage 1POV Generation
HITL Finding
★ HITL: POV
Ideation
×3
Stage 2Three Anonymous
Parallel Runs
No shared context
Consoli-
dation
Stage 3161→18 ideas
6 clusters
Concept
Dev.
Stage 4Quiet Signal
System defined
★ HITL Gate
feasibility_study
(bespoke RAG)
QSS_Impl_Plan
(48-wk deploy)
method_critique
(refs excl. docs)
OVERSHOOT — EXCLUDED
TRANSFER
GATE
Scope Corr.12 transferred
9 excluded
KB curated
OVERSHOOT
CORRECTED
Research
& 3 Appr.
P1 · DivergeComparable sys.
Obstacles
3 approaches
★ HITL
Critical
Assess.
P2 · ConvergeD/F/V Scoring
Sweet Spot
→ Approach C
★ HITL Select.
Action
Plan
P3 · 4 PhasesDevelop: Preparation
+ Deliver
★ HITL
Simulated
Variations
P4 · Testing3 scenarios
Different failures
★ HITL
Cultural
Enable.
P5 · Stretch5 cultural shifts
Change mgmt
★ HITL
MVP
Launch
→ ◎
ReturnFeedback →
next iter. →
back to start

Session Isolation Architecture

The same contamination-prevention principle operates at two scales across the project. In Assignment 6, it manufactured genuine divergence during ideation. In Assignment 7, it prevents context degradation across the execution sequence. The knowledge base — not conversation history — is the only mechanism through which context persists between sessions.

Assignment 6: Three Isolated Ideation Sessions

Three separate browser windows. Only the anonymised POV as input. No shared context, no financial data, no knowledge base. Convergence across isolated sessions is evidence, not artefact. Community Vocabulary Shift appeared independently in all three sessions — the single most structurally validated idea in the 161-idea corpus. If an idea surfaces in three sessions that cannot see each other, its emergence is structural rather than prompted.

The Transition: Knowledge Base Curation

21 documents assessed against a single question: does this document provide context that Assignment 7 needs, without prescribing outputs that the taught frameworks should determine? The Concept Development document transfers (defines what the system is). The feasibility study is excluded (prescribes how to assess viability using the wrong framework — bespoke RAG, not D/F/V). The implementation plan is excluded (48-week full deployment, not MVP). The methodology critique is excluded (references excluded outputs as validated — a contamination chain).

The Financial Context document is also excluded — but its content is embedded verbatim in the project instructions, making it available to every prompt without creating a separate document dependency.

Assignment 7: Five Prompts in Five Separate Chats

Each prompt executed in its own chat session. Dependencies stated explicitly; prior outputs present in the shared knowledge base, not carried through conversation history. This prevents the known risk that long threads erode contextual fidelity, causing later outputs to drift from the original evidence base. The knowledge base contains only reviewed, finalised outputs and source material — not hedged responses, abandoned reasoning, or intermediate drafts.

Knowledge Base Transfer Table

21 documents reconciled. 12 transferred to the Assignment 7 project. 9 excluded. Every exclusion has a specific rationale grounded in preventing contaminated context from shaping the A7 outputs.

DocumentFormatDecisionRationale
Transfer — 12 documents in the Assignment 7 knowledge base
Concept_DevelopmentGoogle DocTRANSFERThe QSS concept: chosen direction entering A7 Develop phase.
Silent_Exit_Ideation 1Google DocTRANSFERSession 1: 52 raw ideas. Session isolation rigour evidenced.
Silent_Exit_Ideation 2Google DocTRANSFERSession 2: 53 raw ideas.
Silent_Exit_Ideation 3Google DocTRANSFERSession 3: 56 raw ideas. Independent convergence on Community Vocabulary Shift.
silent_exit_ideation_consolidatedGoogle DocTRANSFER18 priority ideas across 6 clusters from 161 total.
NakedWines_UK_AI_Risk_AssessmentGoogle DocTRANSFERAI Risk Register: 7 dimensions, UK regulatory lens. DPIA gate context.
Image (Part 1 — In-Depth Research)PNGTRANSFERAssignment 7 brief screenshot — Part 1.
Image (Part 2 — Summary/Stretch Goal)PNGTRANSFERAssignment 7 brief screenshot — Part 2 and Stretch Goal.
Develop: Selecting SolutionsPDFTRANSFERCourse material — Weighted Scoring Matrix and selection framework.
Develop: PreparationPDFTRANSFERCourse material — five key aspects of prototyping preparation.
Develop: Continuous AssessmentPDFTRANSFERCourse material — D/F/V sweet spot assessment.
Building the Thing Right: DeliverPDFTRANSFERCourse material — Deliver phase frameworks (Finalise, Launch, Monitor).
Exclude — 9 documents left in Assignment 6 project
Assignment6_Complete_Prompt_Sequence_FINALGoogle DocEXCLUDEContains every prompt that generated excluded outputs. POV, HITL Finding, Financial Context embedded in project instructions instead.
feasibility_studyGoogle DocEXCLUDEBespoke RAG framework, not D/F/V. Would prescribe the wrong assessment method for A7.
QSS_Implementation_PlanningGoogle DocEXCLUDE48-week full deployment scope, not MVP. Would anchor the action plan to the wrong scale.
methodology_critiqueGoogle DocEXCLUDEReferences excluded outputs as validated. Would contaminate A7 with superseded conclusions.
Assignment6_Ideation_Prompt_SummaryGoogle DocEXCLUDERedundant — content covered by the three ideation outputs and the consolidation document.
Assignment 6Google DocEXCLUDEA6 brief. Backward-looking — A7 brief defines current requirements.
The_Quiet_Signal_System___Visual_Outputs.pdfPDFEXCLUDEVisuals of the full four-layer system, not MVP scope. Would expand scope beyond what A7 tests.
quiet_signal_visuals.htmlHTMLEXCLUDESame content as the PDF visuals — duplicate exclusion.
Financial_Context — HY26 Public DataText fileEXCLUDEContent embedded verbatim in project instructions — available to every prompt without a separate document.

Prompt Sequence Summary

Each prompt maps to a named requirement in the assignment brief. Framework derivation — from the course material, not the Concept Development document's internal architecture — was a non-negotiable discipline throughout.

#Prompt TitleBrief RequirementKB / Framework Inputs
1Research & Three ApproachesPart 1: in-depth research, three approachesConcept Dev, Risk Register, Financial Context
2Critical Assessment & SelectionPart 1: assess approaches; Part 2: rationaleCourse material (selection methods, D/F/V rubrics); Prompt 1 outputs
3Action Plan (Four Phases)Part 1: action plan, all four phasesCourse material (Develop: Preparation; Deliver); Concept Dev, Risk Register
4Three Simulated VariationsPart 1: simulate three variationsPrompt 3 output, Concept Dev, Risk Register
5Cultural EnablementStretch Goal: items a, b, appendix for cHITL Finding, Concept Dev, Deliver: change management framework

Process Notes

HITL gating: No prompt was executed without Tim's review and explicit confirmation. Each output was reviewed and challenged before it fed into the next prompt.

Session isolation: Each prompt is self-contained. Where a prompt depends on a prior output, the dependency is stated explicitly and the output is in the shared knowledge base — not carried through conversation history.

Framework derivation: Structure derives from the course material frameworks, not the Concept Development document's internal architecture. When these sources suggest different structures, the course material governs.

MVP scope discipline: 'Launch' means launch the MVP prototype, gather feedback from relationship stewards, and determine what happens next — then return to the start of the double diamond for the next iteration. Not full system deployment.

HITL Methodology

Assignment 6 gates: POV validation before ideation begins; after the feasibility study before concept development proceeds; after concept development before success criteria are written; after the full critique output before submission. At each gate, a human assessor reviewed AI output against source evidence before the next stage was authorised.

Transition gate: The include/exclude decision on all 21 documents was a human assessment of what context the new project should inherit. The overshoot documents were identified, assessed, and deliberately excluded — not retrofitted to the correct frameworks. The exclusion is the learning.

Assignment 7 gates: Each of the five prompts was reviewed and challenged before it fed into the next. Tim's HITL role included: challenging whether scenario selection in Prompt 4 was the right one; confirming that Approach C was selected on genuine constraint analysis rather than convenience; verifying that the cultural shifts in Prompt 5 are specific to this organisation and this project, not generic change management. The outputs in this appendix reflect the finalised, reviewed versions — not the first draft in every case.

Prompt 1 · Develop Phase — Diverge · Comparable Systems · Obstacles · Three Approaches

Research and Three Implementation Approaches

In-depth research into comparable real-world systems where organisations have tried to detect customer disengagement using community or unmanaged signals rather than prompted satisfaction metrics. Obstacle analysis specific to implementing this MVP at Naked Wines UK. Three genuinely different implementation approaches — not tonal variations but structurally distinct choices about how to build and deploy the MVP.

Foundational assumption being tested: AI can reliably distinguish concluded departure from active complaint in unmediated peer community language, and the signal it surfaces to a human relationship steward is more actionable than a prompted satisfaction score. The MVP is complete when a relationship steward can review ten flagged subscriber briefs — each led by verbatim community language, not a model score — make a confident decision for each, and confirm that the verbatim signal was the most useful element.

1. Comparable Real-World Systems and Approaches

The specific proposal at the core of this MVP — using unmanaged peer community language as the primary signal source for detecting silent disengagement, in preference to prompted satisfaction metrics — sits at a genuine frontier. Most of the retention technology industry is not doing what this system describes. The following analysis maps what exists, what it tells us, and where the transferable lessons lie.

1.1 The NPS Blind Spot: Increasingly Documented, Rarely Acted On

Quitlo, a churn intelligence platform, analysed over 50,000 AI exit conversations with churned SaaS subscribers and found a consistent pattern: companies with reportedly healthy NPS scores were losing customers at rates their survey data never predicted. Their analysis identified the structural reason: NPS captures sentiment from the roughly 4.5% who respond, while churn happens among the 95.5% who stay silent. The most common reason churning customers gave for not completing exit surveys was that they did not believe anyone would read it — a learned response to years of feedback going nowhere.

CustomerGauge, a B2B retention platform, makes a complementary observation: the majority of churn they have observed across enterprise accounts is preceded not by negative feedback but by an absence of signal. Their framing is direct — churn comes from a lack of feedback and data, not from the feedback itself. The absence of complaint is not evidence of satisfaction; it is the signal itself.

Relevance to this project: These findings directly validate the core insight of the Quiet Signal System. The metric-truth gap is not a quirk of Naked Wines' data. It is a structural feature of prompted satisfaction measurement. A group NPS of 76 coexisting with a 24% annual churn rate in the highest-LTV cohort is consistent with the pattern documented across the industry. The anomaly is not the churn. The anomaly is the NPS.

1.2 The Customer Health Score Ecosystem: Instructive Limitations

The current industry standard for churn prediction is the customer health score, operationalised by platforms such as Gainsight, ChurnZero, and Planhat. These systems aggregate product usage, support ticket history, NPS/CSAT responses, and engagement data into composite health scores, typically visualised as red/amber/green. Gainsight reports that 84% of companies with formal customer success programmes use health scoring as a cornerstone of their retention strategy.

These platforms are genuinely useful for B2B SaaS retention where the signals are digital and frequent — login frequency, feature adoption, support ticket volume. They are structurally weaker for consumer subscription models where the product is physical, the community is emotional, and the churn signal is expressed in peer-to-peer language rather than product telemetry.

ChurnZero's own analysis of enterprise account data reveals a specific failure mode relevant to this project: health scores perform well for customers in the early stages of a subscription (high sensitivity to early disengagement signals) but significantly underperform for tenured customers who are expected to renew. The long-tenure, high-NPS customer who churns is a known blind spot in health score architectures — exactly the profile of the Naked Wines Angel at risk of silent departure.

Relevance to this project: The health score ecosystem defines the existing state of the art. The Quiet Signal System is differentiated not by its goal (detecting churn) but by its data source (unmanaged community language) and its interface design (verbatim-first, not score-first). The existing platforms solve a different problem for a different customer profile.

1.3 Community Sentiment as a Leading Indicator: The Slack and Reddit Research

Several academic and commercial research programmes have examined whether sentiment in brand-adjacent online communities predicts customer behaviour. The most relevant evidence base is from the open-source software and gaming communities, where product forums function similarly to the Naked Wines community — unmediated peer-to-peer conversation, not brand-managed channels.

Research from the Wharton School (Netzer et al., "Mine Your Own Business") found that analysing customer discussions in product forums predicted market share movements more accurately than surveys, with a lead time of several months. The mechanism was not sentiment polarity (positive/negative) but semantic content — what customers were discussing, how they were framing the product's role in their lives, and whether the framing was changing over time.

Supportbench, a customer success platform, identifies linguistic changes as a leading indicator of disengagement: the shift from collaborative language ('we should fix this') to transactional language ('your product does not do X') to observational language ('I noticed that other products handle this differently') tracks a consistent trajectory toward departure. The emotional register cools before the cancellation is submitted. This directly validates the Silence Classifier's design: not just what the customer says, but how they say it and whether the register is changing.

Relevance to this project: These findings establish that language trajectory — how customers talk about a product over time — is more predictive than point-in-time sentiment. The Community Vocabulary Shift signal is the operationalisation of this principle: longitudinal NLP analysis of individual posting histories, detecting the shift from inside-the-community language to outside-the-community language.

1.4 The 'We' to 'They' Vocabulary Shift: Pronoun Research

Academic research by Iñigo-Mora (2004) on pronoun use in group discourse establishes that first-person plural ('we', 'our', 'us') versus third-person references ('they', 'them', 'the company') are measurable indicators of group identity alignment. When group members refer to the group in the third person, they have psychologically departed even if their formal membership continues. This finding has been replicated across organisational, political, and consumer contexts.

The application to subscription churn has not been commercially operationalised — no known retention platform currently uses longitudinal pronoun trajectory as a primary churn signal. This represents both the theoretical validity and the practical novelty of the Community Vocabulary Shift component. The academic evidence is strong; the commercial application is untested at scale.

The strongest indirect validation for this project: the Community Vocabulary Shift idea emerged independently across all three isolated ideation sessions. If the idea surfaces in three sessions that cannot see each other, its emergence reflects something the evidence base supports — not a prompt artefact.

1.5 Silent Churn Detection in DTC Subscription Businesses

Peloton's retention model (documented in their investor materials following the 2022 subscription crisis) is the closest documented example of multi-dimensional community engagement analysis in a DTC subscription context. Their retention analysis found that churn was 60% lower among subscribers who engaged with two or more content disciplines per month — and critically, that engagement variety was more predictive than engagement frequency. A subscriber who used the platform every day for one type of content was more at risk than one who used it three times a week across multiple content types.

The relevant insight is not the specific finding but the methodology — they identified that multi-dimensional engagement patterns were more predictive than any single metric. The Quiet Signal System's three-signal corroboration approach (Forum Divergence Score, Silence Classifier, Community Vocabulary Shift) follows the same structural logic: no single signal is dispositive; convergence across signals is the reliable indicator.

1.6 The HITL Verbatim-First Interface: No Direct Precedent

The design principle of surfacing verbatim community language at the top of every subscriber brief — before any model score or risk category — has no direct precedent in the retention technology stack. Every platform in the market (Gainsight, ChurnZero, Braze Predictive Churn, the Pedowitz Group's sentiment-driven churn framework) presents a score first and evidence second.

The Quiet Signal System inverts this deliberately. The rationale is architectural: if the steward sees a number first, the number becomes the decision input and the verbatim language becomes supporting evidence. The system then behaves like a more sophisticated version of the managed metric it was designed to replace. If the steward reads the subscriber's actual words first, the emotional register of the community signal is preserved through to the human decision point. The controlled comparison built into the Approach C prototype — verbatim-first versus score-first presentation of the same briefs — is designed to test whether this architectural commitment produces the expected advantage in practice.

1.7 Summary of Transferable Lessons

SourceTransferable Lesson
Quitlo (50k exit interviews)NPS captures sentiment from ~4.5% who respond; churn happens among the 95.5% who stay silent. The metric-truth gap is structural, not anomalous.
CustomerGaugeThe majority of churn is preceded by an absence of signal, not negative feedback. Silence is the most dangerous indicator.
ChurnZero health scoresHealth scores fail to predict churn among tenured customers expected to renew. The long-tenure, high-NPS churner is a known blind spot.
Wharton / Netzer et al.Forum semantic analysis predicts behaviour months ahead of surveys. What customers discuss — and how framing changes over time — matters more than sentiment polarity.
Peloton retention modelMulti-dimensional engagement patterns are more predictive than any single metric. Corroboration across signals is the reliable architecture.
Supportbench silent churnLinguistic changes (collaborative to transactional to observational language) are a leading indicator of disengagement, preceding behavioural signals.
Iñigo-Mora (pronoun research)Pronoun choices ('we' vs 'they') measurably reflect group identity alignment. The shift is real but has not been commercially operationalised as a churn signal.

2. Potential Obstacles

The following obstacles are organised by category but deliberately not ranked. Materiality assessment is a decision for the project owner, informed by which implementation approach is selected.

2.1 Regulatory and Legal

DPIA sequencing gate. The mandatory DPIA under UK GDPR Article 35 is a hard sequencing constraint on the critical path. The specific obstacle is not completing the DPIA itself — that is a procedural step — but the findings it may produce. If the DPIA concludes that community forum posts constitute special category data when processed through AI behavioural profiling, the lawful basis analysis becomes significantly more complex. The ICO's guidance on AI and data protection is clear that automated profiling with significant effects requires either explicit consent or a documented Legitimate Interests Assessment. Consent is unsuitable given the commercial power imbalance in a subscription context (GDPR Recital 43). The Legitimate Interests Assessment must demonstrate that the processing is necessary, proportionate, and that the data subject's interests do not override the business interest.

Community forum terms of service. Naked Wines' community forum presumably has existing terms governing how user-contributed content may be used. If those terms do not explicitly contemplate AI-driven behavioural analysis of individual posting patterns, there is a gap between what the subscriber consented to when posting and what the system does with their words. This requires legal review and potentially updated terms with grandfathering provisions for existing content. The legitimacy question — can the organisation honestly say to an Angel 'we read what you wrote and used it to understand how you were feeling about us?' — is distinct from the legal question, and both need answering.

ICO exposure. The AI Risk Register identifies the ICO maximum fine exposure at group level as approximately £8m at current revenue. Angel retention damage from a regulatory investigation would compound this materially. The DPIA gate is not a bureaucratic formality — it is the point at which the project either earns regulatory clearance to proceed with live data or determines that it cannot.

2.2 Data Access and Quality

System integration. The MVP requires access to individual subscriber forum posting histories linked to subscriber accounts, correlated with CRM data (tenure, referral history, pre-funded balance, email engagement). Forum data and CRM data are likely in different systems with different schemas. The technical effort to join them into a single subscriber record is non-trivial, and the data engineering team's capacity and willingness to prioritise this for an experimental MVP is uncertain. This is the single largest technical dependency for Approaches A and B; Approach C bypasses it entirely.

Coverage gap. Not all Angels post on the forum. If only a fraction of the 24,000 target cohort are active forum contributors, the system's coverage is immediately limited. The Forum Divergence Score and Community Vocabulary Shift signals only work for subscribers who generate community language. For the silent majority who never posted, only the behavioural telemetry signals (referral velocity reversal, email open-time decay) are available. The MVP must account for this coverage gap honestly — a system that covers 30% of the target cohort is a different proposition from one that covers 80%.

Data quality and longitudinal depth. The Community Vocabulary Shift detection requires longitudinal analysis of individual posting histories — not a single-point classification but a trajectory over months or years. This requires a well-structured historical dataset and a clear definition of what constitutes a meaningful shift versus normal linguistic variation. If forum posts were not consistently linked to individual subscriber records over the full tenure window, the longitudinal signal cannot be computed.

2.3 Organisational and Political

The NPS challenge. The system's core premise is that the company's primary measure of customer health (NPS 76) is structurally misleading. This is not a neutral analytical observation — it is a direct challenge to the metric that the leadership team reports to the board and the market. The HY26 interim results cite NPS 76 as evidence of customer health. A system that explicitly argues this metric is obscuring the departure of the highest-value customers will encounter resistance from anyone whose reporting, compensation, or credibility is anchored to NPS. This is the single most likely reason the project stalls at the organisational level.

The relationship steward role. The concept requires a 'relationship steward' with the authority to make relational intervention decisions. Naked Wines' current customer service structure presumably does not include this role in the form the system requires. Creating it — even for the MVP — requires someone with budget authority to allocate staff time. If the MVP is positioned as a technology experiment, it may get engineering resource but not operational resource. It needs both. The MVP cannot be tested without a human steward to review the briefs.

Operational culture and decision-making norms. The system asks stewards to make qualitative, interpretive decisions based on verbatim language rather than threshold-triggered scripts. Organisations whose customer service functions are built on scripted, metric-driven processes may find this genuinely uncomfortable — not because of resistance to the concept, but because the capability infrastructure for qualitative decision-making does not exist.

2.4 Technical

NLP classification difficulty. The classification task at the core of the MVP — distinguishing concluded departure from active complaint, and satisfied silence from concluded silence — is genuinely difficult. Standard pre-trained sentiment models will not do this out of the box. The system needs to recognise the specific linguistic signatures of conclusion: the absence of emotional language, the shift in pronoun framing, the decline in posting frequency interpreted as signal rather than noise. The MVP can use an LLM as the classifier rather than training a custom model, which reduces the technical barrier significantly, but the prompt engineering and validation work is still substantial.

Longitudinal analysis requirement. The vocabulary shift detection requires longitudinal analysis of individual posting histories — not a single-point classification but a trajectory over months or years. This is computationally straightforward but requires a well-structured historical dataset and a clear definition of what constitutes a meaningful shift versus normal linguistic variation. Context matters enormously: a subscriber who posts less in December may be on holiday rather than disengaging.

False positive rate and steward fatigue. If the system flags too many subscribers who are not at risk, the steward's attention will be diluted and the intervention quality will decline. If it flags too few, it misses the subscribers it was designed to catch. Calibrating the false positive rate requires real data, which Approach C — operating on synthetic signals — cannot provide. This means the threshold calibration problem is deferred to the next iteration.

2.5 Human and Ethical

Steward skill requirement. The steward's value depends on her ability to read verbatim community language and make a qualitative judgement about the subscriber's relational state. This is a skill that most customer service environments do not systematically develop. The MVP cannot be used to train this skill before the test — training before evaluation would contaminate the results. But deploying an undertrained steward risks invalidating the prototype results for a different reason. The orientation protocol in the action plan (Phase 3.3) navigates this tension by providing minimal orientation without interpretive instruction.

The experience of being watched. The system monitors language subscribers wrote in the context of speaking to each other, not to the brand. There is a version of this system that subscribers would experience as surveillance if they learned about it. The legitimacy of the system depends on whether Naked Wines can honestly say to an Angel: 'We read what you wrote in the community forum and used it to understand how you were feeling about us.' If that sentence sounds creepy rather than caring, the system has a legitimacy problem that no DPIA can resolve. The surveillance concern is raised as an explicit data capture point in the MVP review sessions (Phase 4.2 of the action plan).

3. Three Implementation Approaches

Three structurally different approaches — not cautious/moderate/ambitious versions of the same plan. Each makes meaningfully different choices about data sources, technology, regulatory sequencing, and what assumption is tested first.

Approach A: Retrospective Validation

Core method: Run Layer 1 signals against historical data for Angels who have already churned in the past 12–18 months. Assess whether the system would have detected their departure before it happened.

Steward test: Review ten retrospective subscriber briefs for Angels who are already gone. Assess whether the signal was real and whether the brief format supported confident decision-making.

DPIA position: Operates on historical, anonymised data for internal analytical purposes. Substantially more defensible than live profiling — legal counsel's position should be confirmed, but retrospective analysis of anonymised historical data for internal model validation is a different regulatory proposition from live behavioural profiling of current subscribers.

What it prioritises: Evidence before infrastructure. This approach answers the foundational question — does the signal exist in the data? — before any new system is built. It also sidesteps the DPIA sequencing gate for the MVP phase, because it operates on historical data for internal analysis rather than live profiling of current subscribers.

What it trades off: It does not test the live operational loop — the steward receiving a real-time flag and deciding what to do. It validates the signal but not the workflow. There is also a risk of hindsight bias: knowing that the subscriber churned may make the steward see signals that she would not have noticed prospectively.

Assumptions it relies on: That the historical forum and CRM data is accessible and linkable at the individual subscriber level. That churned Angels' forum posting histories are still available in the system. That the legal position on retrospective analysis of historical data is distinct from live profiling.

Approach B: The Live Shadow System

Core method: Build the full Layer 1 signal pipeline against the live subscriber cohort and generate briefs in real time, but run entirely in shadow mode — no action is taken on any flag.

Steward test: Review live briefs and record decisions (Intervene, Monitor, Escalate) over a 90-day observation window. Track the system's predictions against actual subscriber behaviour.

DPIA position: Requires the DPIA to be completed before deployment. Live profiling of current subscribers, even without intervention, triggers Article 35.

What it prioritises: Predictive validation. This is the only approach that generates genuine evidence of whether the system's signals predict future behaviour, not just whether they describe past behaviour. It also tests the full HITL interface — the steward experiences the dashboard as she would in production, makes real decisions under real conditions, and can provide informed feedback on whether the verbatim signal was genuinely more useful than the model score.

What it trades off: It requires the DPIA to be completed before deployment, which puts the DPIA on the critical path and may add 8–16 weeks depending on DPO capacity and ICO consultation requirements. It requires the full data integration pipeline (forum data linked to CRM records) to be built — the largest technical dependency. It requires a steward to commit real time over 90 days to reviewing briefs that produce no action — a harder organisational sell than a one-off retrospective exercise.

Assumptions it relies on: That the DPIA can be completed within the project timeline. That the data engineering team can deliver the forum-CRM integration. That a suitably senior steward can be allocated for 90 days of shadow operation. That 90 days provides enough observation time to see whether flagged subscribers subsequently churn.

Approach C: The Simulated Signal Test

Core method: Construct ten synthetic subscriber briefs using real community language from public proxy sources (Reddit, Trustpilot, app store reviews) mapped onto fictional subscriber profiles with realistic CRM data.

Steward test: Present briefs to stewards who do not know which represent concluded departure and which represent active complaint or satisfied silence. Assess decision quality and interface effectiveness.

DPIA position: No DPIA dependency. No internal subscriber data is processed. Operates entirely on public proxy data and synthetic profiles.

What it prioritises: Speed and HITL design validation. This approach can be built and tested in 2–3 weeks with no data integration, no DPIA dependency, and no access to internal systems. It tests the single thing the concept document identifies as the MVP's success criterion: can the steward, on reviewing a flagged subscriber's brief, make a confident decision and confirm that the verbatim signal was the most useful element? It also enables a direct controlled comparison: the same briefs presented in verbatim-first and score-first formats to test whether the information hierarchy changes the steward's decision-making.

What it trades off: It does not test whether the AI can actually detect the signals in Naked Wines' real data. It validates the interface, the HITL workflow, and the steward's decision-making — but uses hand-crafted signals rather than algorithmically generated ones. It also relies on proxy data that may not fully represent the linguistic register of Naked Wines' own community.

Assumptions it relies on: That public proxy data is a reasonable stand-in for community language during the prototype phase. That stewards can be recruited and briefed without requiring formal organisational approval. That the value of testing the HITL interface independently of the signal pipeline is understood and accepted.

4. Selection Framework

The choice between the three approaches is a strategic decision about what the MVP is for — which risk the project owner judges most material to test first.

If the biggest risk is…Choose…Because…
The signal does not exist in the data at allApproach A: Retrospective ValidationIt answers the foundational question first with the lowest overhead and fewest dependencies.
The signal exists but does not predict future behaviourApproach B: Live Shadow SystemIt is the most rigorous validation, generating genuine predictive evidence over a 90-day window.
Even a perfect signal will be wasted by a poor interface or unsuitable HITL workflowApproach C: Simulated Signal TestIt tests the human decision layer independently and can be executed immediately.

None of the three approaches is wrong. They test different parts of the same system. They could also be sequenced: C first to validate the interface, then A to validate the signal, then B to validate the prediction — which is in effect a three-phase MVP development path. That sequencing decision sits with the project owner and informs the selection rationale in the Critical Assessment tab.

Prompt 2 · Develop Phase — Converge · Weighted Scoring Matrix · D/F/V Sweet Spot · Selection Decision

Critical Assessment and Approach Selection

Weighted Scoring Matrix and Sweet Spot Analysis applied to the three implementation approaches from Prompt 1. HITL convergent step — Tim reviewed, challenged, and selected. The selection reflects an honest assessment of the constraints under which this project actually operates, not an optimistic reading of what might be achievable.

Framework applied: Weighted Scoring Matrix (Desirability, Feasibility, Viability, HITL Integrity) combined with Sweet Spot Analysis (Desirability/Feasibility/Viability intersection). Criteria weights set by project owner; scoring rationale per criterion documented in full below.

1. Assessment Framework

Criteria and Weights

Desirability (weight: 3) — Does the approach produce something that the intended user (the relationship steward) would genuinely want to use, find credible, and act on? Lower weight than Feasibility because a desirable approach that cannot be built is not an MVP.

Feasibility (weight: 5) — Can the approach be executed within the constraints that actually exist: no internal data access, no DPIA completed, project lead as sole resource? Joint highest weight. An approach that cannot be executed produces no evidence, regardless of its theoretical merit.

Viability (weight: 4) — Does the approach produce evidence that justifies further investment? A result that is internally interesting but insufficient to influence a senior stakeholder's decision is not viable in the context of this project.

HITL Integrity (weight: 5) — Does the approach preserve the HITL architecture as specified: verbatim community language surfaced before any model score, human steward as the decision-maker, no AI output reaching the subscriber without review? Joint highest weight because the HITL design is a non-negotiable constraint, not a preference.

2. Per-Criterion Scoring Rationale

2.1 Desirability

Approach A: Retrospective Validation — Score: 3. The steward reviews historical cases with known outcomes. Her decisions are retrospective rather than prospective — she knows, or can infer, that the subscriber in question has already churned. This makes the experience partially artificial. The HITL interface can be tested but the steward's confidence in a brief where the outcome is already determined is not the same as her confidence in a live decision. Scored at 3 rather than lower because the verbatim-first format can still be evaluated for its communicative effectiveness.

Approach B: Live Shadow System — Score: 5. The steward reviews live subscriber briefs with no knowledge of how the subscriber subsequently behaves. Her decisions are genuinely prospective, uncontaminated by hindsight. This is the highest-fidelity test of whether the system produces something the steward genuinely finds actionable. If she can make confident decisions on live briefs and later evidence shows those decisions correlated with actual subscriber behaviour, the Desirability case is conclusive.

Approach C: Simulated Signal Test — Score: 3. The steward reviews synthetic briefs constructed from public proxy data mapped onto fictional subscriber profiles. The interface can be tested and the HITL workflow validated. But the steward knows — or can reasonably infer — that the data is simulated. The emotional register of proxy language (Reddit, Trustpilot, app store reviews) may not match the specific tone of Naked Wines' community. The steward is evaluating a demonstration, not using a tool. Sufficient to test the interface design but not sufficient to test whether the steward would trust and act on the system in practice.

2.2 Feasibility

Approach A: Retrospective Validation — Score: 4. Operates on historical data for internal analytical purposes. The DPIA position is substantially more defensible than live profiling — retrospective analysis of anonymised historical data for internal model validation is a different regulatory proposition from live behavioural profiling of current subscribers. Requires access to historical forum data linked to CRM records at the individual subscriber level — an organisational dependency but not a regulatory one. Does not require real-time data infrastructure. Scored at 4 rather than 5 because it still requires internal data access, which depends on organisational approval that has not yet been secured.

Approach B: Live Shadow System — Score: 2. Requires the DPIA to be completed before deployment — live profiling of current subscribers, even without intervention, triggers UK GDPR Article 35. The DPIA is on the critical path and may add 8–16 weeks depending on DPO capacity and ICO consultation requirements. Requires the full forum-to-CRM data integration pipeline to be built — the single largest technical dependency in the entire project. Requires a suitably senior steward to commit real time over 90 days to reviewing briefs that produce no action. Scored at 2 because two of these three dependencies (DPIA completion and data pipeline construction) are substantial and neither is within the project's direct control.

Approach C: Simulated Signal Test — Score: 5. No DPIA dependency. No internal subscriber data is processed. No data integration pipeline is required. Operates entirely on public proxy data and synthetic profiles that the project team constructs. Can be built and tested in 2–3 weeks. The only organisational dependency is recruiting stewards for the review exercise, which can be done informally. This is the most executable approach by a significant margin.

2.3 Viability

Approach A: Retrospective Validation — Score: 4. Produces a specific, evidenced answer to the foundational question: does the signal exist in the historical data? If the retrospective analysis shows that the Forum Divergence Score, Silence Classifier, and Community Vocabulary Shift indicators were consistently elevated in the 12–18 months before high-LTV Angels churned, this is direct evidence that the signal is real and detectable. This is the evidence a senior leadership team would need to justify further investment. Scored at 4 rather than 5 because retrospective evidence is inherently weaker than prospective evidence — it demonstrates correlation in historical data, not prediction of future behaviour.

Approach B: Live Shadow System — Score: 4. Produces the strongest possible evidence: prospective predictions tested against actual subscriber behaviour over a 90-day observation window. If the system flags subscribers who subsequently churn, and does not flag subscribers who remain, the predictive case is made. This is the evidence that would convert the business case from 'plausible' to 'proven.' Scored at 4 equal to Approach A because, while the evidence it produces is stronger, the question of whether the project reaches the point of producing that evidence is a Feasibility problem — and evidence that is never generated has zero viability regardless of its theoretical strength.

Approach C: Simulated Signal Test — Score: 2. Validates the dashboard interface and the steward's decision-making workflow. Does not validate whether the AI can actually detect the signals in Naked Wines' real data. A board-level audience asking 'does this work?' would receive the answer: 'the interface works and the stewards find it usable — but we have not yet tested whether the underlying signal detection is accurate.' Useful evidence for the next iteration but insufficient evidence to justify significant further investment on its own.

2.4 HITL Integrity

Approach A: Retrospective Validation — Score: 4. The HITL architecture is fully testable: the steward reviews subscriber briefs with verbatim community language presented first, before any model score. The verbatim-first principle is preserved. However, the hindsight bias risk introduces a qualification: the steward who knows the subscriber has churned will read the verbatim language differently from one making a genuinely prospective decision. The HITL design is intact; the steward's cognitive state is not fully equivalent to production conditions.

Approach B: Live Shadow System — Score: 5. The HITL architecture is tested in the highest-fidelity conditions. The steward reads verbatim language for a subscriber whose status is genuinely unknown. She makes a decision — Intervene, Monitor, Escalate — without knowing whether it will prove correct. This is the authentic test of whether the verbatim-first design enables better decisions than a score-first format under real operational conditions. No qualification applies.

Approach C: Simulated Signal Test — Score: 4. The verbatim-first design can be tested in the prototype and the controlled comparison between verbatim-first and score-first formats can be run. The HITL workflow is validated. The qualification is that the verbatim language is synthetic, not algorithmically detected from real Naked Wines community data — so the test of whether the steward would trust the system in practice is limited by the artificiality of the data source.

3. Scoring Matrix and Visual Comparison

CriterionWeightA: RawA: WtdB: RawB: WtdC: RawC: Wtd
Desirability33951539
Feasibility5420210525
Viability441641628
HITL Integrity5420525420
TOTAL (Weighted)656662
Score proximity: The three approaches score within four weighted points of each other (62–66). This proximity is itself a finding — it reflects the fact that each approach is strong on different dimensions and weak on different dimensions. The matrix does not produce a clear winner. It produces a selection that depends on which trade-off the project owner judges most acceptable. A selection made purely on the matrix total would choose B (66) — but a selection made with eyes open to the Feasibility score of 2 would not.

Approach Profiles: Strengths and Weaknesses Visualised

5 4 3 2 1 Desirability Feasibility Viability 3 5 3 4 2 5 4 4 2 A: Retrospective B: Live Shadow C: Simulated Signal Scale: 1–5 per criterion

The chart makes two things immediately visible: B is tall on Desirability and HITL but collapsed on Feasibility — the most rigorous approach that cannot be built within current constraints. C is tall on Feasibility but collapsed on Viability — the most executable approach that does not answer the foundational question. A occupies the middle ground on most criteria.

4. Sweet Spot Analysis

The sweet spot of innovation sits at the intersection of Desirability (Heart), Feasibility (Hands), and Viability (Head). A solution failing on any one dimension produces a specific failure mode. The following analysis maps each approach to this framework and positions them relative to the failure zones.

Desirability Heart Feasibility Hands Viability Head dream unadopted unsustainable SWEET SPOT A B C A: Retrospective B: Live Shadow C: Simulated Signal

A sits closest to the sweet spot but carries residual risk of producing a 'dream' — retrospective evidence that the signal exists but insufficient proof that the steward can act on it in real time. B sits in the Desirability circle but outside the Feasibility circle — the most rigorous test that the project cannot currently execute. C sits firmly in the Feasibility circle but outside Viability — the most executable approach that does not answer the foundational question.

ApproachStrongest DimensionWeakest DimensionSweet Spot Risk
A: Retrospective ValidationFeasibility–Viability axisDesirability (hindsight bias)Dream: retrospective evidence compelling but prospective decision-making untested
B: Live Shadow SystemDesirability + HITL IntegrityFeasibility (DPIA, pipeline, 90-day commitment)Dream: ideal test that cannot be executed within current constraints
C: Simulated Signal TestFeasibility (5)Viability (2)Non-adoption: proves the interface works but not that the underlying signal is real

5. Trade-Off Analysis

5.1 The Foundational Question Trade-Off

Approach A is the only approach that directly answers the foundational question: does the signal exist in the data? If it does not — if the Forum Divergence Score, Silence Classifier, and Community Vocabulary Shift indicators show no consistent pattern before historical churns — then the entire Quiet Signal System concept requires fundamental revision. Approach B would eventually answer this question too, but only after significantly greater investment. Approach C does not answer it at all.

The trade-off is: Approach A answers the most important question first, but under imperfect conditions (retrospective, with hindsight bias). The alternative is to answer a less important question under better conditions (Approach C), or to answer all questions under ideal conditions but at a cost that may prevent the test from happening (Approach B).

5.2 The Regulatory Sequencing Trade-Off

Approach B is the only approach that requires the DPIA to be completed before deployment. This is not a bureaucratic hurdle — it is a substantive gate. Live profiling of current subscribers, even in shadow mode without intervention, triggers UK GDPR Article 35 automated decision-making requirements. The DPIA timeline is not within the project's direct control and depends on DPO capacity and potentially ICO consultation. Approach A occupies an intermediate position: it uses internal historical data but for retrospective analysis rather than live profiling, which is a materially different regulatory proposition. Approach C sidesteps the question entirely.

The trade-off is: the most rigorous approach (B) carries the highest regulatory risk and the longest critical path; the most executable approach (C) avoids regulatory engagement entirely but also avoids the data that matters.

5.3 The Evidence Strength Trade-Off

The three approaches produce evidence of fundamentally different kinds. Approach A produces correlational evidence: the signal was present before historical churns. Approach B produces predictive evidence: the signal identified subscribers who subsequently churned. Approach C produces usability evidence: the interface enables effective steward decision-making. Each type of evidence serves a different audience and answers a different question. Correlational evidence tells the data science team the signal is real. Predictive evidence tells the board the system works. Usability evidence tells the operations team the dashboard is fit for purpose. None is wrong; they are different. The trade-off is which evidence gap the project owner judges most dangerous to carry into the next iteration.

5.4 The Sequencing Argument

The three approaches could be sequenced: C first to validate the interface, then A to validate the signal, then B to validate the prediction — a legitimate three-phase MVP development path. But it is important to recognise what that sequencing implies: if C is chosen first, the project spends its first iteration learning whether the dashboard is usable without learning whether the signal is real. If the signal turns out not to exist in the data (tested in the second iteration via A), the interface validated in the first iteration was validated against synthetic signals that the real system cannot reproduce. This is a real cost, acknowledged and accepted.

6. Selection Decision

Selection: Approach C — Simulated Signal Test

The weighted scoring matrix does not produce a clear winner — the three approaches score within four points of each other (62–66) and each is strong on different dimensions. The selection is therefore determined not by the matrix alone but by an honest assessment of the constraints under which this project actually operates.

The decisive constraint is data access. This project does not have access to Naked Wines' internal data — neither historical CRM records, nor forum archives linked to individual subscriber profiles, nor live behavioural data. This is a hard boundary, not an unsecured dependency. Approach A scores a Feasibility of 4 in the matrix, but that score assumes internal data access is an organisational dependency that could be obtained. An independent critical assessment of the scoring confirms this is overstated: without internal data, a retrospective analysis would operate on fabricated historical data, which is functionally identical to Approach C with a retrospective framing bolted on. The distinction between A and C collapses when neither has access to the real data that gives A its analytical advantage. Approach B requires not only data access but a completed DPIA and a 90-day steward commitment — dependencies that are further from resolution than A's.

Approach C is the only approach that can be fully executed and demonstrated within the constraints that actually exist. It operates entirely on public proxy data and synthetic subscriber profiles. It requires no DPIA, no data integration pipeline, and no internal organisational approval. It can be built, tested with stewards, and iterated within the assignment timeline. This is not a compromise selection — it is the selection that takes the project's own Feasibility weighting (5, the joint highest) seriously.

The Viability weakness is real and is accepted, not explained away. Approach C does not answer the foundational question: is the signal real in Naked Wines' data? It validates the dashboard interface, the steward's decision-making workflow, the verbatim-first information hierarchy, and the HITL architecture — but does so against synthetic signals. A leadership team reviewing the results would know the tool is usable but would not know whether the underlying signal detection works. This is the correct limitation to carry into the next iteration. What happens next, if Approach C validates the interface and workflow, is Approach A — testing the signal against real data, with the interface design already grounded in steward feedback.

The HITL controlled comparison — presenting the same briefs in verbatim-first and score-first formats — is a specific methodological advantage of Approach C that neither A nor B offers. This directly tests the project's core architectural commitment: that surfacing verbatim community language before model scores produces better steward decisions. If this comparison shows no difference, the information hierarchy requires revision regardless of whether the signal is real. This is valuable evidence in its own right. A completed Approach C with a known Viability gap is more useful than an incomplete Approach A with no findings.

Prompt 3 · Develop & Deliver Phases · Develop: Preparation · Finalise & Deploy · Implement & Launch · Monitor & Learn

Action Plan — Relational Health Dashboard MVP

Approach C: Simulated Signal Test. Four phases structured from course material frameworks: Develop: Preparation (five key aspects) and Deliver: Finalise and Prepare for Deployment, Implement and Launch, and Monitor and Iterate Post-Launch. Each activity identifies what it is, who owns it, what it produces, and its dependencies.

Scope discipline: 'Launch' means launch the MVP prototype, gather feedback from relationship stewards, and determine what happens next — then return to the start of the double diamond. Not full system deployment. Known limitation carried forward: Approach C does not answer whether the signal is real in Naked Wines' actual data. This is the accepted trade-off; the next iteration (Approach A) addresses it.
DPIA as hard gate: No deployment of AI-driven behavioural profiling of subscriber data before a mandatory DPIA under UK GDPR Article 35 is completed. Approach C sidesteps this gate because no internal subscriber data is processed — but the gate reappears as the critical path item for the next iteration and must be initiated before Approach A can begin.
Phase 1 · Develop: Preparation · Five key aspects

Phase 1: Prepare for Prototyping

Lay the groundwork to transform the selected concept into something testable, enabling rapid learning through steward feedback. The mindset is 'build to learn,' not 'build to launch.'

1.1 Mini-Planning Session

A focused effort to outline the immediate next steps required to build a basic, testable version of the Relational Health Dashboard. This is not a full project plan. It answers three questions: What are the absolute first actions? Who is responsible? What is the quickest way to create something stewards can react to?

ActivityOwnerProducesDependencies
Define the ten subscriber brief archetypes covering the signal spectrum: concluded departure, active complaint, satisfied engagement, and ambiguous/borderline casesProject lead (Tim)Brief specification document listing the ten profiles with target language patterns and CRM characteristics for eachPOV statement and HITL Critical Finding from Assignment 6
Confirm the steward review protocol: blind review (steward does not know which briefs are concluded vs active), verbatim-first format vs score-first comparison, and structured decision capture formProject leadReview protocol document specifying session format, question sequence, and data capture methodBrief specification (above)
Establish the build timeline: two-week sprint from brief specification sign-off to steward review sessionsProject leadSprint plan with milestones: Week 1 (data construction and prototype build), Week 2 (steward recruitment, briefing, and review sessions)Resource availability confirmed

1.2 Resource Identification

Pinpointing the essential people, tools, and materials required for the initial prototype. The guiding principle is to use what is readily available and cost-effective.

ActivityOwnerProducesDependencies
People and skills: identify and recruit two to three relationship stewards (or equivalent senior customer-facing staff) willing to participate in a one-hour blind review session eachProject lead, with informal support from retention team leadConfirmed steward participants with scheduled session timesOrganisational willingness to release staff time for an experimental exercise; no formal approval gate required (no real subscriber data involved)
Tools and software: confirm the prototyping tool (interactive HTML, built via LLM-assisted development) and the feedback capture method (structured form or spreadsheet)Project leadTool selection confirmed; template feedback form draftedNone (low-fidelity tools, no procurement)
Materials and data: assemble public proxy community language from Reddit, Trustpilot, and app store reviews of DTC subscription services; map to the language patterns identified in the project (concluded departure, we-to-they shift, satisfied silence)Project lead with LLM supportCurated proxy language corpus organised by signal type, ready for brief constructionPublic proxy sources accessible; language patterns defined in Assignment 6 ideation outputs
Budget: confirm zero direct cost for this prototype phase — all tools are available, all data is public, steward time is the only resource costProject leadBudget note confirming nil incremental costNone

1.3 Defining the 'What' and 'How'

A clear decision on the scope and fidelity of the first prototype. The principle is Minimum Testable Prototype — the smallest thing that can be built to learn whether the verbatim-first HITL design produces better steward decisions than the current score-first paradigm.

ActivityOwnerProducesDependencies
'What' to prototype: the Relational Health Dashboard as the steward experiences it — subscriber queue, verbatim community language displayed first, trajectory data (posting frequency, referral velocity, email engagement), confidence intervals (not binary flags), and steward decision options (Intervene, Monitor, Escalate)Project leadPrototype functional specification: what the steward sees, the information hierarchy, and the decision workflowBrief specification and review protocol from 1.1
'How' to prototype: interactive HTML built with simulated data. High enough fidelity that the steward can experience the workflow realistically, but not a production system. Ten synthetic subscriber briefs with fabricated forum posts, dummy behavioural telemetry, and simulated NPS scores illustrating the metric-truth gapProject lead with LLM-assisted developmentWorking interactive HTML prototype with ten populated subscriber briefsProxy language corpus from 1.2; brief archetypes from 1.1
Comparison condition: build a second view of the same briefs in score-first format (model score displayed prominently, verbatim language secondary) to enable a direct controlled comparison of the information hierarchy's effect on steward decision-makingProject leadScore-first variant of the prototype interface for the same ten briefsVerbatim-first prototype completed first

1.4 Setting the Stage for Learning

The primary purpose of this prototype is not to build a perfect product but to enable rapid learning through steward testing. The mindset is 'build to learn,' not 'build to launch.'

ActivityOwnerProducesDependencies
Define the critical questions this prototype must answer: (1) Does the verbatim-first information hierarchy change the steward's decision compared to score-first? (2) Can the steward make a confident decision on a flagged subscriber brief? (3) Does the steward confirm that verbatim community language was the most useful element?Project leadDocumented learning questions linked to the MVP completion criterionMVP completion criterion from Concept Development; review protocol from 1.1
Design the feedback collection instrument: a structured form capturing decision made (Intervene/Monitor/Escalate), confidence level (1–5), most useful element (verbatim language, trajectory data, model score, other), and free-text observationsProject leadSteward feedback form ready for use in review sessionsLearning questions defined (above)
Establish the success and failure criteria for this iteration: Success = stewards make confident decisions and identify verbatim signal as most useful in at least 7 of 10 briefs. Failure = stewards cannot distinguish signal types, or find the model score more useful than verbatim language. Both outcomes are informativeProject leadDocumented success/failure criteria with explicit statement that failure is a valid learning outcomeLearning questions and MVP completion criterion

1.5 Iterative Mindset

This phase is foundational for the iterative nature of Design Thinking. It prepares the project for a continuous loop of refinement, recognising that this first prototype is rarely the last.

ActivityOwnerProducesDependencies
Define the iteration pathway: this MVP (Approach C) tests the interface and steward workflow. The next iteration (Approach A: Retrospective Validation) tests whether the signal exists in real historical data. The third iteration (Approach B: Live Shadow System) tests predictive accuracy. Each iteration returns to the start of the double diamondProject leadDocumented iteration sequence showing what each cycle tests and how it feeds the nextCritical Assessment and Approach Selection document (sequencing argument)
Identify the pivot criteria: if the steward review reveals that the interface design is fundamentally wrong (e.g. verbatim language is overwhelming rather than useful, or the decision options do not match how stewards actually think about subscriber relationships), the next iteration redesigns the interface before proceeding to signal validationProject leadPivot criteria documented as part of the learning frameworkSuccess/failure criteria from 1.4
Phase 2 · Deliver: Finalise and Prepare for Deployment · Five key aspects

Phase 2: Finalise and Prepare for Deployment

Ensure the prototype is polished, documented, compliant, and ready for the steward review sessions. The goal is a prototype that functions reliably and can be evaluated honestly.

2.1 Refinement and Polishing

The final stage of perfecting the prototype before it reaches the stewards. This involves addressing any interface issues and ensuring consistent design quality throughout.

ActivityOwnerProducesDependencies
Review all ten subscriber briefs for consistency: realistic CRM data ranges (tenure 5–12 years, pre-funded balances, referral histories), plausible forum language that does not read as obviously fabricated, and coherent behavioural telemetry trajectoriesProject leadReviewed and corrected brief dataset; change log documenting any revisionsPrototype build complete from Phase 1
Test the steward workflow end-to-end: navigate the subscriber queue, open each brief, read verbatim language, review trajectory data, make a decision, complete the feedback form. Identify and fix any interface frictionProject lead (self-test), plus one informal tester if availableBug list resolved; confirmed smooth workflow from queue to decision to feedback captureWorking prototype and feedback form
Verify the comparison condition: confirm that the score-first variant presents identical underlying data with only the information hierarchy changed, so any difference in steward decisions can be attributed to the display order rather than data differencesProject leadVerified parity between verbatim-first and score-first prototype variantsBoth prototype variants built

2.2 Comprehensive Documentation

Creating clear, accessible materials for the stewards who will interact with the prototype and for the project record.

ActivityOwnerProducesDependencies
Steward briefing document: a one-page guide explaining what the steward will see, what they are being asked to do, and what they should not assume (i.e. that this is a finished product or that the data is real). Must not reveal which briefs are concluded departure vs active complaintProject leadOne-page steward briefing document, reviewed for clarity and absence of bias cuesPrototype finalised from 2.1
Review session facilitation guide: step-by-step instructions for running the session, including the order of brief presentation, when to switch from verbatim-first to score-first format, and how to capture feedback without leading the stewardProject leadSession facilitation guideReview protocol from Phase 1; steward briefing document
Technical documentation: how the prototype was built, what data sources were used, how the simulated signals were constructed, and what the known limitations are. Required for the assignment appendix and for any future iteration teamProject leadTechnical specification document covering build method, data provenance, and known limitationsPrototype build complete

2.3 Legal, Ethical, and Compliance Review

A critical check to ensure the prototype adheres to all relevant laws, regulations, and ethical guidelines. Approach C sidesteps the DPIA sequencing gate because no internal subscriber data is processed, but this position must be formally confirmed.

ActivityOwnerProducesDependencies
Confirm the regulatory position: Approach C uses only public proxy data and synthetic profiles. No real Naked Wines subscriber data is processed. Confirm with legal counsel that this characterisation is accurate and that the steward review sessions — using fictional briefs with recruited participants — do not trigger any data protection obligationsProject lead (confirm with DPO if available)Written confirmation of regulatory position for the recordPrototype specification finalised
Initiate the DPIA for the next iteration: even though the DPIA is not required for Approach C, the DPIA process for Approach A (Retrospective Validation) should be initiated now. The DPIA may take 8–16 weeks. Starting it during Phase 2 of Approach C means it will be complete (or close to complete) when Approach C concludes and Approach A is ready to beginProject lead, with DPO as gating authorityDPIA initiation document for Approach A (retrospective analysis of historical subscriber data); timeline establishedDPO availability and organisational approval
Ethical review of the steward briefing: ensure the briefing document does not prime the steward to prefer verbatim language. The prototype must capture the steward's genuine response, not a response shaped by the briefingProject lead with independent reader reviewBriefing document confirmed as bias-freeBriefing document from 2.2

2.4 Technical Readiness

Ensuring the prototype is technically stable and accessible for the steward review sessions.

ActivityOwnerProducesDependencies
Deploy the prototype to a stable, accessible URL: a static hosting service (e.g. Cloudflare Pages, GitHub Pages) that the steward can access from any device without installation or loginProject leadLive prototype URL accessible on desktop and mobilePrototype build and polishing complete
Test prototype performance and cross-browser compatibility: confirm the interface renders correctly on Chrome, Safari, and Edge. Confirm the feedback form submits correctlyProject leadCross-browser test log; confirmed form submissionDeployment complete
Prepare a fallback: a PDF or printed version of the subscriber briefs in case of technical failure during a sessionProject leadPrinted brief set as session backupBrief content finalised

2.5 Stakeholder Sign-Off

Final approval before the prototype is released to stewards. At the MVP stage, the stakeholder group is small — the project lead and the retention team lead at minimum.

ActivityOwnerProducesDependencies
Retention team lead review: share the prototype, the briefing document, and the session protocol with the retention team lead. Address any concerns about content, process, or steward selection before sessions beginProject leadRetention team lead sign-off (informal, documented)All Phase 2 outputs complete
Confirm steward availability: final confirmation that the recruited stewards are available for their scheduled sessions and have received the briefing documentProject leadConfirmed session scheduleSteward briefing document from 2.2
Phase 3 · Deliver: Implement and Launch · Four key aspects

Phase 3: Implement and Launch

Execute the steward review sessions and gather the evidence the prototype was designed to produce. 'Launch' for this MVP means running the steward review sessions — not public deployment or subscriber-facing rollout.

3.1 Deployment Strategy

The deployment is the steward review sessions. Two to three individual one-hour sessions, each using the same protocol and the same ten briefs.

ActivityOwnerProducesDependencies
Session sequencing: run the verbatim-first format for the first six briefs, then present the same briefs in score-first format for the remaining four. This preserves the verbatim-first experience as the primary condition while still enabling the controlled comparison. The order is consistent across all stewards so that any sequence effect is held constantProject lead (session facilitator)Consistent session protocol across all stewardsSession facilitation guide from 2.2
Stagger sessions by at least 48 hours: stewards must not discuss the prototype with each other between sessions. The facilitation guide includes an explicit instruction not to share opinions before all sessions are completeProject leadSession schedule with adequate separationSteward availability confirmed

3.2 Market Introduction and Communication

At MVP stage, 'market introduction' means introducing the prototype to the stewards — not external communication.

ActivityOwnerProducesDependencies
Opening orientation (10 minutes per session): walk the steward through the interface, explain what each element of the brief contains, demonstrate the decision workflow with a sample brief that is not in the ten. Do not instruct the steward on how to interpret the signals — the prototype must capture the unguided responseProject leadSteward oriented and ready to begin; orientation script used consistentlyBriefing document and prototype deployed
Explain the purpose without disclosing the hypothesis: tell the steward that the exercise is testing a new interface design and that there are no right or wrong answers. Do not reveal that the project is specifically testing whether verbatim language produces better decisions than a model score — this would prime the steward's responseProject leadBlinded session conditions maintainedFacilitation guide from 2.2

3.3 Training and Support

At MVP stage, training is minimal and deliberately so. The steward's unguided response is the data.

ActivityOwnerProducesDependencies
Orientation only: the facilitation guide specifies exactly what to show and explain, and what not to explain. The steward should understand the mechanics (how to navigate, how to submit decisions) but not the interpretive framework (what signals to look for, how to weight verbatim language vs trajectory data)Project leadConsistent orientation across all sessions; no interpretive contaminationFacilitation guide from 2.2
Structured debrief (15 minutes per session): after the steward has completed all ten briefs, conduct an unscripted debrief. Ask open questions: what did you find most useful? What was confusing? What would you want to know that you could not see? Do not reveal the ground truth (which briefs were concluded departure) until after the debrief is completeProject leadDebrief notes per steward; verbatim captures of key observationsAll ten briefs reviewed and decisions submitted
Resistance signal log: specifically capture any steward comments about the surveillance dimension — whether they have concerns about the ethics of monitoring community language that subscribers wrote to each other, not to the brand. This is critical data for the Cultural Enablement planProject leadSurveillance/legitimacy reaction captured per steward, with verbatim quotes where possibleDebrief notes

3.4 Change Management

The MVP introduces a new workflow and a new decision-making paradigm. Even at prototype stage, the steward is experiencing something that may challenge her existing understanding of what the retention function does.

ActivityOwnerProducesDependencies
Acknowledge the role shift explicitly in the debrief: ask the steward whether the stewardship model (qualitative judgement, no save script) feels meaningfully different from the current retention model. Do not advocate for the change — capture the steward's honest response to the contrastProject leadSteward perspective on the role-identity dimension of the change, captured in debrief notesDebrief complete
Identify early adopters and resisters: note which stewards engage with the verbatim-first format readily and which find it difficult or uncomfortable. This is the early signal for how the capability-building programme needs to be structured in the post-MVP phaseProject leadInformal adopter/resister profile for each steward participantSession and debrief observations
Phase 4 · Deliver: Monitor and Iterate Post-Launch · Four key aspects

Phase 4: Monitor and Learn

Analyse the evidence from the steward sessions, identify what the prototype has proven and what it has not, and determine what happens next. The double diamond loop closes here and reopens for the next iteration.

4.1 Performance Tracking

Quantitative analysis of the session data against the success and failure criteria established in Phase 1.

ActivityOwnerProducesDependencies
Calculate steward confidence scores: average confidence (1–5) per brief and per steward, across both verbatim-first and score-first conditions. Note any outlier briefs — high or low confidence — that may reveal something about the brief construction or the interface designProject leadConfidence score summary: per-brief, per-steward, per-conditionAll feedback forms submitted
Analyse the most useful element data: calculate what percentage of briefs had verbatim language selected as the most useful element, across all stewards and both conditions. Compare this against the success criterion (at least 7 of 10 briefs)Project leadMost useful element distribution: verbatim language vs trajectory data vs model score vs otherFeedback forms submitted
Calculate the comparison condition effect: for each brief where the steward reviewed both formats, note whether the decision changed. Record how many decisions changed, in which direction, and whether the steward's confidence increased or decreased when switching to score-first. This is the primary test of the verbatim-first architectural commitmentProject leadFormat comparison table: brief-by-brief decision and confidence delta between verbatim-first and score-first conditionsBoth format conditions completed per steward
Calculate inter-steward agreement: for each brief, note whether different stewards made the same decision. High agreement on concluded-departure briefs and active-complaint briefs (the clearly differentiated cases) is evidence that the interface communicates the signal clearly. Low agreement on borderline cases is expected and acceptableProject leadAgreement matrix: per-brief inter-steward agreement rateAll sessions complete
Assemble a metrics dashboard: a single summary document presenting all quantitative outputs in one place, ready for the evidence report in 4.3Project leadOne-page metrics summary with all key quantitative findingsAll calculations above complete

4.2 Gathering Post-Launch Feedback

Qualitative analysis of the debrief notes and free-text feedback. The most important data from this prototype may be what the stewards said, not what they scored.

ActivityOwnerProducesDependencies
Aggregate the quantitative data and organise the free-text observations by theme: what was useful, what was confusing, what was missing, what surprised the stewardProject leadAggregated feedback report: quantitative summary plus thematic analysis of qualitative observationsAll review sessions and debriefs completed
Identify the steward's experience of the prototype at a personal level: did they find it engaging or tedious? Did the simulated data feel realistic enough to evaluate the interface honestly? Would they use this tool if the data were real? These questions matter because the prototype's value proposition depends on the steward wanting to use it, not just being able toProject lead (via debrief notes and follow-up if needed)Steward experience summary capturing adoption-relevant qualitative dataDebrief notes from Phase 3
Capture the surveillance reaction: does the steward have concerns about the ethics of monitoring community language that subscribers wrote to each other, not to the brand? If so, record the nature and intensity of the concern. This is critical input for the Cultural Enablement planProject leadEthics concern log per steward, feeding into the Cultural Enablement planDebrief notes and resistance signal log from Phase 3

4.3 Problem Identification and Opportunity Spotting

Proactively analysing the collected data and feedback to identify both issues and potential areas for growth.

ActivityOwnerProducesDependencies
Synthesise the metrics and feedback to answer the three learning questions from Phase 1 (1.4): (1) Does the verbatim-first hierarchy change decisions? (2) Can stewards decide confidently? (3) Is verbatim signal the most useful element? Present findings as evidence statements, not opinionsProject leadEvidence report answering the three learning questions with dataMetrics dashboard and aggregated feedback report from 4.1 and 4.2
Conduct root cause analysis on any problems identified: if steward confidence was low on specific briefs, why? If inter-steward agreement was poor, was it the signal, the interface, or the brief construction? If the comparison condition showed no difference, what does that imply about the information hierarchy hypothesis?Project leadRoot cause analysis per identified problem, distinguishing between prototype issues (fixable) and conceptual issues (requiring rethink)Evidence report (above)
Spot opportunities: did the stewards suggest enhancements, raise use cases the project had not considered, or identify elements of the brief that should be added or removed? Did the exercise reveal anything about how stewards currently make relationship decisions that the project should incorporate?Project leadOpportunity log capturing steward-originated suggestions and unexpected findingsDebrief notes and free-text feedback

4.4 Continuous Improvement and Iteration

Based on the evidence gathered, the project determines what happens next and returns to the start of the double diamond for the next iteration. This is where the Design Thinking loop closes and reopens.

ActivityOwnerProducesDependencies
Make the iteration decision based on the evidence. Three possible paths: (1) Success — the interface works and stewards value verbatim signal; proceed to Approach A (Retrospective Validation) to test whether the signal exists in real data. (2) Partial success — stewards value the concept but the interface needs redesign; iterate on the prototype before proceeding. (3) Failure — stewards do not find the approach useful; revisit the foundational assumptions before investing furtherProject lead (decision owner)Documented iteration decision with evidence-based rationale, explicitly stating which path and whyEvidence report and root cause analysis from 4.3
Produce the iteration brief for the next cycle: what was learned, what assumption is tested next, what changes carry forward, what the entry conditions are for the next iteration (including, if Approach A is next, the DPIA as a hard gate)Project leadIteration brief — a one-page document that serves as the starting context for the next double diamond cycleIteration decision (above); DPIA dependency note from Phase 2 (2.3)
Update the financial case: based on the evidence, refine (or maintain) the projected value. If the steward review was successful, the financial case for the next iteration is: the cost of running Approach A (retrospective analysis of historical data) against the potential to preserve approximately £478,000 in annual revenue through a 5-percentage-point churn reduction in the high-LTV cohortProject leadUpdated financial case note for the iteration briefEvidence report; financial context from project instructions
The loop closes here and reopens. Whatever the evidence from Approach C reveals, the next action is clear: document the findings, update the financial case, write the iteration brief, and return to the Discover phase of the double diamond with a better-defined question. The MVP is not the destination. It is the first step in the evidence chain.
Prompt 4 · Deliver Phase — Testing · Three Scenarios · Different Failure Modes · Different Decisions

Three Simulated Variations

Three genuinely different outcomes — not optimistic, moderate, and pessimistic versions of the same trajectory. Each variation traces a structurally different way the plan could play out, testing a different failure or learning mode. Scenario selection was delegated to the LLM; Tim's HITL role was to assess whether the chosen scenarios represented genuinely different failure modes rather than variations of the same scenario.

Financial context: Annual replacement burden for churned high-LTV Angels is approximately £2.3m. Revenue preserved per 5-percentage-point churn reduction is approximately £478,000. These figures are derived from group-level HY26 interim results (UK segment estimated at 43% of group revenue).
VariationWhat Goes RightWhat Goes WrongCore Learning
Variation 1The HITL design and interfaceProxy language fidelity — the known limitation becomes the blocking findingInterface validated; data source insufficient; DPIA becomes the critical path
Variation 2Prototype build and session executionThe foundational design principle — stewards prefer the format the project was designed to replaceVerbatim-first hierarchy challenged; redesign loop before proceeding; commercial differentiation at risk
Variation 3The concept broadly works as designedNothing breaks — the scope shifts upwardStewards identify a more valuable, earlier signal; scope expands; ethical stakes rise

Variation Pathways: From Common Trunk to Diverging Outcomes

PHASES 1–2: COMMON TRUNK Phase 1 Prepare Phase 2 Finalise Phase 3 Review Sessions ↕ DIVERGENCE V1: Source Rejected Proxy language fails → Approach A DPIA as hard gate 2–4 month delay V2: Score Wins Verbatim-first fails → Redesign Loop Then DPIA + Approach A Commercial risk if score-first prevails V3: Earlier Signal Scope expands up → Approach A Expanded scope Higher value target Higher ethical stakes All paths → next double diamond

Phases 1–2 are the common trunk: identical build and preparation across all three variations. Phase 3 (the steward review sessions) is where the paths diverge. All three paths return to the start of the double diamond — but with different questions, different entry conditions, and different timelines.

Variation 1: The Interface Works, But the Source Doesn't Land

What goes right: the HITL design. What goes wrong: the proxy language fidelity. The known limitation becomes the blocking finding.

Phases 1–2: Build and Preparation

Phases 1 and 2 proceed as planned. The ten subscriber briefs are constructed from public proxy language (Reddit, Trustpilot, app store reviews of DTC subscription services) mapped onto fictional subscriber profiles with realistic CRM data. The brief archetypes cover the full signal spectrum: three concluded departure, two active complaint, three satisfied engagement, and two borderline cases. The interactive HTML prototype is built in Week 1. Both verbatim-first and score-first variants are tested and polished. The steward briefing document is prepared. The compliance position is confirmed: no real subscriber data, no DPIA dependency. Two relationship stewards are recruited and scheduled for individual one-hour sessions in Week 2. Nothing in Phases 1 or 2 signals a problem. The build is clean, the documentation is complete, and the stewards are briefed without issue.

Phase 3: The Review Sessions

The first steward completes the verbatim-first review of all ten briefs. Her quantitative feedback is strong: confidence scores average 4.2 out of 5 across all ten briefs. She identifies verbatim community language as the most useful element in 8 of 10 briefs. Her decisions on the concluded-departure briefs are correct and confident. Inter-format comparison shows she changes her decision on 2 of 10 briefs when switching to score-first, both times toward less nuanced choices. On the quantitative metrics, this is a pass.

However, in the debrief, she raises a concern the project had anticipated as a known limitation but not as a session-stopping objection. She observes that the forum language in the briefs does not sound like Naked Wines Angels. The vocabulary, the emotional register, and the relational context are wrong. Reddit posts about generic subscription boxes use different idioms, different levels of emotional investment, and a different assumed audience than the Naked Wines community forum. She puts it directly: the concluded-departure briefs are convincing as a concept, but she cannot evaluate whether this interface would work on real data because the proxy language does not feel real.

The second steward independently raises the same concern, though with a different emphasis. He finds the active complaint briefs particularly unconvincing — the proxy sources express frustration differently from how Angels in the Naked Wines community express frustration. His confidence scores are lower (average 3.6) and he attributes the lower confidence specifically to doubting the language, not the interface design.

The trigger point: Both stewards independently identify proxy language fidelity as the primary limitation. Steward 1 passes on metrics but flags the limitation qualitatively. Steward 2's lower confidence scores are directly attributed to language doubt, not interface doubt. The action plan's known limitation — 'public proxy data may not fully represent the linguistic register of Naked Wines' own community forum' — has materialised as the blocking finding.

Phase 4: What the Team Learns

The evidence report answers the three learning questions with mixed results. Does the verbatim-first hierarchy change decisions? Yes — the comparison condition shows a measurable difference. Can stewards decide confidently? Conditionally — Steward 1 yes, Steward 2 only when she brackets the language concern. Is verbatim signal the most useful element? Yes, when it is believed to be authentic; the value proposition collapses when the language feels constructed. The root cause analysis distinguishes between a prototype issue and a conceptual issue. The interface design is not the problem. The information hierarchy works. The brief format supports confident decision-making. The problem is that Approach C, by design, cannot answer the question the stewards need answered: does this language actually exist in the Naked Wines community?

Decision for the Next Iteration

The project team confirms the planned iteration pathway: proceed to Approach A (Retrospective Validation). The interface has been validated to the degree that simulated data allows. The next iteration must test whether the signal exists in real historical data from churned Angels' actual forum posting histories. This triggers the DPIA dependency. The iteration brief for Cycle 2 carries forward the validated interface design (verbatim-first hierarchy confirmed as superior to score-first) and adds a new entry condition: real community language from the actual Naked Wines forum must be used, which requires data access approval and DPIA sequencing.

Financial Implications

Direct cost of this iteration: nil incremental (steward time only). The financial case for the next iteration is unchanged: the potential to preserve approximately £478,000 in annual revenue through a 5-percentage-point churn reduction remains the target. The cost of the next iteration increases because Approach A requires DPIA completion (estimated 8–16 weeks of DPO and legal resource) and data engineering to link forum posting histories to CRM records for churned Angels. The project team estimates this at 2–3 months of elapsed time and a modest internal resource cost, weighed against the £2.3m annual replacement burden. The ratio is favourable, but the timeline extends.

What Feeds Back into the Double Diamond
ValidatedVerbatim-first information hierarchy produces different (and, by steward judgement, better) decisions than score-first. Interface design and steward workflow are fit for purpose.
InvalidatedPublic proxy language is not a sufficient stand-in for real community data, even for interface testing. This limitation was anticipated but is now confirmed as a hard constraint.
New entry conditionDPIA sequencing becomes the critical path item. The project cannot progress without access to real subscriber data.
Reframed questionThe next iteration no longer asks 'Does the interface work?' It asks 'Does the signal exist in the actual data, and does it look like what we simulated?'

Variation 2: The Score Wins

What goes right: the prototype build and session execution. What goes wrong: the foundational design principle. Stewards prefer the format the project was designed to replace.

Phases 1–2: Build and Preparation

Phases 1 and 2 proceed identically to Variation 1. The prototype is built, polished, documented, and ready. The same ten briefs, the same two stewards, the same protocol. Nothing in the preparation phases distinguishes this variation from the first.

Phase 3: The Review Sessions

The first steward completes the verbatim-first review. Her decision confidence is adequate but not strong: average 3.4 out of 5 across all ten briefs. She correctly identifies the three concluded-departure briefs but notes that reading the verbatim language took longer than she expected and that she found herself scrolling past the forum posts to reach the trajectory data and the scores. In her feedback, she selects 'trajectory data' as the most useful element in 5 of 10 briefs and 'verbatim language' in only 3.

When she switches to the score-first format for the comparison condition, her confidence rises to 4.1 out of 5 and her decision speed improves noticeably. She makes the same decisions on 8 of 10 briefs, but with greater confidence and in roughly half the time. In the debrief, she explicitly states that she finds the score-first format more useful: 'I know what I'm looking at immediately. The verbatim stuff is interesting, but it's a lot to read before I can make a decision.'

The second steward produces a similar pattern. His verbatim-first confidence averages 3.6, rising to 4.3 under score-first. He changes his decision on 3 of 10 briefs when switching formats — in two cases, the score-first format makes him more confident about a concluded-departure brief; in one case, it makes him less nuanced about a borderline case. His overall preference is clearly for the score-first format.

The trigger point: The comparison condition produces a clear result in the wrong direction. Both stewards make faster, more confident decisions under score-first conditions. The verbatim-first hierarchy — the project's architectural centrepiece — does not produce the expected advantage in the steward's decision-making process.

Phase 4: What the Team Learns

The evidence report answers the three learning questions unfavourably. Does the verbatim-first hierarchy change decisions? Yes — but in the opposite direction from the hypothesis: stewards perform better under score-first conditions. Can stewards decide confidently? Yes, under score-first conditions. Is verbatim signal the most useful element? No — trajectory data and model scores rank above verbatim language for both stewards.

The root cause analysis must distinguish between two possible explanations. First, the interface presentation of verbatim language may be the problem — the language is too long, too unstructured, or presented in a format that creates cognitive load rather than clarity. Second, the verbatim-first principle itself may be wrong — stewards genuinely process information better when they have a quantitative anchor first, and the qualitative language functions as contextual support rather than primary signal. These two explanations require different responses.

Decision for the Next Iteration

The project team chooses not to abandon the verbatim-first principle on the basis of a two-person test with simulated data. The steward's preference for scores may itself be evidence of the managed-metric problem: if the operational culture is conditioned to trust numbers over language, then the steward's preference for score-first is predictable but not necessarily correct. However, the team also acknowledges that if the redesigned verbatim-first format still loses to score-first in the next iteration, the hypothesis must be revised. The redesign targets three specific changes: shorter, curated verbatim extracts (two to three sentences, not full posts); a one-line plain-language signal summary above the verbatim text; and a clearer visual hierarchy that reduces the cognitive load of unstructured text. The redesigned prototype is tested in a second iteration before proceeding to Approach A.

Financial Implications

The immediate financial impact is a delay — the planned progression from Approach C to Approach A is paused while the interface is redesigned and retested. This adds approximately 2–4 weeks to the overall timeline. Direct cost remains minimal. There is a second-order financial implication: if the eventual finding is that the verbatim-first hypothesis is wrong and stewards genuinely perform better with a score-first format, the project's differentiation from existing customer health scoring platforms (Gainsight, ChurnZero) weakens significantly. The commercial case for the Quiet Signal System rests partly on it doing something structurally different from standard churn prediction. If the HITL design converges toward a conventional dashboard with a score at the top, the question becomes whether the underlying signal (community language analysis) is different enough to justify the investment, even if the interface is conventional.

What Feeds Back into the Double Diamond
ValidatedThe comparison condition works as a testing instrument. Stewards can evaluate both formats and articulate clear preferences. The session protocol and feedback capture method are robust.
ChallengedThe verbatim-first information hierarchy — the project's core HITL design principle. Not invalidated (the test conditions are too limited), but the evidence from this iteration does not support it.
New design questionIs the problem the format of the verbatim presentation (fixable) or the principle of verbatim-first itself (fundamental)? The next iteration must disambiguate.
Commercial riskIf the verbatim-first principle is ultimately wrong, the Quiet Signal System converges toward conventional health scoring. The signal source (community language) may still differentiate, but the architectural commitment to verbatim-first is what makes the system genuinely novel.

Variation 3: The Earlier Signal

What goes right: the concept broadly works. What goes wrong: nothing breaks — the scope shifts. Stewards want the system to detect the stage before conclusion.

Phases 1–2: Build and Preparation

Phases 1 and 2 proceed identically to the prior variations. The prototype is built, polished, documented, and ready. The same ten briefs, the same two stewards, the same protocol. The preparation is clean and complete.

Phase 3: The Review Sessions

Both stewards perform well on the concluded-departure briefs. Steward 1's confidence averages 4.4 across the concluded-departure and active-complaint briefs — high and consistent. She identifies verbatim language as the most useful element in 7 of 10 briefs. The comparison condition shows she changes her decision on 2 of 10 briefs when switching to score-first, both times toward less nuanced choices. On the defined success criteria, this is a clear pass.

However, the most interesting data comes from the borderline briefs — the two cases where the subscriber is drifting but has not yet concluded. Both stewards spend significantly more time on these briefs, re-reading the verbatim language multiple times and expressing uncertainty. Steward 1 says: 'This one is the hardest. By the time the language says they're finished, it's already too late. I want to see the ones who are starting to drift.' Steward 2 makes the same observation independently, framing it as a workflow question: 'If this is what they sound like when they're leaving, what do they sound like six months before that? That's when I could actually do something.'

Both stewards explicitly request an earlier intervention point. Neither is dissatisfied with the current prototype; both are articulating a more ambitious use case that the current MVP does not address. The steward feedback is not a critique of what the system does — it is a specification of what they want it to do instead.

The trigger point: The prototype succeeds on its defined scope but the stewards' engagement with the borderline briefs reveals a more valuable target. The feedback is not 'this doesn't work' but 'this works — now do it earlier.' The scope shifts upward.

Phase 4: What the Team Learns

The evidence report answers the three learning questions positively. Does the verbatim-first hierarchy change decisions? Yes, in the expected direction. Can stewards decide confidently? Yes, on clearly differentiated briefs. Is verbatim signal the most useful element? Yes, with high consistency. The prototype has succeeded. The more significant finding is the steward's articulation of a more ambitious requirement: earlier detection of the relational shift that precedes conclusion, not just detection of conclusion itself.

The root cause analysis of the borderline brief difficulty reveals that the Community Vocabulary Shift component — designed to detect the 'we' to 'they' shift — is targeting the wrong point in the departure trajectory. By the time the vocabulary shift is fully expressed in the concluded-departure pattern, the subscriber has already decided. The stewards are identifying the stage where the shift is beginning but has not yet settled — a harder NLP problem but a higher-value intervention window.

Decision for the Next Iteration

The project team accepts the scope expansion. The prototype validated that the interface works and the steward is willing to use verbatim community language as a decision input. The next iteration expands the signal detection target to include the pre-conclusion drift stage — a subscriber whose language is changing but who has not yet concluded. This requires the Community Vocabulary Shift component to detect an emerging pattern rather than a settled one, which is a harder NLP problem with a higher false-positive rate. The next iteration (Approach A, with real historical data) must therefore test not only whether the conclusion signal exists in the data, but whether the pre-conclusion shift is detectable and distinguishable from normal linguistic variation. A fourth decision option ('Early Conversation') is added to the steward's workflow, requiring a redesign of the decision interface.

Financial Implications

The financial implications of this variation are the most favourable of the three. The steward feedback suggests that catching subscribers at the earlier stage of relational shift, rather than at conclusion, would increase the effective intervention window and therefore the probability of retention. If the system can detect pre-conclusion shift and enable a steward conversation that prevents the shift from reaching conclusion, the per-subscriber save rate improves. The current financial model assumes a 5-percentage-point reduction in churn, preserving approximately £478,000 in annual revenue. This assumes intervention at the conclusion stage, where the steward is essentially attempting to reverse a decision the subscriber has already made. If intervention occurs at the pre-conclusion stage, the save rate could plausibly be higher because the subscriber has not yet decided. The financial model does not need to change at this stage, but the next iteration should include a mechanism to estimate the differential save rate between early-stage and late-stage intervention. The offsetting risk is scope expansion: detecting early vocabulary shift is a harder NLP problem, with a higher false-positive rate and greater steward cognitive load.

What Feeds Back into the Double Diamond
ValidatedVerbatim-first interface design, steward workflow, comparison condition protocol, and the fundamental proposition that community language contains a signal invisible to NPS.
ReframedThe primary target signal shifts from concluded departure to pre-conclusion vocabulary shift. The system's value proposition moves from 'catch them leaving' to 'catch them thinking about leaving.'
New design requirementA fourth steward decision option ('Early Conversation') and a redesigned brief format that presents shift-in-progress language differently from concluded-departure language.
New technical questionCan the Community Vocabulary Shift component detect the earlier signal reliably, and what is the false-positive rate? This becomes the primary question for the Approach A retrospective.
Ethical implicationEarlier detection of relational shift intensifies the surveillance concern. The steward is now reading language that expresses emerging doubt, not settled conclusion — a more intimate form of monitoring that raises the legitimacy standard.

Comparative Analysis

DimensionV1: Source RejectedV2: Score WinsV3: Earlier Signal
What breaksProxy language fidelity — the known limitation materialisesVerbatim-first hierarchy — the core design principleNothing breaks — scope shifts upward
Interface validated?YesChallenged — needs redesignYes — strongly
Next iterationApproach A with DPIA as hard gateRedesign loop (2–4 weeks), then DPIA + Approach AApproach A with expanded scope (earlier signal detection)
Financial caseUnchanged — £478k per 5pp still the targetAt risk if score-first prevails — differentiation weakensPotentially stronger — earlier intervention improves save rate
Biggest risk forwardSignal doesn't exist in real dataVerbatim-first hypothesis is fundamentally wrongEarly signal too ambiguous to detect reliably at scale
Timeline impact2–3 months (DPIA sequencing)2–4 weeks (redesign) then 2–3 months (DPIA)2–3 months (DPIA + expanded data requirement)
The common thread across all three variations: Approach C performs its intended function — it produces a testable prototype within the sprint window at zero incremental cost, and it generates actionable evidence regardless of outcome. The DPIA dependency, deferred by Approach C, becomes the critical path item for the next iteration in all three cases. The evidence from Approach C changes the shape of Approach A, but does not change the fact that Approach A is where the foundational question — does the signal exist in real data? — must be answered.
Prompt 5 · Stretch Goal · Change Management Framework Applied to Cultural Enablement

Cultural Enablement — The Quiet Signal System

The Quiet Signal System does not ask Naked Wines UK to adopt a new technology. It asks the organisation to accept that its most trusted measure of customer health is actively obscuring the departure of its highest-value subscribers. Every cultural change that follows descends from this single, uncomfortable premise. This document identifies what the organisation must believe, value, and tolerate differently — and then applies the course material change management framework to the cultural changes identified.

The premise no framing can soften: Adopting the Quiet Signal System requires the organisation to accept that its most trusted metric is not just incomplete but misleading, and to shift operational authority toward a new, less comfortable source of truth. This is not a technology change with a cultural dimension. It is a cultural change that happens to require technology to operationalise.

Part 1: Five Cultural Shifts Required

1. The Metric-Trust Shift: From NPS as Truth to NPS as Artefact

NPS is embedded in how the organisation talks to itself about whether it is doing well. The HY26 interim results cite NPS 76 as evidence of customer health. A 24% annual churn rate in the highest-LTV cohort coexists with this score without apparent contradiction in the current reporting framework. The cultural change required is not supplementing NPS with an additional metric — it is accepting that NPS is a managed artefact that functions as an institutional comfort mechanism rather than a diagnostic instrument.

The survey response rate is approximately 4.5% in most DTC contexts. NPS captures the sentiment of the 4.5% who respond positively in a managed context — not the 95.5% who stay silent, and certainly not the Angels who are departing quietly. The metric conflates response willingness with satisfaction. A subscriber who has already decided to leave does not complete the survey. NPS is structurally blind to the departure it is designed to detect.

This shift is not asking the organisation to abandon NPS — it is asking the organisation to understand what NPS actually measures, report it accordingly, and stop using it as the primary evidence of subscriber health when evidence from unmanaged peer-to-peer data says something materially different.

What the organisation must come to believe: that a score of 76 and a 24% annual churn rate in the highest-LTV cohort are not contradictory data points to be rationalised, but evidence that the primary measurement instrument is not measuring what it claims to measure. The divergence is the finding.

2. From Retention to Stewardship: Redefining What the Role Does

A retention agent works from a triggered event and applies a scripted response. The trigger is a cancellation attempt or a churn risk flag; the response is a save offer calibrated to the subscriber's value band. Success is measured by save rate: how many subscribers who were going to leave were persuaded not to. This model is optimised for the subscriber who is ambivalent — willing to stay if offered the right incentive.

A relationship steward works from an ambient signal and exercises qualitative judgement. The input is verbatim community language that the subscriber wrote to other subscribers, not to the brand. The decision is not 'what save offer should I make?' but 'is this subscriber experiencing a recoverable problem or an irreversible conclusion — and if the latter, how do I ensure the departure is dignified?' A steward who recommends against intervention — because the departure is concluded and the subscriber deserves to be respected rather than intercepted — has succeeded, not failed.

This is a genuinely different role. It is not the retention role with better tools. It requires different skills (interpretive literacy, qualitative judgement, comfort with ambiguity), different authority (the steward's decision must be respected even when it means accepting a departure), and different performance metrics (intervention quality and subscriber experience of departure, not save rate).

What the organisation must come to value: that a steward who decides not to intervene is producing a better outcome than a retention agent who triggers a save offer on every flagged account. Restraint, when it reflects a correct reading of the subscriber's relational state, is the skill.

3. Departure as Relationship State: The End of Terminal Cancellation

Cancellation is currently a terminal event. It closes the CRM record, removes the subscriber from the Angel base, and triggers the acquisition cost to replace the revenue. There is no designed experience of departure — only its prevention and, failing that, its administrative processing. The subscriber who leaves is no longer a customer; she is a churn statistic and an acquisition target.

The Alumni Network concept treats departure as a transition to be managed with the same care as onboarding. An Angel who leaves after seven years of pre-funded investment and community contribution is not a failed retention — she is a relationship in a different state. She knows the makers she has supported. She has co-owned the brand story. Her departure, if respected, preserves the relational equity that makes return possible. Her departure, if handled as a terminal event with a save script, destroys it.

The specific cultural change required is accepting that a subscriber who is well-served in her departure — who feels that the organisation saw her, valued her tenure, and respected her decision — is more likely to return and more likely to recommend the brand than one who was pressured to stay and eventually left anyway. The short-term revenue loss of letting a concluded subscriber leave well is real; the long-term relationship value is the argument for absorbing it.

What the organisation must tolerate: the short-term revenue loss of letting a concluded subscriber leave well, on the evidence that the long-term relationship value — return probability, advocacy, and Alumni community contribution — exceeds the value of a coerced retention.

4. Tolerance for Uncomfortable Signal: Qualitative Data as Operational Input

The primary input to the steward's decision is a block of verbatim text — the subscriber's actual words in a forum post written to other subscribers, not to the brand. This language is ambiguous, context-dependent, and resistant to aggregation. It cannot be reduced to a number without destroying the signal it carries. It surfaces things the organisation may not want to hear: frustration with decisions the leadership made, disillusionment with changes to the community, a sense that the brand has drifted from its original commitment to independent makers.

The managed metric exists partly because it filters this discomfort. NPS of 76 is easy to work with. A community post that reads 'I used to recommend Naked Wines to everyone I knew; now I'm embarrassed to' is not. The cultural change required is not just tolerating this input — it is building the operational infrastructure to act on it: the steward review process, the verbatim-first interface design, the qualitative feedback loop from steward decisions back into the signal system.

Organisations that are accustomed to threshold-based decision rules (flag at NPS below X; escalate at churn probability above Y) will find the ambiguity of qualitative decision-making genuinely uncomfortable. This is not a communication problem or a training problem — it is a cultural problem, and it is the deepest institutional change the Quiet Signal System requires.

What the organisation must tolerate: operational decision-making driven by qualitative, ambiguous, emotionally textured data that cannot be reduced to a traffic-light dashboard without destroying the signal it carries. The steward reads language and exercises judgement. There is no decision tree.

5. The Surveillance-to-Legitimacy Boundary: Earning the Right to Listen

The Quiet Signal System monitors community language that subscribers wrote in the context of speaking to each other, not to the brand. The obstacle analysis in the Research tab names this directly: there is a version of this system that subscribers would experience as surveillance if they learned about it. The legitimacy of the system depends on whether Naked Wines can honestly say to an Angel: 'We read what you wrote in the community forum and used it to understand how you were feeling about us.' If that sentence sounds creepy rather than caring, the system has a legitimacy problem that no DPIA can resolve.

This is a cultural challenge, not a compliance challenge. The DPIA addresses the legal basis for processing. Legitimacy addresses whether the organisation has the relational standing to use the data in the way the system proposes. Earning that standing requires Naked Wines to build reciprocal transparency: if the company reads what Angels write to each other, the company must be equally willing to say what it knows and what it has decided to do about it. The Honest Annual Report and the Tenure Council exist in the broader concept specifically to create the institutional architecture that makes this reciprocity credible.

The longer-term cultural shift is from passive data extraction to active relational contract. The Disagreement Forum takes this further: creating a space where Angels can disagree with the company publicly, and the company commits to responding substantively. These are not features. They are cultural infrastructure that earns the legitimacy the Quiet Signal System requires.

What the organisation must build: institutional practices that make the use of community signal a reciprocal relationship rather than a unilateral extraction — so that when the system's existence becomes visible to Angels (as it eventually will), the response is 'of course they listen' rather than 'they were watching us.'

Part 2: Change Management Action Plan

The six framework elements applied to the five cultural changes identified above. Each is applied to the specific context of the Quiet Signal System at Naked Wines UK, not reproduced generically. Sequenced across three horizons: MVP phase (concrete), post-MVP validation (directional), and institutional maturity (conditional on MVP evidence).

2.1 Stakeholder Analysis

The cultural changes required by this project affect different stakeholders in different ways. The following analysis identifies each stakeholder group, their relationship to the cultural shifts, and the nature of their interest — supportive, threatened, or conditional.

Stakeholder GroupPrimary Cultural Shift AffectedNature of InterestEngagement Priority
Board / CFOMetric-Trust ShiftConditional. NPS is reported to investors. Accepting that it obscures churn requires an alternative narrative for the market. The financial case (£478k revenue preserved per 5pp churn reduction) provides the bridge — but the board must first accept that the anomaly (NPS 76 + 24% churn in highest-LTV cohort) is an anomaly worth investigating.Critical — gatekeeper for the entire programme
Retention Team LeadRetention to StewardshipThreatened. The shift from save-rate optimisation to relational stewardship redefines success for the function they manage. A steward who recommends against intervention is succeeding — but the team lead's current KPIs do not reflect this. Role and identity are both at stake.High — operational owner of the steward role
Relationship StewardsTolerance for Uncomfortable SignalDirectly affected. They are the users of the system and the people who must develop qualitative interpretive skill. MVP steward feedback from the prototype sessions is the primary evidence for how this group responds to the new role requirement.High — system users and primary feedback source
Data and EngineeringMetric-Trust ShiftConditional. They built the systems that produce and report NPS. The project implicitly says their primary output is misleading. They also own the data infrastructure required to join forum data with CRM records for Approach A.Medium — enablers, not decision-makers at MVP stage
Community TeamSurveillance-to-Legitimacy BoundarySupportive but cautious. They manage the forum and understand community dynamics. They are likely to recognise the signal's validity but may have the strongest reservations about whether AI monitoring of peer-to-peer language crosses a relational boundary. Custodians of the data source and the community relationship.High — must be co-designers of the legitimacy framework
Angel Subscribers (indirect)Surveillance-to-Legitimacy; Departure as Relationship StateNot engaged directly at MVP stage. Their interests are represented through the HITL architecture (no AI output reaches them without steward review) and the legitimacy framework. They become direct stakeholders at the institutional maturity phase through the Tenure Council.Deferred — addressed through system design, not direct engagement, at MVP stage

2.2 Impact Assessment

The cultural changes create operational, identity, and institutional impacts across the organisation. The following table maps each shift to its impact dimensions.

Cultural ShiftOperational ImpactIdentity ImpactInstitutional Impact
Metric-Trust ShiftNPS repositioned as lagging verification rather than leading indicator; Forum Divergence Score becomes the primary signal for high-LTV cohort health. Reporting cadence and dashboards require redesign.Teams that have built competence around NPS interpretation must accept that their headline metric is insufficient in a specific and material way. This is an identity challenge, not just a process change.Investor communications require a new narrative. The HY26 report cites NPS 76 as evidence of customer health; the next report must either maintain this framing or begin transitioning the investor narrative.
Retention to StewardshipKPIs shift from save rate to intervention quality. The measurement system for the retention function requires redesign before the new role can be fairly assessed.Retention agents face genuine role redefinition. The new role requires qualitative judgement and relational literacy — different skills, different aptitudes. Not all current retention staff will be suited to the stewardship model.Headcount may not change, but the hiring profile and training investment shift materially. The cost of a steward is higher than the cost of a retention agent, offset by the value of better decisions.
Departure as Relationship StateCancellation workflows add a steward review step and an Alumni Network pathway. The subscriber's departure experience is designed rather than defaulted.The organisation stops treating cancellation as failure. This is emotionally significant for teams whose performance is measured by churn prevention.Short-term churn numbers may temporarily increase as coerced retentions are released. The financial case depends on demonstrating that Alumni return rates and reduced replacement costs exceed the short-term revenue loss.
Tolerance for Uncomfortable SignalDecision-making workflows incorporate verbatim community language as a primary input. Dashboards surface qualitative data alongside quantitative scores.Teams accustomed to threshold-based decision rules must develop comfort with ambiguity. The steward reads language and exercises judgement, not a decision tree.The organisation's decision-making culture shifts from certainty-seeking to evidence-tolerant. This is the deepest institutional change and the slowest to embed.
Surveillance-to-LegitimacyA transparency and reciprocity framework must be developed alongside the signal system. Forum terms of service require updating. Disclosure practices change.The organisation must see itself as accountable to the community it monitors, not merely as a processor of community data. This is a shift from data-as-resource to data-as-relationship.Legal, compliance, and community functions must coordinate on a legitimacy framework that exceeds DPIA requirements. The standard is not 'lawful' but 'honest.'

2.3 Communication Plan for Change

The communication challenge at the core of this project is that the founding message — 'our most trusted metric is not measuring what we think it is measuring' — is inherently threatening. If communicated as an accusation ('NPS has been misleading us'), it produces defensiveness. If communicated as a discovery ('we have found a signal that NPS cannot see'), it creates curiosity. The framing is everything.

Phase 1: MVP Phase — Evidence Before Argument

During the MVP phase, the communication strategy is deliberately narrow. The audience is the steward team and the retention team lead, not the broader organisation. The message is not 'NPS is wrong'; it is 'we are testing a new interface to see if there is useful information in community language that our current tools cannot surface.' This framing is honest — it accurately describes Approach C — and it does not require the steward to accept any of the five cultural shifts before the prototype has produced evidence.

The communication vehicle is the steward briefing document (produced in Phase 2 of the action plan), the session facilitation, and the debrief. Nothing is communicated to the broader organisation during the MVP phase. The evidence is gathered first; the argument is made from the evidence, not in advance of it.

Phase 2: Post-MVP — Data-Led Narrative

If the MVP validates the interface and the steward finds verbatim signal more useful than the model score, the communication shifts. The primary audience is the retention team lead and the data team lead. The message shifts to: 'We ran a test. Here is what the stewards found. The signal in community language is different from what NPS reports. Here is what that might mean for how we understand subscriber health.' The data from the MVP (steward confidence scores, format comparison results, debrief observations) is the argument. The financial context (£2.3m replacement burden, £478k per 5pp reduction) provides the business case.

The board conversation — the Metric-Trust Shift at its most challenging — does not happen until after the Approach A retrospective provides evidence that the signal is real in actual data. Approach C produces usability evidence; Approach A must produce correlational evidence before the investor narrative can be revisited.

Phase 3: Institutional Maturity — Reciprocal Transparency

At the institutional maturity phase, the communication extends to Angel subscribers themselves. The Honest Annual Report is the primary vehicle: a document sent exclusively to long-tenure subscribers that acknowledges what went wrong, what was learned, and what remains uncertain. This is the communication act that earns the right to use community language as a signal — the organisation has been transparent about what it hears and what it does with it.

2.4 Resistance Management

Resistance to this project will be specific, predictable, and legitimate in most cases. The management approach for each pattern is designed to take the resistance seriously rather than dismiss it as a misunderstanding.

Resistance PatternHow It ManifestsManagement Approach
NPS Loyalty'Our NPS is 76 — the forum is a vocal minority.' This is the most common and superficially plausible objection. It conflates a high aggregate score with evidence of health in the specific cohort at risk.Present the NPS-churn divergence as a data anomaly warranting investigation, not an accusation. The question is not 'is NPS wrong?' but 'why does NPS 76 coexist with 24% annual churn in the highest-LTV cohort?' Framing it as a puzzle generates curiosity; framing it as a verdict generates defensiveness.
Score-First PreferenceStewards prefer the score-first format. This emerges in the MVP prototype comparison condition — Variation 2 of the simulated scenarios. If stewards find the verbatim-first format harder to use, the architectural commitment is challenged from the inside.Record as evidence about organisational readiness rather than evidence that the principle is wrong. The steward's preference for scores may itself reflect the managed-metric culture the system is designed to change. Redesign the verbatim presentation (curated extracts, plain-language signal summary) before abandoning the principle. If the redesigned format still loses, revise the hypothesis.
Surveillance Objection'You can't read community posts — that's not what they're for.' This objection may come from the community team, from legal counsel, or from senior leadership concerned about the reputational risk of AI-monitored peer-to-peer language.Take the objection seriously rather than reframing it as a misunderstanding. The legitimacy question is real. Use it as the entry point for the reciprocal transparency conversation: 'If we cannot honestly tell an Angel that we use their community language to understand the relationship, we should not use it. What would make it honest?' This converts resistance into design input for the legitimacy framework.
Role-Identity ThreatRetention agents whose competence is built on save-script execution may resist a role redefinition that devalues their existing skills. This resistance is legitimate — the stewardship role genuinely requires different capabilities, and not all current retention staff will be suited to it.Acknowledge the skill transition honestly. The stewardship role is not a promotion of the retention role — it is a different job. Offer development pathways for those who can transition and honest conversations for those who cannot. Do not disguise the change as an enhancement of the existing role, because that framing will be seen through and will generate distrust.
Signal AbsorptionThe organisation adopts the Forum Divergence Score as an additional KPI on the existing dashboard, NPS retains primacy, and the verbatim layer is stripped out. This is the most insidious resistance because it looks like adoption — the system is technically implemented but culturally neutralised.Build the architectural safeguard into the system design: the Concept Development document specifies that verbatim language appears at the top of every subscriber brief, before any score. This is not a presentation preference — it is the design decision that prevents the system from recreating the managed-metric problem it was built to solve. If the organisation removes the verbatim layer, the system has been co-opted. The safeguard is technical; the detection is human.

2.5 Training and Capability Building

The capability gap this project creates is not primarily technical. The dashboard is a relatively straightforward interface. The capability gap is interpretive: the steward must be able to read verbatim community language and make a qualitative judgement about the subscriber's relational state. This is a skill that most organisations do not systematically develop, because most organisations do not use unprocessed qualitative data as an operational input.

MVP Phase: Orientation, Not Training

At the MVP stage, the action plan (Phase 3.3) specifies lightweight orientation: walk the steward through the interface, explain what each element of the brief contains, demonstrate the decision workflow with a sample brief. The orientation is deliberately not instruction on how to interpret the signals. The MVP's value depends on capturing the steward's unguided response — if they are trained to interpret before they evaluate, the prototype cannot distinguish between the interface's value and the training's influence. The capability is measured, not built, at this stage.

Post-MVP: Community Signal Literacy

If the MVP validates the interface and the subsequent iteration validates the signal, the steward team requires structured development in what might be called community signal literacy: the ability to read peer-to-peer community language and distinguish between active complaint (which demands operational response), satisfied silence (which requires no intervention), concluded departure (which warrants a relational response), and — as Variation 3 of the simulated variations reveals — the shifting language that precedes conclusion (which represents the highest-value intervention window). This capability is built through supervised practice with real examples, not through classroom instruction. The training model is closer to clinical supervision than to a learning module: the steward reviews flagged subscriber briefs, makes a decision, and discusses the reasoning with a senior steward or the project lead. The quality of the decision improves through calibrated repetition, not through rule-following.

Institutional Maturity: Qualitative Decision-Making as Organisational Capability

At the institutional maturity phase, the capability extends beyond the steward team. The Tenure Council requires facilitation skills for genuine subscriber deliberation. The Disagreement Forum requires moderation skills that preserve dissent rather than resolving it. The Honest Annual Report requires editorial judgement about what to disclose and how to frame failure honestly. These are all expressions of the same underlying organisational capability: working with qualitative, ambiguous, emotionally textured information as a basis for operational decisions. This is a multi-year capability development programme, not a training course.

2.6 Leadership Sponsorship

The Concept Development document identifies senior leadership sponsorship as Condition 5 for the system's feasibility. The Tenure Council and Disagreement Forum are deliberately sequenced as Ambitious-horizon initiatives requiring a separate governance conversation. They are not on the critical path for the first sprint, but they are present in the concept as the longer-term institutional architecture the Quick Wins are designed to support. Their inclusion signals the cultural commitment required.

Leadership sponsorship for this project must be visible, specific, and sustained. It is not sufficient for a senior leader to approve the project. They must be willing to do three things that are each uncomfortable in their own right.

First: Publicly Acknowledge the Metric Gap

A senior leader must be willing to say, in a context that matters (a board meeting, an all-hands, an investor communication), that NPS is not measuring what the organisation believed it was measuring. This is not the same as saying NPS is wrong — it is saying that NPS is incomplete in a specific, material way that has financial consequences. The £2.3m annual replacement burden is the evidence. The 24% high-LTV churn rate alongside a 76 NPS is the anomaly. The leader's role is to name it — not as a failure, but as a discovery that creates an opportunity. Without this act of naming, every other cultural change is building on an unstated foundation, and the resistance to that foundation will eventually surface in ways that are harder to manage.

Second: Protect the Steward's Judgement Authority

The stewardship model only works if the steward's decision is respected. If a steward recommends against intervention and is overruled by a manager who wants to hit a save-rate target, the system has failed at the human level regardless of what the algorithm produces. Leadership must explicitly protect the steward's judgement authority against operational pressure to maximise short-term retention numbers. This means accepting that the save rate will decline in the short term as concluded departures are respected rather than intercepted. It means saying publicly — in a context that the steward team can hear — that a steward who recommends against intervention and proves to be correct has done excellent work. Without this explicit protection, the stewardship role will revert to retention behaviour under pressure.

Third: Commit to Reciprocal Transparency

The legitimacy of monitoring community language depends on the organisation being equally transparent about what it learns and what it does with that learning. A senior leader must be willing to sponsor the Honest Annual Report — a document that tells long-tenure Angels what went wrong, what the company learned, and what it remains uncertain about. This is the institutional act that earns the right to listen. It is uncomfortable because it requires the organisation to say, in writing, to its highest-value subscribers, that it has made mistakes and that it is still learning. But without it, the Quiet Signal System is technically functional and relationally illegitimate. The technical function extracts signal from a community that the organisation has not yet earned the right to monitor at this depth. The reciprocal transparency is what converts extraction into relationship.

Sequencing summary: The MVP phase tests the interface and the steward's response without requiring any of the five cultural shifts to be resolved. It produces the evidence — steward feedback, resistance signals, the surveillance reaction, the score-vs-verbatim preference — that determines which shifts are most urgent and which are most resistant. The post-MVP phase uses that evidence to build the data-led narrative that earns board sponsorship and begins the metric-trust conversation with institutional credibility. The institutional maturity phase builds the reciprocal transparency infrastructure (Honest Annual Report, Tenure Council, Disagreement Forum) that earns the legitimacy the full system requires. The most important observation is that the cultural changes cannot be mandated from above and then implemented. They must be earned through evidence, demonstrated through practice, and embedded through institutional design. The MVP is the first step in that evidence chain. Everything in this plan depends on what it reveals.