Skip to content

结构化输出与证据链设计

结构化输出与证据链设计

为什么需要 evidence-backed output

AI 用户调研平台最容易出现的问题是:LLM 能总结,但研究团队不信任总结。

因此 AgentSurvey 的核心输出不应只是:

{
"main_pain_point": "配置复杂"
}

而应包含:

  • 字段值;
  • 置信度;
  • 原始证据;
  • 对应 session;
  • 对应 turn;
  • 是否需要人工 review;
  • 抽取模型和 schema 版本。

标准输出对象

export type ExtractedField = {
id: string;
session_id: string;
study_id: string;
schema_field_id: string;
field_name: string;
value: unknown;
normalized_value?: unknown;
confidence: number;
evidence: EvidenceRef[];
reasoning_summary?: string;
status: "draft" | "accepted" | "rejected" | "needs_review";
extracted_by: "agent" | "post_processor" | "human";
model?: string;
schema_version_id: string;
created_at: string;
updated_at: string;
};
export type EvidenceRef = {
evidence_id: string;
session_id: string;
turn_id?: string;
tool_call_id?: string;
quote?: string;
start_char?: number;
end_char?: number;
evidence_type: "user_quote" | "tool_selection" | "system_observation" | "uploaded_artifact";
strength: "strong" | "medium" | "weak";
};

示例输出

{
"id": "field_001",
"session_id": "sess_123",
"study_id": "study_feature_validation",
"schema_field_id": "main_pain_point",
"field_name": "main_pain_point",
"value": "人工整理访谈结果成本高",
"normalized_value": "manual_analysis_cost",
"confidence": 0.88,
"evidence": [
{
"evidence_id": "ev_001",
"session_id": "sess_123",
"turn_id": "turn_18",
"quote": "我最麻烦的是访谈完之后要手动整理一堆开放回答,很难快速变成可比较的数据。",
"evidence_type": "user_quote",
"strength": "strong"
},
{
"evidence_id": "ev_002",
"session_id": "sess_123",
"tool_call_id": "tool_07",
"quote": "用户在痛点多选中选择了:人工整理成本高",
"evidence_type": "tool_selection",
"strength": "medium"
}
],
"reasoning_summary": "用户先在开放回答中描述了手动整理访谈结果的困难,随后在结构化选项中确认了该痛点。",
"status": "accepted",
"extracted_by": "agent",
"model": "config.default_extraction_model",
"schema_version_id": "schema_v3"
}

Output Schema Builder

研究人员应能配置字段:

export type OutputSchemaField = {
id: string;
name: string;
label: string;
description: string;
type: "string" | "number" | "integer" | "boolean" | "enum" | "array" | "object";
enum_options?: Array<{ id: string; label: string; description?: string }>;
required: boolean;
evidence_required: boolean;
min_evidence_count?: number;
confidence_threshold?: number;
review_policy?: "always" | "low_confidence" | "conflict" | "never";
extraction_hint?: string;
};

字段类型示例

痛点字段

{
"id": "main_pain_point",
"name": "main_pain_point",
"label": "主要痛点",
"description": "用户当前最强烈、最影响决策或效率的问题。",
"type": "string",
"required": true,
"evidence_required": true,
"min_evidence_count": 1,
"confidence_threshold": 0.75,
"review_policy": "low_confidence"
}

强度评分字段

{
"id": "pain_intensity",
"name": "pain_intensity",
"label": "痛点强度",
"description": "1-5 分,表示该痛点对用户的影响程度。",
"type": "integer",
"required": true,
"evidence_required": true,
"confidence_threshold": 0.7
}

枚举字段

{
"id": "segment_fit",
"name": "segment_fit",
"label": "目标用户匹配度",
"description": "判断该受访者是否符合目标用户画像。",
"type": "enum",
"enum_options": [
{"id": "high", "label": "高度匹配"},
{"id": "medium", "label": "部分匹配"},
{"id": "low", "label": "不匹配"}
],
"required": true,
"evidence_required": true
}

Extraction Timing

建议支持三种抽取时机:

1. Real-time extraction

每轮用户回答后尝试更新字段。

优点:

  • 可让 Agent 知道哪些字段已覆盖;
  • 可实时决定追问;
  • 可提前发现矛盾。

缺点:

  • 成本较高;
  • 容易产生临时错误。

2. Checkpoint extraction

每完成一个主题后抽取。

适合:

  • 研究 guide 分成几个主题;
  • 每个主题有明确输出字段。

3. Post-session extraction

访谈结束后统一抽取。

适合:

  • 成本敏感;
  • 输出需要更稳定;
  • 需要批处理。

MVP 建议:

  • real-time extraction 用于 coverage 和追问;
  • post-session extraction 用于最终结果。

Evidence Quality

证据质量可分级:

等级定义示例
strong用户明确表述或明确选择“我最痛的是手动整理访谈结果”
medium可由上下文合理推断用户描述多个整理步骤且抱怨耗时
weak间接或不完整证据用户只说“比较麻烦”

Review Rules

字段需要人工 review 的情况:

  1. confidence 低于 threshold。
  2. required field 没有 evidence。
  3. evidence strength 全是 weak。
  4. 不同 turn 有明显矛盾。
  5. 字段值不符合 schema。
  6. 用户在最后 summary 中否认该结论。
  7. 字段涉及敏感判断。

Cross-session Insight

单 session 输出是 extracted field。跨 session 输出是 insight。

export type StudyInsight = {
id: string;
study_id: string;
title: string;
summary: string;
theme: string;
severity?: "low" | "medium" | "high";
frequency?: number;
supporting_sessions: string[];
evidence: EvidenceRef[];
confidence: number;
status: "draft" | "accepted" | "rejected" | "needs_review";
};

Insight 示例:

{
"title": "研究人员最需要的是可信的结构化输出,而不只是对话总结",
"summary": "多位受访者提到 LLM summary 不能直接用于决策,因为缺少字段化结果和原始证据引用。",
"theme": "evidence_backed_extraction",
"severity": "high",
"frequency": 0.62,
"supporting_sessions": ["sess_123", "sess_128", "sess_131"],
"confidence": 0.84
}

产品实现建议

MVP 中 Results 页面应至少展示:

  1. 每个 session 的 extracted fields 表格。
  2. 每个字段的 confidence 和 status。
  3. 点击字段打开 evidence panel。
  4. evidence panel 能跳到原始 transcript turn。
  5. 支持人工修正字段值。
  6. 支持导出 JSON/CSV。