结构化输出与证据链设计
结构化输出与证据链设计
为什么需要 evidence-backed output
AI 用户调研平台最容易出现的问题是:LLM 能总结,但研究团队不信任总结。
因此 AgentSurvey 的核心输出不应只是:
{ "main_pain_point": "配置复杂"}而应包含:
- 字段值;
- 置信度;
- 原始证据;
- 对应 session;
- 对应 turn;
- 是否需要人工 review;
- 抽取模型和 schema 版本。
标准输出对象
export type ExtractedField = { id: string; session_id: string; study_id: string; schema_field_id: string; field_name: string; value: unknown; normalized_value?: unknown; confidence: number; evidence: EvidenceRef[]; reasoning_summary?: string; status: "draft" | "accepted" | "rejected" | "needs_review"; extracted_by: "agent" | "post_processor" | "human"; model?: string; schema_version_id: string; created_at: string; updated_at: string;};
export type EvidenceRef = { evidence_id: string; session_id: string; turn_id?: string; tool_call_id?: string; quote?: string; start_char?: number; end_char?: number; evidence_type: "user_quote" | "tool_selection" | "system_observation" | "uploaded_artifact"; strength: "strong" | "medium" | "weak";};示例输出
{ "id": "field_001", "session_id": "sess_123", "study_id": "study_feature_validation", "schema_field_id": "main_pain_point", "field_name": "main_pain_point", "value": "人工整理访谈结果成本高", "normalized_value": "manual_analysis_cost", "confidence": 0.88, "evidence": [ { "evidence_id": "ev_001", "session_id": "sess_123", "turn_id": "turn_18", "quote": "我最麻烦的是访谈完之后要手动整理一堆开放回答,很难快速变成可比较的数据。", "evidence_type": "user_quote", "strength": "strong" }, { "evidence_id": "ev_002", "session_id": "sess_123", "tool_call_id": "tool_07", "quote": "用户在痛点多选中选择了:人工整理成本高", "evidence_type": "tool_selection", "strength": "medium" } ], "reasoning_summary": "用户先在开放回答中描述了手动整理访谈结果的困难,随后在结构化选项中确认了该痛点。", "status": "accepted", "extracted_by": "agent", "model": "config.default_extraction_model", "schema_version_id": "schema_v3"}Output Schema Builder
研究人员应能配置字段:
export type OutputSchemaField = { id: string; name: string; label: string; description: string; type: "string" | "number" | "integer" | "boolean" | "enum" | "array" | "object"; enum_options?: Array<{ id: string; label: string; description?: string }>; required: boolean; evidence_required: boolean; min_evidence_count?: number; confidence_threshold?: number; review_policy?: "always" | "low_confidence" | "conflict" | "never"; extraction_hint?: string;};字段类型示例
痛点字段
{ "id": "main_pain_point", "name": "main_pain_point", "label": "主要痛点", "description": "用户当前最强烈、最影响决策或效率的问题。", "type": "string", "required": true, "evidence_required": true, "min_evidence_count": 1, "confidence_threshold": 0.75, "review_policy": "low_confidence"}强度评分字段
{ "id": "pain_intensity", "name": "pain_intensity", "label": "痛点强度", "description": "1-5 分,表示该痛点对用户的影响程度。", "type": "integer", "required": true, "evidence_required": true, "confidence_threshold": 0.7}枚举字段
{ "id": "segment_fit", "name": "segment_fit", "label": "目标用户匹配度", "description": "判断该受访者是否符合目标用户画像。", "type": "enum", "enum_options": [ {"id": "high", "label": "高度匹配"}, {"id": "medium", "label": "部分匹配"}, {"id": "low", "label": "不匹配"} ], "required": true, "evidence_required": true}Extraction Timing
建议支持三种抽取时机:
1. Real-time extraction
每轮用户回答后尝试更新字段。
优点:
- 可让 Agent 知道哪些字段已覆盖;
- 可实时决定追问;
- 可提前发现矛盾。
缺点:
- 成本较高;
- 容易产生临时错误。
2. Checkpoint extraction
每完成一个主题后抽取。
适合:
- 研究 guide 分成几个主题;
- 每个主题有明确输出字段。
3. Post-session extraction
访谈结束后统一抽取。
适合:
- 成本敏感;
- 输出需要更稳定;
- 需要批处理。
MVP 建议:
- real-time extraction 用于 coverage 和追问;
- post-session extraction 用于最终结果。
Evidence Quality
证据质量可分级:
| 等级 | 定义 | 示例 |
|---|---|---|
| strong | 用户明确表述或明确选择 | “我最痛的是手动整理访谈结果” |
| medium | 可由上下文合理推断 | 用户描述多个整理步骤且抱怨耗时 |
| weak | 间接或不完整证据 | 用户只说“比较麻烦” |
Review Rules
字段需要人工 review 的情况:
- confidence 低于 threshold。
- required field 没有 evidence。
- evidence strength 全是 weak。
- 不同 turn 有明显矛盾。
- 字段值不符合 schema。
- 用户在最后 summary 中否认该结论。
- 字段涉及敏感判断。
Cross-session Insight
单 session 输出是 extracted field。跨 session 输出是 insight。
export type StudyInsight = { id: string; study_id: string; title: string; summary: string; theme: string; severity?: "low" | "medium" | "high"; frequency?: number; supporting_sessions: string[]; evidence: EvidenceRef[]; confidence: number; status: "draft" | "accepted" | "rejected" | "needs_review";};Insight 示例:
{ "title": "研究人员最需要的是可信的结构化输出,而不只是对话总结", "summary": "多位受访者提到 LLM summary 不能直接用于决策,因为缺少字段化结果和原始证据引用。", "theme": "evidence_backed_extraction", "severity": "high", "frequency": 0.62, "supporting_sessions": ["sess_123", "sess_128", "sess_131"], "confidence": 0.84}产品实现建议
MVP 中 Results 页面应至少展示:
- 每个 session 的 extracted fields 表格。
- 每个字段的 confidence 和 status。
- 点击字段打开 evidence panel。
- evidence panel 能跳到原始 transcript turn。
- 支持人工修正字段值。
- 支持导出 JSON/CSV。