dramaling-vocab-learning/選項詞彙庫功能規格書.md

24 KiB
Raw Blame History

選項詞彙庫功能規格書

版本: 1.0 日期: 2025-09-29 專案: DramaLing 智能英語學習系統 功能模組: 測驗選項生成系統


📋 功能概述

背景

目前 DramaLing 系統的測驗選項生成存在以下問題:

  • 前端使用簡單佔位符["其他選項1", "其他選項2", "其他選項3"]
  • 後端隨機選擇:從用戶自己的詞卡中隨機選取,缺乏智能性
  • 選項品質不穩定:可能產生過於簡單或困難的干擾項
  • 缺乏科學性:未考慮語言學習的認知負荷理論

目標

建立一個智能選項詞彙庫系統,根據目標詞彙的特徵自動生成高品質的測驗干擾項。

核心特性

  • 三參數匹配CEFR 等級、字數、詞性
  • 智能篩選:避免同義詞、相似拼寫等不合適的選項
  • 可擴展性:支援持續新增詞彙和優化演算法
  • 效能優化:透過索引和快取確保快速回應

🎯 功能需求

核心需求

需求ID 描述 優先級
REQ-001 根據 CEFR 等級匹配相近難度的詞彙
REQ-002 根據字數(字元長度)匹配類似長度的詞彙
REQ-003 根據詞性匹配相同詞性的詞彙
REQ-004 每次生成 3 個不同的干擾項
REQ-005 支援多種測驗類型(詞彙選擇、聽力等)
REQ-006 提供詞彙庫管理介面

設計簡化說明:為降低維護成本和實作複雜度,移除了同義詞排除、品質評分、頻率評級等進階功能。專注於三參數匹配的核心功能,確保系統簡潔實用。

非功能需求

需求ID 描述 指標
NFR-001 回應時間 < 100ms
NFR-002 詞彙庫大小 初期 ≥ 10,000 詞
NFR-003 可用性 99.9%
NFR-004 擴展性 支援 100,000+ 詞彙

🏗️ 系統設計

整體架構

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   前端測驗頁面   │────│   選項生成API    │────│   詞彙庫服務     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                        │
                                ▼                        ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │   快取層        │    │   選項詞彙庫     │
                       │  (Redis/Memory) │    │   (Database)    │
                       └─────────────────┘    └─────────────────┘

核心元件

  1. OptionsVocabulary 實體 - 詞彙庫資料模型
  2. OptionsVocabularyService - 詞彙庫業務邏輯
  3. DistractorGenerationService - 干擾項生成邏輯
  4. VocabularyMatchingEngine - 詞彙匹配演算法

📊 資料模型設計

OptionsVocabulary 實體

namespace DramaLing.Api.Models.Entities;

public class OptionsVocabulary
{
    /// <summary>
    /// 主鍵
    /// </summary>
    public Guid Id { get; set; }

    /// <summary>
    /// 詞彙內容
    /// </summary>
    [Required]
    [MaxLength(100)]
    [Index("IX_OptionsVocabulary_Word", IsUnique = true)]
    public string Word { get; set; } = string.Empty;

    /// <summary>
    /// CEFR 難度等級 (A1, A2, B1, B2, C1, C2)
    /// </summary>
    [Required]
    [MaxLength(2)]
    [Index("IX_OptionsVocabulary_CEFR")]
    public string CEFRLevel { get; set; } = string.Empty;

    /// <summary>
    /// 詞性 (noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, idiom)
    /// </summary>
    [Required]
    [MaxLength(20)]
    [RegularExpression("^(noun|verb|adjective|adverb|pronoun|preposition|conjunction|interjection|idiom)$",
                      ErrorMessage = "詞性必須為有效值")]
    [Index("IX_OptionsVocabulary_PartOfSpeech")]
    public string PartOfSpeech { get; set; } = string.Empty;

    /// <summary>
    /// 字數(字元長度)- 自動從 Word 計算
    /// </summary>
    [Index("IX_OptionsVocabulary_WordLength")]
    public int WordLength { get; set; }

    /// <summary>
    /// 是否啟用
    /// </summary>
    public bool IsActive { get; set; } = true;

    /// <summary>
    /// 創建時間
    /// </summary>
    public DateTime CreatedAt { get; set; } = DateTime.UtcNow;

    /// <summary>
    /// 更新時間
    /// </summary>
    public DateTime UpdatedAt { get; set; } = DateTime.UtcNow;
}

複合索引設計

// 在 DbContext 中配置
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    // 核心查詢索引CEFR + 詞性 + 字數
    modelBuilder.Entity<OptionsVocabulary>()
        .HasIndex(e => new { e.CEFRLevel, e.PartOfSpeech, e.WordLength })
        .HasDatabaseName("IX_OptionsVocabulary_Core_Matching");

    // 啟用狀態索引
    modelBuilder.Entity<OptionsVocabulary>()
        .HasIndex(e => e.IsActive)
        .HasDatabaseName("IX_OptionsVocabulary_Active");
}

🔧 服務層設計

IOptionsVocabularyService 介面

namespace DramaLing.Api.Services;

public interface IOptionsVocabularyService
{
    /// <summary>
    /// 根據目標詞彙生成干擾項
    /// </summary>
    Task<List<string>> GenerateDistractorsAsync(
        string targetWord,
        string cefrLevel,
        string partOfSpeech,
        int count = 3);

    /// <summary>
    /// 新增詞彙到選項庫
    /// </summary>
    Task<bool> AddVocabularyAsync(OptionsVocabulary vocabulary);

    /// <summary>
    /// 批量匯入詞彙
    /// </summary>
    Task<int> BulkImportAsync(List<OptionsVocabulary> vocabularies);

    /// <summary>
    /// 根據條件搜尋詞彙
    /// </summary>
    Task<List<OptionsVocabulary>> SearchVocabulariesAsync(
        string? cefrLevel = null,
        string? partOfSpeech = null,
        int? minLength = null,
        int? maxLength = null,
        int limit = 100);
}

QuestionGeneratorService 整合設計

public class QuestionGeneratorService : IQuestionGeneratorService
{
    private readonly DramaLingDbContext _context;
    private readonly IOptionsVocabularyService _optionsVocabularyService;
    private readonly ILogger<QuestionGeneratorService> _logger;

    public QuestionGeneratorService(
        DramaLingDbContext context,
        IOptionsVocabularyService optionsVocabularyService,
        ILogger<QuestionGeneratorService> logger)
    {
        _context = context;
        _optionsVocabularyService = optionsVocabularyService;
        _logger = logger;
    }

    /// <summary>
    /// 生成詞彙選擇題選項(整合選項詞彙庫)
    /// </summary>
    private async Task<QuestionData> GenerateVocabChoiceAsync(Flashcard flashcard)
    {
        try
        {
            // 優先使用選項詞彙庫生成干擾項
            var distractors = await _optionsVocabularyService.GenerateDistractorsAsync(
                flashcard.Word,
                flashcard.DifficultyLevel ?? "B1",
                flashcard.PartOfSpeech ?? "noun");

            // 如果詞彙庫沒有足夠的選項,回退到用戶其他詞卡
            if (distractors.Count < 3)
            {
                var fallbackDistractors = await GetFallbackDistractorsAsync(flashcard);
                distractors.AddRange(fallbackDistractors.Take(3 - distractors.Count));
            }

            var options = new List<string> { flashcard.Word };
            options.AddRange(distractors.Take(3));

            // 隨機打亂選項順序
            var shuffledOptions = options.OrderBy(x => Guid.NewGuid()).ToArray();

            return new QuestionData
            {
                QuestionType = "vocab-choice",
                Options = shuffledOptions,
                CorrectAnswer = flashcard.Word
            };
        }
        catch (Exception ex)
        {
            _logger.LogWarning(ex, "Failed to generate options from vocabulary database, using fallback for {Word}", flashcard.Word);

            // 完全回退到原有邏輯
            return await GenerateVocabChoiceWithFallbackAsync(flashcard);
        }
    }

    /// <summary>
    /// 回退選項生成(使用用戶其他詞卡)
    /// </summary>
    private async Task<List<string>> GetFallbackDistractorsAsync(Flashcard flashcard)
    {
        return await _context.Flashcards
            .Where(f => f.UserId == flashcard.UserId &&
                       f.Id != flashcard.Id &&
                       !f.IsArchived)
            .OrderBy(x => Guid.NewGuid())
            .Take(3)
            .Select(f => f.Word)
            .ToListAsync();
    }
}

🌐 API 設計

整合到現有 FlashcardsController

選項詞彙庫功能將整合到現有的 POST /api/flashcards/{id}/question API 端點中。

// 現有的 FlashcardsController.GenerateQuestion 方法會自動使用改進後的 QuestionGeneratorService
// 不需要新增額外的 API 端點

[HttpPost("{id}/question")]
public async Task<ActionResult> GenerateQuestion(Guid id, [FromBody] QuestionRequest request)
{
    try
    {
        // QuestionGeneratorService 內部會使用 OptionsVocabularyService 生成更好的選項
        var questionData = await _questionGeneratorService.GenerateQuestionAsync(id, request.QuestionType);

        return Ok(new { success = true, data = questionData });
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error generating question for flashcard {FlashcardId}", id);
        return StatusCode(500, new { success = false, error = "Failed to generate question" });
    }
}

詞彙庫管理 API選用功能

注意:以下管理 API 為選用功能,主要供管理員批量管理詞彙庫使用。 核心選項生成功能已整合到現有的測驗 API 中,不依賴這些管理端點。

/// <summary>
/// 詞彙庫管理控制器(選用)
/// 僅在需要管理員批量管理詞彙庫時實作
/// </summary>
[ApiController]
[Route("api/admin/[controller]")]
[Authorize(Roles = "Admin")]
public class OptionsVocabularyController : ControllerBase
{
    private readonly IOptionsVocabularyService _vocabularyService;

    /// <summary>
    /// 批量匯入詞彙(管理員功能)
    /// </summary>
    [HttpPost("bulk-import")]
    public async Task<ActionResult> BulkImport([FromBody] List<AddVocabularyRequest> requests)
    {
        var vocabularies = requests.Select(r => new OptionsVocabulary
        {
            Word = r.Word,
            CEFRLevel = r.CEFRLevel,
            PartOfSpeech = r.PartOfSpeech,
            WordLength = r.Word.Length
        }).ToList();

        var importedCount = await _vocabularyService.BulkImportAsync(vocabularies);
        return Ok(new { ImportedCount = importedCount });
    }

    /// <summary>
    /// 搜尋詞彙庫統計(管理員功能)
    /// </summary>
    [HttpGet("stats")]
    public async Task<ActionResult> GetVocabularyStats()
    {
        var stats = await _vocabularyService.GetVocabularyStatsAsync();
        return Ok(stats);
    }
}

📁 DTOs 定義

QuestionOptionsResponse

namespace DramaLing.Api.Models.DTOs;

public class QuestionOptionsResponse
{
    public string QuestionType { get; set; } = string.Empty;
    public string[] Options { get; set; } = Array.Empty<string>();
    public string CorrectAnswer { get; set; } = string.Empty;
    public string TargetWord { get; set; } = string.Empty;
    public string? CEFRLevel { get; set; }
    public string? PartOfSpeech { get; set; }
    public DateTime GeneratedAt { get; set; } = DateTime.UtcNow;
}

AddVocabularyRequest

public class AddVocabularyRequest
{
    [Required]
    [MaxLength(100)]
    public string Word { get; set; } = string.Empty;

    [Required]
    [RegularExpression("^(A1|A2|B1|B2|C1|C2)$")]
    public string CEFRLevel { get; set; } = string.Empty;

    [Required]
    [MaxLength(20)]
    [RegularExpression("^(noun|verb|adjective|adverb|pronoun|preposition|conjunction|interjection|idiom)$",
                      ErrorMessage = "詞性必須為有效值")]
    public string PartOfSpeech { get; set; } = string.Empty;

}

💾 資料庫遷移

Migration 檔案

public partial class AddOptionsVocabularyTable : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.CreateTable(
            name: "OptionsVocabularies",
            columns: table => new
            {
                Id = table.Column<Guid>(nullable: false),
                Word = table.Column<string>(maxLength: 100, nullable: false),
                CEFRLevel = table.Column<string>(maxLength: 2, nullable: false),
                PartOfSpeech = table.Column<string>(maxLength: 20, nullable: false),
                WordLength = table.Column<int>(nullable: false),
                IsActive = table.Column<bool>(nullable: false, defaultValue: true),
                CreatedAt = table.Column<DateTime>(nullable: false),
                UpdatedAt = table.Column<DateTime>(nullable: false)
            },
            constraints: table =>
            {
                table.PrimaryKey("PK_OptionsVocabularies", x => x.Id);
            });

        // 索引
        migrationBuilder.CreateIndex(
            name: "IX_OptionsVocabulary_Word",
            table: "OptionsVocabularies",
            column: "Word",
            unique: true);

        migrationBuilder.CreateIndex(
            name: "IX_OptionsVocabulary_Core_Matching",
            table: "OptionsVocabularies",
            columns: new[] { "CEFRLevel", "PartOfSpeech", "WordLength" });

        migrationBuilder.CreateIndex(
            name: "IX_OptionsVocabulary_Active",
            table: "OptionsVocabularies",
            column: "IsActive");
    }

    protected override void Down(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.DropTable(name: "OptionsVocabularies");
    }
}

🔄 使用案例

案例 1詞彙選擇題 API 流程

前端請求:
POST /api/flashcards/{id}/question
{
  "questionType": "vocab-choice"
}

後端處理:
1. 查詢詞卡: "beautiful" (B1, adjective, 9字元)
2. 從選項詞彙庫篩選干擾項:
   - CEFR: A2, B1, B2 (相鄰等級)
   - 詞性: adjective
   - 字數: 7-11 字元
3. 選出干擾項: ["wonderful", "excellent", "attractive"]

API 回應:
{
  "success": true,
  "data": {
    "questionType": "vocab-choice",
    "options": ["beautiful", "wonderful", "excellent", "attractive"],
    "correctAnswer": "beautiful"
  }
}

案例 2聽力測驗 API 流程

前端請求:
POST /api/flashcards/{id}/question
{
  "questionType": "sentence-listening"
}

後端處理:
1. 查詢詞卡: "running" (A2, verb, 7字元)
2. 從選項詞彙庫篩選干擾項:
   - CEFR: A1, A2, B1
   - 詞性: verb
   - 字數: 5-9 字元
3. 選出干擾項: ["jumping", "walking", "playing"]

API 回應:
{
  "success": true,
  "data": {
    "questionType": "sentence-listening",
    "options": ["running", "jumping", "walking", "playing"],
    "correctAnswer": "running"
  }
}

案例 3回退機制

情境: 詞彙庫中沒有足夠的相符選項

處理流程:
1. 嘗試從選項詞彙庫獲取干擾項 → 只找到 1 個
2. 啟動回退機制:從用戶其他詞卡補足 2 個選項
3. 確保總是能提供 3 個干擾項

優點:確保系統穩定性,即使詞彙庫不完整也能正常運作

效能考量

查詢優化

  1. 複合索引(CEFRLevel, PartOfSpeech, WordLength)
  2. 覆蓋索引:包含常用查詢欄位
  3. 分頁查詢:避免一次載入過多資料

快取策略

public class CachedDistractorGenerationService
{
    private readonly IMemoryCache _cache;
    private readonly TimeSpan _cacheExpiry = TimeSpan.FromHours(1);

    public async Task<List<string>> GenerateDistractorsAsync(string targetWord, string cefrLevel, string partOfSpeech)
    {
        var cacheKey = $"distractors:{targetWord}:{cefrLevel}:{partOfSpeech}";

        if (_cache.TryGetValue(cacheKey, out List<string> cachedResult))
        {
            return cachedResult;
        }

        var result = await GenerateDistractorsInternalAsync(targetWord, cefrLevel, partOfSpeech);

        _cache.Set(cacheKey, result, _cacheExpiry);
        return result;
    }
}

效能指標

指標 目標值 監控方式
API 回應時間 < 100ms Application Insights
資料庫查詢時間 < 50ms EF Core 日誌
快取命中率 > 80% 自訂計數器
併發請求數 > 1000 req/s 負載測試

📊 初始資料建立

資料來源建議

  1. CEFR 詞彙表

    • Cambridge English Vocabulary Profile
    • Oxford 3000/5000 詞彙表
    • 各級別教材詞彙表
  2. 詞性標注

    • WordNet 資料庫
    • 英語詞性詞典
    • 語料庫分析結果
  3. 頻率評級

    • Google Ngram Corpus
    • Brown Corpus
    • 現代英語使用頻率統計

初始資料腳本

public class VocabularySeeder
{
    public async Task SeedInitialVocabularyAsync()
    {
        var vocabularies = new List<OptionsVocabulary>
        {
            // A1 Level - 名詞
            new() { Word = "cat", CEFRLevel = "A1", PartOfSpeech = "noun", WordLength = 3 },
            new() { Word = "dog", CEFRLevel = "A1", PartOfSpeech = "noun", WordLength = 3 },
            new() { Word = "book", CEFRLevel = "A1", PartOfSpeech = "noun", WordLength = 4 },

            // A1 Level - 動詞
            new() { Word = "eat", CEFRLevel = "A1", PartOfSpeech = "verb", WordLength = 3 },
            new() { Word = "run", CEFRLevel = "A1", PartOfSpeech = "verb", WordLength = 3 },
            new() { Word = "walk", CEFRLevel = "A1", PartOfSpeech = "verb", WordLength = 4 },

            // A1 Level - 代名詞
            new() { Word = "he", CEFRLevel = "A1", PartOfSpeech = "pronoun", WordLength = 2 },
            new() { Word = "she", CEFRLevel = "A1", PartOfSpeech = "pronoun", WordLength = 3 },
            new() { Word = "they", CEFRLevel = "A1", PartOfSpeech = "pronoun", WordLength = 4 },

            // A2 Level - 介系詞
            new() { Word = "under", CEFRLevel = "A2", PartOfSpeech = "preposition", WordLength = 5 },
            new() { Word = "above", CEFRLevel = "A2", PartOfSpeech = "preposition", WordLength = 5 },
            new() { Word = "behind", CEFRLevel = "A2", PartOfSpeech = "preposition", WordLength = 6 },

            // B1 Level - 形容詞
            new() { Word = "beautiful", CEFRLevel = "B1", PartOfSpeech = "adjective", WordLength = 9 },
            new() { Word = "wonderful", CEFRLevel = "B1", PartOfSpeech = "adjective", WordLength = 9 },
            new() { Word = "excellent", CEFRLevel = "B2", PartOfSpeech = "adjective", WordLength = 9 },

            // B1 Level - 副詞
            new() { Word = "quickly", CEFRLevel = "B1", PartOfSpeech = "adverb", WordLength = 7 },
            new() { Word = "carefully", CEFRLevel = "B1", PartOfSpeech = "adverb", WordLength = 9 },
            new() { Word = "suddenly", CEFRLevel = "B1", PartOfSpeech = "adverb", WordLength = 8 },

            // B2 Level - 連接詞
            new() { Word = "however", CEFRLevel = "B2", PartOfSpeech = "conjunction", WordLength = 7 },
            new() { Word = "therefore", CEFRLevel = "B2", PartOfSpeech = "conjunction", WordLength = 9 },
            new() { Word = "although", CEFRLevel = "B2", PartOfSpeech = "conjunction", WordLength = 8 },

            // 感嘆詞
            new() { Word = "wow", CEFRLevel = "A1", PartOfSpeech = "interjection", WordLength = 3 },
            new() { Word = "ouch", CEFRLevel = "A2", PartOfSpeech = "interjection", WordLength = 4 },
            new() { Word = "alas", CEFRLevel = "C1", PartOfSpeech = "interjection", WordLength = 4 },

            // 慣用語
            new() { Word = "break the ice", CEFRLevel = "B2", PartOfSpeech = "idiom", WordLength = 12 },
            new() { Word = "piece of cake", CEFRLevel = "B1", PartOfSpeech = "idiom", WordLength = 12 },
            new() { Word = "hit the books", CEFRLevel = "B2", PartOfSpeech = "idiom", WordLength = 12 },

            // ... 更多詞彙
        };

        await _context.OptionsVocabularies.AddRangeAsync(vocabularies);
        await _context.SaveChangesAsync();
    }
}

🔄 服務註冊

Startup.cs / Program.cs

// 註冊服務
builder.Services.AddScoped<IOptionsVocabularyService, OptionsVocabularyService>();
builder.Services.AddScoped<DistractorGenerationService>();

// 記憶體快取
builder.Services.AddMemoryCache();

// 背景服務(可選)
builder.Services.AddHostedService<VocabularyQualityScoreUpdateService>();

📈 品質保證

演算法驗證

  1. A/B 測試:比較新舊選項生成方式的學習效果
  2. 專家評審:語言學習專家評估選項品質
  3. 用戶回饋:收集學習者對選項難度的反饋

監控指標

public class DistractorQualityMetrics
{
    public double AverageResponseTime { get; set; }
    public double OptionVariability { get; set; }      // 選項多樣性
    public double CEFRLevelAccuracy { get; set; }      // CEFR 匹配準確度
    public double UserSatisfactionScore { get; set; }   // 用戶滿意度
    public int TotalDistractorsGenerated { get; set; }
    public DateTime MeasuredAt { get; set; }
}

🚀 實作階段規劃

Phase 1: 基礎實作 (1-2 週)

  • 建立 OptionsVocabulary 實體和資料庫遷移
  • 實作 OptionsVocabularyService 基礎功能
  • 建立核心 API 端點
  • 匯入初始詞彙資料1000-5000 詞)

Phase 2: 演算法優化 (1 週)

  • 實作 DistractorGenerationService
  • 新增同義詞排除邏輯
  • 實作品質評分系統
  • 加入快取機制

Phase 3: 前端整合 (1-2 天)

  • 測試現有 API 端點的改進效果
  • 驗證各種測驗類型的選項品質
  • 效能測試和優化

注意:由於選項生成功能已整合到現有 API前端不需要修改任何程式碼。 只需要確保後端改進後的選項生成效果符合預期。

Phase 4: 進階功能 (1-2 週)

  • 管理介面開發
  • 批量匯入工具
  • 監控和分析儀表板
  • A/B 測試框架

📋 驗收標準

功能驗收

  • 能根據 CEFR、詞性、字數生成合適的干擾項
  • API 回應時間 < 100ms
  • 生成的選項無重複
  • 支援各種測驗類型

品質驗收

  • 干擾項難度適中(不會太簡單或太困難)
  • 無明顯的同義詞作為干擾項
  • 拼寫差異合理(避免過於相似)

技術驗收

  • 程式碼覆蓋率 > 80%
  • 通過所有單元測試
  • API 文檔完整
  • 效能測試通過

🔒 安全性考量

資料保護

  • 詞彙庫資料非敏感性,無特殊加密需求
  • 管理 API 需要管理員權限驗證
  • 防止 SQL 注入攻擊

API 安全

  • 實作 Rate Limiting 防止濫用
  • 輸入驗證和清理
  • 錯誤訊息不洩露系統資訊

📚 相關文件


規格書完成日期: 2025-09-29 下次更新時間: 實作完成後