docs: 新增例句口說練習整合技術規格文檔

- 詳細規劃例句口說練習功能的前後端整合方案 - Microsoft Azure Speech Services 發音評估 API 整合設計 - 完整的 API 介面規格和資料庫 Schema 設計 - Web Audio API 錄音功能實現規格 - 複習系統 quizType 擴展方案 (sentence-speaking) - 多維度評分系統設計 (準確度/流暢度/完整度/韻律) - 成本分析和部署考量事項 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-09 00:40:03 +08:00 · 2025-10-09 00:40:03 +08:00 · 99677fc014
parent fce5138c55
commit 99677fc014
1 changed files with 745 additions and 0 deletions
--- a/例句口說練習整合規格.md
+++ b/例句口說練習整合規格.md
@ -0,0 +1,745 @@
+# 例句口說練習整合規格
+
+## 📋 概述
+
+本文檔詳細規劃 DramaLing 詞彙學習系統中新增「例句口說練習」功能的完整技術規格，包含前端組件、後端 API、Microsoft Azure Speech Services 整合，以及系統架構設計。
+
+---
+
+## 🎯 功能目標
+
+### 學習價值
+- **主動練習**: 從被動識別進階到主動口說輸出
+- **發音矯正**: 使用 AI 評估發音準確度和流暢度
+- **語境應用**: 在完整例句中練習單詞使用
+
+### 用戶體驗
+- **視覺引導**: 顯示例句圖片幫助理解語境
+- **即時反饋**: 提供發音評分和改善建議
+- **無縫整合**: 與現有複習系統完美融合
+
+---
+
+## 🖥️ 前端規格
+
+### 現有組件分析
+
+**文件位置**: `note/archive/components/review/review-tests/SentenceSpeakingTest.tsx`
+
+**組件結構**:
+```typescript
+interface SentenceSpeakingTestProps extends BaseReviewProps {
+  exampleImage?: string
+  onImageClick?: (image: string) => void
+}
+
+// 核心功能
+- 顯示例句圖片
+- 錄音按鈕 (🎤 開始錄音)
+- 目標例句顯示
+- 結果回饋區域
+```
+
+### 前端功能升級需求
+
+#### 1. **錄音功能實現**
+```typescript
+// 需要添加的功能
+interface AudioRecordingState {
+  isRecording: boolean
+  audioBlob: Blob | null
+  recordingTime: number
+  isProcessing: boolean
+}
+
+// Web Audio API 錄音實現
+const startRecording = async () => {
+  const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
+  const mediaRecorder = new MediaRecorder(stream)
+  // 實現錄音邏輯
+}
+```
+
+#### 2. **評分結果顯示**
+```typescript
+interface PronunciationResult {
+  overallScore: number        // 總分 (0-100)
+  accuracyScore: number      // 準確度
+  fluencyScore: number       // 流暢度
+  completenessScore: number  // 完整度
+  prosodyScore: number       // 韻律 (語調/節奏)
+  feedback: string[]         // 改善建議
+  transcribedText: string    // 語音轉文字結果
+}
+```
+
+#### 3. **UI 互動流程**
+1. 顯示例句圖片 + 目標例句
+2. 用戶點擊錄音按鈕 → 開始錄音 (顯示錄音動畫)
+3. 再次點擊 → 停止錄音 → 上傳音頻
+4. 顯示載入動畫 → 顯示評分結果
+5. 根據評分自動給出信心等級
+
+---
+
+## 🔧 後端規格
+
+### Microsoft Azure Speech Services 整合
+
+#### 1. **NuGet 套件需求**
+```xml
+<PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.38.0" />
+```
+
+#### 2. **配置管理**
+```csharp
+public class AzureSpeechOptions
+{
+    public const string SectionName = "AzureSpeech";
+    public string SubscriptionKey { get; set; } = string.Empty;
+    public string Region { get; set; } = "eastus";
+    public string Language { get; set; } = "en-US";
+    public bool EnableDetailedResult { get; set; } = true;
+    public int TimeoutSeconds { get; set; } = 30;
+}
+```
+
+#### 3. **核心服務實現**
+```csharp
+public interface IPronunciationAssessmentService
+{
+    Task<PronunciationResult> EvaluatePronunciationAsync(
+        Stream audioStream,
+        string referenceText,
+        string language = "en-US"
+    );
+}
+
+public class AzurePronunciationAssessmentService : IPronunciationAssessmentService
+{
+    // 實現 Azure Speech Services 整合
+    public async Task<PronunciationResult> EvaluatePronunciationAsync(...)
+    {
+        // 1. 配置 Speech SDK
+        var config = SpeechConfig.FromSubscription(apiKey, region);
+
+        // 2. 設置發音評估參數
+        var pronunciationConfig = PronunciationAssessmentConfig.Create(
+            referenceText,
+            GradingSystem.HundredMark,
+            Granularity.Phoneme
+        );
+
+        // 3. 處理音頻流並獲取評估結果
+        // 4. 轉換為統一的 PronunciationResult 格式
+    }
+}
+```
+
+---
+
+## 🌐 API 設計規格
+
+### 端點設計
+
+#### **POST `/api/speech/pronunciation-assessment`**
+
+**請求格式**:
+```http
+Content-Type: multipart/form-data
+
+audio: [音頻檔案] (WAV/MP3, 最大 10MB)
+referenceText: "He overstepped the boundaries of acceptable behavior."
+flashcardId: "b2bb23b8-16dd-44b2-bf64-34c468f2d362"
+language: "en-US" (可選，預設 en-US)
+```
+
+**回應格式**:
+```json
+{
+  "success": true,
+  "data": {
+    "assessmentId": "uuid-here",
+    "flashcardId": "b2bb23b8-16dd-44b2-bf64-34c468f2d362",
+    "referenceText": "He overstepped the boundaries...",
+    "transcribedText": "He overstep the boundary of acceptable behavior",
+    "scores": {
+      "overall": 85,
+      "accuracy": 82,
+      "fluency": 88,
+      "completeness": 90,
+      "prosody": 80
+    },
+    "wordLevelResults": [
+      {
+        "word": "overstepped",
+        "accuracy": 75,
+        "errorType": "Mispronunciation"
+      }
+    ],
+    "feedback": [
+      "發音整體表現良好",
+      "注意 'overstepped' 的重音位置",
+      "語速適中，語調自然"
+    ],
+    "confidenceLevel": 2,
+    "processingTime": "1.2s"
+  }
+}
+```
+
+### 錯誤處理
+
+**常見錯誤回應**:
+```json
+{
+  "success": false,
+  "error": "AUDIO_TOO_SHORT",
+  "message": "錄音時間太短，請至少錄製 1 秒",
+  "details": {
+    "minDuration": 1000,
+    "actualDuration": 500
+  }
+}
+```
+
+**錯誤類型定義**:
+- `AUDIO_TOO_SHORT` - 錄音時間不足
+- `AUDIO_TOO_LONG` - 錄音時間過長 (>30秒)
+- `INVALID_AUDIO_FORMAT` - 音頻格式不支援
+- `SPEECH_SERVICE_ERROR` - Azure 服務錯誤
+- `NO_SPEECH_DETECTED` - 未檢測到語音
+
+---
+
+## 📊 資料庫設計
+
+### 新增評估記錄表
+
+```sql
+CREATE TABLE PronunciationAssessments (
+    Id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),
+    UserId UNIQUEIDENTIFIER NOT NULL,
+    FlashcardId UNIQUEIDENTIFIER NOT NULL,
+    ReferenceText NVARCHAR(500) NOT NULL,
+    TranscribedText NVARCHAR(500),
+
+    -- 評分數據
+    OverallScore DECIMAL(5,2),
+    AccuracyScore DECIMAL(5,2),
+    FluencyScore DECIMAL(5,2),
+    CompletenessScore DECIMAL(5,2),
+    ProsodyScore DECIMAL(5,2),
+
+    -- 元數據
+    AudioDuration DECIMAL(8,3),
+    ProcessingTime DECIMAL(8,3),
+    AzureRequestId NVARCHAR(100),
+
+    CreatedAt DATETIME2 DEFAULT GETUTCDATE(),
+
+    -- 外鍵約束
+    FOREIGN KEY (UserId) REFERENCES Users(Id),
+    FOREIGN KEY (FlashcardId) REFERENCES Flashcards(Id)
+);
+
+-- 索引優化
+CREATE INDEX IX_PronunciationAssessments_UserId_CreatedAt
+ON PronunciationAssessments(UserId, CreatedAt DESC);
+
+CREATE INDEX IX_PronunciationAssessments_FlashcardId
+ON PronunciationAssessments(FlashcardId);
+```
+
+---
+
+## 🔄 系統整合規格
+
+### 1. 複習系統擴展
+
+#### **quizType 擴展**
+```typescript
+// hooks/review/useReviewSession.ts
+interface QuizItem {
+  quizType: 'flip-card' | 'vocab-choice' | 'sentence-speaking'
+  // ... 其他屬性保持不變
+}
+```
+
+#### **題目生成邏輯更新**
+```typescript
+// 在 generateQuizItemsFromFlashcards 中添加
+quizItems.push(
+  // 現有的 flip-card 和 vocab-choice...
+  {
+    id: `${card.id}-sentence-speaking`,
+    cardId: card.id,
+    cardData: cardState,
+    quizType: 'sentence-speaking',
+    order: order++,
+    isCompleted: false,
+    wrongCount: 0,
+    skipCount: 0
+  }
+)
+```
+
+### 2. 評分邏輯映射
+
+**Azure 評分 → 系統信心等級**:
+```typescript
+const mapAzureScoreToConfidence = (overallScore: number): number => {
+  if (overallScore >= 85) return 2      // 優秀 (高信心)
+  if (overallScore >= 70) return 1      // 良好 (中信心)
+  return 0                              // 需改善 (低信心)
+}
+```
+
+---
+
+## ⚙️ 技術實施規格
+
+### 前端實施
+
+#### 1. **音頻錄製實現**
+```typescript
+// components/shared/AudioRecorder.tsx (新增共用組件)
+export class AudioRecorder {
+  private mediaRecorder: MediaRecorder | null = null
+  private audioChunks: Blob[] = []
+
+  async startRecording(): Promise<void> {
+    const stream = await navigator.mediaDevices.getUserMedia({
+      audio: {
+        echoCancellation: true,
+        noiseSuppression: true,
+        sampleRate: 16000  // Azure 推薦採樣率
+      }
+    })
+
+    this.mediaRecorder = new MediaRecorder(stream, {
+      mimeType: 'audio/webm;codecs=opus' // 現代瀏覽器支援
+    })
+    // 實施錄音邏輯
+  }
+
+  stopRecording(): Promise<Blob> {
+    // 停止錄音並返回音頻 Blob
+  }
+}
+```
+
+#### 2. **API 客戶端**
+```typescript
+// lib/services/speechAssessment.ts
+export const speechAssessmentService = {
+  async evaluatePronunciation(
+    audioBlob: Blob,
+    referenceText: string,
+    flashcardId: string
+  ): Promise<PronunciationResult> {
+    const formData = new FormData()
+    formData.append('audio', audioBlob, 'recording.webm')
+    formData.append('referenceText', referenceText)
+    formData.append('flashcardId', flashcardId)
+
+    const response = await fetch('/api/speech/pronunciation-assessment', {
+      method: 'POST',
+      body: formData
+    })
+
+    return response.json()
+  }
+}
+```
+
+### 後端實施
+
+#### 1. **控制器實現**
+```csharp
+[ApiController]
+[Route("api/speech")]
+public class SpeechController : BaseController
+{
+    private readonly IPronunciationAssessmentService _assessmentService;
+
+    [HttpPost("pronunciation-assessment")]
+    public async Task<IActionResult> EvaluatePronunciation(
+        [FromForm] IFormFile audio,
+        [FromForm] string referenceText,
+        [FromForm] string flashcardId,
+        [FromForm] string language = "en-US")
+    {
+        // 1. 驗證請求
+        if (audio == null || audio.Length == 0)
+            return BadRequest("音頻檔案不能為空");
+
+        if (audio.Length > 10 * 1024 * 1024) // 10MB 限制
+            return BadRequest("音頻檔案過大");
+
+        // 2. 處理音頻流
+        using var audioStream = audio.OpenReadStream();
+
+        // 3. 呼叫 Azure Speech Services
+        var result = await _assessmentService.EvaluatePronunciationAsync(
+            audioStream, referenceText, language);
+
+        // 4. 儲存評估記錄到資料庫
+        // 5. 返回結果
+        return Ok(result);
+    }
+}
+```
+
+#### 2. **Azure Speech Services 整合**
+```csharp
+public class AzurePronunciationAssessmentService : IPronunciationAssessmentService
+{
+    public async Task<PronunciationResult> EvaluatePronunciationAsync(
+        Stream audioStream, string referenceText, string language)
+    {
+        // 1. 設定 Azure Speech Config
+        var speechConfig = SpeechConfig.FromSubscription(
+            _options.SubscriptionKey,
+            _options.Region
+        );
+        speechConfig.SpeechRecognitionLanguage = language;
+
+        // 2. 設定發音評估參數
+        var pronunciationConfig = PronunciationAssessmentConfig.Create(
+            referenceText,
+            GradingSystem.HundredMark,
+            Granularity.Word, // 單詞級別評估
+            enableMiscue: true // 啟用錯誤檢測
+        );
+
+        // 3. 設定音頻配置
+        using var audioConfig = AudioConfig.FromStreamInput(
+            AudioInputStream.CreatePushStream()
+        );
+
+        // 4. 建立語音識別器
+        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
+        pronunciationConfig.ApplyTo(recognizer);
+
+        // 5. 處理音頻並獲取結果
+        var result = await recognizer.RecognizeOnceAsync();
+
+        // 6. 解析評估結果
+        var pronunciationResult = PronunciationAssessmentResult.FromResult(result);
+
+        // 7. 轉換為系統格式
+        return new PronunciationResult
+        {
+            OverallScore = pronunciationResult.AccuracyScore,
+            AccuracyScore = pronunciationResult.AccuracyScore,
+            FluencyScore = pronunciationResult.FluencyScore,
+            CompletenessScore = pronunciationResult.CompletenessScore,
+            ProsodyScore = pronunciationResult.ProsodyScore,
+            TranscribedText = result.Text,
+            ProcessingTime = stopwatch.ElapsedMilliseconds
+        };
+    }
+}
+```
+
+---
+
+## 🌍 環境配置規格
+
+### appsettings.json 配置
+```json
+{
+  "AzureSpeech": {
+    "SubscriptionKey": "${AZURE_SPEECH_KEY}",
+    "Region": "eastus",
+    "Language": "en-US",
+    "EnableDetailedResult": true,
+    "TimeoutSeconds": 30,
+    "MaxAudioSizeMB": 10,
+    "SupportedFormats": ["audio/wav", "audio/webm", "audio/mp3"]
+  }
+}
+```
+
+### 環境變數
+```bash
+# 開發環境
+AZURE_SPEECH_KEY=your_azure_speech_key_here
+AZURE_SPEECH_REGION=eastus
+
+# 生產環境 (使用 Azure Key Vault)
+AZURE_SPEECH_KEY_VAULT_URL=https://dramaling-vault.vault.azure.net/
+```
+
+---
+
+## 📱 複習系統整合
+
+### 1. Quiz Type 擴展
+
+**更新位置**: `hooks/review/useReviewSession.ts`
+
+```typescript
+// 類型定義更新
+interface QuizItem {
+  quizType: 'flip-card' | 'vocab-choice' | 'sentence-speaking'
+}
+
+// 生成邏輯擴展 (Line 110-132)
+quizItems.push(
+  // 現有題目類型...
+  {
+    id: `${card.id}-sentence-speaking`,
+    cardId: card.id,
+    cardData: cardState,
+    quizType: 'sentence-speaking',
+    order: order++,
+    isCompleted: false,
+    wrongCount: 0,
+    skipCount: 0
+  }
+)
+```
+
+### 2. 渲染邏輯擴展
+
+**更新位置**: `app/review/page.tsx` (Line 332-350)
+
+```typescript
+// 添加新的條件渲染
+{currentQuizItem.quizType === 'sentence-speaking' && (
+  <SentenceSpeakingQuiz
+    card={currentCard}
+    onAnswer={handleAnswer}
+    onSkip={handleSkip}
+  />
+)}
+```
+
+---
+
+## 🎨 用戶介面設計
+
+### 錄音狀態 UI
+
+#### **錄音前**
+```html
+<button class="bg-red-500 hover:bg-red-600">
+  🎤 開始錄音
+</button>
+<p class="text-gray-600">點擊開始錄製例句發音</p>
+```
+
+#### **錄音中**
+```html
+<button class="bg-red-600 animate-pulse">
+  ⏹️ 停止錄音
+</button>
+<div class="flex items-center gap-2">
+  <div class="w-2 h-2 bg-red-500 rounded-full animate-ping"></div>
+  <span>錄音中... {recordingTime}s</span>
+</div>
+```
+
+#### **處理中**
+```html
+<div class="animate-spin rounded-full h-8 w-8 border-b-2 border-blue-600"></div>
+<p>AI 正在評估發音... (約需 2-3 秒)</p>
+```
+
+#### **結果顯示**
+```html
+<div class="bg-blue-50 border border-blue-200 rounded-lg p-6">
+  <h4 class="font-semibold text-blue-900 mb-3">發音評估結果</h4>
+
+  <!-- 總分顯示 -->
+  <div class="flex items-center gap-3 mb-4">
+    <div class="text-3xl font-bold text-blue-600">{overallScore}</div>
+    <div class="text-gray-600">總分 (滿分 100)</div>
+  </div>
+
+  <!-- 詳細評分 -->
+  <div class="grid grid-cols-2 gap-3 mb-4">
+    <div class="bg-white p-3 rounded border">
+      <div class="text-sm text-gray-600">準確度</div>
+      <div class="font-semibold text-lg">{accuracyScore}</div>
+    </div>
+    <!-- 其他評分項目... -->
+  </div>
+
+  <!-- 語音轉文字結果 -->
+  <div class="bg-gray-50 p-3 rounded border mb-4">
+    <div class="text-sm text-gray-600 mb-1">識別結果</div>
+    <div class="font-mono text-sm">{transcribedText}</div>
+  </div>
+
+  <!-- 改善建議 -->
+  <div class="space-y-1">
+    {feedback.map(item => (
+      <div class="text-sm text-blue-700">• {item}</div>
+    ))}
+  </div>
+</div>
+```
+
+---
+
+## 🔄 資料流程設計
+
+### 完整流程
+
+```mermaid
+graph TD
+    A[用戶點擊錄音] --> B[前端開始錄音]
+    B --> C[用戶說完點擊停止]
+    C --> D[前端生成音頻 Blob]
+    D --> E[上傳到後端 API]
+    E --> F[後端接收音頻檔案]
+    F --> G[呼叫 Azure Speech Services]
+    G --> H[Azure 返回評估結果]
+    H --> I[儲存到資料庫]
+    I --> J[返回評分給前端]
+    J --> K[前端顯示結果]
+    K --> L[映射到信心等級]
+    L --> M[更新複習進度]
+```
+
+### 錯誤處理流程
+
+```mermaid
+graph TD
+    A[API 請求] --> B{驗證音頻}
+    B -->|失敗| C[返回驗證錯誤]
+    B -->|成功| D[呼叫 Azure API]
+    D -->|成功| E[處理結果]
+    D -->|失敗| F{錯誤類型}
+    F -->|網路| G[返回重試提示]
+    F -->|配額| H[返回配額錯誤]
+    F -->|其他| I[返回一般錯誤]
+```
+
+---
+
+## 🚀 實施階段規劃
+
+### 第一階段：基礎架構
+1. ✅ 後端 Azure Speech Services 整合
+2. ✅ 基礎 API 端點實現
+3. ✅ 資料庫 Schema 更新
+4. ✅ 環境配置設定
+
+### 第二階段：前端整合
+1. ✅ AudioRecorder 共用組件開發
+2. ✅ SentenceSpeakingQuiz 組件重構
+3. ✅ API 服務客戶端實現
+4. ✅ 複習系統整合
+
+### 第三階段：優化和測試
+1. ✅ 錄音品質優化
+2. ✅ 評分準確度調整
+3. ✅ 錯誤處理完善
+4. ✅ 效能和穩定性測試
+
+---
+
+## 🔧 開發工具和配置
+
+### 開發環境需求
+- **Azure Speech Services 帳戶** (免費層每月 5,000 次請求)
+- **音頻測試環境** (需要麥克風的開發設備)
+- **HTTPS 環境** (Web Audio API 需要安全連接)
+
+### 測試策略
+- **單元測試**: Azure 服務模擬
+- **整合測試**: 端對端音頻流程
+- **負載測試**: 併發請求處理
+- **用戶測試**: 真實發音評估準確性
+
+### 部署考量
+- **音頻檔案暫存**: 處理後立即清理
+- **Azure 配額管理**: 監控使用量避免超限
+- **CDN 配置**: 靜態資源優化
+- **負載平衡**: 處理高併發錄音請求
+
+---
+
+## 📈 效能指標和監控
+
+### 關鍵指標
+- **評估延遲**: 目標 < 3 秒
+- **準確率**: 與人工評估比較 > 85%
+- **成功率**: API 請求成功率 > 99%
+- **用戶滿意度**: 發音改善效果追蹤
+
+### 監控項目
+- Azure API 請求次數和耗時
+- 音頻檔案大小分佈
+- 評分分佈統計
+- 錯誤類型統計
+
+---
+
+## 💰 成本估算
+
+### Azure Speech Services 定價 (2024)
+- **免費層**: 每月 5,000 次請求
+- **標準層**: $1 USD / 1,000 次請求
+- **預估使用**: 100 用戶 × 10 次/日 = 30,000 次/月
+- **月成本**: ~$30 USD (超出免費額度部分)
+
+### 建議成本控制
+- 實施請求快取避免重複評估
+- 設定用戶每日使用限額
+- 監控異常使用模式
+
+---
+
+## 🔐 安全性規格
+
+### 音頻資料保護
+- **傳輸加密**: HTTPS/TLS 1.3
+- **暫存清理**: 處理完成後立即刪除音頻檔案
+- **存取控制**: 僅評估用戶自己的錄音
+
+### API 安全
+- **速率限制**: 每用戶每分鐘最多 10 次請求
+- **檔案驗證**: 檢查音頻格式和內容
+- **輸入清理**: 防止注入攻擊
+
+---
+
+## 📚 技術參考資料
+
+### Microsoft 官方文檔
+- [Azure Speech Services Pronunciation Assessment](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment)
+- [Speech SDK for C#](https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech)
+- [Interactive Language Learning Tutorial](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-learning-with-pronunciation-assessment)
+
+### 實作範例
+- [GitHub Azure Speech Samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk)
+- [Pronunciation Assessment Samples](https://github.com/Azure-Samples/azure-ai-speech/tree/main/pronunciation-assessment)
+
+---
+
+## ✅ 驗收標準
+
+### 功能驗收
+1. ✅ 用戶能成功錄製 1-30 秒的音頻
+2. ✅ 後端能準確評估發音並返回多維度評分
+3. ✅ 前端能清晰顯示評分結果和改善建議
+4. ✅ 評分能正確映射到複習系統的信心等級
+
+### 效能驗收
+1. ✅ 音頻處理延遲 < 5 秒
+2. ✅ API 回應時間 < 10 秒 (包含網路延遲)
+3. ✅ 系統能處理併發錄音請求
+4. ✅ 無記憶體洩漏或音頻檔案堆積
+
+### 用戶體驗驗收
+1. ✅ 錄音過程直觀易懂
+2. ✅ 評分結果有意義且具建設性
+3. ✅ 錯誤提示清晰有幫助
+4. ✅ 與現有複習流程無縫整合
+
+這個規格將為 DramaLing 增加強大的口說練習功能，提升學習者的發音能力和語言實際應用技能！