dramaling-vocab-learning/例句口說練習整合規格.md

745 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 例句口說練習整合規格
## 📋 概述
本文檔詳細規劃 DramaLing 詞彙學習系統中新增「例句口說練習」功能的完整技術規格,包含前端組件、後端 API、Microsoft Azure Speech Services 整合,以及系統架構設計。
---
## 🎯 功能目標
### 學習價值
- **主動練習**: 從被動識別進階到主動口說輸出
- **發音矯正**: 使用 AI 評估發音準確度和流暢度
- **語境應用**: 在完整例句中練習單詞使用
### 用戶體驗
- **視覺引導**: 顯示例句圖片幫助理解語境
- **即時反饋**: 提供發音評分和改善建議
- **無縫整合**: 與現有複習系統完美融合
---
## 🖥️ 前端規格
### 現有組件分析
**文件位置**: `note/archive/components/review/review-tests/SentenceSpeakingTest.tsx`
**組件結構**:
```typescript
interface SentenceSpeakingTestProps extends BaseReviewProps {
exampleImage?: string
onImageClick?: (image: string) => void
}
// 核心功能
- 顯示例句圖片
- 錄音按鈕 (🎤 開始錄音)
- 目標例句顯示
- 結果回饋區域
```
### 前端功能升級需求
#### 1. **錄音功能實現**
```typescript
// 需要添加的功能
interface AudioRecordingState {
isRecording: boolean
audioBlob: Blob | null
recordingTime: number
isProcessing: boolean
}
// Web Audio API 錄音實現
const startRecording = async () => {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
const mediaRecorder = new MediaRecorder(stream)
// 實現錄音邏輯
}
```
#### 2. **評分結果顯示**
```typescript
interface PronunciationResult {
overallScore: number // 總分 (0-100)
accuracyScore: number // 準確度
fluencyScore: number // 流暢度
completenessScore: number // 完整度
prosodyScore: number // 韻律 (語調/節奏)
feedback: string[] // 改善建議
transcribedText: string // 語音轉文字結果
}
```
#### 3. **UI 互動流程**
1. 顯示例句圖片 + 目標例句
2. 用戶點擊錄音按鈕 → 開始錄音 (顯示錄音動畫)
3. 再次點擊 → 停止錄音 → 上傳音頻
4. 顯示載入動畫 → 顯示評分結果
5. 根據評分自動給出信心等級
---
## 🔧 後端規格
### Microsoft Azure Speech Services 整合
#### 1. **NuGet 套件需求**
```xml
<PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.38.0" />
```
#### 2. **配置管理**
```csharp
public class AzureSpeechOptions
{
public const string SectionName = "AzureSpeech";
public string SubscriptionKey { get; set; } = string.Empty;
public string Region { get; set; } = "eastus";
public string Language { get; set; } = "en-US";
public bool EnableDetailedResult { get; set; } = true;
public int TimeoutSeconds { get; set; } = 30;
}
```
#### 3. **核心服務實現**
```csharp
public interface IPronunciationAssessmentService
{
Task<PronunciationResult> EvaluatePronunciationAsync(
Stream audioStream,
string referenceText,
string language = "en-US"
);
}
public class AzurePronunciationAssessmentService : IPronunciationAssessmentService
{
// 實現 Azure Speech Services 整合
public async Task<PronunciationResult> EvaluatePronunciationAsync(...)
{
// 1. 配置 Speech SDK
var config = SpeechConfig.FromSubscription(apiKey, region);
// 2. 設置發音評估參數
var pronunciationConfig = PronunciationAssessmentConfig.Create(
referenceText,
GradingSystem.HundredMark,
Granularity.Phoneme
);
// 3. 處理音頻流並獲取評估結果
// 4. 轉換為統一的 PronunciationResult 格式
}
}
```
---
## 🌐 API 設計規格
### 端點設計
#### **POST `/api/speech/pronunciation-assessment`**
**請求格式**:
```http
Content-Type: multipart/form-data
audio: [音頻檔案] (WAV/MP3, 最大 10MB)
referenceText: "He overstepped the boundaries of acceptable behavior."
flashcardId: "b2bb23b8-16dd-44b2-bf64-34c468f2d362"
language: "en-US" (可選,預設 en-US)
```
**回應格式**:
```json
{
"success": true,
"data": {
"assessmentId": "uuid-here",
"flashcardId": "b2bb23b8-16dd-44b2-bf64-34c468f2d362",
"referenceText": "He overstepped the boundaries...",
"transcribedText": "He overstep the boundary of acceptable behavior",
"scores": {
"overall": 85,
"accuracy": 82,
"fluency": 88,
"completeness": 90,
"prosody": 80
},
"wordLevelResults": [
{
"word": "overstepped",
"accuracy": 75,
"errorType": "Mispronunciation"
}
],
"feedback": [
"發音整體表現良好",
"注意 'overstepped' 的重音位置",
"語速適中,語調自然"
],
"confidenceLevel": 2,
"processingTime": "1.2s"
}
}
```
### 錯誤處理
**常見錯誤回應**:
```json
{
"success": false,
"error": "AUDIO_TOO_SHORT",
"message": "錄音時間太短,請至少錄製 1 秒",
"details": {
"minDuration": 1000,
"actualDuration": 500
}
}
```
**錯誤類型定義**:
- `AUDIO_TOO_SHORT` - 錄音時間不足
- `AUDIO_TOO_LONG` - 錄音時間過長 (>30秒)
- `INVALID_AUDIO_FORMAT` - 音頻格式不支援
- `SPEECH_SERVICE_ERROR` - Azure 服務錯誤
- `NO_SPEECH_DETECTED` - 未檢測到語音
---
## 📊 資料庫設計
### 新增評估記錄表
```sql
CREATE TABLE PronunciationAssessments (
Id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),
UserId UNIQUEIDENTIFIER NOT NULL,
FlashcardId UNIQUEIDENTIFIER NOT NULL,
ReferenceText NVARCHAR(500) NOT NULL,
TranscribedText NVARCHAR(500),
-- 評分數據
OverallScore DECIMAL(5,2),
AccuracyScore DECIMAL(5,2),
FluencyScore DECIMAL(5,2),
CompletenessScore DECIMAL(5,2),
ProsodyScore DECIMAL(5,2),
-- 元數據
AudioDuration DECIMAL(8,3),
ProcessingTime DECIMAL(8,3),
AzureRequestId NVARCHAR(100),
CreatedAt DATETIME2 DEFAULT GETUTCDATE(),
-- 外鍵約束
FOREIGN KEY (UserId) REFERENCES Users(Id),
FOREIGN KEY (FlashcardId) REFERENCES Flashcards(Id)
);
-- 索引優化
CREATE INDEX IX_PronunciationAssessments_UserId_CreatedAt
ON PronunciationAssessments(UserId, CreatedAt DESC);
CREATE INDEX IX_PronunciationAssessments_FlashcardId
ON PronunciationAssessments(FlashcardId);
```
---
## 🔄 系統整合規格
### 1. 複習系統擴展
#### **quizType 擴展**
```typescript
// hooks/review/useReviewSession.ts
interface QuizItem {
quizType: 'flip-card' | 'vocab-choice' | 'sentence-speaking'
// ... 其他屬性保持不變
}
```
#### **題目生成邏輯更新**
```typescript
// 在 generateQuizItemsFromFlashcards 中添加
quizItems.push(
// 現有的 flip-card 和 vocab-choice...
{
id: `${card.id}-sentence-speaking`,
cardId: card.id,
cardData: cardState,
quizType: 'sentence-speaking',
order: order++,
isCompleted: false,
wrongCount: 0,
skipCount: 0
}
)
```
### 2. 評分邏輯映射
**Azure 評分 → 系統信心等級**:
```typescript
const mapAzureScoreToConfidence = (overallScore: number): number => {
if (overallScore >= 85) return 2 // 優秀 (高信心)
if (overallScore >= 70) return 1 // 良好 (中信心)
return 0 // 需改善 (低信心)
}
```
---
## ⚙️ 技術實施規格
### 前端實施
#### 1. **音頻錄製實現**
```typescript
// components/shared/AudioRecorder.tsx (新增共用組件)
export class AudioRecorder {
private mediaRecorder: MediaRecorder | null = null
private audioChunks: Blob[] = []
async startRecording(): Promise<void> {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
sampleRate: 16000 // Azure 推薦採樣率
}
})
this.mediaRecorder = new MediaRecorder(stream, {
mimeType: 'audio/webm;codecs=opus' // 現代瀏覽器支援
})
// 實施錄音邏輯
}
stopRecording(): Promise<Blob> {
// 停止錄音並返回音頻 Blob
}
}
```
#### 2. **API 客戶端**
```typescript
// lib/services/speechAssessment.ts
export const speechAssessmentService = {
async evaluatePronunciation(
audioBlob: Blob,
referenceText: string,
flashcardId: string
): Promise<PronunciationResult> {
const formData = new FormData()
formData.append('audio', audioBlob, 'recording.webm')
formData.append('referenceText', referenceText)
formData.append('flashcardId', flashcardId)
const response = await fetch('/api/speech/pronunciation-assessment', {
method: 'POST',
body: formData
})
return response.json()
}
}
```
### 後端實施
#### 1. **控制器實現**
```csharp
[ApiController]
[Route("api/speech")]
public class SpeechController : BaseController
{
private readonly IPronunciationAssessmentService _assessmentService;
[HttpPost("pronunciation-assessment")]
public async Task<IActionResult> EvaluatePronunciation(
[FromForm] IFormFile audio,
[FromForm] string referenceText,
[FromForm] string flashcardId,
[FromForm] string language = "en-US")
{
// 1. 驗證請求
if (audio == null || audio.Length == 0)
return BadRequest("音頻檔案不能為空");
if (audio.Length > 10 * 1024 * 1024) // 10MB 限制
return BadRequest("音頻檔案過大");
// 2. 處理音頻流
using var audioStream = audio.OpenReadStream();
// 3. 呼叫 Azure Speech Services
var result = await _assessmentService.EvaluatePronunciationAsync(
audioStream, referenceText, language);
// 4. 儲存評估記錄到資料庫
// 5. 返回結果
return Ok(result);
}
}
```
#### 2. **Azure Speech Services 整合**
```csharp
public class AzurePronunciationAssessmentService : IPronunciationAssessmentService
{
public async Task<PronunciationResult> EvaluatePronunciationAsync(
Stream audioStream, string referenceText, string language)
{
// 1. 設定 Azure Speech Config
var speechConfig = SpeechConfig.FromSubscription(
_options.SubscriptionKey,
_options.Region
);
speechConfig.SpeechRecognitionLanguage = language;
// 2. 設定發音評估參數
var pronunciationConfig = PronunciationAssessmentConfig.Create(
referenceText,
GradingSystem.HundredMark,
Granularity.Word, // 單詞級別評估
enableMiscue: true // 啟用錯誤檢測
);
// 3. 設定音頻配置
using var audioConfig = AudioConfig.FromStreamInput(
AudioInputStream.CreatePushStream()
);
// 4. 建立語音識別器
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
pronunciationConfig.ApplyTo(recognizer);
// 5. 處理音頻並獲取結果
var result = await recognizer.RecognizeOnceAsync();
// 6. 解析評估結果
var pronunciationResult = PronunciationAssessmentResult.FromResult(result);
// 7. 轉換為系統格式
return new PronunciationResult
{
OverallScore = pronunciationResult.AccuracyScore,
AccuracyScore = pronunciationResult.AccuracyScore,
FluencyScore = pronunciationResult.FluencyScore,
CompletenessScore = pronunciationResult.CompletenessScore,
ProsodyScore = pronunciationResult.ProsodyScore,
TranscribedText = result.Text,
ProcessingTime = stopwatch.ElapsedMilliseconds
};
}
}
```
---
## 🌍 環境配置規格
### appsettings.json 配置
```json
{
"AzureSpeech": {
"SubscriptionKey": "${AZURE_SPEECH_KEY}",
"Region": "eastus",
"Language": "en-US",
"EnableDetailedResult": true,
"TimeoutSeconds": 30,
"MaxAudioSizeMB": 10,
"SupportedFormats": ["audio/wav", "audio/webm", "audio/mp3"]
}
}
```
### 環境變數
```bash
# 開發環境
AZURE_SPEECH_KEY=your_azure_speech_key_here
AZURE_SPEECH_REGION=eastus
# 生產環境 (使用 Azure Key Vault)
AZURE_SPEECH_KEY_VAULT_URL=https://dramaling-vault.vault.azure.net/
```
---
## 📱 複習系統整合
### 1. Quiz Type 擴展
**更新位置**: `hooks/review/useReviewSession.ts`
```typescript
// 類型定義更新
interface QuizItem {
quizType: 'flip-card' | 'vocab-choice' | 'sentence-speaking'
}
// 生成邏輯擴展 (Line 110-132)
quizItems.push(
// 現有題目類型...
{
id: `${card.id}-sentence-speaking`,
cardId: card.id,
cardData: cardState,
quizType: 'sentence-speaking',
order: order++,
isCompleted: false,
wrongCount: 0,
skipCount: 0
}
)
```
### 2. 渲染邏輯擴展
**更新位置**: `app/review/page.tsx` (Line 332-350)
```typescript
// 添加新的條件渲染
{currentQuizItem.quizType === 'sentence-speaking' && (
<SentenceSpeakingQuiz
card={currentCard}
onAnswer={handleAnswer}
onSkip={handleSkip}
/>
)}
```
---
## 🎨 用戶介面設計
### 錄音狀態 UI
#### **錄音前**
```html
<button class="bg-red-500 hover:bg-red-600">
🎤 開始錄音
</button>
<p class="text-gray-600">點擊開始錄製例句發音</p>
```
#### **錄音中**
```html
<button class="bg-red-600 animate-pulse">
⏹️ 停止錄音
</button>
<div class="flex items-center gap-2">
<div class="w-2 h-2 bg-red-500 rounded-full animate-ping"></div>
<span>錄音中... {recordingTime}s</span>
</div>
```
#### **處理中**
```html
<div class="animate-spin rounded-full h-8 w-8 border-b-2 border-blue-600"></div>
<p>AI 正在評估發音... (約需 2-3 秒)</p>
```
#### **結果顯示**
```html
<div class="bg-blue-50 border border-blue-200 rounded-lg p-6">
<h4 class="font-semibold text-blue-900 mb-3">發音評估結果</h4>
<!-- 總分顯示 -->
<div class="flex items-center gap-3 mb-4">
<div class="text-3xl font-bold text-blue-600">{overallScore}</div>
<div class="text-gray-600">總分 (滿分 100)</div>
</div>
<!-- 詳細評分 -->
<div class="grid grid-cols-2 gap-3 mb-4">
<div class="bg-white p-3 rounded border">
<div class="text-sm text-gray-600">準確度</div>
<div class="font-semibold text-lg">{accuracyScore}</div>
</div>
<!-- 其他評分項目... -->
</div>
<!-- 語音轉文字結果 -->
<div class="bg-gray-50 p-3 rounded border mb-4">
<div class="text-sm text-gray-600 mb-1">識別結果</div>
<div class="font-mono text-sm">{transcribedText}</div>
</div>
<!-- 改善建議 -->
<div class="space-y-1">
{feedback.map(item => (
<div class="text-sm text-blue-700">• {item}</div>
))}
</div>
</div>
```
---
## 🔄 資料流程設計
### 完整流程
```mermaid
graph TD
A[用戶點擊錄音] --> B[前端開始錄音]
B --> C[用戶說完點擊停止]
C --> D[前端生成音頻 Blob]
D --> E[上傳到後端 API]
E --> F[後端接收音頻檔案]
F --> G[呼叫 Azure Speech Services]
G --> H[Azure 返回評估結果]
H --> I[儲存到資料庫]
I --> J[返回評分給前端]
J --> K[前端顯示結果]
K --> L[映射到信心等級]
L --> M[更新複習進度]
```
### 錯誤處理流程
```mermaid
graph TD
A[API 請求] --> B{驗證音頻}
B -->|失敗| C[返回驗證錯誤]
B -->|成功| D[呼叫 Azure API]
D -->|成功| E[處理結果]
D -->|失敗| F{錯誤類型}
F -->|網路| G[返回重試提示]
F -->|配額| H[返回配額錯誤]
F -->|其他| I[返回一般錯誤]
```
---
## 🚀 實施階段規劃
### 第一階段:基礎架構
1. ✅ 後端 Azure Speech Services 整合
2. ✅ 基礎 API 端點實現
3. ✅ 資料庫 Schema 更新
4. ✅ 環境配置設定
### 第二階段:前端整合
1. ✅ AudioRecorder 共用組件開發
2. ✅ SentenceSpeakingQuiz 組件重構
3. ✅ API 服務客戶端實現
4. ✅ 複習系統整合
### 第三階段:優化和測試
1. ✅ 錄音品質優化
2. ✅ 評分準確度調整
3. ✅ 錯誤處理完善
4. ✅ 效能和穩定性測試
---
## 🔧 開發工具和配置
### 開發環境需求
- **Azure Speech Services 帳戶** (免費層每月 5,000 次請求)
- **音頻測試環境** (需要麥克風的開發設備)
- **HTTPS 環境** (Web Audio API 需要安全連接)
### 測試策略
- **單元測試**: Azure 服務模擬
- **整合測試**: 端對端音頻流程
- **負載測試**: 併發請求處理
- **用戶測試**: 真實發音評估準確性
### 部署考量
- **音頻檔案暫存**: 處理後立即清理
- **Azure 配額管理**: 監控使用量避免超限
- **CDN 配置**: 靜態資源優化
- **負載平衡**: 處理高併發錄音請求
---
## 📈 效能指標和監控
### 關鍵指標
- **評估延遲**: 目標 < 3
- **準確率**: 與人工評估比較 > 85%
- **成功率**: API 請求成功率 > 99%
- **用戶滿意度**: 發音改善效果追蹤
### 監控項目
- Azure API 請求次數和耗時
- 音頻檔案大小分佈
- 評分分佈統計
- 錯誤類型統計
---
## 💰 成本估算
### Azure Speech Services 定價 (2024)
- **免費層**: 每月 5,000 次請求
- **標準層**: $1 USD / 1,000 次請求
- **預估使用**: 100 用戶 × 10 次/日 = 30,000 次/月
- **月成本**: ~$30 USD (超出免費額度部分)
### 建議成本控制
- 實施請求快取避免重複評估
- 設定用戶每日使用限額
- 監控異常使用模式
---
## 🔐 安全性規格
### 音頻資料保護
- **傳輸加密**: HTTPS/TLS 1.3
- **暫存清理**: 處理完成後立即刪除音頻檔案
- **存取控制**: 僅評估用戶自己的錄音
### API 安全
- **速率限制**: 每用戶每分鐘最多 10 次請求
- **檔案驗證**: 檢查音頻格式和內容
- **輸入清理**: 防止注入攻擊
---
## 📚 技術參考資料
### Microsoft 官方文檔
- [Azure Speech Services Pronunciation Assessment](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment)
- [Speech SDK for C#](https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech)
- [Interactive Language Learning Tutorial](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-learning-with-pronunciation-assessment)
### 實作範例
- [GitHub Azure Speech Samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk)
- [Pronunciation Assessment Samples](https://github.com/Azure-Samples/azure-ai-speech/tree/main/pronunciation-assessment)
---
## ✅ 驗收標準
### 功能驗收
1. ✅ 用戶能成功錄製 1-30 秒的音頻
2. ✅ 後端能準確評估發音並返回多維度評分
3. ✅ 前端能清晰顯示評分結果和改善建議
4. ✅ 評分能正確映射到複習系統的信心等級
### 效能驗收
1. ✅ 音頻處理延遲 < 5
2. API 回應時間 < 10 (包含網路延遲)
3. 系統能處理併發錄音請求
4. 無記憶體洩漏或音頻檔案堆積
### 用戶體驗驗收
1. 錄音過程直觀易懂
2. 評分結果有意義且具建設性
3. 錯誤提示清晰有幫助
4. 與現有複習流程無縫整合
這個規格將為 DramaLing 增加強大的口說練習功能提升學習者的發音能力和語言實際應用技能