713 lines
19 KiB
Markdown
713 lines
19 KiB
Markdown
# DramaLing 語音功能規格書
|
||
## TTS 語音發音 & 語音辨識系統
|
||
|
||
---
|
||
|
||
## 📋 **專案概況**
|
||
|
||
**文件版本**: 1.0
|
||
**建立日期**: 2025-09-19
|
||
**最後更新**: 2025-09-19
|
||
**負責人**: DramaLing 開發團隊
|
||
|
||
### **功能目標**
|
||
基於現有 DramaLing 詞彙學習平台,整合 TTS (文字轉語音) 和語音辨識功能,提供完整的語音學習體驗,包括發音播放、口說練習與評分。
|
||
|
||
---
|
||
|
||
## 🎯 **核心功能需求**
|
||
|
||
### **1. TTS 語音發音系統**
|
||
|
||
#### **1.1 基礎發音功能**
|
||
- **目標詞彙發音**
|
||
- 支援美式/英式發音切換
|
||
- 高品質音頻輸出 (16kHz 以上)
|
||
- 響應時間 < 500ms
|
||
- 支援 IPA 音標同步顯示
|
||
|
||
- **例句發音**
|
||
- 完整例句語音播放
|
||
- 重點詞彙高亮顯示
|
||
- 語速調整 (0.5x - 2.0x)
|
||
- 自動斷句處理
|
||
|
||
#### **1.2 進階播放功能**
|
||
- **智能播放模式**
|
||
- 單詞→例句→重複循環
|
||
- 自動暫停間隔可調 (1-5秒)
|
||
- 背景學習模式
|
||
- 睡前學習模式 (漸弱音量)
|
||
|
||
- **個人化設定**
|
||
- 預設語音類型選擇
|
||
- 播放速度記憶
|
||
- 音量控制
|
||
- 靜音模式支援
|
||
|
||
#### **1.3 學習模式整合**
|
||
- **翻卡模式**
|
||
- 點擊播放按鈕發音
|
||
- 自動播放開關
|
||
- 正面/背面分別播放
|
||
|
||
- **測驗模式**
|
||
- 聽力測驗音頻播放
|
||
- 題目語音朗讀
|
||
- 正確答案發音確認
|
||
|
||
---
|
||
|
||
### **2. 語音辨識與口說練習**
|
||
|
||
#### **2.1 發音練習功能**
|
||
- **單詞發音練習**
|
||
- 錄音與標準發音比對
|
||
- 音素級別評分 (0-100分)
|
||
- 錯誤音素標記與建議
|
||
- 重複練習直到達標
|
||
|
||
- **例句朗讀練習**
|
||
- 完整句子發音評估
|
||
- 流暢度評分
|
||
- 語調評估
|
||
- 語速分析
|
||
|
||
#### **2.2 智能評分系統**
|
||
- **多維度評分**
|
||
- 準確度 (Accuracy): 音素正確性
|
||
- 流暢度 (Fluency): 語速與停頓
|
||
- 完整度 (Completeness): 內容完整性
|
||
- 音調 (Prosody): 語調與重音
|
||
|
||
- **評分標準**
|
||
- A級 (90-100分): 接近母語水準
|
||
- B級 (80-89分): 良好,輕微口音
|
||
- C級 (70-79分): 可理解,需改進
|
||
- D級 (60-69分): 困難理解
|
||
- F級 (0-59分): 需大幅改進
|
||
|
||
#### **2.3 漸進式學習**
|
||
- **難度等級**
|
||
- 初級: 單音節詞彙
|
||
- 中級: 多音節詞彙與短句
|
||
- 高級: 複雜句型與連讀
|
||
|
||
- **個人化調整**
|
||
- 根據 CEFR 等級調整標準
|
||
- 學習進度追蹤
|
||
- 弱點分析與強化練習
|
||
|
||
---
|
||
|
||
## 🏗️ **技術架構設計**
|
||
|
||
### **3. 前端架構**
|
||
|
||
#### **3.1 UI 組件設計**
|
||
```typescript
|
||
// AudioPlayer 組件
|
||
interface AudioPlayerProps {
|
||
text: string
|
||
audioUrl?: string
|
||
accent: 'us' | 'uk'
|
||
speed: number
|
||
autoPlay: boolean
|
||
onPlayStart?: () => void
|
||
onPlayEnd?: () => void
|
||
}
|
||
|
||
// VoiceRecorder 組件
|
||
interface VoiceRecorderProps {
|
||
targetText: string
|
||
onRecordingComplete: (audioBlob: Blob) => void
|
||
onScoreReceived: (score: PronunciationScore) => void
|
||
maxDuration: number
|
||
}
|
||
|
||
// PronunciationScore 類型
|
||
interface PronunciationScore {
|
||
overall: number
|
||
accuracy: number
|
||
fluency: number
|
||
completeness: number
|
||
prosody: number
|
||
phonemes: PhonemeScore[]
|
||
}
|
||
```
|
||
|
||
#### **3.2 狀態管理**
|
||
```typescript
|
||
// Zustand Store
|
||
interface AudioStore {
|
||
// TTS 狀態
|
||
isPlaying: boolean
|
||
currentAudio: HTMLAudioElement | null
|
||
playbackSpeed: number
|
||
preferredAccent: 'us' | 'uk'
|
||
|
||
// 語音辨識狀態
|
||
isRecording: boolean
|
||
recordingData: Blob | null
|
||
lastScore: PronunciationScore | null
|
||
|
||
// 操作方法
|
||
playTTS: (text: string, accent?: 'us' | 'uk') => Promise<void>
|
||
stopAudio: () => void
|
||
startRecording: () => void
|
||
stopRecording: () => Promise<Blob>
|
||
evaluatePronunciation: (audio: Blob, text: string) => Promise<PronunciationScore>
|
||
}
|
||
```
|
||
|
||
### **4. 後端 API 設計**
|
||
|
||
#### **4.1 TTS API 端點**
|
||
```csharp
|
||
// Controllers/AudioController.cs
|
||
[ApiController]
|
||
[Route("api/[controller]")]
|
||
public class AudioController : ControllerBase
|
||
{
|
||
[HttpPost("tts")]
|
||
public async Task<IActionResult> GenerateAudio([FromBody] TTSRequest request)
|
||
{
|
||
// 生成語音檔案
|
||
// 回傳音檔 URL 或 Base64
|
||
}
|
||
|
||
[HttpGet("tts/cache/{hash}")]
|
||
public async Task<IActionResult> GetCachedAudio(string hash)
|
||
{
|
||
// 回傳快取的音檔
|
||
}
|
||
}
|
||
|
||
// DTOs
|
||
public class TTSRequest
|
||
{
|
||
public string Text { get; set; }
|
||
public string Accent { get; set; } // "us" or "uk"
|
||
public float Speed { get; set; } = 1.0f
|
||
public string Voice { get; set; }
|
||
}
|
||
```
|
||
|
||
#### **4.2 語音評估 API**
|
||
```csharp
|
||
[HttpPost("pronunciation/evaluate")]
|
||
public async Task<IActionResult> EvaluatePronunciation([FromForm] PronunciationRequest request)
|
||
{
|
||
// 處理音檔上傳
|
||
// 調用語音評估服務
|
||
// 回傳評分結果
|
||
}
|
||
|
||
public class PronunciationRequest
|
||
{
|
||
public IFormFile AudioFile { get; set; }
|
||
public string TargetText { get; set; }
|
||
public string UserLevel { get; set; } // CEFR level
|
||
}
|
||
|
||
public class PronunciationResponse
|
||
{
|
||
public int OverallScore { get; set; }
|
||
public float Accuracy { get; set; }
|
||
public float Fluency { get; set; }
|
||
public float Completeness { get; set; }
|
||
public float Prosody { get; set; }
|
||
public List<PhonemeScore> PhonemeScores { get; set; }
|
||
public List<string> Suggestions { get; set; }
|
||
}
|
||
```
|
||
|
||
### **5. 第三方服務整合**
|
||
|
||
#### **5.1 TTS 服務選型**
|
||
**主要選擇: Azure Cognitive Services Speech**
|
||
- **優點**: 高品質、多語言、價格合理
|
||
- **語音選項**:
|
||
- 美式: `en-US-AriaNeural`, `en-US-GuyNeural`
|
||
- 英式: `en-GB-SoniaNeural`, `en-GB-RyanNeural`
|
||
- **SSML 支援**: 語速、音調、停頓控制
|
||
- **成本**: $4/百萬字符
|
||
|
||
**備用選擇: Google Cloud Text-to-Speech**
|
||
- **優點**: 自然度高、WaveNet 技術
|
||
- **成本**: $4-16/百萬字符
|
||
|
||
#### **5.2 語音辨識服務**
|
||
**主要選擇: Azure Speech Services Pronunciation Assessment**
|
||
- **功能**: 音素級評分、流暢度分析
|
||
- **支援格式**: WAV, MP3, OGG
|
||
- **評分維度**: 準確度、流暢度、完整度、韻律
|
||
- **成本**: $1/小時音頻
|
||
|
||
**技術整合範例**:
|
||
```csharp
|
||
public class AzureSpeechService
|
||
{
|
||
private readonly SpeechConfig _speechConfig;
|
||
|
||
public async Task<string> GenerateAudioAsync(string text, string voice)
|
||
{
|
||
using var synthesizer = new SpeechSynthesizer(_speechConfig);
|
||
var ssml = CreateSSML(text, voice);
|
||
var result = await synthesizer.SpeakSsmlAsync(ssml);
|
||
|
||
// 存儲到 Azure Blob Storage
|
||
return await SaveAudioToStorage(result.AudioData);
|
||
}
|
||
|
||
public async Task<PronunciationScore> EvaluateAsync(byte[] audioData, string referenceText)
|
||
{
|
||
var pronunciationConfig = new PronunciationAssessmentConfig(
|
||
referenceText,
|
||
PronunciationAssessmentGradingSystem.FivePoint,
|
||
PronunciationAssessmentGranularity.Phoneme);
|
||
|
||
// 執行評估...
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 💾 **數據存儲設計**
|
||
|
||
### **6. 數據庫架構**
|
||
|
||
#### **6.1 音頻快取表**
|
||
```sql
|
||
CREATE TABLE audio_cache (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
text_hash VARCHAR(64) UNIQUE NOT NULL, -- 文字內容的 SHA-256
|
||
text_content TEXT NOT NULL,
|
||
accent VARCHAR(2) NOT NULL, -- 'us' or 'uk'
|
||
voice_id VARCHAR(50) NOT NULL,
|
||
audio_url TEXT NOT NULL,
|
||
file_size INTEGER,
|
||
duration_ms INTEGER,
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
last_accessed TIMESTAMP DEFAULT NOW(),
|
||
access_count INTEGER DEFAULT 1,
|
||
|
||
INDEX idx_text_hash (text_hash),
|
||
INDEX idx_last_accessed (last_accessed)
|
||
);
|
||
```
|
||
|
||
#### **6.2 發音評估記錄**
|
||
```sql
|
||
CREATE TABLE pronunciation_assessments (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
|
||
flashcard_id UUID REFERENCES flashcards(id) ON DELETE CASCADE,
|
||
target_text TEXT NOT NULL,
|
||
audio_url TEXT,
|
||
|
||
-- 評分結果
|
||
overall_score INTEGER NOT NULL,
|
||
accuracy_score DECIMAL(5,2),
|
||
fluency_score DECIMAL(5,2),
|
||
completeness_score DECIMAL(5,2),
|
||
prosody_score DECIMAL(5,2),
|
||
|
||
-- 詳細分析
|
||
phoneme_scores JSONB, -- 音素級評分
|
||
suggestions TEXT[],
|
||
|
||
-- 學習情境
|
||
study_session_id UUID REFERENCES study_sessions(id),
|
||
practice_mode VARCHAR(20), -- 'word', 'sentence', 'conversation'
|
||
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
|
||
INDEX idx_user_flashcard (user_id, flashcard_id),
|
||
INDEX idx_session (study_session_id)
|
||
);
|
||
```
|
||
|
||
#### **6.3 語音設定表**
|
||
```sql
|
||
CREATE TABLE user_audio_preferences (
|
||
user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
|
||
|
||
-- TTS 偏好
|
||
preferred_accent VARCHAR(2) DEFAULT 'us',
|
||
preferred_voice_male VARCHAR(50),
|
||
preferred_voice_female VARCHAR(50),
|
||
default_speed DECIMAL(3,1) DEFAULT 1.0,
|
||
auto_play_enabled BOOLEAN DEFAULT false,
|
||
|
||
-- 語音練習偏好
|
||
pronunciation_difficulty VARCHAR(20) DEFAULT 'medium', -- 'easy', 'medium', 'strict'
|
||
target_score_threshold INTEGER DEFAULT 80,
|
||
enable_detailed_feedback BOOLEAN DEFAULT true,
|
||
|
||
updated_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## 🎨 **用戶體驗設計**
|
||
|
||
### **7. 界面設計規範**
|
||
|
||
#### **7.1 TTS 播放控制**
|
||
```jsx
|
||
// AudioControls 組件設計
|
||
const AudioControls = ({ text, accent, onPlay, onStop }) => (
|
||
<div className="flex items-center gap-3 p-3 bg-gray-50 rounded-lg">
|
||
{/* 播放按鈕 */}
|
||
<button
|
||
onClick={isPlaying ? onStop : onPlay}
|
||
className="flex items-center justify-center w-10 h-10 bg-blue-600 text-white rounded-full hover:bg-blue-700 transition-colors"
|
||
>
|
||
{isPlaying ? <PauseIcon /> : <PlayIcon />}
|
||
</button>
|
||
|
||
{/* 語言切換 */}
|
||
<div className="flex gap-1">
|
||
<AccentButton accent="us" active={accent === 'us'} />
|
||
<AccentButton accent="uk" active={accent === 'uk'} />
|
||
</div>
|
||
|
||
{/* 速度控制 */}
|
||
<SpeedSlider
|
||
value={speed}
|
||
onChange={setSpeed}
|
||
min={0.5}
|
||
max={2.0}
|
||
step={0.1}
|
||
/>
|
||
|
||
{/* 音標顯示 */}
|
||
<span className="text-sm text-gray-600 font-mono">
|
||
{pronunciation}
|
||
</span>
|
||
</div>
|
||
);
|
||
```
|
||
|
||
#### **7.2 語音錄製界面**
|
||
```jsx
|
||
const VoiceRecorder = ({ targetText, onScoreReceived }) => {
|
||
const [isRecording, setIsRecording] = useState(false);
|
||
const [recordingTime, setRecordingTime] = useState(0);
|
||
const [lastScore, setLastScore] = useState(null);
|
||
|
||
return (
|
||
<div className="voice-recorder p-6 border-2 border-dashed border-gray-300 rounded-xl">
|
||
{/* 目標文字顯示 */}
|
||
<div className="text-center mb-6">
|
||
<h3 className="text-lg font-semibold mb-2">請朗讀以下內容:</h3>
|
||
<p className="text-2xl font-medium text-gray-800 p-4 bg-blue-50 rounded-lg">
|
||
{targetText}
|
||
</p>
|
||
</div>
|
||
|
||
{/* 錄音控制 */}
|
||
<div className="flex flex-col items-center gap-4">
|
||
<button
|
||
onClick={isRecording ? stopRecording : startRecording}
|
||
className={`w-20 h-20 rounded-full flex items-center justify-center transition-all ${
|
||
isRecording
|
||
? 'bg-red-500 hover:bg-red-600 animate-pulse'
|
||
: 'bg-blue-500 hover:bg-blue-600'
|
||
} text-white`}
|
||
>
|
||
{isRecording ? <StopIcon size={32} /> : <MicIcon size={32} />}
|
||
</button>
|
||
|
||
{/* 錄音時間 */}
|
||
{isRecording && (
|
||
<div className="text-sm text-gray-600">
|
||
錄音中... {formatTime(recordingTime)}
|
||
</div>
|
||
)}
|
||
|
||
{/* 評分結果 */}
|
||
{lastScore && (
|
||
<ScoreDisplay score={lastScore} />
|
||
)}
|
||
</div>
|
||
</div>
|
||
);
|
||
};
|
||
```
|
||
|
||
#### **7.3 評分結果展示**
|
||
```jsx
|
||
const ScoreDisplay = ({ score }) => (
|
||
<div className="score-display w-full max-w-md mx-auto">
|
||
{/* 總分 */}
|
||
<div className="text-center mb-4">
|
||
<div className={`text-4xl font-bold ${getScoreColor(score.overall)}`}>
|
||
{score.overall}
|
||
</div>
|
||
<div className="text-sm text-gray-600">總體評分</div>
|
||
</div>
|
||
|
||
{/* 詳細評分 */}
|
||
<div className="grid grid-cols-2 gap-3 mb-4">
|
||
<ScoreItem label="準確度" value={score.accuracy} />
|
||
<ScoreItem label="流暢度" value={score.fluency} />
|
||
<ScoreItem label="完整度" value={score.completeness} />
|
||
<ScoreItem label="音調" value={score.prosody} />
|
||
</div>
|
||
|
||
{/* 改進建議 */}
|
||
{score.suggestions.length > 0 && (
|
||
<div className="suggestions">
|
||
<h4 className="font-semibold mb-2">💡 改進建議:</h4>
|
||
<ul className="text-sm text-gray-700 space-y-1">
|
||
{score.suggestions.map((suggestion, index) => (
|
||
<li key={index} className="flex items-start gap-2">
|
||
<span className="text-blue-500">•</span>
|
||
{suggestion}
|
||
</li>
|
||
))}
|
||
</ul>
|
||
</div>
|
||
)}
|
||
</div>
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 **效能與優化**
|
||
|
||
### **8. 快取策略**
|
||
|
||
#### **8.1 TTS 快取機制**
|
||
- **本地快取**: 瀏覽器 localStorage 存儲常用音頻 URL
|
||
- **服務端快取**: Redis 快取 TTS 請求結果 (24小時)
|
||
- **CDN 分發**: 音頻檔案透過 CDN 加速分發
|
||
- **預載策略**: 學習模式開始前預載下一批詞彙音頻
|
||
|
||
#### **8.2 音頻檔案管理**
|
||
```csharp
|
||
public class AudioCacheService
|
||
{
|
||
public async Task<string> GetOrCreateAudioAsync(string text, string accent)
|
||
{
|
||
var cacheKey = GenerateCacheKey(text, accent);
|
||
|
||
// 檢查快取
|
||
var cachedUrl = await _cache.GetStringAsync(cacheKey);
|
||
if (!string.IsNullOrEmpty(cachedUrl))
|
||
{
|
||
await UpdateAccessTime(cacheKey);
|
||
return cachedUrl;
|
||
}
|
||
|
||
// 生成新音頻
|
||
var audioUrl = await _ttsService.GenerateAsync(text, accent);
|
||
|
||
// 存入快取
|
||
await _cache.SetStringAsync(cacheKey, audioUrl, TimeSpan.FromDays(7));
|
||
|
||
return audioUrl;
|
||
}
|
||
|
||
private string GenerateCacheKey(string text, string accent)
|
||
{
|
||
var combined = $"{text}|{accent}";
|
||
using var sha256 = SHA256.Create();
|
||
var hash = sha256.ComputeHash(Encoding.UTF8.GetBytes(combined));
|
||
return Convert.ToHexString(hash);
|
||
}
|
||
}
|
||
```
|
||
|
||
### **9. 效能指標**
|
||
|
||
#### **9.1 TTS 效能目標**
|
||
- **首次生成延遲**: < 3秒
|
||
- **快取命中延遲**: < 500ms
|
||
- **音頻檔案大小**: < 1MB (30秒內容)
|
||
- **快取命中率**: > 85%
|
||
|
||
#### **9.2 語音辨識效能**
|
||
- **錄音上傳**: < 2秒 (10秒音頻)
|
||
- **評估回應**: < 5秒
|
||
- **準確度**: > 90% (與人工評估對比)
|
||
|
||
---
|
||
|
||
## 💰 **成本分析**
|
||
|
||
### **10. 服務成本估算**
|
||
|
||
#### **10.1 TTS 成本** (基於 Azure Speech)
|
||
- **定價**: $4 USD/百萬字符
|
||
- **月估算**:
|
||
- 100 活躍用戶 × 50 詞/天 × 30天 = 150,000 詞/月
|
||
- 平均 8 字符/詞 = 1,200,000 字符/月
|
||
- **月成本**: $4.8 USD
|
||
|
||
#### **10.2 語音評估成本**
|
||
- **定價**: $1 USD/小時音頻
|
||
- **月估算**:
|
||
- 100 用戶 × 10分鐘練習/天 × 30天 = 500小時/月
|
||
- **月成本**: $500 USD
|
||
|
||
#### **10.3 存儲成本** (Azure Blob Storage)
|
||
- **音頻存儲**: $0.02/GB/月
|
||
- **估算**: 10,000 音頻檔 × 100KB = 1GB
|
||
- **月成本**: $0.02 USD
|
||
|
||
#### **10.4 成本優化策略**
|
||
1. **智能快取**: 減少重複 TTS 請求 80%
|
||
2. **音頻壓縮**: 使用 MP3 格式降低存儲成本
|
||
3. **免費層級**: 提供基礎 TTS,付費解鎖語音評估
|
||
4. **批量處理**: 合併短文本降低 API 調用次數
|
||
|
||
---
|
||
|
||
## 🚀 **開發實施計劃**
|
||
|
||
### **11. 開發階段**
|
||
|
||
#### **第一階段: TTS 基礎功能 (1週)**
|
||
- ✅ Azure Speech Services 整合
|
||
- ✅ 基礎 TTS API 開發
|
||
- ✅ 前端音頻播放組件
|
||
- ✅ 美式/英式發音切換
|
||
- ✅ 快取機制實現
|
||
|
||
#### **第二階段: 進階 TTS 功能 (1週)**
|
||
- ⬜ 語速調整功能
|
||
- ⬜ 自動播放模式
|
||
- ⬜ 音頻預載優化
|
||
- ⬜ 個人化設定
|
||
- ⬜ 學習模式整合
|
||
|
||
#### **第三階段: 語音辨識基礎 (1週)**
|
||
- ⬜ 瀏覽器錄音功能
|
||
- ⬜ 音頻上傳與處理
|
||
- ⬜ Azure 語音評估整合
|
||
- ⬜ 基礎評分顯示
|
||
|
||
#### **第四階段: 口說練習完善 (1週)**
|
||
- ⬜ 詳細評分分析
|
||
- ⬜ 音素級反饋
|
||
- ⬜ 改進建議系統
|
||
- ⬜ 練習記錄與追蹤
|
||
- ⬜ UI/UX 優化
|
||
|
||
### **12. 技術債務與風險**
|
||
|
||
#### **12.1 已知限制**
|
||
- **瀏覽器相容性**: Safari 對 Web Audio API 支援限制
|
||
- **移動端挑戰**: iOS Safari 錄音權限問題
|
||
- **網路依賴**: 離線模式無法使用語音功能
|
||
- **成本控制**: 需嚴格監控 API 使用量
|
||
|
||
#### **12.2 緩解措施**
|
||
1. **降級機制**: API 配額用盡時顯示音標
|
||
2. **錯誤處理**: 網路問題時提供友善提示
|
||
3. **權限管理**: 明確的麥克風權限引導
|
||
4. **監控告警**: 成本異常時自動通知
|
||
|
||
---
|
||
|
||
## 📋 **驗收標準**
|
||
|
||
### **13. 功能測試**
|
||
|
||
#### **13.1 TTS 測試案例**
|
||
- ✅ 單詞發音播放正常
|
||
- ✅ 例句發音完整清晰
|
||
- ✅ 美式/英式發音切換有效
|
||
- ✅ 語速調整範圍 0.5x-2.0x
|
||
- ✅ 快取機制減少 80% 重複請求
|
||
- ✅ 離線快取音頻可正常播放
|
||
|
||
#### **13.2 語音辨識測試**
|
||
- ⬜ 錄音功能在主流瀏覽器正常
|
||
- ⬜ 音頻品質滿足評估需求
|
||
- ⬜ 評分結果與人工評估差異 < 10%
|
||
- ⬜ 5秒內回傳評估結果
|
||
- ⬜ 音素級錯誤標記準確
|
||
|
||
#### **13.3 效能測試**
|
||
- ⬜ TTS 首次請求 < 3秒
|
||
- ⬜ 快取命中 < 500ms
|
||
- ⬜ 音頻檔案 < 1MB (30秒)
|
||
- ⬜ 99% 服務可用性
|
||
- ⬜ 1000 併發用戶支援
|
||
|
||
---
|
||
|
||
## 📚 **附錄**
|
||
|
||
### **14. API 文檔範例**
|
||
|
||
#### **14.1 TTS API**
|
||
```http
|
||
POST /api/audio/tts
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"text": "Hello, world!",
|
||
"accent": "us",
|
||
"speed": 1.0,
|
||
"voice": "aria"
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"audioUrl": "https://cdn.dramaling.com/audio/abc123.mp3",
|
||
"duration": 2.5,
|
||
"cacheHit": false
|
||
}
|
||
```
|
||
|
||
#### **14.2 語音評估 API**
|
||
```http
|
||
POST /api/audio/pronunciation/evaluate
|
||
Content-Type: multipart/form-data
|
||
|
||
audio: [audio file]
|
||
targetText: "Hello, world!"
|
||
userLevel: "B1"
|
||
|
||
Response:
|
||
{
|
||
"overallScore": 85,
|
||
"accuracy": 88.5,
|
||
"fluency": 82.0,
|
||
"completeness": 90.0,
|
||
"prosody": 80.0,
|
||
"phonemeScores": [
|
||
{"phoneme": "/h/", "score": 95},
|
||
{"phoneme": "/ɛ/", "score": 75, "suggestion": "嘴形需要更開"}
|
||
],
|
||
"suggestions": [
|
||
"注意 'world' 的 /r/ 音",
|
||
"整體語調可以更自然"
|
||
]
|
||
}
|
||
```
|
||
|
||
### **15. 相關資源**
|
||
|
||
#### **15.1 技術文檔**
|
||
- [Azure Speech Services 文檔](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/)
|
||
- [Web Audio API 規範](https://www.w3.org/TR/webaudio/)
|
||
- [MediaRecorder API 使用指南](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder)
|
||
|
||
#### **15.2 設計參考**
|
||
- [Duolingo 語音功能分析](https://blog.duolingo.com/how-we-built-pronunciation-features/)
|
||
- [ELSA Speak UI/UX 研究](https://elsaspeak.com/en/)
|
||
|
||
---
|
||
|
||
**文件結束**
|
||
|
||
> 本規格書涵蓋 DramaLing 語音功能的完整設計與實施計劃。如有任何問題或建議,請聯繫開發團隊。 |