透過 Audio API 的 analyser,我們可取得聲音在頻域的平均值,可設定 threshold 判斷是否為 人聲。
Version
Vue 3.4
extendable-media-recorder 9.1.7
extendable-media-recorder-wav-encoder 7.0.102
Install Package
$ npm install extendable-media-recorder
$ npm install extendable-media-recorder-wav-encoder
extendable-media-recorder:提供與原生MediaRecorder相同 interface 的新MediaRecorderextendable-media-recorder-wav-encoder:支援WAV格式
Architecture
MediaStream: 從麥克風取得 streamAudioContext: 產生 16 bit stream- 由
AudioContext產生SrcNode、AnalyserNode與DestNode,需使用connect()將各 node 連結在一起 AnalyserNode:轉頻域
Speech Detection
main.js
import { createApp } from 'vue'
import App from './App.vue'
import { register } from 'extendable-media-recorder'
import { connect } from 'extendable-media-recorder-wav-encoder'
await register(await connect())
createApp(App).mount('#app')
Line 7
await register(await connect())
- 與原生
MediaRecorder不同,extendable-media-recorder所提供的MediaRecorder必須先註冊才能使用 register()不能寫在每個頁面的mounted(),只要 route 改變重新進入該 page,就會造成重複註冊的錯誤,因此只能寫在main.js只註冊一次
App.vue
<template>
<div>Speech Detection Demo</div>
<div>
<button @click="onStart">Start</button>
<button @click="onStop">Stop</button>
</div>
<div>
Threshold:
<input type="text" v-model.number="threshold" />
</div>
<div>avgVolume: {{ avgVolume }}</div>
<div>isSpeaking: {{ isSpeaking }}</div>
</template>
<script setup>
import { ref } from 'vue'
import { MediaRecorder } from 'extendable-media-recorder'
const SAMPLE_RATE = 16000
const SAMPLE_INTERVAL = 250 // 0.5s
let threshold = ref(40)
let avgVolume = ref(0)
let isSpeaking = ref(false)
let mediaRecorder = null
let onStart = async () => {
try {
let mediaStream = await navigator.mediaDevices.getUserMedia({
audio: true
})
let audioContext = new AudioContext({ sampleRate: SAMPLE_RATE })
let analyserNode = audioContext.createAnalyser()
analyserNode.fftSize = 256
let srcNode = new MediaStreamAudioSourceNode(audioContext, { mediaStream })
let destNode = new MediaStreamAudioDestinationNode(audioContext)
srcNode.connect(analyserNode)
analyserNode.connect(destNode)
mediaRecorder = new MediaRecorder(destNode.stream, {
mimeType: 'audio/wav'
})
mediaRecorder.ondataavailable = () => {
let bufferLength = analyserNode.frequencyBinCount
let bufferArray = new Uint8Array(bufferLength)
analyserNode.getByteFrequencyData(bufferArray)
avgVolume.value = bufferArray.reduce((sum, value) => sum + value, 0) / bufferLength
isSpeaking.value = avgVolume.value > threshold.value
}
mediaRecorder.start(SAMPLE_INTERVAL)
} catch (err) {
console.warn(err)
}
}
let onStop = () => {
mediaRecorder.stop()
}
</script>
Line 19
const SAMPLE_RATE = 16000
const SAMPLE_INTERVAL = 250 // 0.5s
設定可改用的變數:
SAMPLE_RATE:取樣頻率SAMPLE_INTERVAL:每次取樣時間
Line 22
let threshold = ref(40)
let avgVolume = ref(0)
let isSpeaking = ref(false)
thresholdstate:設定判斷聲音大小的 threshold 值avgVolumestate:顯示目前平均聲音值isSpeakingstate:顯示目前是否在說話
Line 30
let mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true })
- 使用 Web API 內建的
navigator.mediaDevices.getUserMedia()取得MediaStream
Line 34
let audioContext = new AudioContext({ sampleRate: SAMPLE_RATE })
- 要改變取樣頻率,必須使用
AudioContext
Line 35
let analyserNode = audioContext.createAnalyser()
analyserNode.fftSize = 256
AudioConext必須靠node方式運行- 由
AudioContext建立analyserNode,將由此 node 轉頻域 - 設計其
ffSize為256
Line 38
let srcNode = new MediaStreamAudioSourceNode(audioContext, { mediaStream })
let destNode = new MediaStreamAudioDestinationNode(audioContext)
由
AudioContext與MediaStream建立srcNode由
AudioContext建立destNode
Line 41
srcNode.connect(analyserNode)
analyserNode.connect(destNode)
- 使用
connect()連接各 node
Line 44
mediaRecorder = new MediaRecorder(destNode.stream, { mimeType: 'audio/wav' })
- 使用
extendable-media-recorder提供的MediaRecorder,並改由destNode所處理過的MediaStream - 指定
mineType為audio/wav
Line 60
mediaRecorder.start(SAMPLE_INTERVAL)
- 只要設定
SAMPLE_INTERVAL時間,則每次dataavailableevent 都會被精準觸發
Line 66
let onStop = () => {
mediaRecorder.stop()
}
- 啟動
MediaRecoder開始錄音
Line 48
mediaRecorder.ondataavailable = () => {
let bufferLength = analyserNode.frequencyBinCount
let bufferArray = new Uint8Array(bufferLength)
analyserNode.getByteFrequencyData(bufferArray)
avgVolume.value = bufferArray.reduce((sum, value) => sum + value, 0) / bufferLength
isSpeaking.value = avgVolume.value > threshold.value
}
- 由
analyserNode的frequencyBinCount()與getByteFrequencyData()取得頻域資料 Array - 由 Array 資料計算其頻域聲音平均值
- 若平均大於
threshold則為人聲
Conclusion
- Analyser 為 Audio API 一部分,不必另外安裝其他套件