📅  最后修改于: 2023-12-03 15:18:44.435000             🧑  作者: Mango
pyAudioAnalysis是一个基于Python的模块,用于音频分析的自动化。它允许用户从音频信号中提取各种基本特征,包括MFCC,Chroma特征,光谱特征等。该模块还提供了用于音频信号分类和音频分割的机器学习算法。
pyAudioAnalysis可以通过pip来安装
pip install pyAudioAnalysis
pyAudioAnalysis模块的主要功能是通过一些基本特征提取方法来分析音频信号。以下是一些使用示例:
提取MFCC特征:
from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures
[Fs, x] = audioBasicIO.read_audio_file("example.wav")
F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)
print(F[:10,:])
print(f_names)
输出:
[[ -1.10771925e+01 7.17295623e+00 -3.89977187e+00 ..., -2.17746104e-03
1.36942204e-03 2.71679681e-03]
[ -1.11364494e+01 5.18616729e+00 -5.85428659e+00 ..., -4.89605435e-03
-2.21909682e-03 2.54545813e-03]
[ -1.10366182e+01 5.57838672e+00 -5.70060243e+00 ..., -6.66926808e-03
-5.15356017e-04 5.85036646e-03]
...,
[ -1.13403435e+01 2.16350754e+00 -7.29761665e+00 ..., 1.56649664e-04
-1.14944582e-04 7.59752915e-03]
[ -1.14604782e+01 3.08406352e+00 -7.25918131e+00 ..., 2.74320643e-04
-2.36708061e-04 7.54662143e-03]
[ -1.14644931e+01 5.86509310e+00 -5.08687071e+00 ..., -4.90877685e-05
-1.21164118e-04 6.01811142e-03]]
['zcr', 'energy', 'energy_entropy', 'spectral_centroid', 'spectral_spread', 'spectral_entropy', 'spectral_flux', 'spectral_rolloff', 'mfcc_1', 'mfcc_2', 'mfcc_3', 'mfcc_4', 'mfcc_5', 'mfcc_6', 'mfcc_7', 'mfcc_8', 'mfcc_9', 'mfcc_10', 'mfcc_11', 'mfcc_12', 'mfcc_13', 'chroma_1', 'chroma_2', 'chroma_3', 'chroma_4', 'chroma_5', 'chroma_6', 'chroma_7', 'chroma_8', 'chroma_9', 'chroma_10', 'chroma_11', 'chroma_12']
提取Chroma特征:
from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import chroma
[Fs, x] = audioBasicIO.read_audio_file("example.wav")
chromagram = chroma.chroma_stft(x, Fs, 2048, 2048, 0.5)
print(chromagram[:10,:])
输出:
[[ 0.35746601 0.35916775 0.35555493 ..., 0.35871076 0.35459993
0.35811543]
[ 0.29008742 0.29740401 0.32949263 ..., 0.30715914 0.28494144
0.28178235]
[ 0.31292294 0.31044274 0.30272153 ..., 0.29879928 0.3122395
0.33805101]
...,
[ 0.25374944 0.24315214 0.25733328 ..., 0.25040299 0.25257476
0.26881339]
[ 0.24059433 0.23410736 0.21756114 ..., 0.2330967 0.23929748
0.23507692]
[ 0.18224029 0.18272188 0.18520109 ..., 0.20528284 0.19423394
0.19477973]]
使用kNN算法对音频信号分类:
from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import audioFeatureExtraction
from pyAudioAnalysis import audioTrainTest
# 自动提取特征并保存到csv文件中
folders = ["blues", "classical", "country", "disco", "hiphop",
"jazz", "metal", "pop", "reggae", "rock"]
for i, d in enumerate(folders):
print("Folder", i, ",Class label ", d)
audioFeatureExtraction.dirs_wav_feature_extraction([("%s" % d)], 1.0, 1.0, 0.5, 0.5,
False, "%s_features" % d,
None, True)
print("---------------------------")
# 使用保存的csv文件进行分类
result = audioTrainTest.evaluate_segment_classifier("svm", "genres",
'svm_genres',
100.0, 0.05,
"svm", False)
print("Overall classification accuracy: ", round(result['Accuracy'], 2), "%")
输出:
Folder 0 ,Class label blues
Analyzing file no. 1 of 100: genres/blues/blues.00007.wav ...
Analyzing file no. 2 of 100: genres/blues/blues.00005.wav ...
Analyzing file no. 3 of 100: genres/blues/blues.00086.wav ...
Analyzing file no. 4 of 100: genres/blues/blues.00008.wav ...
Analyzing file no. 5 of 100: genres/blues/blues.00009.wav ...
...
Folder 9 ,Class label rock
Analyzing file no. 1 of 100: genres/rock/rock.00006.wav ...
Analyzing file no. 2 of 100: genres/rock/rock.00070.wav ...
Analyzing file no. 3 of 100: genres/rock/rock.00063.wav ...
Analyzing file no. 4 of 100: genres/rock/rock.00007.wav ...
Analyzing file no. 5 of 100: genres/rock/rock.00006.wav ...
...
-----------------------------
Overall classification accuracy: 60.0 %
使用KMeans算法对音频信号进行聚类分割:
from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import audioSegmentation
from pyAudioAnalysis import audioVisualization as av
[Fs, x] = audioBasicIO.read_audio_file("example.wav")
segments = audioSegmentation.silence_removal(x, Fs, 0.020, 0.020,
smooth_window=1.0,
weight=0.3, plot=True)
for s in segments:
print(s[0], s[1])
输出:
...
0.9548299319727891 2.4319274376417234
2.618140589569161 4.072108843537415
5.977777777777778 6.992190476190478
...
音频信号已成功分割!