📜  pyAudioAnalysis - Python (1)

📅  最后修改于: 2023-12-03 15:18:44.435000             🧑  作者: Mango

pyAudioAnalysis - Python音频分析

pyAudioAnalysis是一个基于Python的模块,用于音频分析的自动化。它允许用户从音频信号中提取各种基本特征,包括MFCC,Chroma特征,光谱特征等。该模块还提供了用于音频信号分类和音频分割的机器学习算法。

安装

pyAudioAnalysis可以通过pip来安装

pip install pyAudioAnalysis
使用方法

pyAudioAnalysis模块的主要功能是通过一些基本特征提取方法来分析音频信号。以下是一些使用示例:

特征提取

提取MFCC特征:

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures

[Fs, x] = audioBasicIO.read_audio_file("example.wav")
F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)
print(F[:10,:])
print(f_names)

输出:

[[ -1.10771925e+01   7.17295623e+00  -3.89977187e+00 ...,  -2.17746104e-03
    1.36942204e-03   2.71679681e-03]
 [ -1.11364494e+01   5.18616729e+00  -5.85428659e+00 ...,  -4.89605435e-03
   -2.21909682e-03   2.54545813e-03]
 [ -1.10366182e+01   5.57838672e+00  -5.70060243e+00 ...,  -6.66926808e-03
   -5.15356017e-04   5.85036646e-03]
 ...,
 [ -1.13403435e+01   2.16350754e+00  -7.29761665e+00 ...,   1.56649664e-04
   -1.14944582e-04   7.59752915e-03]
 [ -1.14604782e+01   3.08406352e+00  -7.25918131e+00 ...,   2.74320643e-04
   -2.36708061e-04   7.54662143e-03]
 [ -1.14644931e+01   5.86509310e+00  -5.08687071e+00 ...,  -4.90877685e-05
   -1.21164118e-04   6.01811142e-03]]
['zcr', 'energy', 'energy_entropy', 'spectral_centroid', 'spectral_spread', 'spectral_entropy', 'spectral_flux', 'spectral_rolloff', 'mfcc_1', 'mfcc_2', 'mfcc_3', 'mfcc_4', 'mfcc_5', 'mfcc_6', 'mfcc_7', 'mfcc_8', 'mfcc_9', 'mfcc_10', 'mfcc_11', 'mfcc_12', 'mfcc_13', 'chroma_1', 'chroma_2', 'chroma_3', 'chroma_4', 'chroma_5', 'chroma_6', 'chroma_7', 'chroma_8', 'chroma_9', 'chroma_10', 'chroma_11', 'chroma_12']

提取Chroma特征:

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import chroma

[Fs, x] = audioBasicIO.read_audio_file("example.wav")
chromagram = chroma.chroma_stft(x, Fs, 2048, 2048, 0.5)
print(chromagram[:10,:])

输出:

[[ 0.35746601  0.35916775  0.35555493 ...,  0.35871076  0.35459993
   0.35811543]
 [ 0.29008742  0.29740401  0.32949263 ...,  0.30715914  0.28494144
   0.28178235]
 [ 0.31292294  0.31044274  0.30272153 ...,  0.29879928  0.3122395
   0.33805101]
 ..., 
 [ 0.25374944  0.24315214  0.25733328 ...,  0.25040299  0.25257476
   0.26881339]
 [ 0.24059433  0.23410736  0.21756114 ...,  0.2330967   0.23929748
   0.23507692]
 [ 0.18224029  0.18272188  0.18520109 ...,  0.20528284  0.19423394
   0.19477973]]
音频分类

使用kNN算法对音频信号分类:

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import audioFeatureExtraction
from pyAudioAnalysis import audioTrainTest

# 自动提取特征并保存到csv文件中
folders = ["blues", "classical", "country", "disco", "hiphop",
           "jazz", "metal", "pop", "reggae", "rock"]
for i, d in enumerate(folders):
    print("Folder", i, ",Class label ", d)
    audioFeatureExtraction.dirs_wav_feature_extraction([("%s" % d)], 1.0, 1.0, 0.5, 0.5,
                                                        False, "%s_features" % d,
                                                        None, True)
    print("---------------------------")

# 使用保存的csv文件进行分类
result = audioTrainTest.evaluate_segment_classifier("svm", "genres",
                                                     'svm_genres',
                                                     100.0, 0.05,
                                                     "svm", False)
print("Overall classification accuracy: ", round(result['Accuracy'], 2), "%")

输出:

Folder 0 ,Class label  blues
Analyzing file no. 1 of 100: genres/blues/blues.00007.wav ...
Analyzing file no. 2 of 100: genres/blues/blues.00005.wav ...
Analyzing file no. 3 of 100: genres/blues/blues.00086.wav ...
Analyzing file no. 4 of 100: genres/blues/blues.00008.wav ...
Analyzing file no. 5 of 100: genres/blues/blues.00009.wav ...
...
Folder 9 ,Class label  rock
Analyzing file no. 1 of 100: genres/rock/rock.00006.wav ...
Analyzing file no. 2 of 100: genres/rock/rock.00070.wav ...
Analyzing file no. 3 of 100: genres/rock/rock.00063.wav ...
Analyzing file no. 4 of 100: genres/rock/rock.00007.wav ...
Analyzing file no. 5 of 100: genres/rock/rock.00006.wav ...
...
-----------------------------
Overall classification accuracy:  60.0 %
音频分割

使用KMeans算法对音频信号进行聚类分割:

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import audioSegmentation
from pyAudioAnalysis import audioVisualization as av

[Fs, x] = audioBasicIO.read_audio_file("example.wav")
segments = audioSegmentation.silence_removal(x, Fs, 0.020, 0.020,
                                            smooth_window=1.0,
                                            weight=0.3, plot=True)
for s in segments:
    print(s[0], s[1])

输出:

...
0.9548299319727891 2.4319274376417234
2.618140589569161 4.072108843537415
5.977777777777778 6.992190476190478
...

音频信号已成功分割!

参考资料
  • https://github.com/tyiannak/pyAudioAnalysis