Benchmark

Base benchmark

The time required to calculate the mel-spectrogram for 1000 samples according to a TimeStep of 1/5/10/100/500/1000/2000/3000. Where fft_len=2048, slide_len=512, sampling_rate=32000.

Linux - AMD

  • OS: Ubuntu 20.04.4 LTS

  • CPU: AMD Ryzen Threadripper 3970X 32-Core Processor

_images/linux_amd_1.png

TimeStep

audioflux

torchaudio

librosa

1

0.04294s

0.07707s

2.41958s

5

0.14878s

1.05589s

3.52610s

10

0.18374s

0.83975s

3.46499s

100

0.67030s

0.61876s

6.63217s

500

0.94893s

1.29189s

16.45968s

1000

1.43854s

2.23126s

27.78358s

2000

3.08714s

4.10869s

45.12714s

3000

4.90343s

5.86299s

51.62876s

Linux - Intel

  • OS: Ubuntu 20.04.4 LTS

  • CPU: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz

_images/linux_intel_1.png

TimeStep

audioflux

torchaudio

librosa

1

0.08106s

0.11043s

5.51295s

5

0.11654s

0.16005s

5.77631s

10

0.29173s

0.15352s

6.13656s

100

1.18150s

0.39958s

10.61641s

500

2.23883s

1.58323s

28.99823s

1000

4.42723s

3.98896s

51.97518s

2000

8.73121s

8.28444s

61.13923s

3000

13.07378s

12.14323s

70.06395s

macOS - Intel

  • OS: 12.6.1 (21G217)

  • CPU: 3.8GHz 8‑core 10th-generation Intel Core i7, Turbo Boost up to 5.0GHz

_images/mac_x86_1.png

TimeStep

audioflux

torchaudio

librosa

1

0.07605s

0.06451s

1.70139s

5

0.14946s

0.08464s

1.86964s

10

0.16641s

0.10762s

2.00865s

100

0.46902s

0.83551s

3.28890s

500

1.08860s

5.05824s

8.98265s

1000

2.64029s

9.78269s

18.24391s

2000

5.40025s

15.08991s

33.68184s

3000

7.92596s

24.84823s

47.35941s

macOS - M1

  • OS: 12.4 (21F79)

  • CPU: Apple M1

_images/mac_arm_1.png

TimeStep

audioflux

torchaudio

librosa

1

0.06110s

0.06874s

2.22518s

5

0.23444s

0.07922s

2.55907s

10

0.20691s

0.11090s

2.71813s

100

0.68694s

0.63625s

4.74433s

500

1.47420s

3.37597s

13.83887s

1000

3.00926s

6.76275s

25.24646s

2000

5.99781s

12.69573s

47.84029s

3000

8.76306s

19.03391s

69.40428s

Benchmark Script

https://github.com/libAudioFlux/audioFlux/tree/master/benchmark

Other Test

Server performance

server hardware:

- CPU: AMD Ryzen Threadripper 3970X 32-Core Processor
- Memory: 128GB

Each sample data is 128ms(sampling rate: 32000, data length: 4096).

The total time spent on extracting features for 1000 sample data.

Lib

audioFlux

librosa

pyAudioAnalysis

python_speech_features

Mel

0.777s

2.967s

MFCC

0.797s

2.963s

0.805s

2.150s

CQT

5.743s

21.477s

Chroma

0.155s

2.174s

1.287s

Mobile performance

For 128ms audio data per frame(sampling rate: 32000, data length: 4096).

The time spent on extracting features for 1 frame data.

Mobile

iPhone 13 Pro

iPhone X

Honor V40

OPPO Reno4 SE 5G

Mel

0.249ms

0.359ms

0.313ms

0.891ms

MFCC

0.249ms

0.361ms

0.315ms

1.116ms

CQT

0.350ms

0.609ms

0.786ms

1.779ms

Chroma

0.354ms

0.615ms

0.803ms

1.775ms