BFT - Based Fourier Transform, similar short-time Fourier transform
- class audioflux.BFT(num, radix2_exp=12, samplate=32000, low_fre=None, high_fre=None, bin_per_octave=12, window_type=WindowType.HANN, slide_length=None, scale_type=SpectralFilterBankScaleType.LINEAR, style_type=SpectralFilterBankStyleType.SLANEY, normal_type=SpectralFilterBankNormalType.NONE, data_type=SpectralDataType.MAG, is_reassign=False, is_temporal=False)
Based Fourier Transform, similar short-time Fourier transform
- Parameters
- num: int
Number of frequency bins to generate, starting at low_fre.
- radix2_exp: int
fft_length=2**radix2_exp
- samplate: int
Sampling rate of the incoming audio.
- low_fre: float or None
Lowest frequency.
Linear/Linsapce/Mel/Bark/Erb, low_fre>=0. default: 0.0
Octave/Log, low_fre>=32.703. default: 32.703(C1)
- high_fre: float or None
Highest frequency. Default is 16000(samplate/2).
Linear is not provided, it is based on
samplate / (2 ** radix2_exp)
.Octave is not provided, it is based on musical pitch.
- bin_per_octave: int
Number of bins per octave.
Only Octave must be provided.
Usually set to 12, 24 or 36.
- window_type: WindowType
Window type for each frame.
See:
type.WindowType
- slide_length: int or None
Window sliding length.
If slide_length is None, then
slide_length = fft_length / 4
- scale_type: SpectralFilterBankScaleType
Spectral filter bank type. It determines the type of spectrogram.
- style_type: SpectralFilterBankStyleType
Spectral filter bank style type. It determines the bank type of window.
- normal_type: SpectralFilterBankNormalType
Spectral filter normal type. It determines the type of normalization.
Linear is not provided.
- data_type: SpectralDataType
Spectrogram data type.
It cat be set to mag or power. If you needs db type, you can set power type and then call the power_to_db method.
- is_reassign: bool
Whether to use reassign.
- is_temporal: bool
Whether to get temporal data.
If True, you can call the get_temporal_data method to get energy/rms/zeroCrossRate feature.
Examples
Read 220Hz audio data
>>> import audioflux as af >>> audio_path = af.utils.sample_path('220') >>> audio_arr, sr = af.read(audio_path)
Create BFT object of Linser(STFT)
>>> from audioflux.type import (SpectralFilterBankScaleType, SpectralFilterBankStyleType, >>> WindowType, SpectralDataType) >>> obj = af.BFT(num=2049, radix2_exp=12, samplate=sr, low_fre=0., high_fre=16000., >>> window_type=WindowType.HANN, slide_length=1024, >>> scale_type=SpectralFilterBankScaleType.LINEAR, >>> style_type=SpectralFilterBankStyleType.SLANEY, >>> data_type=SpectralDataType.POWER)
Extract spectrogram of dB
>>> import numpy as np >>> from audioflux.utils import power_to_db >>> spec_arr = obj.bft(audio_arr) >>> spec_arr = np.abs(spec_arr) >>> spec_dB_arr = power_to_db(spec_arr)
Show spectrogram plot
>>> import matplotlib.pyplot as plt >>> from audioflux.display import fill_spec >>> audio_len = audio_arr.shape[-1] >>> fig, ax = plt.subplots() >>> img = fill_spec(spec_dB_arr, axes=ax, >>> x_coords=obj.x_coords(audio_len), >>> y_coords=obj.y_coords(), >>> x_axis='time', y_axis='log', >>> title='BFT-Linear Spectrogram') >>> fig.colorbar(img, ax=ax, format="%+2.0f dB")
Methods
bft
(data_arr[, result_type])Get spectrogram data
cal_time_length
(data_length)Calculate the length of a frame from audio data.
Get bin band array
Get an array of frequency bands of different scales.
Get energy/rms/zeroCrossRate feature.
set_data_norm_value
(norm_value)Set data norm value
set_result_type
(result_type)Set result type.
x_coords
(data_length)Get the X-axis coordinate
y_coords
()Get the Y-axis coordinate
- cal_time_length(data_length)
Calculate the length of a frame from audio data.
fft_length = 2 ** radix2_exp
(data_length - fft_length) // slide_length + 1
- Parameters
- data_length: int
The length of the data to be calculated.
- Returns
- out: int
- get_fre_band_arr()
Get an array of frequency bands of different scales. Based on the scale_type determination of the initialization.
- Returns
- out: np.ndarray [shape=(fre, )]
- get_bin_band_arr()
Get bin band array
- Returns
- out: np.ndarray [shape=[n_bin,]]
- set_result_type(result_type)
Set result type.
- Parameters
- result_type: int, 0 or 1
If 0, then the result is a matrix of complex numbers.
If 1, then the result is a matrix of real numbers.
- set_data_norm_value(norm_value)
Set data norm value
- Parameters
- norm_value: float
- bft(data_arr, result_type=0)
Get spectrogram data
- Parameters
- data_arr: np.ndarray [shape=(…, n)]
Input audio data
- result_type: int,0 or 1
If 0, then the result is a matrix of complex numbers.
If 1, then the result is a matrix of real numbers.
- Returns
- m_data_arr: np.ndarray [shape=(…, fre, time), dtype=(np.complex or np.float32)]
The matrix of BFT
- get_temporal_data()
Get energy/rms/zeroCrossRate feature.
Need to call
bft
method first.- Returns
- energy_arr: np.ndarray [shape=(…, time)]
energy feature
- rms_arr: np.ndarray [shape=(…, time)]
rms feature
- zero_cross_arr: np.ndarray [shape=(…, time)]
zero cross rate feature
- y_coords()
Get the Y-axis coordinate
- Returns
- out: np.ndarray [shape=(fre,)]
- x_coords(data_length)
Get the X-axis coordinate
- Parameters
- data_length: int
The length of the data to be calculated.
- Returns
- out: np.ndarray [shape=(time,)]