BFT - Based Fourier Transform, similar short-time Fourier transform

class audioflux.BFT(num, radix2_exp=12, samplate=32000, low_fre=None, high_fre=None, bin_per_octave=12, window_type=WindowType.HANN, slide_length=None, scale_type=SpectralFilterBankScaleType.LINEAR, style_type=SpectralFilterBankStyleType.SLANEY, normal_type=SpectralFilterBankNormalType.NONE, data_type=SpectralDataType.MAG, is_reassign=False, is_temporal=False)

Based Fourier Transform, similar short-time Fourier transform

Parameters

num: int

Number of frequency bins to generate, starting at low_fre.

radix2_exp: int

fft_length=2**radix2_exp

samplate: int

Sampling rate of the incoming audio.

low_fre: float or None

Lowest frequency.

Linear/Linsapce/Mel/Bark/Erb, low_fre>=0. default: 0.0
Octave/Log, low_fre>=32.703. default: 32.703(C1)

high_fre: float or None

Highest frequency. Default is 16000(samplate/2).

Linear is not provided, it is based on samplate / (2 ** radix2_exp).
Octave is not provided, it is based on musical pitch.

bin_per_octave: int

Number of bins per octave.

Only Octave must be provided.

Usually set to 12, 24 or 36.

window_type: WindowType

Window type for each frame.

See: type.WindowType

slide_length: int or None

Window sliding length.

If slide_length is None, then slide_length = fft_length / 4

scale_type: SpectralFilterBankScaleType

Spectral filter bank type. It determines the type of spectrogram.

See: type.SpectralFilterBankScaleType

style_type: SpectralFilterBankStyleType

Spectral filter bank style type. It determines the bank type of window.

see: type.SpectralFilterBankStyleType

normal_type: SpectralFilterBankNormalType

Spectral filter normal type. It determines the type of normalization.

Linear is not provided.

See: type.SpectralFilterBankNormalType

data_type: SpectralDataType

Spectrogram data type.

It cat be set to mag or power. If you needs db type, you can set power type and then call the power_to_db method.

See: type.SpectralDataType

is_reassign: bool

Whether to use reassign.

is_temporal: bool

Whether to get temporal data.

If True, you can call the get_temporal_data method to get energy/rms/zeroCrossRate feature.

See also

NSGT
CWT
PWT

Examples

Read 220Hz audio data

>>> import audioflux as af
>>> audio_path = af.utils.sample_path('220')
>>> audio_arr, sr = af.read(audio_path)

Create BFT object of Linser(STFT)

>>> from audioflux.type import (SpectralFilterBankScaleType, SpectralFilterBankStyleType,
>>>                             WindowType, SpectralDataType)
>>> obj = af.BFT(num=2049, radix2_exp=12, samplate=sr, low_fre=0., high_fre=16000.,
>>>              window_type=WindowType.HANN, slide_length=1024,
>>>              scale_type=SpectralFilterBankScaleType.LINEAR,
>>>              style_type=SpectralFilterBankStyleType.SLANEY,
>>>              data_type=SpectralDataType.POWER)

Extract spectrogram of dB

>>> import numpy as np
>>> from audioflux.utils import power_to_db
>>> spec_arr = obj.bft(audio_arr)
>>> spec_arr = np.abs(spec_arr)
>>> spec_dB_arr = power_to_db(spec_arr)

Show spectrogram plot

>>> import matplotlib.pyplot as plt
>>> from audioflux.display import fill_spec
>>> audio_len = audio_arr.shape[-1]
>>> fig, ax = plt.subplots()
>>> img = fill_spec(spec_dB_arr, axes=ax,
>>>                 x_coords=obj.x_coords(audio_len),
>>>                 y_coords=obj.y_coords(),
>>>                 x_axis='time', y_axis='log',
>>>                 title='BFT-Linear Spectrogram')
>>> fig.colorbar(img, ax=ax, format="%+2.0f dB")

Methods

`bft`(data_arr[, result_type])	Get spectrogram data
`cal_time_length`(data_length)	Calculate the length of a frame from audio data.
`get_bin_band_arr`()	Get bin band array
`get_fre_band_arr`()	Get an array of frequency bands of different scales.
`get_temporal_data`()	Get energy/rms/zeroCrossRate feature.
`set_data_norm_value`(norm_value)	Set data norm value
`set_result_type`(result_type)	Set result type.
`x_coords`(data_length)	Get the X-axis coordinate
`y_coords`()	Get the Y-axis coordinate

cal_time_length(data_length)

Calculate the length of a frame from audio data.

fft_length = 2 ** radix2_exp
(data_length - fft_length) // slide_length + 1

Parameters

data_length: int: The length of the data to be calculated.

Returns

out: int

get_fre_band_arr()

Get an array of frequency bands of different scales. Based on the scale_type determination of the initialization.

Returns

out: np.ndarray [shape=(fre, )]

get_bin_band_arr()

Get bin band array

Returns

out: np.ndarray [shape=[n_bin,]]

set_result_type(result_type)

Set result type.

Parameters

result_type: int, 0 or 1

If 0, then the result is a matrix of complex numbers.
If 1, then the result is a matrix of real numbers.

set_data_norm_value(norm_value)

Set data norm value

Parameters

norm_value: float

bft(data_arr, result_type=0)

Get spectrogram data

Parameters

data_arr: np.ndarray [shape=(…, n)]

Input audio data

result_type: int，0 or 1

If 0, then the result is a matrix of complex numbers.
If 1, then the result is a matrix of real numbers.

Returns

m_data_arr: np.ndarray [shape=(…, fre, time), dtype=(np.complex or np.float32)]: The matrix of BFT

get_temporal_data()

Get energy/rms/zeroCrossRate feature.

Need to call bft method first.

Returns

energy_arr: np.ndarray [shape=(…, time)]: energy feature
rms_arr: np.ndarray [shape=(…, time)]: rms feature
zero_cross_arr: np.ndarray [shape=(…, time)]: zero cross rate feature

y_coords()

Get the Y-axis coordinate

Returns

out: np.ndarray [shape=(fre,)]

x_coords(data_length)

Get the X-axis coordinate

Parameters

data_length: int: The length of the data to be calculated.

Returns

out: np.ndarray [shape=(time,)]