You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This implementation is intended to be used as a loss function only.
It doesn't replicate the exact behavior of the original metrics
but the results should be close enough that it can be used
as a loss function. See the Notes in the
NegSTOILoss class.
Quantitative comparison coming soon hopefully 🚀
Usage
importtorchfromtorchimportnnfromtorch_stoiimportNegSTOILosssample_rate=16000loss_func=NegSTOILoss(sample_rate=sample_rate)
# Your nnet and optimizer definition herennet=nn.Module()
noisy_speech=torch.randn(2, 16000)
clean_speech=torch.randn(2, 16000)
# Estimate clean speechest_speech=nnet(noisy_speech)
# Compute loss and backward (then step etc...)loss_batch=loss_func(est_speech, clean_speech)
loss_batch.mean().backward()
Comparing NumPy and PyTorch versions : the static test
Values obtained with the NumPy version are compared to
the PyTorch version in the following graphs.
8kHz
Classic STOI measure
Extended STOI measure
16kHz
Classic STOI measure
Extended STOI measure
16kHz signals used to compare both versions contained a lot
of silence, which explains why the match is very bad without
VAD.
Comparing NumPy and PyTorch versions : Training a DNN
Coming in the near future
References
[1] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen 'A Short-Time
Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech',
ICASSP 2010, Texas, Dallas.
[2] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen 'An Algorithm for
Intelligibility Prediction of Time-Frequency Weighted Noisy Speech',
IEEE Transactions on Audio, Speech, and Language Processing, 2011.
[3] J. Jensen and C. H. Taal, 'An Algorithm for Predicting the
Intelligibility of Speech Masked by Modulated Noise Maskers',
IEEE Transactions on Audio, Speech and Language Processing, 2016.