AudioIO: add dnn speech recognition sample on C++ #21458

SinM9 · 2022-01-16T19:27:55Z

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

samples/dnn/speech_recognition.cpp

Qannaf · 2022-01-24T20:19:39Z

samples/dnn/speech_recognition.cpp

+        return res;
+    }
+
+    std::vector<std::vector<double>> mel(int n_mels, double fmin, double fmax)


Suggested change

std::vector<std::vector<double>> mel(int n_mels, double fmin, double fmax)

vector<vector<double>> mel(int n_mels, double fmin, double fmax)

Qannaf · 2022-01-24T20:20:41Z

samples/dnn/speech_recognition.cpp

+    }
+
+    // STFT preperation
+    std::vector<double> pad_window_center(std::vector<double>&data, int size)


Suggested change

std::vector<double> pad_window_center(std::vector<double>&data, int size)

vector<double> pad_window_center(vector<double>&data, int size)

Qannaf · 2022-01-24T20:21:02Z

samples/dnn/speech_recognition.cpp

+        // Pad the window out to n_fft size
+        int n = static_cast<int>(data.size());
+        int lpad = static_cast<int>((size - n) / 2);
+        std::vector<double> pad_array;


Suggested change

std::vector<double> pad_array;

vector<double> pad_array;

Qannaf · 2022-01-24T20:21:32Z

samples/dnn/speech_recognition.cpp

+        return pad_array;
+    }
+
+    std::vector<std::vector<double>> frame(std::vector<double>& x)


Suggested change

std::vector<std::vector<double>> frame(std::vector<double>& x)

vector<vector<double>> frame(vector<double>& x)

Qannaf · 2022-01-24T20:22:02Z

samples/dnn/speech_recognition.cpp

+    {
+        // Slices a data array into overlapping frames.
+        int n_frames = static_cast<int>(1 + (x.size() - n_fft) / hop_length);
+        std::vector<std::vector<double>> new_x(n_fft, std::vector<double>(n_frames));


Suggested change

std::vector<std::vector<double>> new_x(n_fft, std::vector<double>(n_frames));

vector<vector<double>> new_x(n_fft, vector<double>(n_frames));

Qannaf · 2022-01-24T20:22:30Z

samples/dnn/speech_recognition.cpp

+    std::vector<double> hanning()
+    {
+        // https://en.wikipedia.org/wiki/Window_function#Hann_and_Hamming_windows
+        std::vector<double> window_tensor;


Suggested change

std::vector<double> window_tensor;

vector<double> window_tensor;

Qannaf · 2022-01-24T20:23:16Z

samples/dnn/speech_recognition.cpp

+        return window_tensor;
+    }
+
+    std::vector<std::vector<double>> stft_power(std::vector<double>& y)


Suggested change

std::vector<std::vector<double>> stft_power(std::vector<double>& y)

vector<vector<double>> stft_power(vector<double>& y)

Qannaf · 2022-01-24T20:23:34Z

samples/dnn/speech_recognition.cpp

+        // https://en.wikipedia.org/wiki/Short-time_Fourier_transform
+
+        // Pad the time series so that frames are centered
+        std::vector<double> new_y;


Suggested change

std::vector<double> new_y;

vector<double> new_y;

alalek

Thank you for update!

samples/dnn/speech_recognition.cpp

alalek

Well done 👍

AudioIO: add dnn speech recognition sample on C++ * add speech recognition cpp * fix warnings * fixes * fix warning * microphone fix

SinM9 added 2 commits January 16, 2022 22:24

add speech recognition cpp

3905c00

fix warnings

6ab836f

alalek reviewed Jan 18, 2022

View reviewed changes

Qannaf suggested changes Jan 24, 2022

View reviewed changes

SinM9 added 2 commits February 18, 2022 00:50

fixes

db730ba

fix warning

6194610

alalek reviewed Feb 22, 2022

View reviewed changes

samples/dnn/speech_recognition.cpp Outdated Show resolved Hide resolved

samples/dnn/speech_recognition.cpp Outdated Show resolved Hide resolved

microphone fix

4f4746d

alalek approved these changes Feb 28, 2022

View reviewed changes

alalek merged commit a332509 into opencv:4.x Feb 28, 2022

opencv-pushbot mentioned this pull request Apr 23, 2022

(5.x) Merge 4.x #21903

Merged

	std::vector<std::vector<double>> mel(int n_mels, double fmin, double fmax)
	vector<vector<double>> mel(int n_mels, double fmin, double fmax)

	std::vector<double> pad_window_center(std::vector<double>&data, int size)
	vector<double> pad_window_center(vector<double>&data, int size)

	std::vector<std::vector<double>> frame(std::vector<double>& x)
	vector<vector<double>> frame(vector<double>& x)

	std::vector<std::vector<double>> new_x(n_fft, std::vector<double>(n_frames));
	vector<vector<double>> new_x(n_fft, vector<double>(n_frames));

	std::vector<double> window_tensor;
	vector<double> window_tensor;

	std::vector<std::vector<double>> stft_power(std::vector<double>& y)
	vector<vector<double>> stft_power(vector<double>& y)

Uh oh!

AudioIO: add dnn speech recognition sample on C++ #21458

AudioIO: add dnn speech recognition sample on C++ #21458

Uh oh!

Conversation

SinM9 commented Jan 16, 2022

Pull Request Readiness Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!