CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
speech recognition sample #20291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speech recognition sample #20291
Conversation
Please add description at the beginning of sample as in opencv/samples/dnn/human_parsing.py Lines 2 to 40 in b74aae6
|
samples/dnn/speech_recognition.py
Outdated
if __name__ == '__main__': | ||
|
||
# Computation backends supported by layers | ||
backends = (cv.dnn.DNN_BACKEND_DEFAULT, cv.dnn.DNN_BACKEND_OPENCV) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try forward net with OpenVINO (cv.dnn.DNN_BACKEND_INFERENCE_ENGINE)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried. It gave this error: error: (-213:The function/feature is not implemented) Unknown backend identifier in function 'cv::dnn::dnn4_v20210301::wrapMat'
samples/dnn/speech_recognition.py
Outdated
|
||
parser = argparse.ArgumentParser(description='This script runs Jasper Speech recognition model', | ||
formatter_class=argparse.ArgumentDefaultsHelpFormatter) | ||
parser.add_argument('--input_audio', type=str, help='Path to input audio file.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specify supported audio formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add required=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, we need to use AudioIO. So, should I add the formats supported there? I suppose mp3, wav and mp4 are supported.
@spazewalker Could you please check if approach from #20558 works for this case? |
@alalek Just tested it. It works for this case. |
support for multiple files at once
Co-authored-by: Liubov Batanina <piccione-mail@yandex.ru> fix whitespaces
Lets merge it with @spazewalker Please make PR to "Ready for review" if it is ready for merging. |
@spazewalker Ping. Or let us know if you want to improve something else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 👍
speech recognition sample * speech recognition sample added.(initial commit) * fixed typos, removed plt * trailing whitespaces removed * masking removed and using opencv for displaying spectrogram * description added * requested changes and add opencl fp16 target * parenthesis and halide removed * workaround 3d matrix issue * handle multi channel audio support for multiple files at once * suggested changes fix whitespaces
GSoC 2021: Speech Recognition using OpenCV AudioIO
Project details
PR details
Creating ONNX model
NVIDIA trained jasper using FP16 precision. OpenCV needs FP32. We need to change onnx model's graph. This is done using this script : convert_jasper_to_FP32.py. Pre-trained converted onnx can be found here. Original pre-trained model by NVIDIA can be found here.
Usage
Todo
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.