You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a sample application showing how to capture audio from a webpage,
submit it to a Spring MVC controller, and submit it to OpenAI's Whisper API
to get a transcription. That transcription is then sent to OpenAI's
gpt-3.5-turbo model to get a response. Finally, the response is then sent to the
text-to-speech API to get an audio response that is returned to the client.
Note that at the moment, the prompt sent to the gpt-3.5-turbo model includes
text asking for the response to be kept brief. At this time, no conversational
memory or RAG is applied to the prompt for context (but that could change in
the future).
Building and running the application
Before running the application, you'll need to obtain an OpenAI API
key and set it to the OPENAI_API_KEY environment variable. E.g.,
% export OPENAI_API_KEY=sk-...
This is a Spring Boot application built with Maven. Therefore, you can run
the application just like any other Spring Boot application using the Spring
Boot Maven plugin:
% ./mvnw spring-boot:run
Or you can build it to an executable JAR file and then run it:
Once the application is running, open https://localhost:8080/ask in your
web browser. Press and hold the "Listen" buttons while talking.
After releasing the "Listen (Whisper API)" button, the audio captured by the
browser will be sent to the server and from there to OpenAI's Whisper API for
transcription. From there the transcritpion will be sent to OpenAI's gpt-3.5-turbo
model to get a response. Finally, the response will be sent to the text-to-speech
API to get an audio response that is returned to the client.
A few notes...
At this time, Spring AI does not (yet) support integration with OpenAI's
Whisper or TTS APIs. Therefore, this application is invoking those APIs directly
using Spring's RestClient in the SpeechClient class.
I've tested this on Chrome, Firefox, and Safari web browsers. It works on
all three, but Safari doesn't seem to capture everything said when the "Listen"
button is held (for some reason). Most of my testing has been with Chrome and
it works incredibly well. I welcome any insight into why it doesn't work as well
with Safari.
About
Sample using Spring AI, Whisper, and TTS to enable talking to a webpage