VCIVING is a project that intends to provide voice control for IVI. User speech is converted into internal commands that run on the system.
Operation of VCIVING is divided into 4 subsequent steps.
- Reading user's speech through microphone.
- Using a Speech-to-Text(STT) model / Speech Recognition Engine to obtain the text version of the speech.
- Interpretation of the recognized text using a trained model.
- Executing the task/process requested by the user/meant by the recognized text.
1.Reading user's speech through microphone
- This step involves grabbing all inputs through the microphone.
- Some technologies will be used later on to determine whether user's speech is really meant for VCIVING to process or not.
2.Using a STT model/Speech Recognizer to obtain the text version of the speech
- Speech from the microphone is obtained as a audio stream.
- This needs to be converted to text to be processed further.
- After eliminating noise, pre-trained models are used to convert the audio to text.
- If there is any valid speech in the audio received, it's text version is returned through the model/recognizer.
3.Interpretation of the recognized text of speech using a trained model
- After text version of the speech is obtained, our next attempt is to interpret it.
What we expect from the interpretation is the grabbing the underlying meaning of the text obtained.
- Text is passed through another pre-trained model which previously trained to classify phrases into different tasks we expect the system to perform for us.
The pre-trained model would have a set of tasks that were defined before the training process, so the model would find most suitable task for the phrase given as the input.
- Then the retrieved task is executed.
4.Executing the task/process requested by the user/meant by the recognized text
When the task is retrieved and executed, VCIVING interacts with the core IVI systems(and core systems of the smart vehicle) to run the desired task.
- The commands meant by the recognized text is passed to these tasks and processed inside them to grab necessary data to be passed into the core IVI system.
- This merely consists of interfaces which connects VCIVING to the core of IVI.