Skip to end of metadata
Go to start of metadata

Introduction and Overview

A new application can assume a standard interface to implement a speech dialog as well as to output speech. GENIVI application cores can rely on standard interfaces to speech and voice recognition systems.

Initial Scope

Identify requirements towards a unified Interface for speech / voice components in the system, a vendor agnostic abstraction layer, Integration of voice recognizer and TTS engines, identification of standards for resources (like phonetic alphabets).

IVI Speech Subprojects

  • Speech Output Service (GENIVI Abstract Component -> Compliance on Interface Level)
  • Speech Input Service (GENIVI Placeholder Component -> Compliance on Requirement Level)
  • Speech Dialog Service (GENIVI Placeholder Component -> Compliance on Requirement Level)

Reasons for a GENIVI Speech API

Benefits of GENIVI Interfaces between Applications and Speech:

  • Reduced the effort to exchange application cores (like e.g. navigation)
  • Reduced number of PRs in a project due to well defined and robust interface

Benefits of a Genivi API for the „Dialog Step“:

  • Reduces the effort to move from one dialog framework to another
  • Provides a good basis for OSS dialog framework implementations
  • Basis for App Development

Code

The ivi-speech plans to use the GENIVI projects GIT repository to publish the following content:

  1. Speech Output Service Proof Of Concept Source code
  2. VMware image with Speech Output Service PoC integration Demo

Proof Of Concept (PoC)

Introduction

In this chapter you will find the "Howto" for the GENIVI Speech Output Service PoC.
The aim of this PoC is to show the maturity of the Speech Output API. The Speech Output Service API is available in the /pub folder of the Speech Output Service PoC delivery.

  • ISpeechOutputService.h (interface declaration)
  • ISpeechOutputServiceCommon.h (data type declaration)

Prerequisites

This software uses the GNU gcc toolchain as its build tool and the espeak TTS project as TTS engine. Please make sure that the gcc toolchain is installed correctly.

Speech Output Service has dependencies to other system services, mentioned as follows:

  • espeak-1.47.16 (TTS engine)

Please download the espeak-1.47.16 sources from the espeak homepage (http://sourceforge.net/projects/espeak/files/espeak/) and build the espeak shared libaries required by the PoC to access the TTS engine functionality.

You can find the shared libs build instructions as part of the espeak source package delivery.

Instructions for building the PoC Project

To build the GENIVI Speech Output Service Proof of Concept you have two options:

  1. Qt Creator: Open project file 'GENIVI_SpcOutputSrv_PoC.pro' with Qt Creator and run 'Build Project'
  2. Make: Type 'make all' command in shell within project root directory

PoC Environment

The Speech Output Service PoC is  pre-integrated into an VWware image.
Both you can download from the GENIVI IVI Speech GIT repository.

After you have opened the VMware image you find the Qt Creator link on the desktop. To open the project please load the Qt Creator .pro file
/home/user/Documents/GENIVI_SpeechOutputService_PoC_v1.0/GENIVI_SpcOutputSrv_PoC.pro

SpeechOutputPOC 22.png

 

Running the test

You can run the Proof Of Concept Code by pushing the "run" button of the Qt Creator

  • Use-case 1: Simple prompt.
  • Use-case 2: Abort a prompt.
  • Use-case 3: Queue buffer overflow discards the prompts.
  • Use-case 4: Invalid prompt. Too long messages are ignored.

Software installation

If you want to setup the VM build environment by your own please follow the instructions below.

Preconditions

VMware Player 7.0.0

https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/7_0

VMware image LUbuntu 12.04

lubuntu1204t.zip
http://www.trendsigma.net/vmware/lubuntu1204t.html#

Software configuration

Open then Start the VM using file Lubuntu.vmx.

Configure Virtual machine
Import files into the VM

Share a common folder between Host & Guest.

Follow Menu “Player -> Manage -> Virtual Machine Settings…” then add a shared folder as follow:

SpeechOutputPOC 01.png

Update the VM tools

The VM will prompt you to update its tooling to enable advanced features (like drag&drop).

SpeechOutputPOC 02.png

In case you accept, then at the bottom allow to install Tools:

SpeechOutputPOC 03.png

Install VMware tools and in a terminal follow those steps:
sudo mkdir /mnt/cdrom
sudo mount /dev/cdrom /mnt/cdrom
cd /tmp/
rm -fr /tmp/vmware-tools-distrib
tar zxpf /mnt/cdrom/VMwareTools-9.9.0-2304977.tar.gz
--> “vmware-tools-distrib” is created
cd vmware-tools-distrib
sudo ./vmware-install.pl
--> previous version will be uninstalled first

SpeechOutputPOC 05.png

SpeechOutputPOC 07.png

SpeechOutputPOC 09.png

Configure Linux distribution

Note: all passwords are generally set to "password"

For more help about Lubuntu: https://help.ubuntu.com/community/Lubuntu/Setup

HINT: to switch mouse focus back to Host use “CTRL+ALT”

Most administration tasks will need the super user privileges provided by sudo on the command line. To avoid repeatedly to input the password, you might add the user as automatic “sudoer”.
$ sudo visudo At the bottom of the file add following line: user ALL=(ALL) NOPASSWD: ALL

Update current packages (optional)

It is not directly used for this PoC but for your convenience you might first want to update the current default installation packages.

SpeechOutputPOC 11.png

SpeechOutputPOC 12.png


Note: After update restart the VM is needed.

Localization (optional)

Change the Keyboard Layout if needed (i.e. default is US keyboard):
$ sudo dpkg-reconfigure keyboard-configuration Or manually (https://help.ubuntu.com/community/Lubuntu/Keyboard#Keyboard_Mapping) e.g. to use German
$ sudo leafpad /etc/xdg/lxsession/Lubuntu/autostart Append following line: @setxkbmap -layout "de" Note: the latter method will also change the flag at the bottom of the desktop.

3rd parties packages

In terminal you might start the Synaptic package manager tool (apt-get is also an option):
$ sudo synaptic

In Synaptic search and install following packages:

  • qt4-dev-tools
  • qtcreator
  • pulseaudio
  • pulseaudio-dev

Note: You should not install espeak neither espeak-dbg to avoid conflicts with the version used in PoC.

Workspace

You should install the downloaded PoC source files into following folder:

“/home/user/Documents/GENIVI_SpeechOutputService_PoC_v1.0”

For the sake of simplicity we will refer or imply this folder per default in the next configuration steps.

Qt Creator IDE Environment

Setup for debugging
Terminal setup

In case you want to use the debug mode in Qt Creator and you are likely to encounter the following error:

SpeechOutputPOC 14.png

A counter measure is to use an alternative terminal as follow.

First install the xterm terminal:
$ sudo apt-get install xterm

Then in the IDE follow the menu “Tools -> Options…” and edit the default Terminal as follow:

SpeechOutputPOC 16.png

Replace the command “x-terminal-emulator -e” with “xterm -sb -rightbar -e”.

Ptrace setup

Debugging capability requires the system to support ptrace. To enable edit the following:
$ sudo vi /etc/sysctl.d/10-ptrace.conf
After changing 1 to 0 you will need to reboot the VM to get the change applied.

espeak open source TTS

The TTS software espeak is available as package(s) in Lubuntu. The version 1.1.46 is available and we decided to use the newer version 1.1.47 in the PoC. For some reasons we were not able to make the PoC work with the default 1.1.46 package. Nevertheless as information you can see in the following picture where the system would install the relevant libraries and we will reuse the same location. Thus it is not possible to have both versions at the same time.

One benefit from re-using the default system location is that we do not need to reconfigure other tools like Qt Creator. The libespeak location from default package installed and used by PoC is found in the IDE as follow with LD_LIBRARY_PATH environment variable:


Some plumbing is now needed to link the espeak libraries and data delivered along the PoC with your system. As root go to /usr/lib/i386-linux-gnu and then links from system to custom espeak:
root@lubuntu:/usr/lib/i386-linux-gnu# ln -s /home/user/Documents/GENIVI_SpeechOutputService_PoC_v1.0/lib/libespeak.so.1.1.47 libespeak.so root@lubuntu:/usr/lib/i386-linux-gnu# ln -s /home/user/Documents/GENIVI_SpeechOutputService_PoC_v1.0/lib/libespeak.so.1.1.47 libespeak.so.1 In the user Home you should check if the lib folder contains the following links:
user@lubuntu:~/Documents/GENIVI_SpeechOutputService_PoC_v1.0/lib$ ll
total 808
drwxrwxr-x 2 user user 4096 Dec 10 08:25 ./
drwxrwxr-x 9 user user 4096 Dec 10 08:29 ../
-rwxrwxrwx 1 user user 459564 Dec 9 07:31 libespeak.a*
lrwxrwxrwx 1 user user 19 Dec 10 07:53 libespeak.so -> libespeak.so.1.1.47*
lrwxrwxrwx 1 user user 19 Dec 10 07:53 libespeak.so.1 -> libespeak.so.1.1.47*
-rwxrwxrwx 1 user user 345479 Dec 10 05:52 libespeak.so.1.1.47*
In case the links do not exist (e.g. the import from Windows as Host broke the links), you need to add them manually as follow:
$ ln -s libespeak.so.1.1.47 libespeak.so.1
$ ln -s libespeak.so.1.1.47 libespeak.so

Additionally you need to install the espeak data. Note that the data from v1.1.46 (available as lubuntu package) are not compatible with the v1.1.47 (sound like a sampling rate difference).
As root you need to copy the data from your delivery location into “/usr/share/”.

  1. cp -r espeak-data /usr/share/

Now everything should be in place to start the project in the IDE and listen to the tests.

  • No labels