speech interface

GPSS Speech Interface

Created 0900 Thursday 5th February 1998

Introduction
This page provides a simplified description of the GPSS Human-Computer-Interface (HCI), when GPSS is used with speech recognition.
GPSS is primarily a speech-interactive application, but speech recognition technology is only just begining to reach the high performance and low cost demanded for consumer products. Consequently, the Car-PC products using GPSS now, might have speech recognition available, but they will also provide a more reliable HCI such as touch-screen. The vast majority of existing GPSS users are running GPSS on conventional Notebook PCs, with neither speech recognition software or touch-screens.

Technical Interface for Voice Recognition

GPSS gets its input from the speech recognition system by keyboard keystrokes. This is a practical interface, since the same functions in GPSS can be tested and used by hitting keys on a keyboard, and because most commercially available speech recognition systems can already emulate keyboard keystrokes. They have to do this if their vendors are to offer the capability of using the speech recognition product with established products such as MS Word.
So the Speech Recogntion system must be set up with a vocabulary of words and phrases, recognised by GPSS.
e.g. "where are we" emulates keystroke W, "yes" emulates Y, "be quiet" Q, etc.
The full list of GPSS commands are listed in GPSS.HLP within the software.

GPSS will work without any operator input

On startup, GPSS will default to providing spoken information regularly, without need for operator input or control - other than a volume control :-)
Every 60 seconds GPSS will describe where you are, and direction the car is moving:
"we are 25 miles west of London, and 2 miles south east of Ascot, moving north"
Every 20 seconds - if there is a destination selected - it will provide guidance:
"destination Esso filling station, 550 yards to our right at your 2 o'clock"
- and if RGI is in operation:
"turn right after the school on your left"

Voice Interaction

Voice Recognition provides the Driver (or passenger) control over GPSS, and ability to request information or give it commands.
Most commands result in a spoken response from the computer such as "OK", and some will prove a spoken answer to the question.
The simplest commands are those to control what is spoken by the computer:
"be quiet" Q - GPSS will stop speaking and only speak when spoken to.
"OK to speak" - O - GPSS will then ask a series of questions starting with "shall I say the same things as before ?", expecting a Yes or No answer from the driver.
"yes" - Y - in answer to this first question, will enable all speech output.
"no" - N - will result in a series of questions to permit any combination of outputs to be inhibited. e.g. "shall I say where we are regulary ?", "shall I say direction to our destination regularly ?", "shall I say Route Guidance Instructions ?", etc.
Failure to get a "Yes" or "No" reply, will result in GPSS saying "please answer yes or no".
It should be remembered that the actual words spoken by the computer, can be configured within GPSS to be any phrase, in any language. This is done by simply recording sounds or editing text files. The same applies to the words or phrases spoken by the driver - for most existing speech recognition systems.
Examples of questions asked by the driver, and typical spoken replies are:
"Where are we ?" - W - "we are 25 miles west of London in Sunninghill, moving north, destination Ascot, 2 miles to our left at your 2 o'clock".
"Petrol Filling Station ?" - F - "the nearest filling station is Esso, 400 yards ahead at your 12 o'clock"
"search" - Enter - "the next nearest is BP, 2.2 miles to our left at your 4 o'clock"
After finding a place, the driver might ask that to be made the destination:
"destination" - D - "destination BP filling station, 26 miles west of London and 2.1 miles south west of Ascot".
From then on, the guidance will be automatically calculated and spoken, based upon the GPS position of the car.
Places may be found by searching in one of the GPSS 'search files' for a place based on it's name, or part of it's name. Future speech recognition system may get sufficiently reliable for vocabularies of tens of thousands, in which case the dialogue might be:
"Find Stonehenge" - `stonehenge (enter) - resulting in a spoken description of where the nearest place with that name is, or "I cannot find" message.
For current state of the art speech recognition systems, able to recognise only a few tens of utterences reliably, then the place would be spelt out by the driver :
"Find S T O N E H search" - `stoneh (enter) - note that GPSS will locate the nearest place which has part of it's name matching the text. So HENG might find Stonehenge or Woodhenge - whichever is the nearer to the car.

More Information is Available

For some places, more information will be available for a place, in addition to it's name and geographic location. An example of this is given in the basic GPSS software, where there is a sound recording and photograph of the Berystede Hotel. The data could include text, sound, a picture, or even a full screen multimedia video clip. The way GPSS behaves is the same:
When the place is found, if more information is available, GPSS will add :
".. more information is available"
and in response to the driver, or passenger, saying "tell me more", will play the appropriate sounds, pictures, video, etc.
If the user asks, "tell me more" for a place without more information, the computer will simply respond, "I am sorry, but more information is not available".

Conclusion

The above is a very simple explanation of the GPSS interactive speech syntax. More commands are available and described in the GPSS.HLP and BOOK.TXT files within the product.
- and reading this is no substitute for running the software :-)