In an industry dominated by expensive proprietary tools, open-source software offers some enticing benefits. The SpeakRight Framework is a new open-source Java framework for writing VoiceXML applications. It offers a high-level API and re-usable classes. And since you have the source code, it’s totally extensible.
SpeakRight is code-based. You write your application in Java using SpeakRight classes, with all the power and convenience of Java debugging, refactoring, and unit tests. And unlike other web frameworks, SpeakRight is built specifically for VoiceXML and speech recognition applications. It understands prompt escalation, grammars, and confirmation. When asked to write a speech application, many developers are tempted to output the raw VoiceXML tags themselves. This works in simple cases, but once you add all the elements of a voice dialog, even a simple question involves many tags. Speech applications must deal with the inherent inaccuracy of speech recognition: retrying if recognition fails and confirming when recognition’s confidence score is low.


Let’s see SpeakRight in action. This example is part of a hotel application, where the user is asked for the number of rooms to be reserved.

SRApp app = new SRApp();
//add other callflow elements...
//ask # of rooms
SRONumber question = new SRONumber("rooms", 1, 10);
question.setConfirmer(new SROConfirmYesNo("rooms"));
question.setModelVar("numRooms");
app.add(question);
app.add(new PromptFlow("Got it. ${M.numRooms} rooms"));
//etc..

A question object is created using one of the built-in SpeakRight classes. SRONumber asks for a number and and does range validation. We attach a confirmation object as well, that will execute if the recognition score is below a configurable threshold. We use a simply yes/no confirmation in this example.
When executed, the question object might interact with the user like this:

Computer: How many rooms would you like?
Human: two, please.
Computer: Got it.  Two rooms.

Or if the user has more trouble due to background noise, the same question object would add error prompts and confirmation:

C: How many rooms would you like?
H: (garbled)
C: I didn't get that. Please say the number of rooms you would like.
H: one (recognized "nine" but with a low confidence score)
C: I heard 'nine'.  Is that correct?
H: no
C: Let's try again. How many rooms would you like?
H: one
C: Got it. one room.

Speakright provides default prompts and grammars. This lets you build a bare-bones application quickly. Then you can customize prompts (in external XML files) and grammars. You can also change the retry, validation, and confirmation logic. Since it’s all Java code, you have the full power of an object-oriented framework to customize, override, and extend. Even the final rendering into VoiceXML is an extension point.
Architecture
A SpeakRight application generally lives in a servlet, serving up VoiceXML to a speech platform such as those offered by Voxeo, Nuance, and Microsoft. The platform renders the VoiceXML into audio. Upon completion of a page, it sends back user input and call events. SpeakRight uses this information to traverse to the next element in the callflow. Dialog state during a session is stored by SpeakRight, usually in a servlet session variable.
GUI frameworks are based around objects called controls (or widgets). A form is built out of a set of controls that manage presentation and take user input. SpeakRight takes a similar approach to VUIs (voice user interfaces). It is based around flow objects. A flow object manages presentation (prompts, grammars, and retry logic), and control flow. In Model-View-Controller terms, a flow object is both the view and the controller. Flow objects can be customized (by setting properties), or extended using sub-classing. SpeakRight provides built-in objects for standard data types (time, date, alphanum), and for standard flow algorithms (forms, menus, list traversal, confirmation).
Features

  • VoiceXML 2.1 (partial support currently, more to come)
  • Inline, built-in, and external grammars (GSL and GRXML).
  • Prompts can be TTS, audio, or rendered data values. External prompts in XML files for multi-lingual applications and post-deployment flexibility.
  • Built-in support for noinput, nomatch, and help events. Escalated prompts.
  • Built-in validation. A flow object’s ValidateInput method can validate user input and accept, ignore, or retry the input.
  • A library of re-usable “speech objects”, called SROs is provided for common tasks such as time and dates, numbers, and currency.
  • Flow Objects. A flow object represents a dialog state such as asking for a flight number. Each flow object is rendered as one or more VoiceXML pages. Flow objects are fully object-oriented: you can use inheritance, composition, and nesting to combine flow objects. The speech application itself is a flow object.
  • Throw/catch used for errors such as max-retries-exceeded, validation-failed, or user-defined events. This simplifies callflow development because it encourages centralized error handling (although local error handling can be done when needed). Also, throw/catch increases software re-use because (unlike a ‘goto’) it decouples the part of the app throwing the error from the part of the app that handles it.
  • MVC architecture. Built-in model allows sharing of data between flow objects.
  • Flow objects can invoke business logic upon completion.
  • Extension points are available in the framework for customization.

On your next VoiceXML project, give SpeakRight a try.