CARVIEW |
SpeechObjects Specification V1.0
W3C Note 14 November 2000
- This Version:
- https://www.w3.org/TR/2000/NOTE-speechobjects-20001114
- Latest version:
- https://www.w3.org/TR/speechobjects
- Editors:
- Daniel C. Burnett, Nuance Communications <burnett@nuance.com>
Copyright ? 2000 Nuance Communications, Inc. All rights
reserved.
Abstract
This document describes SpeechObjects, a core set of reusable dialog components that are callable through a dialog markup language such as VoiceXML, to perform specific dialog tasks, for example, get a date or a credit card number, etc. The major goal of SpeechObjects is to complement the capabilities of the dialog markup language and to leverage best practices and reusable component technology in the development of speech applications.
Status of this document
This document is a submission to the World Wide Web Consortium from Nuance Communications, Inc. (see Submission Request, W3C Staff Comment). For a full list of all acknowledged Submissions, please see Acknowledged Submissions to W3C.
This document is a Note made available by W3C for discussion only. This work does not imply endorsement by, or the consensus of the W3C membership, nor that W3C has, is, or will be allocating any resources to the issues addressed by the Note. This document is a work in progress and may be updated, replaced, or rendered obsolete by other documents at any time.
A list of current W3C technical documents can be found at the Technical Reports page.
Table of Contents
- Introduction
- Background
- Concepts
- SpeechObject specifications
- Parameters and return values common to all SpeechObjects
- Dialog
- Yes/No
- Quantity
- Simple Digit String
- Date
- Time
- Menu
- US Currency
- North American Telephone Number
- Alpha Digit String
- Confirm and Correct
- Browsable Selection
- Browsable Action
- US Zip Code
- Credit Card Info
- Sectioned Digit String
- Browsable List
- Appendix
SpeechObjects Specification V1.0
1. Introduction
SpeechObjects are reusable software components that encapsulate discrete pieces of conversational dialog. SpeechObjects are based on an open architecture that can be deployed on any of the major server and IVR (interactive voice response) platforms. This paper describes a specification based on Nuance's Java implementation of SpeechObjects.
Simply stated, a SpeechObject is a reusable software component that implements a dialog flow and is packaged with the audio prompts and recognition grammars that support that dialog. A Java call to a SpeechObject can be as simple as
// Initialize the SpeechObject
SODate date = new SODate();
// Invoke the SpeechObject
SODate = date.invoke(sc, dc, cs);
// Look at the results
int month = result.getMonth();
int day = result.getDayOfMonth();
int year = result.getYear();
In this document we will present both the configuration parameters (JavaBean properties) and the return values for each of the SpeechObjects.
?
2. Background
2.1 Architectural model
The Java SpeechObjects architecture was designed to be portable and extensible, as well as easy to use. To this end SpeechObjects are all based on a primary interface, SpeechObject. This simple interface defines:
- A single method, invoke, that applications call to run a SpeechObject
- An inner class, Result, used to return the recognition results obtained during the dialog executed by the SpeechObject
From the SpeechObject interface and a set of supporting interfaces, SpeechObject developers can build objects of any complexity that can be run with a single method call. The invoke method for any given SpeechObject executes the entire dialog for that SpeechObject. A simple invoke method might just play a standard prompt, wait for speech, and return the results after recognition completes. A more complicated invoke method could include multiple dialog states, smart prompts, intelligent error handling for both user and system errors, context-sensitive help, and any other features built in by the SpeechObject developer.
To call a SpeechObject from your application, however, doesn't require you to know anything about how the invoke method is implemented. You only need to provide the correct arguments and know what information you want to extract from the results.
The SpeechChannel is the object that provides recognition functionality to a SpeechObject. When an application is launched, the environment allocates a SpeechChannel for each supported port. This SpeechChannel is passed to the application for each incoming call and persists until the application terminates. The SpeechObjects that make up the application use the SpeechChannel to interact with the caller-requesting recognition services, playing prompts, setting configuration parameters, and so on. Telephony control is an optional component of the SpeechChannel.
2.2 Goals and principles of Design
In short, SpeechObjects
- are components
- Components are software objects that are complex enough to provide useful functionality, but small enough to be broadly reusable. A well-designed component exposes the properties that allow users to tailor it for specific needs. As such, components help make application developers more efficient and significantly reduce the time to market while improving overall application quality.
- are uniform
- Both the simple Yes/No SpeechObject and the more complex Confirm and Correct SpeechObject implement the same SpeechObject interface. This characteristic allows them to be treated identically by someone who is using them to implement a higher-level dialog. For example, a "flowchart" SpeechObject might contain a set of SpeechObjects, and invoke them in some order determined by its property settings. The container SpeechObject doesn't need to know any of the specifics of the SpeechObjects it contains-it only cares about the attributes they have in common, such as the way they are invoked and the format of the results they return. Because this kind of uniformity encourages reuse and simplifies the programming model, all SpeechObjects are required to implement a core set of methods defined in the SpeechObject interface.
- are interoperable
- Not only do all SpeechObjects implement the same interface, but as mentioned earlier they rely on another
abstract interface, called a SpeechChannel, to
execute their dialogs. The SpeechChannel provides the interface to a
speech recognition subsystem, and is presented as an abstract
interface so that any speech recognition vendor can implement it.
This allows SpeechObjects written by any developer to run on top of
a SpeechChannel supporting any recognition engine, provided that
both providers adhere to the SpeechObject standard. Additionally,
SpeechObjects written by one vendor can refer to, invoke, extend, or
encapsulate SpeechObjects provided by another vendor. SpeechObjects
are also callable from the VoiceXML dialog markup language. The
Dialog Markup Language, presently being specified by the World Wide
Web Consortium, is based on VoiceXML and is therefore expected to be
able to call SpeechObjects as well.
Here is an example of how to call a SpeechObject from within VoiceXML:
<object name="appt_date"
????????classid="speechobject://nuance.so.SODate">
??<param name="initialPrompt"
?????????type="nuance.common.prompts.URLPrompt">
????<param name="URL" type="java.net.URL"
???????????expr="../prompts/when.wav"/>
??</param>
</object>
- are object-oriented
- In addition to the fact that SpeechObjects are themselves objects, SpeechObjects support an object-oriented development method. Application development is SpeechObject-centric, encouraging speech application developers to develop and package audio prompts, recognition grammars, and any other required resources together with the SpeechObject. Because these dialog elements are interrelated and interdependent, keeping them together results in better dialogs. The object-based nature of SpeechObjects makes them more interoperable with other software development environments and easier to package, deploy and install.
- vary in power and complexity
- Any unit of dialog of any complexity can be packaged as a
SpeechObject. A common dialog example is a SpeechObject that knows
how to elicit a date from a caller. In an ideal scenario, this is an
almost trivial dialog: the application prompts for a date and the
caller responds. The SpeechObjects paradigm, however, allows much
more sophisticated functionality to be built in, including
error-handling, disambiguation, and reprompting that allow recovery
from speaker or recognizer errors. This is provided without making
the SpeechObject more complex to use, and turns even a "simple"
dialog, such as eliciting a date, into an extremely valuable
object.
At the other end of the complexity spectrum, domain-specific dialogs can also be packaged as SpeechObjects. For example, an application that works with a caller to perfect a travel itinerary can be packaged as a single SpeechObject. An application developer could use this SpeechObject as a turnkey application or embed it as a subsystem in a larger application, eliminating days or weeks of development that would have been spent on the travel dialog.
Nuance's Enterprise SpeechObjects demonstrate the flexibility in creating SpeechObjects for vertical industries like Customer Relationship Management (CRM) or Order Entry applications etc.
- are modular
- The travel itinerary SpeechObject described above would probably not be written from scratch-it would use other SpeechObjects as necessary. For example, to ask the caller for the departure date, it would delegate that part of the interaction to a specialized "carview.php?tsp=date" SpeechObject, and so on. In this way more complex SpeechObjects can be constructed using simpler SpeechObjects. This ability to leverage smaller components when building larger components is an important contributor to the power of component-based technology.
- exist in a universal namespace
- Java SpeechObjects adhere to the Java convention of naming classes in a globally unique way. For example, Nuance's SpeechObjects are part of the nuance.so Java package. Other vendors should follow the same convention, guaranteeing that SpeechObjects from different vendors have globally unique names and can coexist. The SpeechObject architecture uses a similar convention for naming audio prompt files to avoid naming collisions.
- leverage existing standards
- SpeechObjects make use of existing standards wherever possible. For example, the universal commands described in section 3.1 were proposed by AVIOS' Telephone Speech Standards Committee (TSSC).
2.3 Implementation Platform and Requirements
This section briefly describes the process of invoking a SpeechObject as a motivation for the runtime requirements and then presents the SpeechChannel.
2.3.1 Invocation process
To use a SpeechObject from your application, you simply instantiate it and call its invoke method.
The invoke method executes the dialog defined by the SpeechObject, and returns an instance of the Result class used by that SpeechObject. This result provides your application with the data that was accumulated during the dialog.
The invoke method takes several arguments:
- A SpeechChannel object, allocated to your application for the duration of a call. The SpeechChannel provides the bridge between your application and the Nuance recognition client, giving you access to recognition functionality and telephony control features. The SpeechChannel is described below in more detail.
- A DialogContext object that can be used to pass information between the SpeechObjects that make up a dialog.
- A CallState object that maintains information about the current call, such as the number of times the user has asked for help or the number of errors that have occurred, and the overall application state. The CallState object references an additional object, an AppState, that stores application state information that applies across all handled calls.
2.3.2 Runtime environment requirements
In order for the invocation described above to work, a platform must implement a SpeechChannel and provide a launcher that creates the DialogContext and CallState and invokes the object.
2.3.3 SpeechChannel
The SpeechChannel is an integral part of a SpeechObjects application. Application developers use SpeechObjects to implement the dialog flow, and SpeechObject developers use SpeechChannel methods to implement the recognition functionality of the dialog.
This section describes the abstract SpeechChannel architecture in more detail.
SpeechChannel interfaces
Functionality provided by the SpeechChannel is actually separated into five interfaces: the main speech channel interface that provides recognition functions, and four separate interfaces that define the functionality for:
- Prompt playback, that is playing out standard audio prompts - either pre-recorded or to a text-to-speech engine, and managing the prompt queue.
- Dynamic grammars, that is, the ability to create and modify grammars at runtime. This is typically used to build caller-specific grammars, for example, for personal address books. It can also be used to allow SpeechObjects to construct grammars on the fly.
- Speaker verification, that is, the ability to verify that a speaker is who they claim to be by analyzing their voice.
- Telephony features. This interface is optional, allowing SpeechChannel implementations to more easily be created for non-telephony platforms or for specialized telephony requirements.
The SpeechChannel is the primary object and provides access to the corresponding implementation of the other interfaces. SpeechObjects work with the single SpeechChannel object passed to them and can access the other interfaces when needed:
These interfaces can be implemented in the same class, or in separate classes, as appropriate for the platform. In either case, the SpeechChannel interface defines methods that return each of the other interfaces. For example, if a SpeechObject wanted to access dynamic grammar functionality, it would call the SpeechChannel getDynamicGrammarControl method and use the returned object to make dynamic grammar requests.
?
3. Concepts
This section briefly describes some concepts underlying SpeechObjects that may provide a context for understanding the specific parameters and return values described in Section 4.
3.1 Universal commands
A universal command is an utterance that the speaker can say at any point during any dialog. The framework includes a grammar allowing recognition of a small set of universal commands, and provides default handling when these utterances are recognized. The universal commands currently defined by the SpeechObjects framework are:
- Help
- This universal allows the speaker to get context-sensitive help at any point in the dialog simply by saying "help." The default behavior is to play the help prompt defined by the currently executing SpeechObject and reexecute the dialog loop.
- Operator
- This universal redirects the caller to a live agent if your application is so enabled. The "0" touch-tone is also interpreted as a request to transfer to the operator.
- Repeat
- This allows the speaker to ask the dialog to replay the last set of prompts.
- Go to the top
- This returns the speaker to the beginning of the application dialog.
Through method calls the application can substitute its own handling for any of the supported universals, including disabling them.
As mentioned earlier, these universal commands are based on standards proposed by the Telephone Speech Standards Committee (TSCC).
3.2 Error handling and recovery
All SpeechObjects provide default handling, including prompts and logic (and grammar adjustments, if necessary), for all of the common recognizer error conditions: rejection, no speech timeout, too much speech, spoke too early, recognizer too slow, and unexpected key.
The default error handlers for each of these error types play an error prompt and then attempt to reexecute the dialog. The error prompt is generated by combining an application-wide error prompt that is specific to the type of error with a generic prompt provided by the current SpeechObject. For example, if a no-speech timeout occurs while a Yes/No SpeechObject dialog is executing, the framework concatenates the application-wide error prompt "I'm sorry, I didn't hear you" and the Object error prompt "Please say 'yes' or 'no'."
The default error handling mechanism continually reexecutes the dialog until a valid response is generated or the error threshold for the object is reached. When the threshold is reached, an exception is thrown and you can implement whatever error handling behavior you prefer, such as transferring the caller to a live agent. You can also override the default error handlers for any of the defined error types through class method calls.
All prompts can be overridden through configuration parameters.
3.3 Playables
All prompts used by SpeechObjects are encapsulated using classes that implement the core interface Playable, which defines the protocol for objects that can be played over an audio channel. The framework defines a set of classes that implement Playable that provide additional prompt behavior, including:
- Concatenated prompts let you easily assemble a prompt from multiple elements-for example, from multiple audio files. The resulting prompt object can be manipulated and added to the prompt queue just as if it were a single prompt, instead of having to consider each element individually.
- Escalating prompts let you define a sequence of prompts-each time you play the prompt, the next prompt in the sequence is played. This is especially useful for help or error prompts. If the prompt needs to be repeated because the users asks for help again or the dialog generates repeated errors, the prompt can provide additional information on later repetitions.
- Context-sensitive prompts encapsulate a set of prompts and select the correct prompt to play based on a given context.
- Random prompts let you add variety to a dialog by defining a set of prompts, one of which is selected at random each time the prompt is played. You can also assign relative probabilities to each prompt in the set.
- Text-to-speech prompts let you generate synthesized prompts from a text string via a third-party text-to-speech engine.
- Silence prompts provide an easy way to generate a prompt to be used to insert a pause into a dialog prompt sequence. You can use this prompt class to automatically generate a period of silence rather than having to record your own "silence" prompt. You define the length of the pause.
3.4 Grammars
SpeechObjects are designed to let you define grammars in a variety of ways, based on the requirements for each dialog and the need to customize the grammar at runtime. Most SpeechObjects use the Nuance dynamic grammar mechanism, meaning that the grammar for a given SpeechObject is compiled and loaded onto the recognition server when the SpeechObject is constructed. This allows SpeechObjects to be reused much more easily, as you don't have to compile the grammar for each SpeechObject into a recognition package before using it.
The SpeechObjects framework grammar classes let you build your grammars in a variety of ways:
- From a text file
- This format allows grammars to easily be updated as the application is developed and tuned. You don't need to recompile your application when you change the grammar, as the grammar is dynamically generated from the file at runtime.
- From a grammar you define programmatically
- This is harder to tune but is useful when a grammar's contents are determined by criteria only available at runtime. The Alpha Digit String SpeechObject, for example, generates its grammar at runtime based on how the object is configured, including the number of letters/digits in the string and where pauses or delimiter phrases such as "dash" might occur.
You can also initialize a grammar from a file and subsequently update it programmatically.
The framework also allows compound grammars. Compound grammars let you define a single grammar object comprised of multiple grammars to be used in parallel. For example, in a corporate dialing application you might use compound grammars containing a set of employee names and a set of employee extensions, to allow speaker to dial either by name or number. The framework uses compound grammars to combine each SpeechObject's grammar with the grammar defining the set of Universal commands.
3.5 Results
The Result class is a subclass of a utility class KVSet, which defines an object used to encapsulate a set of key/value pairs. This structure is analogous to natural language slots and the values they are filled in with during recognition. Because the value stored in a KVSet can be any type of object, SpeechObjects have the flexibility to populate Result objects with any set of values that are appropriate. For example:
- SOYesNo.Result contains a single String value, "yes" or "no"
- SODate.Result contains a Java Calendar object
- MyFlightInfo.Result contains a set of values, including String values providing the codes of the origin and destination airports, an integer value representing the flight time, a Calendar object representing the flight date, and an integer representing the flight number.
The value at any given key can also be another KVSet, providing the ability to nest result structures if appropriate.
Each Result class defined by the SpeechObjects includes convenience methods allowing easy access to the specific information it encapsulates.
Result subclasses also have another characteristic, which is that they can be played over the current audio output device. They implement the Playable interface, which allows objects to be appended to the prompt queue and then played by a SpeechChannel or other object that supports audio playback.
This lets you easily play the recognized information, for example, for confirmation dialogs or during testing.
3.6 Redo Objects
Many SpeechObjects are used in conjunction with ConfirmAndCorrect, which confirms all of the information obtained by those SpeechObjects, and upon a negative confirmation, identifies which information needs to be corrected (e.g., "Would you like to change the date, the time, or the telephone number?"). The SpeechObjects corresponding to the piece ("the date") or pieces ("the date and the time") of information that need correction is then re-invoked (e.g., re-invoke the Date object), prompting the user for the information again.
To promote better dialog, rather than simply re-invoking the same SpeechObject again during this error-correcting phase, a SpeechObject may offer a "RedoObject" which should be used to re-obtain the desired information. This RedoObject may simply ask for the information in a different manner, by changing prompts as appropriate ("Please say the date again."). Alternatively, the "RedoObject" may actually employ a different dialog strategy, perhaps breaking up the task into a set of smaller tasks in order to facilitate recognition of complex items. RedoObjects typically share the same SOKey ('object instance name') as their original SpeechObject in order to share n-best information from the original SpeechObject. SpeechObjects that do not employ a RedoObject may return "null" to indicate that the same instance should be used during this error-correction phase of a confirmation dialog.
3.7 Identifiable interface
Many of the SpeechObjects implement the Identifiable interface, which enables them to be used in the Confirm and Correct SpeechObject. The Identification phase of the Confirm and Correct process makes use of
- IdentifyPrompt - a Playable representing the Object in a spoken list, and
- IdentifyExpression - a grammar expression covering the ways in which the user might refer to the Object.
Here's a sample of how the prompt might be used by the system (with the prompt highlighted):
Which would you like to change - the departure city or the arrival city?
The corresponding grammar expression for a derived ArrivalCity SpeechObject might then accept phrases like "the arrival city", "the destination", or "destination city".
3.8 Inherited parameters and/or return values
Many SpeechObjects inherit parameters, return values, and behavior from other SpeechObjects. These relationships are helpful in understanding what parameters might possibly be common (in syntax and behavior) across a large number of Objects. A simplified inheritance diagram for all of the SpeechObjects in this document is shown below.
?
4. SpeechObject specifications
Although SpeechObjects as implemented in Java have method calls for setting and getting various values, the specification below is restricted to listing only JavaBean properties of the SpeechObjects, i.e. properties for which there are both "get" and "set" methods. While this restriction limits configuration to discrete parameters which may be changed but not added to (1), it also results in a cleaner interface for the users of the Objects - these properties may be edited in a GUI, set and retrieved in a scripting environment, etc.
Parameter type and return type descriptions can be found in the appendix.
Parameters and return values common to all SpeechObjects
Configuration parameters:
Parameter |
Type |
Description |
RedoObject |
SpeechObject |
New object to call in case the caller negatively confirms the result from the original object in a confirmation scenario |
SOKey |
String |
Name for this instance's family (i.e., the object itself plus any redo objects for this object) |
Return values:
Return value |
Type |
Description |
getNextResult |
Next Result in n-best list, or null if no more |
|
requiredAdditionalInteraction |
boolean |
Boolean indicating whether or not additional interaction between the SO and the caller was required in obtaining this result. Typically, this means that the SO has already done any needed disambiguation |
isAutoConfirmed |
boolean |
Boolean indicating whether or not this Result has already been confirmed |
?
Description:
This SpeechObject does not implement a specific dialog -- it simply provides the framework for a dialog. The default behavior is:
- Append the SpeechObject's initial prompt to the current prompt buffer and play the buffer
- Wait for speech and send it to the recognizer for recognition, using the top-level grammar currently set by the SpeechObject
- If recognition was successful, pass the result to the SpeechObject's result processing methods and return the final result
- If recognition was not successful, perform the necessary error handling and attempt the dialog again
Configuration parameters:
Parameter |
Type |
Description |
Filter |
Used to examine n-best SpeechObject.Results and filter out invalid results |
|
Grammar |
The grammar used for recognition |
|
HelpPrompt |
This prompt is played if the user requests help |
|
InitialPrompt |
Unless an error occurs or the user requests help, this is the prompt that is played before recognition |
|
MaxErrorCount |
int |
The maximum number of errors (rejections, timeouts, or unexpected dtmf keypresses) permitted before the SpeechObject gives up |
MaxHelpCount |
int |
The maximum number of help requests permitted before the SpeechObject gives up |
NoResultFoundPrompt |
This prompt is played after a valid recognition but when none of the candidates in the n-best list are successfully processed into a SpeechObject.Result (e.g. if the entries fail to pass this Object's Filter) |
|
NoSpeechTimeoutPrompt |
This prompt is played when a recognition error code of "no speech timeout" is returned by the recognizer |
|
RecognitionErrorPrompt |
This prompt is played by default when a recognition error occurs unless a more specific error prompt is defined |
|
RecognizerTooSlowTimeoutPrompt |
This prompt is played when a recognition error code of "recognizer too slow timeout" is returned by the recognizer |
|
RejectedPrompt |
This prompt is played when a recognition error code of "rejected" is returned by the recognizer |
|
ReturnAllPossibleResults |
boolean |
If true, this Object returns an entire n-best list of SpeechObject.Results. Otherwise, it will return only the first valid result it interprets and processes |
SpeechTooEarlyPrompt |
This prompt is played when a recognition error code of "speech too early" is returned by the recognizer |
|
TooMuchSpeechTimeoutPrompt |
This prompt is played when a recognition error code of "too much speech timeout" is returned by the recognizer |
|
UnexpectedKeyPrompt |
This prompt is played when a recognition error code of "unexpected_key" is returned by the recognizer |
?
Return results:
Return value |
Type |
Description |
toString |
String |
A String representation of this Object's recognized result |
?
?
Description:
This SpeechObject expects an answer to a yes-or-no question.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
StrictGrammar |
boolean |
If true, loads and uses a limited (strict) grammar to maximize performance |
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
YesNo |
String |
String indicating yes or no |
saidYes |
boolean |
True if the user said yes, false otherwise |
saidNo |
boolean |
True if the user said no, false otherwise |
?
Description:
This SpeechObject recognizes quantities of items. By default, this SpeechObject recognizes 1-4 digit (0-9,999) quantities and has an absolute range of 1-8 digits (0-99,999,999). A developer can (and should) configure this SpeechObject to recognize quantities only within a certain range by setting the minDigits and maxDigits properties, as appropriate for a specific domain and application. The Quantity SpeechObject does not itself perform any confirmation or validity checking. The range of numbers that the speaker is allowed to say is limited by limiting the grammar used for recognition to that range. If the speaker says a number that is out of the current range, the utterance is rejected by the recognizer.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
MaxDigits |
int |
Maximum allowed number of digits for the quantity that will be recognized (e.g. 4 => '9999') |
MinDigits |
int |
Minimum allowed number of digits for the quantity that will be recognized (e.g. 2 => '10') |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Quantity |
int |
The quantity recognized |
?
Description:
This SpeechObject can be configured to recognize a string of digits of a fixed length. When NumberDigits is set, the SpeechObject automatically creates a grammar for recognizing that number of digits (without natural numbers). The Simple Digit String Speech Object does not itself perform any confirmation or validity checking. If there are specific constraints on what constitutes a valid number string for the controlling application, using the result filter mechanism to filter out inconsistent hypotheses is highly recommended.
?
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
NumberDigits |
int |
Number of digits to be recognized |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
DigitString |
String |
The recognized digit string |
?
Description:
This SpeechObject prompts for and interprets a date. The date may be specified in one of many formats, including just getting a day of the week, a day of the month, a relative date (today, tomorrow, yesterday, next tuesday), and so forth. More complex expressions which specify the date in multiple ways are allowed (tomorrow, December 12th); consistency of such dates are checked, and if the date is inconsistent, the user will be reprompted for the date with an appropriate error message. Invalid dates, such as April 31, are similarly disallowed, causing reprompting for a date.
The speech object makes an effort to interpret the date intelligently:
If the day of week is given, such as Thursday, the SpeechObject interprets the date as if it were the upcoming Thursday. For example, if today is Monday, February 15, 1999, and the caller said "Thursday", the SpeechObject would interpret this as Thursday, February 18, 1999.
If the day of month is given, such as the 27th, and this day is later than the current day (for example, February 15), this SpeechObject assumes the date is in the same month as the current date. For example, the 27th would be interpreted as Saturday, February 27, 1999.
If the day of month is given, such as the 5th, and this day is a number less than the current date (for example, February 15), the SpeechObject assumes the day is for the next month. In this example, the 5th would be interpreted as Friday, March 5, 1999. If the next month is January, then the SpeechObject assumes the date is in the following year as well.
If the month is before the current month, the Date SpeechObject assumes the caller intends this date in the following year. For example, if the caller said January 3, this would be interpreted as January 3, 2000. If the caller says "today", the SpeechObject determines the current date unless specified by the developer.
When the caller says only a month, the SpeechObject will follow up by prompting the caller to specify the day of the month. This is actually implemented by the invocation of a default DisambiguateTime SpeechObject, which may be overridden.
The SpeechObject performs the following validation checking of the recognized date:
When inconsistent information is provided by the caller, such as a conflicting day of month and day of week (for example, Tuesday, February 15, 1999), the Date SpeechObject plays a prompt that identifies the correct information (February 15th is a Monday) and then reprompts the caller.
Likewise, if the Modifier such as "today" is inconsistent with the day of month, the SpeechObject will play a prompt specifying what 'today's' date is and reprompt the caller.
Invalid date handling:
When the caller responds with an invalid date such as "February 30", the SpeechObject plays a prompt that explains why this date is invalid "... there are only 30 days in April," and then reprompts the caller.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DateTooEarlyPrompt |
Prompt played if the stated date is before the lower DateLimit, e.g. "I'm sorry, I thought you said 'February 12th, 1985' but that day is too far in the past" |
|
DateTooLatePrompt |
Prompt played if the stated date is after the upper DateLimit, e.g. "I'm sorry, I thought you said 'February 12th, 2085' but that day is too far in the future" |
|
DayOfMonthSO |
SODayOfMonth instance used to obtain a day of month when just the month or just the month and year are specified |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
InconsistentDayOfWeekPrompt |
Prompt played if the stated date includes a day of week that doesn't match, e.g. "I'm sorry, I thought you said 'Tuesday, December 10th', but December 10th is a Friday." |
|
InconsistentModifiedDayOfWeekPrompt |
Prompt played if the stated date includes a modified day of week that doesn't match, e.g. "I'm sorry, I thought you said 'next Tuesday, December 10th', but next Tuesday is December 14th." |
|
InconsistentNamedTodayEtcPrompt |
Prompt played if the stated date includes a today expression as well as a day of week that actually refers to another date, e.g. "I'm sorry, I thought you said 'today, Tuesday, December 10th', but today is December 14th." |
|
InconsistentTodayEtcPrompt |
Prompt played if the stated date includes a today expression that refers to another date, e.g.,"I'm sorry, I thought you said 'today, December tenth', but today is December fourteenth." |
|
InvalidDatePrompt |
Prompt played if the stated date is invalid (the day of month exceeds the number for the month |
|
LowerDate |
java.util.Calendar or int or SODate.DateLimit |
Earliest permissible date, represented by a Calendar object, an offset in days, or a DateLimit object |
UpperDate |
java.util.Calendar or int or SODate.DateLimit |
Latest permissible date, represented by a Calendar object, an offset in days, or a DateLimit object |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Calendar |
java.util.Calendar |
The date as a Calendar object |
DayOfMonth |
int |
Day of month specified by the caller |
DayOfWeek |
int |
Day of week specified by the caller |
Month |
int |
Month represented as an integer between 1 and 12 |
Year |
int |
Year represented as a four-digit integer |
?
Description:
This SpeechObject defines a generic dialog for getting a time expression from the speaker.
The Time SpeechObject is generic and may be specialized (through modification of parameters, prompts, and/or grammars) for use in a range of applications, for example, flight information and reservation systems, personal agenda management, or package delivery/pickup scheduling.
In response to a prompt requesting the time, the caller speaks the time in a natural way (i.e. the time using natural expressions such as "in the morning" or "at night" as well as "am" or "pm".) The Time SpeechObject recognizes a clock time, for example, "three forty-five am". If the time is ambiguous (am/pm not specified), the SpeechObject conducts any additional dialog with the caller needed to ensure that an unambiguous time is obtained. This dialog is implemented by invoking an instance of the DisambiguateTime SpeechObject.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DisambigObject |
Sets object that disambiguates ambiguous times, e.g. '10' => 10 am or 10 pm |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
InconsistentTimePrompt |
Prompt played when the user response in the disambiguation dialog is inconsistent with the original time they said. For example, the user is asked to disambiguate if 11 o'clock is in the morning or evening, and replies "in the afternoon". |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
AM_PM |
String |
Returns whether the time said by the caller was AM or PM |
Calendar |
java.util.Calendar |
Returns the time in a Calendar representation |
ClockTime |
int |
A numerical representation of the time |
ClockTimePlayable |
The time as a Playable in the standard format (with trailing "am" or "pm") |
|
Hours |
int |
The hour portion of the time said by the caller |
Minutes |
int |
The minutes portion of the time said by the caller |
SmartTimePlayable |
A time Playable in the "intelligent" (colloquial) format (for example, "7 in the evening", "5 in the morning", "noon") |
|
UserStatedModifier |
String |
Any user-stated modifier that disambiguated the time |
?
Description:
The Menu SpeechObject does not itself define a default dialog. The dialog is generated dynamically based on the number of items defined. The dialog presents the list of menu items and allows the caller to choose one of them. It enables the developer to dynamically build menus from pairs of grammars and prompt atoms, and in addition it permits the developer to associate a listener with any of the items so that the listener's action is performed in response to selecting the item.
The menu may be defined dynamically by calling a method that adds menu items sometime before invocation. Each menu item is defined in terms of:
- an item name, which is the text string returned in the result if the item is selected
- an optional Playable for representing the item in the menu prompts if they are autogenerated
- an optional grammar expression to trigger selection of this item
?
Note that at this time the menu items cannot be set merely by setting JavaBean properties.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
ErrorPromptPostfix |
Audio to use as the postfix of the error prompt (if the prompt is being auto-generated) |
|
ErrorPromptPrefix |
Audio to use as the prefix of the error prompt (if the prompt is being auto-generated). |
|
HelpPromptPostfix |
Audio to use as the postfix of the help prompt (if the prompt is being auto-generated) |
|
HelpPromptPrefix |
Audio to use as the prefix of the help prompt (if the prompt is being auto-generated) |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
InitialPromptPostfix |
Audio to use as the postfix of the initial prompt (if the prompt is being auto-generated) |
|
InitialPromptPrefix |
Audio to use as the prefix of the initial prompt (if the prompt is being auto-generated) |
|
ItemListPrompt |
Prompt that is an explicit listing of all the menu items |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
ItemName |
String |
The name of the selected item |
RecResult |
The recognition result of the interaction |
?
Description:
This SpeechObject prompts for and recognizes a dollar and cent amount in one utterance. If neccessary, disambiguation is performed, for utterances like "seven fifty". This disambiguation is performed by invoking a default instance of the DisambiguateCurrency SpeechObject (which may of course be overridden). This SpeechObject provides a DTMF backoff strategy if the caller encounters recognition problems.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DisambigObject |
Object that disambiguates ambiguous currencies, e.g. 'ten fifty' => $10.50 or $1050 |
|
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
Range |
Sets the allowed value range (also propagated to the disambiguation object) |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Amount |
float |
Floating point number indicating the recognized amount of dollars and cents |
Cents |
int |
Integer indicating the recognized amount of cents |
Dollars |
int |
Integer indicating the recognized dollar amount, not including cents |
?
North American Telephone Number
Description:
This SpeechObject prompts for and obtains a telephone number from the user, in the standard 10-digit format used in Canada, Mexico, and USA.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
UseNatural |
boolean |
If true, allows natural numbers within each section, e.g. 'six five oh, eight four seven, eleven fifty five' |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
AreaCode |
String |
The first 3 digits of the 10-digit recognized phone number |
Exchange |
String |
The second set of 3 digits of the 10 digit recognized phone number |
Subscriber |
String |
The last 4 digits of the 10 digit recognized phone number |
PhoneNumber |
String |
Entire phone number as a string |
?
Description:
This SpeechObject can be configured to prompt for and recognize alphanumeric digit-strings which may consist of sections, such as an account credit card number, social security number, and the such. Natural numbers are optionally allowed within each section.
As with the Sectioned Digit String SpeechObject, each format of the sectioning is specified as a '-' delimited string. Each format should be of the form:
DDD-DD-DDDD or DDD-DD-AADD
and so forth. The first formatting specifies that the digitstring grammar should recognize a section of three digits, two digits, and four digits (e.g., a Social Security Number). The letter D is used for a digit (0-9), and the letter A is used for any alpha (A-Z).
One can also use a user-defined group to recognize a subset of the alphabet and optionally allow for digits in certain positions as well as an alpha. For example, one could define "V" to correspond to "AEIOU" - the vowels. This is useful for when only certain letters are allowed in a position within the digit-string - the automatically generated grammar can reflect this constraint directly.
Configuration parameters:
All configuration parameters of the Dialog and Sectioned Digit String SpeechObjects, plus
Parameter |
Type |
Description |
Group |
Group[] |
Defines the groups |
Group |
(int, Group) |
Defines a specific group |
?
Return results:
All return results of the Dialog and Sectioned Digit String SpeechObject.
?
Description:
The Confirm and Correct SpeechObject can be used to have the caller confirm one or more pieces of information together, and correct (by invoking suitable SpeechObjects) any pieces of information that are incorrect. The items that can be so confirmed, identified, and/or corrected must be SpeechObjects implementing the Identifiable interface. This is done in the three phases of Confirmation, Identification, and Correction:
- The first phase is the Confirmation phase. During the Confirmation phase, the inner Confirmation object is invoked to play the confirmation prompt ("...is this correct?"), so that the caller can indicate whether all the information is correct. If the caller answers in the affirmative, this SpeechObject is finished.
- If the caller indicates that information is not completely correct, Confirm and Correct moves on to the Identification phase, invoking the inner Identify object. During this phase, the caller identifies which piece(s) of information need to be corrected. The caller can respond by indicating up to two items that are incorrect -- or the caller can specify that all of the information is wrong; the caller can also answer that none of the items is incorrect, in which case execution returns to the Confirmation phase, and starts over.
- If the caller Identifies at least one incorrect piece of information during the Identification phase, execution moves on to the Correction phase. During the Correction phase, Confirm and Correct obtains the re-do object for the SpeechObject whose Result is wrong. After getting and invoking each re-do object, execution returns to the Confirmation phase, to confirm all of the contained SpeechObject.Results.
At the end of a successful invocation (i.e., after all results have been confirmed), the Confirm and Correct SpeechObject returns a Result that contains all the results of the contained SpeechObjects, with each contained SpeechObject Result stored under the contained SpeechObject's SO key. For example, if the contained SpeechObjects are SODate and SOTime, the Result instance returned by Confirm and Correct will contain an SODate.Result stored under SODate's SO key, and an SOTime.Result stored under SOTime's SOKey.
Configuration parameters:
Parameter |
Type |
Description |
Confirmation |
The object that performs confirmation |
|
GetInitialResultsIfNeeded |
boolean |
If true, will initially invoke all contained SpeechObjects that have not yet obtained results |
Identify |
The object that identifies which information needs to be corrected |
|
MaxRetryCount |
int |
Maximum number of retries attempted by Confirm and Correct |
SpeechObject |
SpeechObject[] |
SpeechObjects to be contained (confirmed/corrected) |
SpeechObject |
(int, SpeechObject) |
Adds/sets a SpeechObject for confirmation/correction |
?
Return results:
Return value |
Type |
Description |
SOKeysEnum |
Enumeration |
Enumeration of contained SpeechObjects' Result keys |
?
Description:
The Browsable Selection SpeechObject acts similarly to the Browsable List SpeechObject except that it also supports a "select" command the caller can use to select the current item being browsed.
Configuration parameters:
All configuration parameters of the Browsable List SpeechObject, plus
Parameter |
Type |
Description |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
SelectionExpression |
Application-specific grammar rule to specify that the current item is to be selected |
?
Return results:
All return results of the Browsable List SpeechObject.
?
Description:
The Browsable Action List SpeechObject acts similarly to the Browsable List SpeechObject except that the user can add any custom command and an associated handler into the list. When a custom command is spoken, the corresponding handler is fired to handle it.
Note that at this time there is no way to specify these custom commands and handlers using JavaBean properties.
Configuration parameters:
All configuration parameters of the Browsable List SpeechObject.
Return results:
All return results of the Browsable List SpeechObject.
?
Description:
This SpeechObject will collect either a 5- or 9-digit US ZIP code. The filter used to validate 5-digit codes is based on a list of currently existing codes issued by the U.S. Postal Service. The 4-digit extension, if spoken, is not validated. It is possible to disable the filter if you want to accept any 5-digit code.
?
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
FilterDisabled |
boolean |
True prevents the recognized result from being validated |
FiveDigitsOnly |
boolean |
True restricts recognition to 5 digits |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
Extension |
String |
Either a string representation of the last 4 digits (if a 9-digit zip code was recognized) or null (if a 5-digit zip code). |
ZipCode |
String |
String representation of the 5 digit zip code (or first 5 digits, if a 9-digit zip code was recognized) |
?
Description:
This speech object encapsulates the functionality of acquiring information on credit card type, credit card number, and credit card expiration date.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
AcceptExpiredCard |
boolean |
If true, credit cards that are expired before today are accepted |
AllTypesEnabled |
boolean |
If true, all eight built-in credit card types are acceptable |
CardTypeEnabled |
(int, boolean) |
Sets whether a specific card type is accepted or not |
CreditCardExpirationDateSpeechObject |
SpeechObject |
Internal expiration date speech object |
CreditCardInfoCANDCSpeechObject |
SpeechObject |
Internal confirm and correct speech object |
CreditCardNumberSpeechObject |
SpeechObject |
Internal card number speech object |
CreditCardTypeSpeechObject |
SpeechObject |
Internal card type speech object |
InitialState |
String |
Initial state for the call-flow |
PreamblePrompt |
Prompt that is played at the beginning of the dialog |
|
TypeQueryExplicit |
boolean |
If true, the credit card type is queried explicitly |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
CreditCardExpirationMonth |
int |
Credit card expiration month. |
CreditCardExpirationYear |
int |
Credit card expiration year |
CreditCardNumber |
String |
Credit card number as digit string |
CreditCardType |
String |
Credit card type as string |
ResultStatus |
int |
Result status |
isResultOk |
boolean |
Whether the result is Ok or not |
?
Description:
This SpeechObject can be configured to recognize a string of digits broken up into various sections. Alternate sectionings can be provided, and the use of natural numbers in the grammar can be enabled. The maximum length of a section is six digits.
Each format of the sectioning is specified as a '-' delimited string; the number of digits in each section of a given format is specified by a sequence of 'D' characters. For example,
"DDD-DDD-DDDD"
specifies a sectioning of three digits, three digits, and four digits.
"DDD-DD-DDDDD"
specifies a sectioning of three digits, two digits, and five digits.
Developers can also set the delimiter that may be spoken by callers when reading the sectioned digitstring. By default, this is a "dash", but this can be changed to any single word or valid GSL expression, such as "dot" or "[dash dot]" or null, etc.
Natural numbers can also be enabled through a simple property setting.
This SpeechObject does not itself perform any confirmation or validity checking. If there are specific constraints on what constitutes a valid digit string for your application, using the result filter mechanism to filter out inconsistent hypotheses is highly recommended.
The configuration of the digit string -- that is, the number of sections and the length of each section -- determines the construction of the grammar used for recognition. If the speaker says a digit string that does not match one of the defined patterns, the recognizer rejects the utterance.
Configuration parameters:
All configuration parameters of the Dialog SpeechObject, plus
Parameter |
Type |
Description |
DelimiterExpression |
String |
The (optional) delimeter expression used in the grammar between sections of the alphadigit string (dash, dot, etc.) |
DelimiterPrompt |
Audio played between sections of the recognized digit string (e.g. 'dash.wav') |
|
Format |
String[] |
Defines formats of all sections of the string |
Format |
(int, String) |
Defines the format for the given section of the string (e.g. 'DDD-DD-DDDD') |
Format |
Defines formats of all sections of the string |
|
Format |
(int, WeightedFormat) |
Defines the format for the given section of the string |
IdentifyExpression |
Identification expression for the Identifiable interface |
|
IdentifyPrompt |
Identification prompt for the Identifiable interface |
|
UseNatural |
boolean |
If true, allows natural numbers within each numeric section, e.g. 'three six two, fifty seven, eleven hundred' |
?
Return results:
All return results of the Dialog SpeechObject, plus
Return value |
Type |
Description |
DigitString |
String |
The recognized string without any section delimeters |
Section |
String[] |
The sections of the recognized string |
SectionedDigitString |
String |
The recognized string with '-' between sections, eg. "52-764". |
?
Description:
The Browsable List SpeechObject allows the caller to hear items in a list in sequence, and navigate through the list. The object provides methods through which the developer may dynamically define the list. The list of items to be browsed is encapsulated within a Browsable object.
When invoked, the list plays prompts associated with items, one after another. Depending on configuration, the list advances automatically or through a "next" command to the next item. Based on configuration, the list may terminate automatically at the end, or as the result of an "exit list" command.
The general dialog flow is as follows:
The dialog begins with a preamble if enabled (recognition state which accepts relevant navigational commands), which automatically advances to the first item. Some commands are invalid in the preamble. (For example, "previous" or application-specific commands like delete). The invalid commands are handled as errors. For each item, the item prompt is played, with optional pre-pending and post-pending prompts. Navigation and application-specific commands are active. If enabled, a timeout automatically advances list to the next item. If enabled, a timeout automatically exits the list after the last item.
The default navigation commands are: next, previous, last, first, and exit.
?
Configuration parameters:
Parameter |
Type |
Description |
AutoAdvance |
boolean |
If set to true, forces the list to advance automatically to the next item if there is no response from the user |
Browsable |
Object providing access to items to be browsed |
|
ExitPrompt |
Prompt played when the list exits |
|
FirstPrePrompt |
The pre-prompt played before the first item |
|
Grammar |
The grammar used for both preamble and list item recognitions |
|
LastPrePrompt |
The pre-prompt played before the last item |
|
ListNoSpeechTimeoutPrompt |
The prompt played if there is a "no-speech" timeout when the user says a command after the list item |
|
ListRecognizerTooSlowTimeoutPrompt |
The prompt played if there is a "recognizer-too-slow" timeout when the user says a command after the list item |
|
ListRejectedPrompt |
The prompt played if there is a rejection when the user says a command after the list item |
|
ListSpeechTooEarlyPrompt |
The prompt played if there is a "speech-too-early" condition when the user says a command after the list item |
|
ListTooMuchSpeechTimeoutPrompt |
The prompt played if there is a "too-much-speech" timeout when the user says a command after the list item |
|
ListUnexpectedKeyPrompt |
The prompt played if the user presses a dtmf key instead of speaking a command after the list item |
|
MultiItemListErrorPrompt |
The recognition error prompt when the list has more than one item |
|
MultiItemListHelpPrompt |
The help prompt when the list has more than one item |
|
NextPrePrompt |
The pre-prompt played when the user says "next", or when the list auto-advances to the next item |
|
OnlyItemListErrorPrompt |
The recognition error prompt when the list has only one item |
|
OnlyItemListHelpPrompt |
The help prompt when the list has only one item |
|
OnlyItemPrePrompt |
The pre-prompt played when there is only one item |
|
PreambleHelpPrompt |
The prompt played if the user asks for help during the preamble |
|
PreamblePrompt |
The prompt that is played only once when the user first enters the list |
|
PreambleRecognitionErrorPrompt |
The prompt played if there is a recognition error in the preamble |
|
PreviousPrePrompt |
The pre-prompt played when the user says "previous" |
|
ReturnAtEnd |
boolean |
If true, list will automatically exit when it reaches the end |
?
Return results:
Return value |
Type |
Description |
exitedFromList |
boolean |
True if the user exited during the list portion |
exitedFromPreamble |
boolean |
True if the user exited during the preamble |
Index |
int |
The index of the item on which the list exited |
toString |
String |
The index of the item on which the list exited, in string form. If the list exited in the preamble, returns the string "PREAMBLE" |
5. Appendix
This appendix describes the types of the configuration parameters and return values.
- Browsable
- One class implementing this interface is BrowsableVector, which includes methods to add to, set, and check items in the list of browsable items. An "item" consists of a Playable, which will be played as the name of the item when the list is browsed, and optional arbitrary user data (to be returned if the item is selected).
- Confirmation
- This class is used by Confirm and Correct during the Confirmation phase to confirm the values of the Speech Objects being managed by Confirm and Correct. It will ask a question like "I have you flying from Boston to New York on Friday, May 5th. Is that correct?" It is rare that a developer will need to override the default instance for this class.
- Expression
- An Expression is basically an encoded representation of the right-hand side of a arbitrarily-complex grammar production rule, including semantic tags and probabilities.
- Grammar
- Extensions of this class include both static and dynamic grammars, specified in code or in files, and combinations thereof. For all grammars it is possible to set the top-level Rulename (a String) that should be used for recognition. These grammars can include probabilities and semantic tags.
- Group
- This class allows you to define a group: a single-letter name that represents a collection of letters. This group name, along with others, can then be used in a format string for the Alpha Digit String SpeechObject. Each group can also enable or disable digit recognition.
- Identify
- This class is used by Confirm and Correct during the Identification phases to obtain the user's selection of which Object's value was incorrect. It will ask a question like "Which would you like to change - X, Y, or Z?" It is rare that a developer will need to override the default instance for this class.
- Playable
- There are many classes that implement the Playable interface. Approximately speaking, a Playable can be any concatenation of silence, recorded audio, tts, random prompts (2), and escalating prompts.(3) See Section 3.3 for a longer explanation of Playables.
- Range
- This class is a property of US Currency that captures the range as an order of magntitude. For example Range[1,3] captures a range of $0 to $999. By default, the Range is [1,8].
- RecResult
- This is the standard result object from a single recognition and includes such things as a recognized text string, a confidence score, natural language interpretations and their scores.
- ResultFilter
- This class (actually an interface) provides one method, pass, which examines a SpeechObject.Result and returns a boolean indicating whether or not the specified result passes this filter. It is used internally by the Dialog SpeechObject and its subclasses to postprocess recognition results.
- SODate.DateLimit
- Base class allowing absolute and relative date specifiers to set a lower or upper date for use by the Date SpeechObject.
- SODayOfMonth
- The SpeechObject responsible for getting the date of the month. It is invoked by the Date SpeechObject if it is determined that the day of the month has not been specified and cannot be inferred.
- SODisambiguateCurrency
- The SpeechObject responsible for disambiguating an ambiguous currency expression. When invoked by the Currency SpeechObject, it lists all the possible interpretations of the recognition result and asks the caller to specify the actual amount in dollars and cents. For example, if the caller said "two fifty" it asks, "Did you mean two dollars fifty cents or two hundred and fifty dollars?". The caller inputs an amount in this state that will override the amount obtained by the Currency SpeechObject.
- SODisambiguateTime
- The SpeechObject responsible for disambiguating an ambiguous time expression. When invoked by the Time SpeechObject, it asks whether the time is "in the morning or in the evening", or "in the afternoon or in the morning", and so on, depending on the candidate time (i.e. the time to be disambiguated). For the value "12", for example, it asks "Is that twelve noon or midnight?"
- WeightedFormat
- This is a convenience class used by the Sectioned Digit String SpeechObject to represent a weighted format. It consists of a string representing the sectioning, for example "DDDD-DD-DD", and of an associated probability.
Footnotes
(1) e.g. one could set a parameter to have a linked list as a value but not to add an element to the end of the list - setting to a value is allowed, but executing a function on the value is not. Of course, the calling application is free to check the value, compute a new value using this value, and set the parameter to the new value.
(2) a set of Playables, one of which is selected at random each time the random prompt is to be played.
(3) an ordered set of Playables 1 ... n such that 1 is played the first time the escalating prompt is to be played, 2 the second, and so on.