NaturallySpeaking
From Wikipedia, the free encyclopedia
NaturallySpeaking | |
![]() |
|
![]() A sample dictation in DragonPad, the included text editor. |
|
Developer: | Nuance Communications |
---|---|
Latest release: | 9.0 / 2 July 2006 |
OS: | Microsoft Windows |
Use: | Voice Recognition |
License: | Proprietary |
Website: | Nuance Communications Website |
- For the purpose of brevity just the unique name NaturallySpeaking is used throughout the majority of the article.
Dragon NaturallySpeaking is the speech recognition software package produced by Nuance Communications for Windows PCs.
NaturallySpeaking superimposes on top of other software. Dictation temporarily appears in a floating Results Box as words are spoken, and when a pause for breath is taken, the program will essentially transcribe or paste the words into the location of the cursor.
Like other speech recognition software, NaturallySpeaking has three primary areas of functionality. Dictation, whereby spoken language is transcribed to written text; commands that control, whereby spoken language is recognized as a command to click widgets (controls); and finally text-to-speech whereby written text is converted to synthesized audio stream. Early versions of the software had to be trained for approximately 10 minutes to recognize the user's voice, though version 9 no longer requires the initial training. It claims that using NaturallySpeaking, writing a 900 word essay would take 6 minutes, while typing 40 words per minute and writing a 900 word essay would take 22 minutes.
Contents |
[edit] Common user profiles
- Health-care industry - This is likely to be the most profitable sector for speech recognition vendors. The high cost of labor, the specialized multisyllabic vocabulary of medicine, the formalized input, and the need to access a computer while using hands for other tasks makes speech recognition a compelling tool for health-care.
- Legal industry - similar to health care
- Accessibility - speech recognition is the most effective means of using a computer for those with limited or no ability to use their hands. Many people start using speech recognition after suffering the symptoms of RSI, although voice strain and RSI of the vocal cords is a possible side-effect.
[edit] Accuracy
To this day, there are people who say the software is still too finicky and cumbersome to use, frustrating users with endless need for correction of recognition errors. The recently released version 9.0 is advertised as having a potential for 99% accuracy. For some a constant stream of 1% misrecognitions may be accepted as the cost of a blessing that overcomes serious barriers of time or typing capability, and for others may be an arduous burden. Accuracy is the most discussed topic on forums.
Initially, accuracy rates of 80-85% are reasonable to be expected.[citation needed] An expert NaturallySpeaking user can expect 98-99% recognition accuracy according to Nuance Communications, but such claims of almost perfect accuracy have never been substantiated independently. Moreover, the program itself very carefully avoids reporting on recognition rates. (DragonDictate provided recognition statistics.) The 98-99% figures are unlikely to be true; speech recognition for transcription works far better when applied to broadcast news (read by journalists chosen for their diction) than when applied to speech produced by ordinary people in casual circumstances. Anecdotal evidence points to accuracy about 95% for most users.[citation needed]
Highest accuracy is achieved with, in approximate order of effectiveness:
- A quality input signal.
- A powerful computer system.
- Adding phrases to NaturallySpeaking's vocabulary with the Vocabulary Editor.
- Using the Acoustic Optimizer.
- Correcting NaturallySpeaking's misinterpretations.
- Feeding NaturallySpeaking many proofread documents.
- Training.
Any noise in the path from the larynx to the sound chip can degrade signal quality. Causes of reducing signal quality include poor quality microphones, too much ambient noise around the speaker, excessive noise inside the case. Integrated sound cards included in laptops and many Dell, Compaq, and Hewlett-Packard desktops, do not have dedicated shielding. Noise canceling is often considered preferable and many inexpensive microphones offer excellent performance.
Speech recognition is still considered a processing intensive task even on modern computers. Speech will be recognized on Nuance's system requirements, but can be more effective with stronger equipment. With smaller vocabularies such as commands, generally accuracy will be faster and a higher percentage. Some tasks that take seconds on strong systems can take minutes on weak systems that barely qualify for installation, such as saving user files, or opening the Command Browser in especially versions prior to 9.0. The requirements for memory, processor, and free hard drive space are in practice regularly all doubled to quadrupled. However properly configuring the computer may be more important.
A key concept is NaturallySpeaking learns by correcting mistakes. Correcting misinterpretations by including adjacent words (context) helps distinguish similar sounds.
[edit] Editions and versions
The editions of NaturallySpeaking can be known as its flavors. They progress both in cost and functionality from bare transcription in Standard to the most control in Software Development Kit (SDK) editions.
The SDK editions can be considered the missing leftmost column on Nuance's Feature Comparison Matrix. It is marketed at a level above the other editions and does not appear in the often cited matrix.
The Professional edition, and the related Legal and Medical editions which come with specialized vocabularies, allow the user to create commands. Commands are also called macros, programming instructions for repetitive tasks. The Preferred version, which as of 2006 costs roughly a fifth of the Professional version, allows only macros with the single action of pasting some text or graphics into a document. The cheapest edition, Standard, has no programmable features allowing only transcription. Essentials was discontinued in 2003.
Total command-and-control requires a lot of research and support. Even Nuance has chosen not to go down that road. Nuance provides the tools to create commands, but charges for command support. This has led to a prevalence of value-added resellers (VARs), people who develop commands to solve problems such as reducing the repetition of a series of events into a few spoken words.
NaturallySpeaking can be extended by other programs. NatLink, for instance, is a tool that allows NaturallySpeaking to interact with the Python programming language.
Version | Release date | Editions |
---|---|---|
1.0 | June 1997 | Personal |
2.0 | November 1997 | Standard, Preferred, Deluxe |
3.0 | October 1998 | Point & Speak, Standard, Preferred, Professional (with optional Legal and Medical add-on products) |
3.01 | Teens | |
4.0 | August 4, 1999 | Essentials, Standard, Preferred, Professional, Legal, Medical, Mobile |
5.0 | July 2001 | Essentials, Standard, Preferred, Professional, Legal, Medical |
6.0 | November 15, 2001 | Essentials, Standard, Preferred, Professional, Legal, Medical |
7.0 | March 2003 | Essentials, Standard, Preferred, Professional, Legal, Medical |
8.0 | November 2004 | Standard, Preferred, Professional, Legal, Medical |
9.0 | July 2006 | Standard, Preferred, Professional, Legal, Medical, SDK client, SDK server |
[edit] History
NaturallySpeaking has passed through four companies and evolved considerably since its first beginnings in the early 1980s as a research prototype called DRAGON. The married couple Dr. James Baker and Dr. Janet Baker founded Dragon Systems in 1982, deciding to commercialize DRAGON when their funding was cut by ARPA. Their first product DragonDictate was sold for a number of years. Dr. James Baker departed from the conventional AI, and was a pioneer in Hidden Markov models, a way of using statistics for recognition of speech. His wife developed the expert system named Hearsay.
In March of 1990, Dragon Systems began selling DragonDictate (for DOS) at a cost of $9000 for a single-user license. As hardware became less expensive over the next several years the price decreased, and by the time NaturallySpeaking 1.0 was released, the price of DragonDictate for Windows was about $2000. The hardware of the time was not yet powerful enough to address the difficult problem of word segmentation, and was unable to determine the boundaries of words in the continuous signal that constitute human voice. Users had to pronounce one word at a time, each clearly separated by a small pause before the next. DragonDictate is based on a trigram model, and is known as a discrete speech recognition engine.
In 1997 advances in hardware technology allowed NaturallySpeaking version 1.0 to launch as the first available continuous dictation system. During this time the speech recognition industry promoted enthusiastically the notion that speech input was "the" natural modality that would eventually supersede more "primitive" methods such as keyboards. Trying to reach a mass market, vendors dropped prices to levels that were unsustainable.
Dragon Systems was faltering in 2000, and thus Lernout & Hauspie bought Dragon Systems. The dictation system bubble burst in 2001, and Lernout & Hauspie had a spectacular bankruptcy. ScanSoft Inc. bought the rights for Dragon products. In 2005, ScanSoft bought Nuance Communications , and changed the name of the newly combined entity to Nuance. This shows a particular drive of the company to move further into the Enterprise speech arena.
The software today is being advertised as potentially up to 99% accurate.
[edit] Features missing since DragonDictate
Later versions of NaturallySpeaking include a feature to ignore some types of external noise. This is the Nothing But Speech technology originally ported over from the L&H product Voice Xpress. While individual noises can't be trained as with DragonDictate there is suppression using NBS running in the background with NaturallySpeaking 8.
It is impossible to have several language versions of Dragon NaturallySpeaking installed on one system (for example: German and French). However, all non-English versions of DNS also contain the functionality to dictate in English. This problem has been rectified in DNS 8 Preferred, ALL languages can coexist and function fully on a single installation.
[edit] Competitors
[edit] ViaVoice and iListen
The main stand alone competitor to NaturallySpeaking is IBM ViaVoice, which was licensed to Nuance (formerly ScanSoft) a few years ago. Control and development remain in the hands of IBM. Functionality is similar to NaturallySpeaking, but unlike NaturallySpeaking, ViaVoice is available on Linux and Mac OS X (although these versions are no longer maintained). iListen is the leading OS X speech recognition program, but it is generally regarded as inferior to NaturallySpeaking.
[edit] Microsoft Speech API (SAPI) in Office, Tablet PCs, and Windows Vista
Speech recognition functionality built on Microsoft's Speech API (SAPI) 5.1 is included free in Microsoft Office and on all Tablet PCs running Microsoft Windows XP Tablet PC Edition. It may also be downloaded as part of the Speech SDK 5.1 for Windows® applications; but since that is aimed at developers building speech applications, it lacks any user interface, and thus is unsuitable for end users.
Windows Vista will include version 7 of the Microsoft speech recognition engine along with an improved and expanded speech-recognition interface.
[edit] Public figures who are users
- David Pogue, technology writer for The New York Times, is a long time Dragon user.
- Frank Peretti Christian novelist.
- Richard Powers, novelist who won the National Book Award for The Echo Maker, used unspecified voice recognition software.
- Richard Dreyfuss, actor, has mentioned it on several "Tonight Show" appearances. He once visited the Dragon Systems corporate headquarters to, among other things, thank the employees for making the product.
- Ed McMahon, announcer and TV personality, bought an early version of the software in the late 1990s.
- Sinbad (actor), stand-up comedian, is a tech junkie and NaturallySpeaking user. He visited the Dragon Systems trade show booth at Comdex in Las Vegas regularly in the late 1990s.
- James Randi, stage magician and scientific skeptic.
- John Ousterhout, creator of the scripting language Tcl, former professor of computer science at the University of California, Berkeley, and founder of Electric Cloud. He started using Dragon Dictate in 1996 and switched to Dragon Dictate Naturally Speaking in 1999. His notes on "Dealing with RSI" detail his use of the software to cope with repetitive stress injury, including tips on using it with Unix.
[edit] Reference
- Newquist, H. P. The Brain Makers, 1994, Sams Publishing, ISBN 0-672-30412-0
- Ousterhout, John K. Dealing with RSI http://home.pacbell.net/ouster/wrist.html
[edit] See also
[edit] External links
- Nuance
- Dragon NaturallySpeaking 9 Review
- All about Dragon NaturallySpeaking speechwiki.org
- http://www.voicerecognition.com/voice_article.html
- Dragon on Linux http://appdb.winehq.org/appview.php?versionId=3227
- Dragon Naturally Speaking - Not for kids...
[edit] Forums
- L&H Support and Updates/ScanSoft Public Forum Nuance's official forum, but badly out of date. See more here.
- KnowBrainer Leading NaturallySpeaking technical support forum
- SpeechComputing Combination of forums and blogs
- VoiceRecognition.com Forum