Speech recognition: computers versus humans

One could define “speech recognition” as the conversion of the spoken word to the written word, and typically this would be one person speaking either directly or by tape or digitally to another, who then transcribes. Some would also think of software applications when using the term “speech recognition” – but who would win in a “transcription bake-off”?

At Document Direct we use humans to transcribe the spoken word as ultimately, it’s the end point that is the most important – ie an accurately transcribed and formatted document produced from the spoken word.  

It’s interesting to note that a key “selling point” from the software vendor is that of improving turnaround time and not necessarily a cost saving exercise, nor better accuracy.  That’s because all computer recognised transcription must be proof read, edited in the application so that the speech recognition engine can learn, and then formatted (if required). Claims of 80%, 90% 95% accuracy mean one thing to us - that 100% of most documents produced by computer recognised speech recognition are probably wrong!  The turnaround time may be improved as the editor can speed up the recorded voice while quickly proofing the text.  However, if this task is left to a secretary to perform then there is the downside that work cannot be produced out of office hours.  

Software has its place and there’s clearly a lot of take up of people trialling the computer driven process.  The ability to program voice commands to operate windows (rather than using a mouse) and dictate emails and watch the text appear on your screen can be much faster than a lawyer typing text themselves.  But it takes time to train the speech recognition engine.  There is still the process of proofing and editing the text so that the engine can learn and improve - and admitted with frustration by the IT budget holder, most give up after 20 days of effort.  The software vendors are then fighting an unwinnable battle to offer consultancy and training to authors to help them get past the 20 day pain barrier.  It’s an almost impossible task to gather a number of lawyers for a day’s training: never mind the cost the training, there’s problems co-ordinating diaries, lost productivity, and a general lack of willingness.  

If choosing to trial speech recognition software then consider the differing aspects to each individual author’s profile.  

  • Is the author happy to work sitting at his desk or are they out and about and using mobile dictation?  Software works best when there’s little interference to the quality of the recording from background noise.  
  • Does the author dictate volume work?  There seems little point in investing in licence fees, training and time if the author only dictates 10 minutes of recording a day. 
  • Does the author work out of office hours (or more importantly, has aspirations for a more flexible work day)?  If so, then free them from the office and not have them rely on office hours support staff.
  • Does the author produce mainly complex word processed documents? This type of work is best produced by humans who have advanced Microsoft Word skills. 

At the end of the day human nature will prevail: we take the path of least resistance.  It’s far easier to ask another person for help and that’s why outsourcing dictation to a typing service works so well for the majority of authors.  Being able to talk human to human (rather than human to computer) removes the pain of investing in new systems, learning them, frees authors away from the office and having 24 hour availability/7 days a week, offers real flexibility to support the author’s goals. 

Similarly, the IT department may not want to risk the investment of time and money in IT infrastructure, licensing and training.  Speech recognition as a service (using the software engine) will provide the path of least resistance.  Why invest in systems when a service provider can do that for you on a different cost model: another form of outsourcing which provides the desired win-win solution. 

Document Direct understands author profiles, how lawyers prefer to work, how to format simple and complex documents, and how to help achieve the delivery of their aspirations of being free from the office and able to work flexibly and outside of the “9-5 box”.  

In our view, the optimum speech recognition system is someone we know and love, and comes with a name and two ears!

Add your comment

The content of this field is kept private and will not be shown publicly.