Communication support: Remote Captions using Respeaking vs Speech to Text Reporting

Communication –  spoken words, intonation, written language, sometimes a visual expression or a sign. Communication is knowing what people are saying, feeling, and a way of expressing a want a need, a thought or an order- being part of a professional and social circle where you are fully included.

Communication is taken for granted though. If you can’t communicate, it’s not because you don’t want to, it’s because it’s not accessible to you. It doesn’t mean lack of intelligence or ignorance. Having a hearing loss is one of those things that cuts you off from communication, it doesn’t mean you are dumb, it means you are excluded. If you give communication, you are entitled to receive response in a way you can follow at the level of language you have given it.

So here I am now, with my expertise and experience reporting on communication given in sounds and words and how you can access them if you have a hearing loss that is so severe you cannot make use of aids that help you hear.

I particularly love English. Reading it, writing it, and speaking it, too. Who doesn’t? I’m one of those who will correct any bad spelling or grammatical mistakes in English. I like to know EXACTLY what is said, at the TIME it is said, not afterwards, not a summary or afterthought. I am entitled to be treated equally and to get the same information as everyone else.


I’ve used Lipspeakers (“oral interpreters” they are called in the US) for many years as my preferred communication support. A Lipspeaker is a human being who listens to what is being said and relays it back to you, silently, with clear, lipreadable lip-patterns of their own. Lipreading is fine for short periods of chat, for meetings, maybe half a day, but not really for a full day. Lipreading is so tiring, you have to use your eyes and your intelligence to remember the message, but the part of your memory you use to do so is also the part that “decodes” the message, if you will. In other words, most people can hear and for them, they will be using their auditory memory as an “Aide memoire” – an extra bit of help to remember conversations. I do not have that. So lipreading takes a lot of effort, eye-strain, concentration and energy out of you. I’m also discovering as I get older, lipreading becomes a little bit harder every day.

Let me introduce you to something a little bit different. It’s another way in which you can get access to communication. It’s called Captions. Or subtitles. Or Speech To Text Reporting, STTR in the UK. Or Communication Access Real Time (CART) in the US. It is like watching subtitles of what’s being said at the time it’s said.

So when I go out to a conference, or meeting of many people for a full day and there are captions present to relay what is being said, I really enjoy myself. It is the easiest way to get full access with no stress.


There are not many STTR in the UK, around 30 to be exact. There are a lot more in the US (CART). A STTR is a human being who uses a phonetic keyboard called a Stenograph or Palantype. They listen to what is being said and they “play” the sounds into the keyboard, which then is picked up by a connecting laptop which matches the “song” to a particular word and displays that word on the screen. Within a second or two of it being played. STTR can keep up with speech at a speed of up to 250 words per minute, some particularly skilled STTRs can go up to 300 words per minute or more. It’s a highly practised skill to be a STTR, the equipment costs a lot of money, and it takes a long time to build up a dictionary of words that match the chords input to the stenograph or palantype keyboard. It also takes a lot of practise to build up speed and accuracy. A STTR, in order to become qualified has to undergo a test which will prove their accuracy to 99% or more.


Recently there has been a lot of publicity in the world I live in, about respeaking, and about voice recognition as possible alternatives to STTR. If it was as good as STTR I’d welcome it. But it is not. Let me explain why.

Let’s start with VOICE RECOGNITION. Breaking it down to what it actually is – It is a piece of software that will take sound uttered by a human being and convert it into words. All well and good. But the software isn’t “trained” to different accents, dialect, discourse or slang. Every person speaks slightly differently. So the user has to correct that, and tell the software which word they mean when they are speaking.This means Voice recognition has a very poor accuracy, and each person who uses it has to correct it to their own ways of speaking.

Way back in the 1990s, we had voice recognition software, so it’s not new. It was about 80% accurate for someone who was speaking in a very clear, concise way using perfectly pronounced “Queen’s English”. For someone with natural speech that uses lots of intonation, accent and slang, it could be as bad as 50% accurate. Not good enough to be able to fully understand what is being said using it.

Today, it’s not much different. About 90% accurate for a very clear speaker, who has spent a considerable amount of time updating the software to their particular voice. And you STILL have to train the software to the words spoken in your particular sound. You can’t do it in one or two months, it takes at least 6 months and even then, the quality is variable. It is no good for using as a method to subtitle, you only have to see it in action when you click on “Autocaptions” on a you tube video.

Click on here for Example of Autocaptions

Ok, so in order to make voice recognition more accurate, we bring in a human being. This human being has to train the voice recognition software to become familiar with THEIR voice. This is called RESPEAKING. Accuracy in respeaking will again rely on the perfectly spoken word that the software has recognised. So if someone is speaking very fast and the respeaker is in turn speaking very fast, the accuracy drops. If the respeaker has a cold, and the sound they are giving is muffled, the accuracy drops, if the respeaker is made to do it for a long time (ie more than 20 minutes on the trot) the accuracy drops.

Of course, the root words spoken by someone else, that the respeaker is listening to and relaying to the software have to be understood too. How many of you have picked up a phone call and listened to say, someone in India speaking with an accent and struggled to follow? How many of you who usually speak in a southern English accent can fully understand a Scottish accent in all it’s glory and be able to follow? Why do some people find a “Scouse” accent particularly difficult? You have to concentrate to be able to follow.

Now imagine you are a respeaker, you have to be 100% accurate in repeating what you are hearing and speak it, again, as it is coming to you, into your voice recognition software. At speed, under stress, with a perfect voice, for a long period of time. It will hurt. Your voice will falter over time, and you will need frequent breaks, with lots of water to keep your mouth moist and you’ll get very tired mentally because you also have to concentrate, to keep an eye on what is coming out of the system on the screen and making sure it’s the right word. Many words sound the same but have different spellings and meanings in English, they are called homophones. You have to make sure you’re using the right one.

Have a look at different homophones, click on a letter from this website:

Try some homophones

When the BBC decided to use respeakers instead of STTR for their subtitling, the accuracy and quality dropped. We all noticed too!


There’s a newer(ish) way to get communication support these days. It uses the internet to relay the output from a person at home, listening in to what is being said, and either using respeaking or STTR methods to change spoken language into captions. This is called REMOTE LIVE CAPTIONING or REMOTE REALTIME CAPTIONING. It’s fantastic!


The output will be compromised by several different things.

The first one is to have a good internet connection. Some remote captioning companies will tell you that it’s perfectly fine to use WiFi. It is not. WiFi signals whether they are mobile or in-house will ALWAYS only be as good as the quality of the signal. You are best having your technology hard wired to the internet. That way you will be guaranteed a good strong signal for the captioner to hear the sound clearly. (The more people who use your in-house WiFi signal, the weaker it becomes you see).

The second one is to make sure you have a good microphone set up, and that everyone who is going to speak will be able to be heard by the captioner. So you will need to remind everyone to speak clearly into that microphone, or the direction it is in. A bit like having a talking stick to control the conversation.  The captioner isn’t in the room, they are miles away, sometimes at the other end of the world. I’ve used remote captions and skype to present to a class of students in Karachi, using a captioner based in California, with the person controlling the whole based in London, UK. So the people in the room need deaf awareness, everyone, including the deaf person has a right to follow.

The third one is to make sure the captioner, too (whomever they are, respeaker, STTR or CART ) has a good, checked working knowledge of deaf awareness. The people who book and use remote captions for deaf people, may be deaf themselves. I need to be able to contact the person at the other end, both the person who is facilitating the communication and the captioner. It goes the other way too. If the captioner can’t hear what is going on, they need to talk to me. They can’t ring me you see (I’m deaf) – and perhaps the connection to the internet has been lost, so it is imperative to have a back up plan in case this happens.

The fourth one is to make sure the captioner can understand the voices being spoken over the internet. This includes accents, slang, the topic of conversation, the jargon, the dialect and the language. And to be able to relay EVERYONE’S voice. Including the voice of the deaf person themselves. For me, many people struggle with my voice on it’s own. Many deaf people DO have a “Deaf voice”. I have used respeakers in a remote captioning situation, where the captioner cannot understand me, and I’ve spoken at conferences where I’ve been highly professionally embarrassed by the lack of captions from an particular company to my own speech. Everyone else has got their speech captioned on the screen, but not me. It means deaf people in the room who are relying on the captions can’t follow what I’m saying. It’s like committing professional suicide except that I have no control over it. This is why I blatantly REFUSE to allow anyone other than a qualified STTR / CART reporter with a checked Professional Registration proving they can work with deaf people and can understand deaf voices. So don’t assume all remote caption companies are going to be the same.

The fifth is to understand that if you are using respeakers, you will not get a word for word flowing script of captions. The output will be in blocks. So for the people you see communicating in the room, you will get the captions a lot later than if you were using STTR, it may not seem like much but even 1 or 2 seconds behind is crucial because you will not be able to get a word in as by the time someone has finished talking, you are still reading what they have said, and someone else will be able to come in before you’ve even finished following the last person. This is important, you can’t interrupt a conference, you can’t ask them to stop while you are following the last bit of information, therefore you can’t partake fully in what is going on.

So….. Whatever the provider says, respeaking will never be as fast or accurate as a STTR / CART. If you have used both, you will notice the difference straightaway. You won’t get the background “noises” like “mobile phone ringing”; “someone knocking on the door”; “fire alarm is going off”; “person is talking very softly “; “someone has just come in”; “traffic noises outside”; “inaudiable” …. that is what a trained STTR (or CART)  will give you. And that is only from a captioner who is trained in deaf awareness. You also won’t get as much grammar, such as full stops, semicolons, proper sentences and strategically placed pauses in the dialogue.

I’m going to try to explain to those who think they know what I need as a deaf person. And it seems to me that all these people are hearing. They obviously haven’t used or relied on communication support. If they had nothing to help them follow a meeting but an output that mirrors the awful recent subtitling by the BBC using respeakers, they would all be up in arms. Like listening to a badly tuned radio perhaps? If they were given a programme to listen to in which they had to listen very carefully to and try to work out what was being said and it was not clear at all, chances are they’d give up or change the channel. Unfortunately, if you’re deaf, you can’t do that, there’s no other option but to put up with a substandard service because you have no choice.

To be fair on the BBC, we are beginning to see some improvement in the accuracy of the subtitling output. But that is only because they have listened to deaf people and taken action. If we didn’t tell them, nothing would have changed. So here I am, now, a deaf person reporting on the do’s and don’ts of remote live captioning. It is the same sort of kettle of fish.

When you are using or booking live remote captions, remember to ask and listen to those deaf people who have used it if they have used it before. Don’t think that because this one is cheaper than the other, it’s better or that it’s the same thing. It’s not. There is a reason to look for accuracy and quality when booking, it will hugely affect the understanding and access the deaf person is getting, much more than you think.

I am a member of a vibrant deaf / HOH community on social media, Pardon, I’m deaf. When will you listen? We need Access for All.and I have seen and read many bad reviews as well as good ones. Here are some of them….:

Just to let you know I’ve had to put an appeal in to ATW about the cost they were allowing me for captions. I tried this Australian company for 15 minutes and could not follow the conversation. It was like trying to follow the news or a footie match, big chunks of text appearing at a time with no reference to sentence ending or phrases. We talked at normal conversation speed which was, according to this same company, “very fast”. But I can’t tell everyone to slow down, it’s not going to happen in a big conference call! So, no. Definitely not suitable!

I went to Uni of Westminster a few years back and the remote communication support for all was useless. I never use captions myself as I use interpreters but I do remember the problems involved at this event. Awful! Only worked in a few rooms (hit and miss though). Was a learning point for all I think as we all know technology can be so temperamental. Having said that it can work with correct set up and connections.

Once I was at a meeting and using onsite STTR. My manager was watching the output. She said to me “What’s going on? She is making it all up”. I didn’t know, obviously, I couldn’t hear what was being said and trusted the captioner 100%. That was scary.

One of my colleagues sat next to me on Wednesday morning, watching the remote captioner working. He was stunned and amazed at what he saw.

This is the reaction when you get access, my first ever “speeches” from my smartphone, with captions. The expressions say it all 


About Suzie

Mother, Wife, Teacher, Cook and Hearing dog owner. Passionate about Equality for deaf and deafblind people. Believes in communication for all and breaking down these barriers, real and perceived. Deafened.
This entry was posted in Communication Support, Deaf awareness and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s