How do I transcribe audio manually

Transcribing in the practice of Melanie Hanning

The written transmission of conversations, so that Transcribe, is a time-consuming and mostly laborious process. As a rule, students at the have to deal with transcribing more and less intensively; Of course, this is particularly true of linguistics. Many underestimate the process and do not allow enough time for the transcription. This can quickly lead to stress and time constraints. Most students are also unaware that there are different forms and guidelines that must be followed depending on the research area and framework.

1. What types of transcription are there?

In practice, a distinction is usually made between two forms of transcription: between literal and smooth transcription.

Verbatim transcription

Try to imagine what it would be like to put a conversation down word for word on paper. For many that would look rather strange, because normal conversations involve stuttering, repetition and colloquial speech. Verbatim transcription aims to do just that: to capture the way something is said. The method of representation used for interjections, repetitions, broken words, dialects, colloquial language, etc. depends on the context and the specifications. For a linguistic research, the recording of accents and dialects may be relevant, for a psychological research, however, stresses, interjections and repetitions are more relevant.

An example of a verbatim transcription:

Interviewer: How old are they?

Respondent: Um, I'm 30 years old.

Interviewer: All right, all right. And, um, where do you live at the moment?

Respondent: I currently live in Sp / Berlin. (...) But I also lived in Cologne once.

Since people often do not speak grammatically correct and spoken sentences are often long, reading verbatim transcriptions takes more time and is more tedious. However, the context becomes clear and it can be better analyzed whether someone has doubts or is telling the truth, for example. In many cases verbatim transcription is required, for example in the research areas mentioned above or in a legal context.

The smoothed transcription

Smooth transcription aims at the content of a conversation. Interjections, repetitions, aborted words, etc. are ignored. The transcriber writes the conversation as grammatically correct as possible in order to keep the written text legible. A smooth transcription perfectly reproduces the content of a conversation, but not the way in which something is said. Here, too, the rules for transcription depend on the context and the specifications.

An example of a smooth transcription:

Interviewer: How old are they?

Respondent: I am 30 years old.

Interviewer: And where do you live at the moment?

Respondent: I currently live in Berlin. But I also lived in Cologne once.

This type of transcription is useful when the content of a conversation is more relevant than the context. Application examples are interviews to be published or research areas in which the content of the conversation is the focus.

Depending on how simple or complex transcription rules are used, the boundaries between literal and smooth transcription blur.

Prepare to transcribe

Since transcribing is time-consuming, it is advisable to prepare well before collecting the data, be it an audio or a video recording.

Theory and method. On the one hand, of course, this applies to having a well-thought-out interview guide or questionnaire; That means, in general terms, that one has a suitable theoretical and methodological basis for the data collection.

Interview partner. Furthermore, apart from the stringent treatment of the research questions, it is essential to have suitable interview partners, also for reasons of time.

Audio or image quality. Before collecting data, it is essential to make sure that you have a suitable recording tool. One should test, for example, how strong background or ambient noise can be heard in an audio recording or whether the frequency range of the voice is well covered. The whole point is that transcribing is time-consuming in and of itself, but transcribing a bad recording can be extremely frustrating. It can distract from the actual research project and, in the worst case, lead to the loss of important parts of the data.

If you can work sufficiently well with the data that you have created yourself or have received from other researchers, the question of the options for carrying out a transcription becomes topical.

What are the options for transcribing?

If you have to transcribe audio or video files, you basically have two options: Either you transcribe yourself, or you have the transcription done externally.

Transcribe yourself

Transcribing yourself takes a lot of time, but it also has the advantage that you go deep into your own research. Every time you listen to the audio recordings, you consciously deal with your data material, with an initial interpretation or evaluation and an overview already taking place in the back of your mind. They understand exactly what the speakers mean, know the content of the conversation better and thus save valuable time in the subsequent analysis itself. In addition, this is usually cheaper than commissioning the transcription.

Transcribing a one-hour audio recording takes two to six hours, depending on the audio quality and how experienced you are at transcribing. If you have recorded ten hours of interview material for a research question, you can count on about 20 to 60 working hours for the transcription. These are guidelines, but even with relatively little data, it is always advisable to use auxiliary tools for the transcription. There are software and hardware tools to help you and speed up the process. But don't underestimate the effort!

Transcription hardware

Professional transcribers usually work with good headphones and foot pedals. The latter are used to make audio files easier to play and stop while allowing you to keep typing with your hands. A one-hour interview is typed between 20 and 35 A4 pages. You have to start and stop the audio file thousands of times. A foot pedal makes it easier for you to get into the work flow. If you have to transcribe regularly, the investment is worth it! In addition, it is advisable to always transcribe with headphones in order to understand indistinctly spoken words.

Transcription software

There are two types of software for transcription: software without and software with automatic speech recognition.

Software without automatic speech recognition can be used to play your audio files faster or slower in a practical editor using keyboard shortcuts, or to repeat the last few seconds. Usually time codes or speakers can also be inserted in this way.

Don't underestimate the number of times you have to play and pause a recording while transcribing. Speeding up this process with keyboard shortcuts can save a lot of time. However, you are not spared typing yourself.

Here are some providers of transcription software without automatic speech recognition:

oTranscribe is free and has an online editor that can be used to upload audio and video files of various formats. All common transcription commands such as pause, rewind and inserting time stamps are provided with abbreviations, so that you can type in parallel to the audio track.

Express Scribe is an inexpensive program that provides all the basic functions for transcribing. An editor is downloaded here and keyboard shortcuts for playing, pausing and speed control are also available. It is also possible to connect a foot pedal to the editor. The trial version is free.

Transcriva (Mac): Just like Express Scribe, Transcriva allows you to easily speed up or slow down audio recordings. In addition, Transcriva has its own recording function, which makes it possible to transcribe while recording. Transcriva automatically saves your work in between. The program also offers a free trial version.

Software with automatic speech recognition have in common that they create automatic transcripts. You no longer have to type the audio recordings manually.

Here are some providers of transcription software with automatic speech recognition:

Dragon NaturallySpeaking is dictation software that you can train with your voice. After the software has been trained, you can dictate texts with almost no errors. So the transcription process consists of you repeating interviews. Commands like 'point', 'new paragraph' and 'underline' are also recognized. Dictating can save a lot of time and works very well by adapting to the way you speak, but you will have to invest time in training the software. It is also necessary to learn certain “dictation” techniques. Dictation itself usually takes more time than the original audio is long. Depending on which Dragon product you want to test, test versions are also available here.

AmberScript has developed software with automatic speech recognition. Audio and video files can be easily uploaded and within a few minutes a transcript is automatically created, which you can edit in the online editor if necessary. This also offers various practical abbreviations. You can then download your transcript in various formats. The automatically generated transcripts are of course not perfect, and technical terms and names in particular have to be adjusted again. With the appropriate audio quality, AmberScript achieves an accuracy of up to 95%. Instead of four to six hours, transcribing in this way only takes one to two hours.

You can test AmberScript's transcription software for 30 minutes of audio free of charge.

To give in order

If you do not have the time to transcribe yourself, there is of course also the option of commissioning an agency for your transcription. There are countless companies that offer transcription on the German market. AmberScript, for example, offers in addition to its transcription software also to take over the complete transcription. The automatic transcripts are then corrected by transcribers within 4-5 days.

Outsourcing the transcription is of course more expensive than transcribing it yourself. The price is determined based on the length of an audio file. Factors such as the number of speakers, intelligibility, sound quality, accents, verbatim vs. smooth transcription and the inclusion of time codes are reasons for many agencies to increase the price per audio minute. There is also the option to hire friends or freelancers, which is usually cheaper. However, you risk that the transcription will not be finished on time or that the quality will be lower than expected.

The advantage of outsourcing the transcription is the time saved. It should be noted, however, that third parties are usually not familiar with your research. Technical terms or product names are therefore not necessarily correctly transcribed, even by professional typists. Correction time should therefore also be planned for externally created transcripts!


Transcription is an essential part of academic work. Do not take the process of transcribing lightly and expect transcription to take longer than you initially anticipate. There are several tools that can help you speed up the process and save time. Allow enough time for the transcription process, obtain the necessary resources and, above all, be patient in order to lay a good foundation for your research work.