The voice of an angel – or a devil: How linguistic information helps the audience decode a character

–

6. april 2026

From the raspy breathing of Darth Vader to Professor Snape’s slow drawl, movies and television are full of different voices. The most iconic ones stay with the audience long after the story ends, representing something fundamental about the characters. But ordinary voices are also vital for understanding a character. This article examines how and why voices are important for characterisation. It proposes a categorisation system based on the effects and aims of a character’s voice.

While written literary works can guide readers’ perception of a character by letting the narrator describe them as ‘sweet’, ‘suspicious’, or ‘unnerving’, movies and television (collectively: telecinema (Piazza, Bednarek, and Rossi 2011)) are marked by their characters being directly perceivable to the audience. Telecinema characters need to be presented in a way, both visually and audibly, so the audience can draw these inferences by themselves. Written literary works can conjure up a character simply by mentioning their name without providing any descriptions or fill a room with characters merely by stating that people are in the room, but any character in telecinema, whether a main character or delegated to the background, will take a physical form.

In telecinema, everything about a character, e.g. their build, clothes, hair colour, ethnicity, and, if they are given a speaking role, voice, will be directly perceived by the audience and used to form an opinion of the character. As will the framings they are filmed in, their placement in the setting, the angles used, the cutting tempo of the scene, and the movement of the camera. Telecinema production teams navigate through a vast choice of costuming, mise-en-scene, camera angles, movement, background music, framing, and voices to subtly guide their audience towards the desired interpretation of a character or scene.

A tool for characterisation

Take Dolores Umbridge and Professor McGonagall’s verbal fight about appropriate disciplinary practices in Harry Potter and the Order of the Phoenix: Both characters pause, mid-conversation, at the same step while walking up the stairs, resulting in McGonagall, with her tall physique, towering above Umbridge. However, immediately following a pointed remark aimed at McGonagall, Umbridge takes a deliberate step up the stairs, switching their height difference and now looking slightly down at McGonagall. While defending herself, McGonagall follows, but when Umbridge questions McGonagall’s loyalty, McGonagall, defeated, takes a shocked step down the stairs. Immediately, Umbridge advances another step upwards, placing herself well above McGonagall before addressing the student body (fig. 1).

The characters’ relative placement to each other, both in the frame and on the physical set, is used to provide the audience with more information about the characters’ relationship and their mental state. Umbridge’s first step up the stairs is a clear indication of her trying to gain control of the situation, while McGonagall’s final step down marks her final defeat.

Many of the tools that telecinema production teams have at their disposal for communicating information about a character or situation have been aptly studied and their effects mapped out: Scenes filmed with a handheld camera are likely filled with action, instability, or uncertainty. A character seen through a low-angle shot is in control, a figure of authority. Positioning a character such that their reflection is visible on the screen reflects inner tumult, a symbolic or actual duality of the character. However, one thing that has historically been given little attention is the voice of the characters. To adequately discuss the effects of and reasons behind a character’s voice, we must first understand why linguistic variation even exists in telecinema.

Linguistic variation in the real world

Linguistic variation describes all differences between voices: Everything from pitch to slang and vocabulary to regional pronunciations of individual words (consider for example the many different pronunciations of something as simple as the word “water”: There are varying vocal sounds, not to mention different realisations of the t in the middle of the word – perhaps most memorable the glottal stop of Cockney English where the t (and the tongue!) almost gets swallowed).

Much linguistic variation can be attributed to differences between groups of speakers: Depending on, for example, a person’s age, regional location, or class, they belong to a certain group whose speech is generally marked by certain speech features. For example, the double negative is typical (and grammatical!) in African American Vernacular English (AAVE), see fig. 2 for an example, while it is often frowned upon among other groups. Likewise, slang words are often restricted to specific age groups: Consider the frequency with which “6-7” was used by teens in 2025 and the absolute confusion among older generations.

In general, humans are great at spotting patterns. And as linguistic variation takes the form of various patterns, humans naturally notice differences in speech and attribute them to different groups. In this way, speech differences are not only noticed; they are used to categorise, evaluate, and judge speakers. Some of it is as simple as recognising the area a person grew up, their native language, age, or gender. But listeners also form more ideologically potent ideas about a speaker based on their voice: Non-native speakers are perceived as less trustworthy (Lev-Ari and Keysar 2010; Kertesz et al. 2021), expert witnesses who use uptalk (that is, have a rising intonation at the end of declarative sentences,) are perceived as less confident (Levon and Ye 2020), and defendants with non-standard dialects are more often perceived as guilty (Dixon, Mahoney, and Cocks 2002). This all points to one fact: The voice is a determining factor in how we perceive the people around us, even if we are not aware of it.

Indexes and indexical fields

The association between linguistic features and specific groups and/or traits occurs through indexes. Two things are indexically linked when they co-occur so frequently that only encountering one of them brings up the mental idea of the other (Silverstein 2003; Johnstone 2016). For example, if you meet a person who uses the word wee instead of small, you are likely to assume that person is Scottish. The word wee indexes Scottishness (Stein 2023), which is a likely reason for it to be used in the opening scene of Brave (fig. 3).

Indexes are not limited to speech features and may be more readily understood by another example: Thunder so often co-occurs with lightning, rain, and dark skies that a theatre production can conjure up an entire storm simply by playing the sound of thunder (Johnstone 2016). In this manner, thunder indexes a storm. The same trick can be used in telecinema to make a sound bridge between two scenes or let the audience infer the weather situation outside of an otherwise sealed room.

Indexes are not created in a vacuum and are often part of indexical fields in which already encountered indexes affect new indexes (Eckert 2008). Returning to the hypothetical Scottish speaker from before: If the person is wearing a patterned kilt, has a beard and long reddish hair, and is carrying a bagpipe, this all supports your notion that their use of wee means the person is Scottish. In some cases, a speech feature evokes different indexes depending on what other features and indexes have been encountered previously. In many ways, this is not unlike how genres work in telecinema.

Imagine this: A woman walks around alone in her home. It is late in the evening, perhaps already night. At any rate, it is dark outside. The sound of rain can be heard, perhaps even a bit of thunder. Suddenly, there’s a knock at the door, and the woman, surprised (startled?) goes to answer.

Who is on the other side of the door? Your answer likely depends on the movie genre you are watching. If it is a scene from a romantic comedy, the male love interest will be waiting on the other side of the door, completely soaked and ready to proclaim his love for the woman. Even just hearing the knock, you, as the audience, know, in rough terms, what is about to happen, and you will immediately be filled with a hopeful anticipation (fig. 4). Are you, however, watching a horror movie, hearing the knock on the door is likely to fill you with dread. What is on the other side of the door? A supernatural monster? A serial killer? Almost certain death, that is for sure!

Indexes related to speech work in much the same way: Just like audiences interpret scenes differently depending on the genre of telecinema they are watching, listeners interpret speech differently depending on the context in which it occurs.

Linguistic variation in telecinema

At this point, three important things have been highlighted:

Linguistic variation occurs naturally in the real world for a myriad of different reasons.
Speech indexes information about a speaker, such as where they are from, their gender, or even their sexual orientation. Other, more stereotypical and ideologically potent indexes also exist, for example, indexing assumptions about the speaker’s trustworthiness, their social standing, or their intelligence.
Unlike written fiction, characters in telecinema are directly perceivable to the audience and must necessarily have a physical appearance and, in many instances, a voice. To efficiently guide the audience towards correctly interpreting the characters, the conflicts, and the resolutions, telecinema uses a wide range of tools such as the characters’ placements on the screen, framing, and cutting tempo.

Combining these points, the way a character speaks must be counted as one of the tools of telecinema characterisation. Therefore, the linguistic variation in telecinema should be examined with the same dedication as other telecinematic tools.

To help guide such examinations, I propose classifying speech in telecinema into three categories based on the effect or motivation behind a character’s speech in individual scenes and/or in the narrative overall. The three categories are:

The actor’s influence, such as their own speech variety or linguistic limitations
A realistic correlation between variety and informational background
Subtle transmission of character traits, plot functions, or other non-factual information.

The actor’s influence

Since linguistic variation occurs naturally in the real world and telecinema characters are played by real people, speech in telecinema would also be varied if each actor spoke with their own voice and dialect – their vernacular. However, even a limited data set quickly reveals that linguistic variation in telecinema cannot be fully attributed to the naturally occurring linguistic variation among actors: In the 2021 movie Black Widow, multiple key characters speak with a Russian English dialect, but none of the involved actors are Russian or speak with a Russian English dialect (fig. 5). Instead, they are all native speakers of English, either from the UK or the US, without any of the markers of Russian English that are used for their characters. This is a clear example of when a character’s speech is not motivated by the actor’s own speech variety.

However, in other instances, the voice of a character may, to a larger degree, be influenced by the vernacular of the actor. In animation movies, the physical appearance of the actor does not appear on screen; their contribution is to provide a voice. In many cases, the actors alter their voices for the characters, but sometimes they simply lend their voices to characters. This is the case for Josh Gad with Olaf in Frozen whose speech is a near replica of Gad’s own vernacular (fig. 6).

Even when there are differences between the voice of a character and the voice of an actor, some aspects of the character’s voice may still be influenced, and perhaps even motivated, by the actor’s vernacular. In Black Widow, Florence Pugh’s character, Yelena, may speak English with a Russian accent while Pugh herself speaks with a native English accent, but the voice qualities are largely identical: A little hoarse and raspy, not stereotypically feminine. In contrast, Scarlett Johansson’s character, Natasha, has the same American accent as Johansson, while her voice quality is breathier than Johansson’s.

Traditionally, breathy female voices have been recognised as “sexy” voices (Hejná and Eaton 2025), making it an obvious choice for a female superhero introduced in the 2010s who uses her attractiveness as a weapon. But when a new female superhero who is not created for the male gaze, such as Yelena, is introduced in the 2020s, having a stereotypically “sexy” voice is not a priority, allowing Pugh to use her own voice quality in the portrayal of Yelena.

Black Widow is also an example of how an actor’s vernacular may shine through even when the character’s speech is otherwise different to it: David Harbour’s character Alexei speaks Russian English for most of the movie, but Harbour’s vernacular American vowel pronunciation can be heard from time to time. This is most likely due to Harbour slipping up when trying to produce Russian English speech sounds. Meanwhile, David Harbour’s character Jim Hopper in Stranger Things shares his American dialect with Harbour himself, though Hopper mumbles more and generally has a gruffer voice.

Logical correlations between variety and background

Linguistic variation in telecinema can also be motivated by a wish for realism, that is, wanting the voice and the background of a given character to be in realistic correlation. For telecinema set more or less in the real world (such as Heated Rivalry, Die Hard, and The Holiday, or even Iron Man and Superman), this means having Russian characters speak English with a Russian accent or letting scientists have a bigger and more specific vocabulary when discussing their field of study.

In general, the claim ‘linguistic variation can be motivated by a desire for a realistic correlation between background and voice’ refers to an intradiegetic correlation imitating that of real life, so that characters are regarded as if real and their dialects evaluated through considerations of ‘if this character was a real-life person with this upbringing and general background, how would they speak?’. Thus, ‘realistic’ refers to an attempt to make the narrative world and its inhabitants resemble real life.

Similarly, in cases where the telecinema universe significantly diverges from the real world (for instance, because of intergalactic life-forms in Star Wars, fantasy worlds such as Lord of the Rings and Game of Thrones, or animals producing speech as in Zootopia (see Kjeldgaard-Christiansen (2024) for an exploration of how dubbing without attention to the original distribution of voices can diminish a film’s message)), linguistic variation may still be motivated by a desire for a realistic or logical use of voices and dialects. In these kinds of telecinema universes, the intradiegetic language logic of the story universe needs to be considered: Is there a correlation between certain linguistic features and certain regions? Are there reasonable similarities between parents and offspring? Does a potential non-English native language phonetically, syntactically, and/or lexically influence the English productions in a way that makes linguistic sense? The inner logic of language- and dialect-use needs to be considered and judged as being either intradiegetically sound (and thus ‘realistic’) or as intradiegetically conflicting (and thus ‘unrealistic’).

An example of a telecinema universe that diverges significantly from the real world, but whose linguistic variation still seems to be at least partly influenced by a desire for an internal logic between voice and character background, is Game of Thrones. The main place of action in Game of Thrones, Westeros, is somewhat reminiscent of Great Britain, with the North/South divide being more distinct than the East/West divide. Using primarily British accents for its characters, there is a certain logical distribution pattern such that characters from the South primarily use southern or standard dialects while characters with a strong relation to the North more frequently use northern English dialects (fig. 7). Even so, linguistic stereotypes have likely played a role as well: High-prestige characters, such as the Lannisters, always use standard and non-stigmatised dialects, while the speech of less fortunate, hard-working, and down-to-earth characters, such as Ned Stark and Samwell Tarly, contain features typical of non-standard Northern dialects (Lien 2016).

However, the most striking examples of linguistic variation being motivated by a desire for a realistic correlation between character background and voice can be found in telecinema that features people from the real world, particularly historically recent persons with known verbal mannerisms. It can be observed in the television show The Crown (Morgan 2016) that follows Elizabeth II journey to become queen of the United Kingdom and her subsequent reign (fig. 8): Virtually all actors are trying to match their speech productions to the recorded voice of the real person they are portraying. This is also the case in biopics such as Bohemian Rhapsody where Rami Malek plays Queen’s front figure Freddie Mercury, making sure to imitate his verbal mannerisms, not to mention matching the timing of concert scenes to recordings of the actual concerts. In such instances, linguistic productions (as well as visual appearances) need to adhere very strictly to real-life voices.

Transmission of Non-Factual Information

The third category when examining linguistic variation in telecinema is for speech that lets the audience infer more about the character than merely where they grew up, their profession, or their native language. This is where the indexical nature of speech is particularly important, to the degree that producers and actors may not even be consciously aware of the indexes that they themselves rely on when designing the voice for a character. Since telecinema characters, unlike real people, are fictional, their verbal patterns are not determined by their personal linguistic history. Instead, when we truly look behind the curtain, their linguistic productions have been chosen.

The clearest example of how important the voice is for creating a character that can be decoded by the audience is when a character changes personality or ‘contains’ multiple personalities: Consider the vocal difference between the Green Goblin and Norman Osborn in Spider-Man: No Way Home. The Green Goblin’s voice is tense, angry, and generally rather deep, but at times also marked by almost scary switches to a higher pitch. Meanwhile, Norman Osborn still speaks with a rather deep voice, but all the tenseness and anger have disappeared, leaving only a tired and scared old man. Likewise, characters who have fooled their fellow characters and the audience into believing that they are good guys, only to suddenly be revealed as the villain, are often marked by changes to their voice quality and pitch when the reveal happens. Among others, this is the case for Henry Creel in Stranger Things and Dawn Bellwether in Zootopia.

Once you start giving the telecinematic speech the same attention that has traditionally been given to, for example, genre tropes, camera settings, and characters’ placements on the screen, patterns emerge. From these patterns, the underlying effect and indexes of certain speech features can be deduced. So far, research suggests that, in American and British action telecinema, Russian characters who provide comic relief omit far more articles than serious Russian action characters (fig. 9) (Knudsen 2024), foreign dialects, in general, are mostly used by villains or comic relief characters (Minutella 2020; Lippi-Green 2012), and the pragmatic marker like (as in “I always sort of think, like, what’s the worst that could happen?”) is primarily used by young female characters, such as Cordelia and Dawn from Buffy the Vampire Slayer, evoking associations to the “Valley Girl” persona (Reichelt 2018).

Examining voices in telecinema requires careful observation of the narrative, attention to language use both in and outside telecinema in general, and an awareness of attitudes towards speakers, since indexes are formed and changed through language use and exposure. Importantly, such examinations also benefit immensely from scrutiny of how the characters are portrayed visually, their intradiegetic relations, and the overarching ideological base of the film, as these things may be a part of the indexical field or otherwise reinforce the assumptions associated with the linguistic indexes. Within this web of related meanings and interactions, attentive audience members with substantial telecinema consumption will be able to unravel the subtle ways their perception of a character is being guided by the way they speak.

Facts

Films

Black Panther (2018), dir. Ryan Coogler
Black Widow (2021), dir. Cate Shortland
Bohemian Rhapsody (2018), dir. Bryan Singer
Brave (2012), dir. Mark Andrews and Brenda Chapman
Buffy the Vampire Slayer (1997-2003), created by Joss Whedon
Die Hard (1988), dir. John McTiernan
Friends (1994-2004), created by David Crane and Marta Kauffman
Frozen (2013), dir. Chris Buck and Jennifer Lee
Game of Thrones (2011-2019) created by David Benioff and D. B. Weiss
Harry Potter and the Order of the Phoenix (2007), dir. David Yates
Heated Rivalry (2025), created by Jacob Tierney
Iron Man (2008), dir. Jon Favreau
Spider-Man: No Way Home (2021), dir. Jon Watts
Star Wars: Episode IV – A New Hope (1977), dir. George Lucas
Stranger Things (2016-2025), created by Matt Duffer and Ross Duffer
Superman (2025), dir. James Gunn
The Crown (2016-2023), created by Peter Morgan
The Holiday (2006), dir. Nancy Meyers
The Lord of the Rings: The Fellowship of the Ring (2001), dir. Peter Jackson
Zootopia (2016), dir. Byron Howard, Rich Moore, and Jared Bush

Literature

Attiah, Karen. 2020. “Why Chadwick Boseman’s fight for African accents in ‘Black Panther’ was so important.” The Washington Post.
Dixon, John A., Berenice Mahoney, and Roger Cocks. 2002. “Accents of Guilt?: Effects of Regional Accent, Race, and Crime Type on Attributions of Guilt.” Journal of language and social psychology 21 (2): 162-168.
Eckert, Penelope. 2008. “Variation and the indexical field.” Journal of Sociolinguistics 12 (4): 453-476.
Hejná, Míša, and Mark Eaton. 2025. ““It’s chiefly your eyes I think, and that throb you get in your voice”: The place of creaky voice in the soundscape of attractive female voices in twentieth and twenty-first century American cinematography.” In Creak: Theories and Practices of Pulse Phonation, edited by Francesco Venturi. United Kingdom: Jenny Stanford Publishing.
Johnstone, Barbara. 2016. “Enregisterment: How linguistic items become linked with ways of speaking.” Language and Linguistics Compass 10 (11): 632-643.
Kertesz, Ajna F., Joseph Alvarez, Maya Afraymovich, and Jessica Sullivan. 2021. “The role of accent and speaker certainty in children’s selective trust.” Cognitive development 60 (101114): 1-8.
Kjeldgaard-Christiansen, Jens. 2024. “Lost in Standardization: How the Danish Dubbing of Zootopia Diminishes the Film’s Message.” 16:9 filmtidsskrift.
Knudsen, Freja Hovgaard. 2024. “The good, the bad, and the funny Russian: linguistic variation in telecinema.” In Proceedings of the 2023 Aarhus International Conference on Voice Studies, edited by Jens Kjeldgaard-Christiansen, Mark Eaton, Mathias Clasen, Míša Hejná, Oliver Niebuhr and Zachary Christoper Boyd, 48-55. Warsaw, Poland: Sciendo.
Lev-Ari, Shiri, and Boaz Keysar. 2010. “Why don’t we believe non-native speakers? The influence of accent on credibility.” Journal of Experimental Social Psychology 46 (6): 1093-1096.
Levon, Erez, and Yang Ye. 2020. “Language, indexicality and gender ideologies: contextual effects on the perceived credibility of women.” Gender and language 14 (2): 123-151.
Lien, Yngvild Audestad. 2016. “Game of Thrones: A game of accents?: A sociolinguistic study of the representation of accents in HBO’s television series – Master’s Thesis.” Norges teknisk-naturvitenskapelige universitet.
Lippi-Green, Rosina. 2012. English with an accent: language, ideology, and discrimination in the United States. 2nd ed. Abingdon, Oxon: Routledge.
Minutella, Vincenza. 2020. “Linguistic Variation in Animated Films from 2001 to 2017.” In (Re)Creating Language Identities in Animated Films: Dubbing linguistic variation, In Palgrave Studies in Translating and Interpreting, 123-216. Switzerland: Springer International Publishing AG.
Piazza, Roberta, Monika Bednarek, and Fabio Rossi. 2011. “Introduction.” In Telecinematic Discourse : Approaches to the language of films and television series, edited by Roberta Piazza, Monika Bednarek and Fabio Rossi, 1-17. Amsterdam, The Netherlands: John Benjamins Publishing Company.
Reichelt, Susan. 2018. “The sociolinguistic construction of character diversity in fictional television series.” Doctor of Philosophy, Centre for Language and Communication Research & School of English, Communication and Philosophy, Cardiff University.
Silverstein, Michael. 2003. “Indexical order and the dialectics of sociolinguistic life.” Language & Communication 23 (3): 193-229.
Stein, Simon David. 2023. “Space Rednecks, Robot Butlers, and Feline Foreigners: Language Attitudes Toward Varieties of English in Videogames.” Games and Culture 18 (8): 1043-1070.

Emner:

Forår 2026

Om skribenten:

Freja Hovgaard Knudsen

Freja Hovgaard Knudsen (f. 1997) er cand.mag. i engelsk og matematik fra Aarhus Universitet. Hun har i de seneste år undersøgt stemmer i film og deres betydning for, hvordan en karakter bliver opfattet af seerne.