Critical media response to Sounds and the Brain, a Sawyer Seminar event with Vijay Iyer and Aniruddh Patel (March 11, 2014)

The talks by Iyer and Patel investigate recent cognitive scientific research on audio/visual perception suggesting that “mental processes are strongly influenced by the form and physical actions of the body” (Patel 2).  The neurological basis for these paradigms, these authors explain, can be explained using a cognitive model called action understanding, which proposes “the existence of a ‘mirroring mechanism’ […] in which the perception of certain familiar actions in another body can trigger the activation of similar motor programs in the observer’s brain” (Iyer 7).  Iyer explores the intriguing “notion that this process could occur through sound – that we may undergo a kind of empathetic action understanding when we merely hear someone do something, without seeing that person do it – [which] offers quite radical implications for how we listen to music (and especially what happens when we hear music without seeing)” (Iyer 8).  Music can “recreate[] for us the sensation and emotional thrill of people in our midst” (Iyer 12), an effect Iyer links to the “foundations of rhythm perception” that Patel’s research explores, “since the sound of a human generated rhythm… can activate an analogous body motion in a listener.” (Iyer 13).  Accordingly, when we “hear bodies but do not see them, we instead fantasize about them” (Iyer 4).

This process can lead to “some kind of empathy for the embodiment of the performer, or some kind of understanding of the effortfulness of real-time performance” (Iyer 12).  Iyer speculates that acoustically-inspired empathy might even be culturally constructive, for it could perhaps lead to improved cross-cultural understanding between performers and listeners from different racial backgrounds who, if they saw each other, might be antagonistic:

Is it possible that music-heard-and-not-seen[…] might have overridden the visual, racialized, culturally imposed constraints on empathy?  Could the essential humanity of African Americans be newly revealed for white American listeners in the twentieth century though the disembodied circulation of ‘race records,’ by activating in these listeners a neural ‘understanding’ of the actions of African American performers? […] Could a new kind of cross-racial empathy, or at least a new quasi-utopic racial imaginary, have been inaugurated through the introduction and sudden ubiquity of recorded sound? (Iyer 8).

Still, Iyer and Patel acknowledge that the actions that trigger empathetic understanding through sound should be understood not simply as “environmental events, but [as] culturally embedded modes of bodily movement and expression” (Iyer 2) – and for that reason, when sounds are paired with images, cognitive processes become more complicated.  Iyer points to recent scientific research that “actually suggests that the [visual] perception of racialized difference may inhibit or constrain empathy” (Iyer 7)  He describes a recent study by Gustell & Inzlicht (2010):

It was observed that test subjects (all whites from North America) displayed a greater mirror neuron-type response to images of other whites than they did to non-whites.  In some cases, whites displayed practically zero empathy-like mental simulation of actions of non-whites.  This finding has been extended to more fluid ‘in-group/out-group’ affiliations, suggesting a profound neuroplasticity in this ‘mirroring’ mechanism associated with empathy.  (Iyer 7).

In this case, he argues, “there is no such thing as ‘clean’ mirroring: there is perhaps always some distortion of the metaphorical mirror, since the problematic visual ‘perception’ of racial difference can seemingly interfere with action understanding” (Iyer 8).  Visual information, in other words, complicates the neurological processes of action understanding, and is thus threatening to “empathy.”

Our project will approach this last suggestion from a different critical angle than Iyer and Patel’s: film studies.  We have collected five examples – four films and, as a “coda,” one audio clip – which illustrate just how complicated the processes of audio/visual perception can be.  One guide for this project has been Thomas Elsaesser’s Film Theory: An Introduction Through the Senses, which explores the practicality of embodied cognition paradigms as a critical approach to film.  Elsaesser suggests that films can feature an “ironically highlighted interaction or discrepancy between body and voice, and then the re-connection that follow[s] it,” pointing towards an “epistemologically problematic connection between body and voice.”  This problematic connection threatens to sever “the ontological bond between a sound and its origin that appears so self-evident to us in ever-day life,” but which is “cancelled out and annihilated in the technological set-up of sound cinema.”  He suggests: “Thus, in modern film theory, despite the ‘turn’ to the body and to ‘embodied perception’, we need to be cautious not to presume that we have thus gained firmer ground than in the former days… we might in fact be walking on crushed bones and skulls, to invoke the landscape of Terminator 2.”  This should not be taken to mean that we ought to abandon the use of paradigms of embodied cognition paradigms altogether; rather, as Iyer himself suggests, we should avoid the application of too simplistic a model of “empathy” or action understanding when dealing with multisensory perception.

The scenes we have selected feature lip-syncing, post-dubbing, representations of ventriloquism, and so forth – all suggesting that Iyer is right when he argues against the existence of any as “clean mirroring.”  We hope this project will point towards constructive ways in which the work of Iyer and Patel can be supplemented by critical humanistic discourses.

The Jazz Singer dramatizes precisely the sort of acoustically-inspired empathy that Iyer describes, in which “music-heard-and-not-seen[…]overrid[es] the visual, racialized, culturally imposed constraints on empathy” (Iyer 8).  Al Jolson stars as Jakie Rabinowitz, a jazz singer who runs away from home after his conservative father (Warner Oland), a cantor at the local synagogue, beats him for singing popular songs at a beer garden.  He returns to New York ten years later as Jack Robin, a successful cabaret singer who has just been spotted by the musical theater star Mary Dale (May MacAvoy) and selected to star alongside her in the Broadway show April Follies.  He and his mother are tearfully reunited, but his father casts him out a second time: “I never want to see you again – you jazz singer!”  Shortly afterwards, on the night of his Broadway premiere, his father falls gravely ill and his mother (Eugenie Besserer) pleads with him to sing Kol Nidre at the Yom Kippur service in his father’s place.  Forced to choose between Broadway and the synagogue, his promising career and his religious tradition, he finally decides to sing Kol Nidre, and in this scene, his father hears him from his deathbed.  As his father speaks his final words – “Mama, we have our son again!” – Mary, listening in the next room, recognizes that Jackie can be true to his roots and a successful theatrical star at the same time: “a jazz singer – singing to his God.”

In this film, “visual perception of racialized difference,” to borrow Iyer’s phrase, leads to conflict.  This graphic portrayal of racial difference is what makes the film difficult to watch today, for practically every actor dresses up as a stereotyped caricature of another race: Al Jolson appears onstage in blackface, and Warner Oland – a Swedish-American actor who was also famous for his portrayal of Asian characters including Fu Manchu and Charlie Chan – dresses up as the father in a long beard, glasses, and rabbinic cap.  Cultural conflicts also generate most of The Jazz Singer’s plot: when Jackie’s father catches him in singing a bar outside of their Jewish community on the Lower East Side, he beats him furiously because he sees it as an affront to his cultural traditions; when Jackie’s mother comes to the theater to plead with him to sing for his father, she is unable to recognize him because he is in blackface, and she laments the loss of her son.

Jackie’s voice, however, has the power to unify these divisions and heal the scars left by cultural misunderstandings.  When his mother first hears him sing on stage, she finally understands that he is truly a jazz singer “in his heart”: “Here he belongs.  If God wanted him in His house, He would have kept him there.  He’s not my boy anymore – he belongs to the whole world now.”  When Jackie sings Kol Nidre, his father finally forgives him for all the sorrow he caused by running away, and Mary finally understands the depth of Jackie’s commitment to tradition.  In both scenes, these characters listen to Jackie from afar without seeing him: his mother stands backstage behind the curtain, while Mary and Jackie’s father both listen through the open windows of the synagogue next door to the Rabinowitz household.

This film suggests that the answer to Iyer’s provocative question may, in fact, be yes: “Is it possible that music-heard-and-not-seen[…] might have overridden the visual, racialized, culturally imposed constraints on empathy?”  The sight of race and culture leads to conflicts that only voice can resolve, for while the visual appearance of the body can lead to alienation and estrangement, the voice can communicate directly from the soul and transcend conflict by revealing the innermost truth of the identity of its possessor.  The Jazz Singer, of course, was also the first feature film with synchronized dialogue, and so one can only imagine the shock that audience-goers must have felt upon hearing a moving picture speak for the first time.  Jolson’s voice must have had as extraordinary an effect on the audience as it has on the characters on screen – for The Jazz Singer was one of the first films that afforded, through its use of brand-new sonic technologies, the pathway towards acoustically-inspired “empathy.”

The Great Gabbo appeared only two years after The Jazz Singer, and similarly constitutes a reflection on the revolutionary effect that the introduction of sound technology had on the development of film.  This film tells the story of a vaudeville ventriloquist and Broadway star named “The Great Gabbo” (Erich von Stroheim), who has skills that no other ventriloquist could rival: he is able to smoke, drink, and eat at the same time as he can make his dummy “Otto” sing.  Gabbo is so captivated with his abilities that he is driven into self-delusion and egomania, and is unable to return the love of his mistress and personal assistant Mary (Betty Compson).  By the end of the film, he has descended into madness and is only able to communicate his inner emotions through Otto.  Mary is unable to bear Gabbo any longer, and leaves him for another performer.

In this scene, Gabbo performs live before an audience.  Mary enters on stage and nervously drops the tray carrying his props; Gabbo (and Otto) glares at her angrily.  Gabbo’s signature act is, of course, impossible outside of the cinematic world – no ventriloquist is capable of drinking water while throwing his or her voice at the same time; Otto’s voice was clearly added in post-production.  Stroheim makes no attempt to hide this; pay close attention to his mouth, which throughout the entire film remains tightly shut whenever Otto is speaking.  This caused concern for contemporary reviewers: Mordaunt Hall, a critic for the New York Times, praised Stroheim for his “intensely strong performance” but lamented that he “might perhaps have imbued it with a little more imagination, for when he is supposed, and only supposed to make the dummy talk there is never a sign of movement in his throat.”  Still, Hall remarks, this is of little concern: for “the audible screen is particularly well suited to ventriloquism.”

Hall had a point.  Fifty years later, Rick Altman proposed that cinema is itself a form of ventriloquism:

The sound track is a ventriloquist who, by moving his dummy (the image) in time with the words he secretly speaks, creates the illusion that the words are produced by the dummy/image whereas in fact the dummy/image is actually created in order to disguise the source of the sound.  Far from being subservient to the image, the sound track uses the illusion of subservience to serve its own ends (Altman 67).

Altman’s interpretation of cinema presupposes that there are in fact two bodies attached to every sound – the one we are shown on screen that “pretends” to be its source, and the “real” one that remains concealed but which is actually responsible for what we hear.  For Iyer, fantasy also plays a role in the process of listening: when we hear bodies that are not present before us “we instead fantasize about them” (Iyer 4).  Does this fantasy that the audio track of a film necessarily inspires, however, always correspond to the images that the film presents us with on screen, or to the real sources of the sound?  In other words, do we fantasize about the “dummy” or the “ventriloquist?”

The previous scene from The Great Gabbo features a ventriloquist “speaking” through his dummy, even though the impression of this process was actually created entirely in post-production.  This clip features a similar situation in which an actress seems to “voice” another lip-syncing actress on stage, when in reality the whole scene is a series of a complex cinematic tricks.  Hollywood stars Don Lockwood (Gene Kelley) and Lina Lamont (Jean Hagen) are in the middle of shooting The Dueling Cavalier, when Vitaphone releases the first “talkie” –The Jazz Singer (see clip 1) – and scores a major hit.  The film industry is transformed overnight.  R.F. (Milllard Mitchell), the head of the studio producing Lockwood’s film, decides that the only way to compete at the box office will be to convert the film into a lavish musical: The Dancing Cavalier.  Lamont’s strong high-pitched New York accent is unsuitable for the film and threatens to derail the production, when Lockwood’s friend Cosmo Brown (Donald O’Connor) suggests that they dub the film with the more appropriate voice of Kathy Selden (Debbie Reynolds), the actress Lockwood has recently fallen in love with.  In this scene, following the film’s successful premiere, the opening-night audience calls for Lamont to sing the hit song “Singing in the Rain” live on stage; Lockwood, R.F., and Brown coerce Lamont into lip-syncing while Selden sings backstage.  Selden, like a ventriloquist, voices Lamont while keeping the motions of her own mouth concealed; Lamont, like a ventriloquist’s dummy, mouths the words that Selden voices for her.  In the middle of Lamont’s performance, however, Lockwood, R.F., and Brown raise the curtain to reveal that Selden is the “true” singer.  As both actresses run away humiliated, Lockwood calls after Selden “That’s the girl whose voice you heard and loved tonight, she’s the real star of the picture!”  The actress who possesses a voice but whose body is hidden from the screen is thus privileged as “authentic” – the true star – whereas the actress who possesses a body but mimes the voice of another is “inauthentic.”

Elsaesser explains how this clip presents a situation in which “body and voice no longer fit together, or rather: the scene restores their technical separation in the film-making process, usually hidden but now made palpable for the diegetic audience (the one watching the performance) as well as for the film audience (watching the film). Throughout the entire film, the relation between body and voice is fundamentally called into question, as this problematic relation is staged time and again, on several levels, primarily for comic effect.”  Singing in the Rain dramatizes how filmic techniques sever the bond between sound and origin, thereby threatening to undermine the fundamental processes of embodied cognition.  We are pulled in two different directions: towards an “empathetic” relationship by the sounds themselves – which, if we follow Iyer and Patel, invite neurological mirroring – while we are pulled away in an “anempathetic” direction by the presentation of a pantomiming body that we know is not the true originator of these same sounds.

There are, however, two extra twists to this scene, neither of which Elsaesser discusses.  First: Jean Hagen was actually the voice in this recording of “Singing in the Rain,” not Debbie Reynolds.  Second: the whole scene is post-synchronized, meaning that we are not actually hearing the real sounds that either of them was making during the filming.  Post-synchronization tricks the viewer through an “involuntary” process Michel Chion describes as “synchresis,” which can only be explained through cognitive paradigms:

[Synchresis is] a spontaneous and reflexive psychophysiological phenomenon that is universal and works because of the makeup of our nervous system, not from cultural conditioning.  Synchresis consists in perceiving the concomitance of a discrete sound event and a discrete visual event as a single phenomenon.  There is synchresis when the audio and visual events occur simultaneously, and concomitance alone is the necessary and sufficient condition for synchresis.  The impression created is involuntary; it attributes a common cause to sound and image, even if their nature and source are completely different and even if they have little or no relation to each other in reality.

This suggests that further studies in cognitive science might be able to point towards new paradigms which, added to those of Iyer and Patel, may help us better understand the mechanisms of embodied cognition in multisensory media.  A large number of involuntary neurological processes – not just neural mirroring – affect our film-viewing experience.

In this clip, we think we know what is going on, since the camera shows us behind the stage curtain to show us the “true” singer.  But in reality, the combination of two audio/visual tricks (lip-syncing and synchresis) has led us totally astray: we are tricked into attaching Selden’s voice to Lamont’s body, when the audio track actually features Lamont’s voice captured at an entirely separate moment from what we are shown on screen.  We think we are in on the joke, but the joke is actually on us.

This clip also features a situation in which the performers onstage pantomime to a pre-recorded track; although this example is not taken from a film, but from a manipulated excerpt of a live news broadcast of The First Inauguration of President Barack Obama as the 44th President of the United States.  This inauguration, held on 20 January 2009, was the first to feature an original composition for classical chamber ensemble.  John Williams was the composer selected for the commission; the resulting work, Air and Simple Gifts, was performed by Anthony McGill (clarinet), Itzhak Perlman (violin), Yo-Yo Ma (cello), and Gabriela Montero (piano).  Several days later, however, it was revealed that their performance had been pantomimed.  The weather conditions were too severe for the instruments, and the musicians decided not to use weather-resistant carbon fiber instruments because they would look too odd to most viewers and would “distract” from the solemnity of the event.  Instead, Perlman and Ma put soap on their bows so that they would not create friction with the string; the piano technician decoupled the keys from the hammers so that, when depressed, they made no sound.  The revelation, however, that a classical ensemble was faking their performance at such a serious historical moment exposed the entire ensemble to ridicule – just as when the curtain was lifted on Lina Lamont in Singing in the Rain.

Only eight days later, YouTube user “boilingsand” posted the video “Yo-Yo Shreds at the Inauguration with Perlman et al.”  This parody pretended to be a leak from a TV station that revealed the “real” sounds of the performers as picked up directly by the microphones, before the pre-recorded sounds were dubbed on top.  “Yo-Yo Shreds” was the latest example of a popular internet meme (“X Shreds”) in which filmed performances of famous musicians were carefully post-dubbed so that it seemed as if they were making ludicrous horrible sounds.  (For the history of this meme, click here:  The clip was a sensation.  Most viewers who were familiar with the meme caught on quickly to the joke: most tellingly, the instrumental tracks cut out when the camera is not focused on the instrument it represents.  For these viewers, the humor came from the absurd incongruity that resulted from the matching of terrible performances to skillful musicians.  The coordination between the fake “real” sounds and the performer’s gestures were remarkable; had it been a film of complete novices playing, it could have seemed real.  But not everyone caught on to the joke: some viewers who were familiar with the story of what had happened, and so knew that a “real” audio track could exist, thought that the audio was real.

“Yo-Yo Shreds” ironically highlights the discrepancy between body and voice by forcing the viewer to make absurd connections between the audio and visual tracks, testing the limits of audio-visual synchresis – and therefore questioning how “involuntary” synchresis actually is.  For if synchresis is truly a completely involuntary process – and did not depend at all on conscious cognition – would anyone have been able to catch the joke?

This final clip features the intro music to the hit reality-TV show RuPaul’s Drag Race, now in its seventh season, which features drag queens from across America competing for the title of “America’s Next Drag Superstar” and a cash prize of $100,000.  Each episode features a different challenge – sewing a brand-new “runway look,” celebrity impersonations, stand-up comedy, and more – after which one contestant is crowned winner and two are selected as the “bottom two.”  These contestants are forced to “lipsync for their lives” simultaneously in order to stay in the competition.  The loser of this final challenge leaves the show, and the remaining queens continue to the next episode.

Gender, performance, and identity are key themes of the show; all three themes are addressed in this short musical clip.   The music starts with an audiosample of RuPaul singing “RuPaul’s Drag Race” pitch-shifted up an octave, answered by a clip of RuPaul singing “start your engines!” in which the register remains unaltered.  At 0:08 RuPaul’s voice is electronically shifted down two octaves, as she sings the first syllable of her name – “Ru – Ru – Ru” – to land in a register one octave lower than her natural singing voice.  The process of digital manipulation is clearly and intentionally audible; we hear the voice as it is dragged up and down many octaves between different registers.

Each register, of course, has its own culturally determined associations with gender identity: a higher register is often associated with femininity, a lower register with masculinity.  Neither register dominates the audio track, and both have been digitally altered.  Since the process of manipulation is so clearly audible, we are discouraged from attaching RuPaul’s voice to either – for even her own “natural” singing voice, which interjects between the digitally manipulated cries of “RuPaul’s Drag Race,” is difficult to identify as “masculine” or “feminine.”  We are invited, therefore, to dissociate from traditional or historical cultural expectations about gender identity.  We are also encouraged by the transparent manipulation of RuPaul’s voice to pay attention to the processes by which drag queens create the illusion of another gender identity: indeed, a good portion of each episode shows the drag queens as they put on makeup, fit their dresses, and prepare themselves for dramatic entrances as another character.  If Iyer is correct, and the processes of embodied cognition invite us to “fantasize” about the bodies making the sounds we are listening to, what exactly are we supposed to picture when these very same sounds are also challenging us to question our own culturally- and historically-determined expectations about bodies and identity?

producers/editors: Michael Kushell and Daniel Walden