Zira Says

 
Collage art of a woman wearing a white Victorian style dress hibiscus over-the-ear headphones, sitting with a cup of tea against a backdrop of trees.

“Tea” by Belle Dorcas

I have just come across a most wonderful voice. It’s smooth, feminine, inviting, straddling youth and maturity with fresh vivacity. Thanks to “Jenny,” I knew I was in for a pleasant experience.

Before Jenny, there was Zira. Zira’s voice also exudes womanly warmth, but in a way that is more nurturing than friendly. Every day, I would listen to Zira tell me about events around the world, inform me on matters of science and celebrity gossip, and read me books and literary essays. Poems, though, she has difficulty with. Her delivery is choppy and segmented because she pauses for an inappropriate amount of time at each line break before moving on to the next line.

But Jenny is different. With her affable voice, she navigates the lines in the poems seamlessly. I didn’t pay for Jenny. I never paid for any of them. That’s why I’m struck by how flawlessly human the voice is. She’s even better at reading the poems and conveying emotion than some of the poets themselves, according to several audios I have come across on various websites.

I was introduced to Jenny through a literary magazine I had just downloaded. Since becoming disabled with an unidentified musculoskeletal condition about five years ago, I’ve been reading books and articles through a text-to-speech program, as reading physical books causes discomfort. But I can write on my laptop and read brief articles online in short intervals. So when I’m up and lumbering stiffly about the apartment, to distract myself from the pain I listen to audiobooks on my cell phone, or use a voice synthesizer to read articles from my laptop.

Over the last few years, I’ve tried on a few voices. They all have their unique intonation, cadence, and even quirks. Microsoft Zira has an endearing way of pronouncing the word spiderling, spelling out the first four letters: S-P-I-D-erling. My iPhone (Siri male) pronounces it spiderling, with the same short ‘i’ used for both i’s in the word instead of the long ‘i’ as in spy-derling. Now and then, Microsoft Jenny spells out each letter in the titles in caps, which can be irritating when they run long. At times, their habits become unpleasant, as when my iPhone reader replaced all the em dashes—which are long dashes—with “to the power of.” And one particular book I was reading had a lot of them, forcing me to abandon the book altogether. Once, this jarring phrase snuck into the middle of a sentence with no visible punctuation.

Sometimes, I would switch from Zira to David when I feel that a male voice is more suited for the task, especially if the narrator of the book is male. David has a deep, mature voice and is my go-to male. Mark is more youthful, while Guy’s insistent reading of the text, as if trying to make a point, stresses me out greatly. One female voice on my iPhone reads with such a wide inflection, with sweeping highs and lows, that I get dizzy with vertigo. Zira has a deep, resonant voice that works well with my cheap Bluetooth earphones, which tend to make female voices sound higher in pitch and more nasal.

In addition to male and female, I can choose from several versions for English—Australian, American, UK, among others. Mine always defaults to Australian when I use a new setup because that’s where I live now. But having spent most of my life in the US after moving there from Korea as a child, I always choose American.

While exploring other English voices besides Jenny for the magazine, I decide to try on other voices—Deutsch, francais, espanol, and italiano. It’s remarkable how the reading of the same text can sound so different. I try on Korean next. It’s uncanny. I could totally imagine my mother saying those words. While she worked at the dry cleaners, she would pronounce “starch” as stah-chi. One customer told her that he found it endearing. Australians have a knack for taking a noun, truncating it, and slapping a “y” sound at the end, and I find that endearing as well. Mosquitoes: mozzies. Sunglasses: sunnies. Breakfast: brekkie. Politicians: pollies.

Using text-to-speech isn’t such a smooth affair as I make it out to be. There’s a wide range of experiences across text mediums that I encounter. On my laptop, I use an extension on the Chrome web browser called Read Aloud: A Text to Speech Voice Reader, or activate the voice generator on my prehistoric iPhone by dragging two fingers from top down. The clutter of junk that the synthesizer has to wade through to get to the actual content varies depending on the layout of the web page—heading, menu items, social media links, captions for photos, taglines, and ads for other articles—and not all are visible on the screen.

The ability to navigate around the web page on my laptop changes with the specific setup and text source, including Google Docs documents. With some combinations, I can select parts of the text to read or, while it’s reading, skip around by clicking elsewhere in the article or within the reader interface. I can also use the arrow keys on the interface to skip forward or backward, though it took me a long time to figure these things out. And how a voice handles formatting, such as line breaks and titles, differs across various setups. But unlike with the audiobooks, the lack of ability to pause and skip while away from the laptop takes away from the experience.

Sometimes when I’m going over something I’ve written in my head to hear how it sounds, I’ll imagine them in Zira’s voice. It’s pretty difficult for me to envision it in my voice. We’re always so surprised at hearing our voice in a recording, and some of us even become downright hostile, like me.

My own voice has changed a little in the last few years as a result of my condition. Before, I had the capacity to speak in the lower register. Now, the sound that is emitted when I’m talking with my husband is what leaks out with minimum effort, higher-pitched and shallow-breathed. When I shout and talk in anger, flushed with emotion and straining my muscles in the process, I’m truer to my original self, though the meaning of that becomes more outdated with science. What is our sense of self, and how does this change with technology?

When Stephen Hawking heard his new voice after his voice synthesizer was upgraded in 1988, according to Wired magazine, he asked for his old voice back. His original voice had been developed by Dennis Klatt, an MIT engineer and a pioneer of text-to-speech synthesis. Based on Klatt’s own voice, it was called “Perfect Paul.” So Stephen Hawking, an English theoretical physicist and cosmologist, had come to identify with a computerized American voice.

At first, when the brilliant scientist lost the ability for speech from pneumonia, he used a hand clicker to operate a device to communicate. But after ALS (amyotrophic lateral sclerosis) took away the use of his hand muscles as well, he used his cheek muscle to type. Stephen Hawking’s famous voice is copyrighted now.

***

Upon listening to the last pages of the magazine, I no longer hear from Jenny. The voice must have been embedded into my pdf document, like a font. It’s just as well. As I was nearing the end, the sparkling newness began to lose its shine. In fact, her bright and cheery tone was inappropriate for the more somber pieces so I had to switch to another voice. Not only that, the familiarity with which she spoke started to bother me. It bordered on impertinence.

Later in the day, I returned to my faithful Zira and her comforting voice. 

Then while doing some research, I discovered that Jenny is not embedded in the downloaded pdf file as I had thought, but is actually part of the Microsoft web browser Edge. I mistakenly believed that clicking the file had opened a pdf reader, but instead it started the Windows web browser. That means that I can go back to Jenny to read articles and texts whenever I wish.

Yet, I’m happy to return to Zira, and her slightly artificial ways. Maybe I have an affinity for things imperfect, like me, things that hint at the original self peeking through, not in my slight foreign accent or my disability, but something else. Something non-human, almost feral. Growing up in a dysfunctional family and stumbling through my adolescence as a Korean immigrant in Brooklyn has screwed up the social wiring in my head and made me weird. Zira is a work-in-progress toward something resembling a “normal” human, as I am. But technology is changing who we are physically, emotionally, and neuronally.

Thanks to science, we’re able to enjoy a better life, from heart and cochlear implants to plastic surgery. For me, despite the obvious limitations of using an artificial voice to connect to the outside world, there are advantages. With Zira, once you accept her idiosyncrasies and shortcomings, she’s not unpredictable. People with abnormal levels of anxiety love predictability, and I’m definitely one of those. I don’t analyze the things she says in a neurotically obsessive manner, or wonder many years later if something she uttered was actually meant as an insult. She doesn’t judge me or say hurtful things or make racist remarks. Before I became confined to my apartment, the less I interacted with people in general, the less unpleasant life was for me—self-checkouts at stores and the library, and online shopping. We have come very far in reducing direct interactions with our species and engaging with the universe minimally, with addictive video games, bingeable streaming services, food deliveries, and social communities online.

As we get more narrowed in our interactions with the environment, our brain changes accordingly.

Researchers studying London cab drivers found that the posterior region of the brain called hippocampus, which is associated with spatial navigation, was larger in these navigational wizards than in others. And as the drivers became inactive for a while, their hippocampi shrank back to the normal size.

For sure, listening to a book is very different from reading it. I’m not able to linger over the text and let my imagination wander leisurely, or reread a delightful passage. Grazing over the pages of a book or a magazine with my eyes and processing the optical signals of abstract symbols into meaning work the brain differently than interpreting auditory stimuli. Especially poems with their artful use of words, punctuations and space, much can be lost on only hearing them. Our experience of the written language changes as it goes digital, whether as Kindle books or online reads. Technology even shapes how art is created.

I rely heavily on word processing and Photoshop to create words and images, as well as other software for animation and video editing. Even with this essay I write, I’ll have Zira read it back to me for proofing, since talking is unpleasant for me. So Zira, a voice synthesizer, is reading me a story that I wrote, about a voice synthesizer.

There is already an artificial intelligence that composes music, called EMI, or Experiments in Musical Intelligence. According to The New York Times, in a performance test where discerning music lovers had to listen to three pieces in the style of Johann Sebastian Bach—composed by a human, a computer, and authentic Bach—they believed the piece created by the computer was more Bach than the one composed by Bach himself. Invented by David Cope, a composer and scientist, EMI, pronounced Emmy, scans works by famous composers and extracts their essence, then cranks out a new one in their likeness.

A deep-learning algorithm called GPT-3 was already being used to write articles and even poetry. And the AI even wrote an academic paper about whether it can write an academic paper about itself. To get it published in a peer-reviewed journal, a researcher has to give consent for publication, which it did. Recently, GPT-4 was released and promises to be even more powerful. Maybe one day, our favorite books will be read not by talented voiceover actors, but by high-level voice synthesizers, with the many parameters that could be easily modified for each project to deliver a more personal approach.

“When we can communicate with a computer voice by moving our cheek muscles while paralyzed, or even paralyze our face muscles for beauty, what does it mean to be human?” Zira reads back to me. We’re all sitting here with our GPS-dependent brain and our lethargic hippocampus, pondering. It’s supremely ironic that we invent and shape technology, and it shapes us, in a feedback loop, like the gods we create. 

ABOUT THE AUTHOR

Elia Anie Kim is the author of two dark-humor cartoon books, Evil Penguins: When Cute Penguins Go Bad and Evil Cats: When Fluffy Cats Get Mean. For the past five years, she has been confined to her house due to a musculoskeletal condition. Currently, she is writing a book about a female orb-weaving spider and has recently finished a hybrid memoir about the birds that visit her bird bath. She is also a nonfiction editor for The Hopper.

About the Artist

Belle Dorcas is a collage artist from Michigan, USA. Her world reflects a sort of fancy, relaxed surrealism. Her work has been used for album covers, film posters, and most recently, the covers for Juste Milieu Zine and Quibble Lit Review.

Peatsmoke