« Reorganizing This Movable Type Blog: Mistersquid Refreshed | Main | Full Speed Reverse »

Some Thoughts Regarding the Creation of Text-to-Speech Audiobooks Using Consumer-Grade Software, or One Possible Outcome for Academic Professionalism as a Result of Fully Digitzed Text

Codex Books, Audiobooks, and Assistive Devices: A Tale of Near Misses

The last book I checked out of Alden Library while still a faculty at Ohio University was Clay Shirky’s 2008 Here Comes Everybody: The Power of Organizing without Organizations. I had planned on scanning the book and turning it into an OCRed PDF file but I ran out of time, had to tear down and pack my digitizing rig, a large-format flatbed scanner and the dual G5 it was connected to. When Pam and I made our way back to California from Ohio in July, I was without a workstation class computer for over a month, using niblet and numbskull (respectively my smart phone and trusty MacBook Air) as a combined mobile computing solution. Good enough.

Two days ago, I was listening to an audiobook version of Clay Shirky’s book stored on eggplant, a black 160GB classic iPod.

I’ve never really been that into audiobooks. First, I am was a scholar of paper print texts, primarily texts classifiable as postmodern American literature and post-psychoanalytic philosophy. I did lots of non-print media, such as film and digital print, but my primary training and identity was with codex books, the old-skool kind. In fact, until Pam and I drove from San Francisco, California, to Athens, Ohio, at the end of 2008 and early 2009, I had never listened to an audiobook. The first audiobook I ever listened to was Cormac McCarthy’s The Road narrated by Tom Stechschulte. The performance and story were gripping, and we hooted over our alternating renditions of Stechschulte’s version of “the man” and “the boy.” Still not hooked.

On Monday, 14 September, I walked to the main branch of the SFPL and applied and received a library card. Shirky’s book, which was on the shelf in the main branch the Friday before (I think), was out. The nearest available copy was at the Golden Gate Valley Branch, a 1.8 mile walk.

On the morning of Tuesday, 15 September, I discovered there existed Federal regulations which do not allow insurers to pay for assistive devices if those devices had functionality beyond what was required for assistive services. Besides the regrettable and unsurprising waste forced by such regulations, I was on fire to find out more about the $150 text-to-speech software Ms. Lynn used on her iPhone to convert text to speech.

My search didn’t turn up the iPhone software, but I did see a number of text-to-speech softwares for the desktop, most of which were clearly garbage. I remembered a piece of software that had caught my interest several months back, a program called Textcast. I had tried the software out, but considered it to be too narrow. I couldn’t imagine selecting text snippets from various web pages, sending those snippets to Textcast, and synchronizing them with iTunes and then my iPod. Besides that the process involved four distinct steps from selection to iPod, many web-accessible articles span multiple pages. Getting those articles into a single podcast on an iPod would be too much work for too little payoff. I filed Textcast as interesting-idea-but-not-worth-the-effort.

The Pitfalls of Professional Narcissism

So two days ago on my walk to the main branch of the SFPL, I was listening to a self-generated audiobook version of Shirky’s Here Comes Everybody: The Power of Organizing without Organizations stored on eggplant. Shirky is a canny analyst of new media, one who understands the effect new media have on incumbents, especially professionals.

In Everyone is a Media Outlet, Shirky notes that newspaper publishers didn’t see websites such as Craigslist and eBay as threats to their business, despite that anyone with eyes in 1998 could see that both sites (and others) were eating the newspapers’ lunch.

Shirky notes There was a kind of narcissistic bias in the [newspaper] profession; the only threats they tended to take seriously were from other professional media outlets, whether newspapers, TV, or radio stations (56). Incumbents were blind to the threat posed by the Internet to their business, even though that threat had a direct effect on their bottom line. The threat posed by the Internet was not just that web audiences want their news for free, though that would have an affect on subscriber revenue. The existence of a site like eBay combined with a moratorium on the taxation of Internet commerce that crossed state lines meant that eBay, Amazon, Apple, and Craigslist could give the traditional customer base of classified ads more for their money—more reach, more buyers, more flexibility—in addition to the simple fact of more money (no taxes).

So it goes.

Shirky also traces the effect of the printing press on scribal culture, identifying the instructive hypocrisy of Johannes Trithemius’s decision to publish De Laude Scriptorum, which praises the kind of lives that scribes lead, by means of movable type and a printing press rather than by employing a group of scribes. The Abott of Sponheim (Trithemius) provided the perfect example of why scribal culture was lost in the age of the printing press. Faced with the urgent need to save scribal culture, the Abott of Sponheim could think of nothing better than to accelerate the obsolecence of scribal culture with a eulogy on the virtues of scribal life. Today we would call this scribal FAIL.

Shirky put it this way.

Scribes existed to increase the spread of the written word, but when a better, nonscribal way of accomplishing the same task came along, the Abbot of Sponheim stepped in to argue that preserving the scribes’ way of life was more important than fulfilling their mission by nonscribal means.

Here Comes Everybody: The Power of Organizing without Organizations (69)

To my eyes, this closely resembles arguments about the value of paper print books to the teaching of literature in a university setting.

Using paper print books, sitting with them in hand, cradling them in the bathtub, dragging them through airports, demanding students go the library and handle “real” books—all these seem the restrictions of habit rather than the realization that the artifacts which will most forcefully and, henceforth, most lastingly shape our culture do not come between the covers of a paper print book.

I’m not arguing that literature is no longer as culturally relevant in the United States as it was in the 1950s, though I do believe there is a strong argument to be made along those lines. Rather, I am arguing that the lack of interest in new media on the part of establshed mainstream scholars of literature is the sure sign that literary studies is in a death spiral. Unquestionably, paper print culture is irreplaceable and worth preserving as what historically has been the primary medium for the transmission of literature and literary culture. However, this position should not be confused with the nostalgic assertion that you can’t curl up in bed with a computer screen because that line of thinking misses the fact that what was most important about literary and cultural studies was that for half a century literature was the most established of the arts fostering self-identity. This is why literary studies was so important to identity politics in the late twentieth century, and it is why the study of film, television, web sites, video games, and text messaging will be the crucial objects of study inside of a decade.

Continued Cultural Relevance? There’s an App for That.

So two days ago I read this article about text-to-speech software and what had been a long-passed interest about some consumer-grade software that can batch produce digital audio from digital text was reawakened. I scoured my RSS feed reader, my blog, and the Internet for that same software.

And though I have not been a listener of audiobooks, two days ago I did find myself listening to a personally-generated copy of Shirky’s Here Comes Everybody while walking to the library to return a paper print book.

Despite the errors and infelicities of the machine-generated audiobook, its undeniably robotic cadence and mispronunciations, I found myself absolutely absorbed by the digital audiobook while I walked to and from the library. In addition to listening to the audiobook stored on egglant, I had a PDF version of the book on niblet and whenever “Alex” read something fascinating, I opened the PDF and searched for the passage. The second half of my walk I stepped to side of the sidewalk so I could stop and take more detailed notes using niblet’s Notes app.

Once I downloaded Textcast, I built a service to facilitate the process of sending digital text to Textcast for batch processing. That service is Text to Textcast and in order to use it you will need

The series of three brief tutorials only slightly overlap and it might worth your while to make you way through them all. These tutorials are meant for informational purposes only. I do not advocate violating copyright and encourage everyone to stay within the bounds of Fair Use concerning the transformation of copyrighted material for one’s personal use.

The audiobook I created was sourced from a PDF that itself was sourced from a paper print book.1 The PDF souce contains numerous errors but the text-to-speech framework are suprisingly robust even when given error-riddled text. Which is to say that between the inaccuracies of the text-to-speech software frameworks and flexibility of the algorithms which turn digital text into phonemes, the comprehensibility of the resulting audio files is good enough.

Compare the following sample text sent to Textcast

It's tempting to regard the bloggers writing about 'frent Lott or the people taking pictures ofthe Indian Ocean tsunami as a new crop of journalists. Tbe label has an obvious concep- tual appeal. Tbe problem, however, is that mass professional- ization is an oxymoron, since a professional class implies a specialized function. minimum tests for competence, and a minority ofmembers. None ofthose conditions exist with po? litical weblogs, photo sharing, or a host ofother self-publishing tools. Tbe individual weblogs are not merely alternate sites of publishing; they are alternatives to publishing itself, in the sense of publishers as a minority and professional class. In the same way you do not have to be a professional driver to drive, you no longer have to be a professional publisher to publish. Mass amateurization is a result ofthe radical spread of expressive capabilities, and the most obvious precedent is the one that gave birth to the modern world: the spread of the printing press five centuries ago. (Shirky 66)

with the audio file Textcast generated:

The digital text from which the audio file is generated has a number of errors. Words run together, words are misspelled, sentences are mispunctuated, and hyphens are retained. Still, the audio generated by OS X’s text-to-speech framework is for the most part comprehensible.

Here is how that audio file was generated.

Duration: 1m 21s

Clicking downloads a 8 MB file.

Please be patient while the file loads.

Ctrl/Right-click here to “Save File As . . .”

This next video explains how to create the “Text to Textcast” service.

Duration: 2m 58s

Clicking downloads a 12.5 MB file.

Please be patient while the file loads.

Ctrl/Right-click here to “Save File As . . .”

This last video describes how to produce an audibook and transfer it to an iPod.

Duration: 1m 49s

Clicking downloads a 17.6 MB file.

Please be patient while the file loads.

Ctrl/Right-click here to “Save File As . . .”

My claim that paper print books will lose their place to digital print books depends upon digital print providing affordances paper print books cannot. Some of these affordances are machine readability (machine searchable), portability (stored as electronic data), copyability (good for users), and transducibility (transformation from one form to another as illustrated above). For now and the next little bit of the future, digital print versions of literature are not gaining traction because they are digitally-encumbered, relatively expensive, and difficult to transfer. These drawbacks are grit in the machinery that would convert audiences from paper to digital print. Furthermore, the incumbent media continues to prevail to such a degree that even Google, the world leader in the production and management of of digital print assets, understands that allowing for the reconversion of digital print assets into paper will prove to be a profitable market.

But things change quickly in this early part of the digital information age and I do believe that within a few years one of the most fundamental tools of scholarship—those venerable objects referred to as “books”—will be superseded by something better.

end of article

Notes
1 I used an Epson Expression 10000XL to scan the pages in. I then used Photoshop to turn the TIFF images into PDFs using the version of Adobe Photoshp that comes bundled with Adobe CS 4 Premium. I batch OCRed the PDFs using Adobe Acrobat Pro (bundled with CS4) and concatenated the PDFs using the open source pdftk.
Bibliography

Shirky, Clay. Here Comes Everybody : The Power of Organizing without Organizations. New York: Penguin Press, 2008.

Trithemius, Johannes. De Laude Scriptorum. Mainz: Petrus Friedberg, 1494.