Do Open-Source Books Work?
by Ben Crowell
How will the internet change book publishing? This article examines a new crop of math and science textbooks that are available for free over the internet, and discusses what they have to tell us about whether the open-source software model can be translated into book publishing.
This article is copyright 2000 by Benjamin Crowell, and is open-content licensed under the OPL license,

This article was discussed on the Slashdot forum 26 Sep 2000. Since then, I've started a web site called The Assayer for user-contributed book reviews, with an emphasis on free books. The number of free books and the number of open-source books has grown since then, and The Assayer is a good place to find them. If you're interested in old public-domain books that are free on the web, check out The Assayer's links page. I wrote followup articles in 2002 and 2005.

Ben Franklin[1] figured out that information wants to be free, so in 1731 he invented the lending library. It was no Napster: this eighteenth-century information superhighway was meant for such serious purposes as education and fomenting revolution. Franklin wrote, "These libraries have improved the general conversation of the Americans, made the common tradesmen and farmers as intelligent as most gentlemen from other countries, and perhaps have contributed in some degree to the stand so generally made throughout the colonies in defense of their privileges." Words mattered. In the golden age of ink and wood pulp, Uncle Tom's Cabin and Zola's J'Accuse letter[2] were data that packed a punch.
We take the information revolution seriously, but how serious are we about serious information? Can we really free our minds if we power on, dial up, and log in? You wouldn't think so based on any changes in the U.S. education system. A young relative of mine brought home his grade-school science textbook, and one of its main modules was a detailed discussion of dinosaurs, yet it never mentioned evolution. Bad textbooks are the rule, not the exception. A recent critical survey of American history textbooks[3] is dedicated "to all American history teachers who teach against their textbooks," but the author might have well included the rest of the curriculum. Poor textbooks were probably already inspiring complaints back when they were scratched on clay tablets with a pointed stick, but I'll argue below that books are actually getting worse, and that both the problem and the possible solution have to do with technology and economics.
The Problem is Economics
Many e-businesses have found out that technology can make you broke as easily as it can make you rich; in publishing, it seems that technology has driven the profit out of textbooks. Color printing has been getting cheaper, and full color, though still fantastically expensive to set up for production, is now considered mandatory for high school and introductory college textbooks. At the same time, desktop publishing software and the increasing digitization of printing have made it possible to prepare new editions more rapidly. The confluence of these technologies has created a vicious circle. Rising production costs drive up bookstore prices, which makes more students buy used books, which reduces sales. To kill off the used book market, publishers bring out a new edition every few years, with just enough changes to make it impractical to use it side by side in the same classroom with the previous edition. To compensate for the added cost of tooling up for so many new editions, publishers raise their prices, which starts the whole cycle over again. After decades of merely keeping pace with inflation, textbook prices have recently headed through the roof.[5]
In this climate of vanishingly thin margins, the most successful textbook is little more than a loss leader, and one with more modest sales is a disaster. Every book has to be a home run. For fear of losing sales in socially conservative school districts, K-12 biology books often soft-pedal evolution, misrepresent it, or omit it entirely.[6] History books avoid controversy by propagating the myth that John Brown was insane, or by failing to mention that the Vietnam war began as a war of independence in a French colony.[3] The home-run syndrome's most consistent effect is to inflate the list of topics, so that no book will be rejected by anyone for leaving out a specific item. In my field, physics, it is commonly observed that each edition is worse than the previous one, as the pressure for more topics squeezes out the room for honest explanations, resulting in a cookbook of formulas.
Free Books, But No Open-Source Books
If bad books result from higher prices, free books would seem to be the solution. Textbooks, besides their intrinsic importance as gateways to industrial-strength information, are a good test bed for evaluating innovations in how books are written and distributed. The authors of math and science textbooks in particular are unlikely to be intimidated by technology, but their goals and methods are more representative of the practical approach of authors in general than in the case of computer manuals and computer science textbooks, [4] whose authors may be willing to put up with a great deal of pain to be on the bleeding edge of information technology. When I set out to write my own free physics textbook, I found that it was quite hard to get any information on how a free book could be done in practice, and this article is the result of my attempt at a (completely unscientific) survey of how free textbooks have actually been done. Quite a few free math and science textbooks are on the web now, [7],[8],[9],[10],[11] but interestingly, none of them seem to have followed the successful, highly publicized, and legally solid open-source software approach.[12] In fact, the most highly publicized digital textbooks are based on a model that is to open source as antimatter is to matter: a dental school[13] has required its students to buy all their books on a single DVD, which expires and stops working if the students don't pay a hefty annual fee!
Does the neglect of the open-source book concept outside the computer arena mean that there is something intrinsically wrong with the idea of an open-source book? Or does the rest of the world just not "get it" yet? As we'll see, the reality is more complicated than either extreme point of view.
Among the free books I've studied, the one that comes closest to the collaborative spirit of the open source movement is the Biophysics Textbook On-Line (BTOL),[9] in which each chapter has been written by a different author. The most important reason why the open-source software movement emphasizes collaboration-building is that the projects they tackle are often simply too big for the lone-wolf approach. Likewise, the BTOL was written because it had become apparent that the field was getting so large that the previously standard text was never going to be updated. When I wrote one of the authors, Lou DeFelice, to ask how the BTOL folks had been so successful in their community-building, he repled, "The BTOL is tied to a Society that already has an established community, regular meetings, newsletters, etc. We tap into all of this structure. For example, when a new article is posted we announce it in the Biophysical Society Newsletter. I would think that other fields might benefit from endorsement by an established society that already serves the field."
The most surprising result of my survey, however, was that there were no books that were really open source in the sense in which the term is used in the open-source movement. The BTOL is collaborative but closed-source. Some authors have made their source code available, [8],[11] but none of the source-available books are collaborations, and they do not have licensing agreements of the type developed to make sure free software stays free.
Do We Need Open Source?
Maybe that sounds like a criticism, but I don't intend it that way. My own book, although free, isn't even source-available, much less open-source. (This is mainly because of certain technical and economic issues discussed below.) But the open-source software model is designed to solve some real problems. For example, open-software licenses and culture are designed to prevent the problems that can arise when different people's software has to be put together in one package, e.g. to make sure that Linux can't be stopped dead in its tracks because some critical part of it turns out to be patented. The BTOL, on the other hand, might be difficult to publish as a single, bound book, because the individual authors own the copyrights on their own chapters, and there is no licensing agreement. An important insight of the inventors of open source was that copyrighted information with a carefully designed licensing agreement (a "copyleft") is in some sense more free than either copyrighted information or uncopyrighted information.[12].
Do authors even want other people to be able to modify what they wrote? Although software and books are not perfectly analogous, I feel that this particular concern about applying open-source methods to books is based on a misunderstanding of what open source is. While open-source software licenses do guarantee anyone the right to modify the program, they do not guarantee that those modifications will become standard or widespread. I could, for example, fiddle around with the delicate inner workings of my own copy of the Linux kernel, most likely breaking it due to my deficient programming skills. But I simply would not be allowed to tinker with the version everyone else depends on until I had proven my transcendent programming talents to a very critical cadre of the world's most fanatical software geeks. Nobody was ever able to force Linus Torvalds to take his Linux project in a direction he didn't want, because he owned the copyrights to its vital parts. The open-source approach allows the project's originator to exert whatever degree of control she/he deems appropriate. If I want to limit other people's contributions to my book severely, so that they can only report errata and provide supplements and add-ons, I can do that (although an approach that strict would probably not inspire very many people to participate). When it comes to sharing the pen, "if" and "how much" are up to the author, but a more interesting question is "how?" What legal and cultural framework will work? Are open-source software methods directly applicable? The BTOL collaboration, for instance, has an original take on this. Writes Victor Bloomfield, "It is important, of course, to maintain the integrity of each author's chapter (closed source). However, the volume editor can choose to include more than one treatment of the same material (semi-open source)."
It's also not hard to imagine creative projects that would be impossible with a closed-source model. In my field, for example, the phenomenon of textbook bloat is particularly out of control when it comes to the number of homework problems at the end of each chapter. One of the main things that deterred me from shopping my book around to the traditional- style publishers was knowing that I would be expected to crank out roughly a thousand additional homework problems in addition to the few hundred I'd already written. Writing homework problems is an activity that can be done in parallel by many people, and a stockpile of problems on the web would be a valuable resource for every teacher in the field. In fact, quite a few physics teachers already have their own individual collections on the web. A more general collection would also fit well with the collaborative approach used in open-source software, since there is no need to maintain a consistent authorial voice, and the bug-finding philosophy of the open-source software movement is applicable: homework problems can have bugs, people can usually agree on what constitutes a bug, bugs are hard to find, and bug-finding can be done in parallel by many people. (Incidentally, when publishers kill off the used book market by bringing out gratuitous new editions, one of their standard techniques for creating incompatibility between editions is to fiddle with the homework problems. Having a public collection on the web might help to eliminate this particular dysfunctional behavior.)
Another possible application of the open-source paradigm to textbooks would be the creation of sets of notes on applications. In physics, for example, ideas about torque and angular momentum can be applied to martial arts and gymnastics, but I simply don't have the expertise to write anything interesting on these topics. The availability of such a set of resources online would help to reduce textbook bloat, and would also allow students to read about applications that truly interest them. Likewise, scientists who lament the sparseness of applications in math textbooks could be invited to contribute applications themselves.
Do Technical Problems Prevent Open-Source Books?
Unfortunately going open-source isn't as simple as just adopting an open-source license. As I toyed with the idea of open-sourcing my own book, and then began to study how other people were doing things, it became clear that there were some serious technical hurdles. Imagine that Linus Torvalds was trying to get the Linux collaboration off the ground, but none of the prospective partners used the same computer language. This is pretty much the situation with desktop publishing software. Quark Express and PageMaker are the most popular packages for laying out books, at least among professionals, but they are very expensive and not fully interoperable. Quite a few physicists and mathematicians know LaTeX, but it's far from being a universal standard, and it does not allow the kind of control that is necessary for a book with a complex layout and lots of illustrations. (To be fair, many LaTeX users would consider this a feature, not a bug, since it results from the philosophy of separating form from content.) The true lingua francas are word-processor formats. Victor Bloomfield of the BTOL project writes, "Authors typically send word-processing (most commonly Word, but others as well) and graphics files. It is indeed a hassle..." The sheer amount of work involved in getting a book ready for open-sourcing has also deterred authors like Jim Hefferon and me.
A more subtle problem is that except for LaTeX, none of these formats lend themselves to communal editing. The open-source software community uses a program called CVS (Concurrent Version System) to allow people within a trusted community to change and edit the files from a large software project, and to resolve conflicts that occur when two people are simultaneously working on the same file. CVS can be used for any kind of plain-text, human-editable files, not just computer programs, but it can't be used with files from any of the popular word processors or desktop publishing programs, since they're all in binary formats.
No Paper, No Problem?
Nearly all the books I surveyed are distributed purely digitally. Author Warren Siegel[8]says, "...I'm trying to discourage printing as much as possible... I see a lot of printing/publishing as more habit than convenience, with dead trees rotting in people's offices rather than in the forests." A few authors (e.g. Jim Hefferon[11]) distribute bound, printed books to their own students and encourage instructors at other schools who use the book to do the same, but this may have the effect of discouraging adoption of the book, since professors may not want the hassle.
Students do want printed, bound books, and are willing to pay for them. I now have my own self-publishing business, but I originally distributed my book to students through print-to-order sales at Kinko's. Although Kinko's was expensive, roughly 90% of my students bought the books from Kinko's rather than downloading and printing them, which, after all, results in single-sided, unbound output. (I explained to them that I didn't get any royalties from Kinko's, so there was no personal motivation to buy the books rather than downloading them.) I have never had a student forgo dead-tree format completely and read the entire book from a computer monitor.
For my own book[7] I'm now using free digital distribution side by side with commercial distribution of printed copies by wholesale. The issue here is that printing has high startup costs, and running a business is, frankly, a lot less fun than teaching and writing. The big investment required to self-publish a book is also in conflict with openness; giving up the monopoly on selling printed copies would make it even more scary to try to make back my money.
Big booksellers such as Amazon.com and the bricks-and-mortar chains offer various options that let authors avoid the hassles and risks of setting up their own cottage industries, but their systems are not particularly attractive in my opinion. Amazon, for instance, offers a service in which they handle the retail ordering side of things while the author simply sends them wholesale shipments as needed. The problem is money. Amazon says they pay a "royalty" of 45%, which sounds generous, but is misleading. The author is responsible for production, so the 45% "royalty" is really an 82% retail markup, expressed as a percentage of the author's net. Considering how expensive short-run printing is, it's hard to imagine bringing a textbook to market at a reasonable price via this service. Other services handle both production and marketing, but are not able to do illustrated books.
What Next?
The solution to the difficulties of paper distribution is probably to limp along with the variety of approaches we've already been using, and wait for printing technology to solve the problem. The increasing digitization of the printing process and the emergence of efficient print-to-order systems is gradually making short-run print distribution cheaper and easier.
I don't see any general solution on the horizon to the technical problems involved in true open-source books. However, some of the interesting projects that require an open source approach might be doable with HTML format, which can be used with CVS. Although HTML is not printer-friendly enough to be suitable for a complete book, it might be fine for some of the more limited, modular applications such as homework sets and application notes.

History of Revisions