Explore 1.5M+ audiobooks & ebooks free for days

Only €10,99/month after trial. Cancel anytime.

Publishing with XML: Structure, enter, publish
Publishing with XML: Structure, enter, publish
Publishing with XML: Structure, enter, publish
Ebook454 pages2 hours

Publishing with XML: Structure, enter, publish

Rating: 0 out of 5 stars

()

Read preview

About this ebook

XML is now at the heart of book publishing techniques: it provides the industry with a robust, flexible format which is relatively easy to manipulate. Above all, it preserves the future: the XML text becomes a genuine tactical asset enabling publishers to respond quickly to market demands. When new publishing media appear, it will be possible to very quickly make your editorial content available at a lower cost. On the downside, XML can become a bottomless pit for publishers attracted by its possibilities. There is a strong temptation to switch to audiovisual production and to add video and animation to what we currently call a book, i.e. a written, relatively linear discourse representing a series of ideas. Publishers cannot ignore technology, however. It is better to recognize the threats of innovation and to maintain your business and your convictions by boarding the e-publishing ship. But make sure you carry a life preserver, XML, to ride above the waves of modern times.

À PROPOS DES ÉDITIONS LIGARAN

Les éditions LIGARAN proposent des versions numériques de qualité de grands livres de la littérature classique mais également des livres rares en partenariat avec la BNF. Beaucoup de soins sont apportés à ces versions ebook pour éviter les fautes que l'on trouve trop souvent dans des versions numériques de ces textes.

LIGARAN propose des grands classiques dans les domaines suivants :

• Livres rares
• Livres libertins
• Livres d'Histoire
• Poésies
• Première guerre mondiale
• Jeunesse
• Policier
LanguageEnglish
PublisherLigaran
Release dateJun 19, 2015
ISBN9782335086522
Publishing with XML: Structure, enter, publish

Related to Publishing with XML

Related ebooks

Programming For You

View More

Reviews for Publishing with XML

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Publishing with XML - Ligaran

    etc/frontcover.jpg

    Bernard Prost

    Publishing with XML

    Ligaran Publishing

    2015

    EAN : 9782335086522

    Copyright Ligaran 2015

    71100 Chalon-sur-Saône

    FRANCE

    Acknowledgments

    Summarizing the relation between XML and publishing in a short book is a difficult task, and I could never have carried it out on my own. First I wish to thank some key people at Editions Eyrolles (the publisher of the French edition of this book): my editor Stéphanie Poisson and her team, as well as Véronique Dürr who helped her with the proofreading. They have the art of giving meaning to my thoughts which occasionally get overwhelmed by technology.

    I also wish to thank all those who worked with me on XML:

    the shareholders of Ligaran: Alain Pierrot, a remarkable designer of advanced taxonomies, connoisseur of the Open Office suite, XSLT author, and an expert in book scanning; Xavier Maurin, the code and graphic wiz at MyBookForge.com, who has a brilliant view of the consumer digital world; Olivier Desnoux, a software developer with impeccable methodology, author of elegant (and legible!) code, co-designer of the MyBookForge transformation engine; Adrien Vieilleribière, talented researcher, major XSLT artist able to put just about anything online and make XML transformation to any format accessible to all, who also co-designed the MyBookForge transformation engine; Patrick Pierre, a talented engineer and one of the most advanced minds in publication technology—his mastery of IDML (barely discussed in this book) is remarkable; and Hugues Cochard, serial-creator of high-tech companies, currently in Tahiti but very present via the Web.

    all those who trusted me with their professional or scientific projects, notably Mai Nguyen and Lionel Ridoux who know everything about medication and XML.

    two friends met along the way: Christian Brugeron for his clever scripts designed to work around the limitations of just about any page layout software—starting with InDesign; and Benoît Leprince who provided various examples of InDesign layouts used to illustrate this book.

    Thanks to all those in the brand new e-book ecosystem which should take off at an astounding rate worldwide and perhaps in France as well: notably to Houriah Ghebalou (PREMICE, the regional business incubator in Burgundy) who financed the preliminary research for the Ligaran/Mybookforge project; the Burgundy region, which supported the project and its local set-up; and Nicéphore Cité, our home away from home in Chalon-sur-Saône which assists image and audio start-ups.

    Finally I would like to thank Ray Charles, who understood that the medium influences the message: without the need to flip the 45 RPM record to listen to the other half, the famous break in What I’d Say would not exist!

    Foreword

    Wait a minute, wait a minute, oh hold it! Hold it!

    ---

    Hey (hey) ho (ho) hey (hey) ho (ho) hey (hey) ho (ho) hey

    Ray Charles (What I’d Say)

    If everything is under control, you are going too slow

    Mario Andretti

    The world of publishing is going through a sea change. Paper books are facing competition from an ever-expanding range of virtual devices: the Web (obviously); the compact, powerful, and aptly-named netbooks; and especially mobile phones and other nomadic devices like e-book readers and notepads which make complete professional and literary libraries available to all, urbi et orbi. Now publishers need to deliver content for these media, making use of their specific features while minimizing both costs and production lead times. At first publishers had to make several revisions of the same content for different target media. But today publishers are adopting a more industrial—yet also more standardized and restrictive—approach based on XML.

    The flexible and universal nature of XML has attracted publishers—first and foremost those specializing in legal publications, who are used to working with SGML—as well as programmers, who can use the language to exchange data between a wide range of computer systems.

    DEFINITION e-reader

    A portable device for reading electronic books (e-books). An e-reader is a hardware device using display technology called e-paper, the marketing term for a non-backlit screen requiring minimal energy and reputedly less tiring for the eyes. Along with e-paper, marketers have coined the term e-ink to describe a pixel...

    The Extensible Markup Language (XML), standardized in 1999, has reached maturity. An XML ecosystem has emerged populated by specialized software (XML editors), on-shore, near-shore and off-shore service providers specializing in the language; application developers able to use the Document Object Model (DOM) to create innovative electronic products, and industry-specific document models for various types of publications.

    DEFINITION DOM

    The Document Object Model is a tree-based IT model for XML or HTML documents. DOM is independent of all other taxonomies. The DOM enables programs to manipulate document components.

    Nevertheless, XML usage has not yet stabilized and practices vary among publishers. The purpose of this book is to provide a practical overview of how publishers can use XML, based on concrete, tested methods which, by nature, are limited to specific cases. Publishing with XML is neither a bible nor a dogmatic treatise on the subject, and readers can adapt the examples provided to suit their needs.

    How the book is organized

    This book includes three parts—Structure, Enter, and Publish—covering the entire XML cycle for publishing an e-book. Publishing with XML is mainly intended for publishers, editors/proofreaders, and production managers. But it also addresses managers wishing to understand the underlying techniques, and to comprehend how the medium influences the design and format of digital publications. Authors curious to learn more about XML's possibilities can also discover new ways to design their composition.

    The book frequently refers to a sample encyclopedia article, similar to those found in Wikipedia. The example is based on a structure developed specifically for this publication (

    article_v1.2.dtd

    ). The example meets simple editorial requirements:

    be able to publish the article in paper format, on the Web, or on a smartphone.

    include interactive publication objects regarding authors, bibliographies, filmographies and discographies. The interactive features must be independent of target databases.

    For simplicity's sake, this book does not contain tables or mathematical formulas (except for a few included as images).

    Structuring with XML

    The first chapter focuses on document modeling and the XML markup method. The following chapter describes the main structures found in a publication, or more generally in a document. Chapter Three shows how to write a DTD, i.e. the simplest way of representing a taxonomy.

    DEFINITION Taxonomy

    A set of tags used for encoding a document in XML. The taxonomy is usually written in a specialized language (such as DTD, XML Schema, or Relax NG).

    Entering XML markup

    Chapter Four concerns the actual entry of XML tags. In most cases, this job is outsourced, but publishers increasingly need to be able to modify a document using an XML editor in-house in order to correct minor errors or to make last-minute changes. This chapter focuses on configuring a commercial XML editor and using it with a specific DTD.

    Chapter Five examines the relation with subcontractors: how to prepare the text to minimize errors when interpreting the structure, and how to create effective instructions.

    Chapter Six discusses a step rarely described in the production process: proofing XML. It shows how to make sure the XML provided by the subcontractor meets the publisher's needs. This chapter also covers the various XML production models used for XML entry either before, during, or after the paper page layout.

    Publishing

    Chapter Seven provides an overview of the techniques for transforming an XML document into a target format, including XML itself (e.g. input for InDesign), XHTML, or any other text format. Although highly technical, there is nothing mysterious about the XSLT transformation language. It is important for those involved in publishing to understand the mechanism in order to appreciate the impact of editorial decisions.

    Chapter Eight briefly describes publishing on electronic media, but limits the discussion to the Web, e-readers, and the iPhone (currently the most advanced phone-based e-reader).

    Finally, Chapter Nine investigates two approaches to paper-based publishing using an XML document:

    directly transforming XML into PDF using XSL-FO, a page layout language written in XML

    directly importing XML using a DTP tool (such as InDesign)

    This book provides the keys to using XML in the editing process, but presents only the bare essentials of this modern publishing method. Interested readers can find books dedicated to each of these techniques.

    NOTE

    XML terminology is relatively opaque. Many terms include references to SGML, style sheets, etc. but have lost their original meaning and the terms no longer reflect their actual role. You will need to apply them regardless of their usual meaning in English.

    Chapter 1

    Separating content from format

    The crucial challenge for publishers is how to build a methodology for publishing across a wide range of current or future media, with a single markup process performed either before or after publication, and at the lowest possible cost. The first step in this process is to separate content from format, far beyond the techniques of word processor style sheets.

    Modeling a document

    A book, or more generally any document in XML format, requires a sufficiently general model adapted to all likely publishing scenarios. You create an abstract model for a set or a class of documents and then submit them to a common computer process.

    Identifying the three aspects of a document

    Once you become familiar with XML, you will never look at a document the same way. The content of a document is created by juxtaposing words (without any typographical enrichment) and the document's form (which partially highlights the author's thoughts). But the structure is a new document component providing features which depend specifically on the planned use of the paper and electronic editions.

    The content

    The content is the text, i.e. what you read; it is independent of the format. The version with the least amount of format is an audio recording: each word only has its semantic value and is not supported by any typographical variations, although a few audio variants can give a word more meaning.

    The format

    The format enhances the information. It is based on a highly cultural and linguistic graphical translation providing an implicit manner of interpreting the text.

    In our society, putting a character in bold highlights it, both for titles and within the body of the text. The character font and the position on the page reflect the level of importance: text which is bigger and farther to the left is usually the highest-level title.

    The structure

    Actually I should say structures: there is not just one structure, but an infinite number of structures depending on what you wish to identify for future use.

    For a novel to be published in both paper and electronic editions, you simply identify the chapters, chapter titles, paragraphs, and the text to be highlighted within each paragraph.

    For a journal article to be published on the Web with automatic search functions in Google Scholar or Google Books (or any other bibliographic database), you mark entries in the bibliography, the authors' name, and the titles of publications or journals cited.

    Figure 1-1 Content, Format, Structure

    The content (on the left) is made of the raw text—what can be read out loud (audio book).

    The format (on the right) provides additional information which is heavily influenced by culture and practices. A title appears larger and in bold. It acts both as a marker and a summary to help readers as they discover the text.

    The structure—shown here via callouts—is an abstract representation (in many cases guided by a pre-existing form) intended for multimedia use, without making any choices in principle regarding the final appearance.

    Identifying document classes

    There is no such thing as a generic document model able to represent any type of document. If one did exist, it would be so complex that it would be impossible to use. Therefore we try to define document classes that correspond to various ways of organizing information—such as a dictionary—or to natural groups such as the collections of a given publisher.

    The process of defining document classes, called document analysis, involves extracting the structural elements for future use from a set of similar documents. You usually start from a limited number of available and representative publications, and then gradually build a model meeting your multimedia editorial requirements.

    Structured documents

    The most basic structured document is a novel or a dissertation. This model is the simplest, the most widely used, the most intuitive, but also the most complex for there is an infinite number of structural variations to manage (even if it means ignoring or simplifying them for electronic editions).

    DEFINITION Label

    A graphical, textual, or numerical navigational indicator: numbering in a list, chapter numbers, etc.

    A dissertation is often (but not always) divided into parts, which in turn are divided into chapters. Each chapter has an (optional) title, preceded by an (optional) number or label for positioning it in the book's organization. When the composition has neither chapter numbers nor a title, it is difficult to mark the chapters in an electronic edition. There are solutions, of course...

    The most common structural component within a chapter is a paragraph: a semantic unit defined by the author and represented typographically by both an indentation on the first line—making it easy to see even if it appears at the start of a page—and a carriage return at the end. Within a paragraph, the author can highlight certain words or phrases using bold or italic font, for instance.

    Finally, typographical variants related to a paragraph (such as flush right) express various concepts such as a quotation, an excerpt, an epigraph, etc. The number of possible variations is unlimited.

    Dictionaries

    Each dictionary has its own structure; hence it's not realistic to speak of THE dictionary class structure You will fin either specific structure to each dictionnary, the target being to publish different paper version (for example a paperback dictionary) or electronic versions with advanced features (hypertext link, lookup functions, etc.)

    A dictionary is closer to a database structure than to a book structure. It has entries, often sorted alphabetically, organized in semantic units more or less like a data base.

    Usually, entries are structured in XML and look more or less like micro-documents which are

    Enjoying the preview?
    Page 1 of 1