Text Encoding and the TEI: Modelling with TEI: Why use TEI?

Modelling with TEI

Why use TEI?

In the previous unit we have learned how one can model data using XML to represent and describe a specific type of data. We could thus create a model in XML for a poem as below:

<verse>
  <line>Mary had a little lamb,</line>
  <line>Its fleece was white as snow,</line>
  <line>And everywhere that Mary went</line>
  <line>The lamb was sure to go.</line>
<verse>

In our model above the entire poem is represented using the <verse> element and individual lines of the poem using the <line> element. If we all use our own custom markup languages this can get confusing and makes interchangeability virtually impossible. Somebody else might use <poem> and <LN> to encode poems and somebody in Germany might use <Gedicht> for poem and <Zeile> for line. This need for a common vocabulary is not new. By the mid-1980s there was a clear need for a common format. Academics, librarians and archivists from North America and Europe met and developed what was to become the Text Encoding Initiative, which has become the de-facto standard for encoding texts in the humanities.The acronym TEI stands both for an encoding standard for electronic texts, Text Encoding for Interchange, and for the consortium that releases and continuously develops this standard, the Text Encoding Initiative. The TEI consortium was established in 1987 as an international research project to develop a standard to ‘facilitate the creation, exchange, and integration of textual data in machine-readable form’. The goal was to create a standard that would support the encoding of ‘all kinds of texts, in every human language, from every historical or social context’. A challenging goal!

The main characteristics and benefits of the TEI markup language are that TEI was designed to encode meaning (descriptive markup language), to be software independent, and to be community-driven. The TEI recommendations are continuously updated and occasionally major releases are published. These major releases are numbered incrementally starting with TEI P1 (in 1990) to the latest release TEI P5 (in 2007). Since, 2011 TEI is also registered as its own media type (RFC 6129).

Since the first draft of the TEI guidelines was released in the 1990s, TEI has developed into one of the most important encoding standards within the humanities. The first TEI guidelines P1 to P3 are based on SGML, while the more recent standards – TEI P4 ( June 2002) and TEI P5 ( November 2007) – use XML.