The xml:lang attribute

Darwin Information Typing Architecture (DITA) Version 1.3 Part 3: All-Inclusive Edition

Document
Darwin Information Typing Architecture (DITA) Version 1.3 Part 3: All-Inclusive Edition
Version
1.3
Author
OASIS DITA Technical Committee

The xml:lang attribute specifies the language and (optional) locale of the element content. The xml:lang attribute applies to all attributes and content of the element where it is specified, unless it is overridden with xml:lang on another element within that content.

The xml:lang attribute SHOULD be explicitly set on the root element of each map and topic.

Setting the xml:lang attribute in the DITA source ensures that processors handle content in a language- and locale-appropriate way. If the xml:lang attribute is not set, processors assume a default value which might not be appropriate for the DITA content. When the xml:lang attribute is specified for a document, DITA processors MUST use the specified value to determine the language of the document.

Setting the xml:lang attribute in the source language document facilitates the translation process; it enables translation tools (or translators) to simply change the value of the existing xml:lang attribute to the value of the target language. Some translation tools support changing the value of an existing xml:lang attribute, but they do not support adding new markup to the document that is being translated. Therefore, if source language content does not set the xml:lang attribute, it might be difficult or impossible for the translator to add the xml:lang attribute to the translated document.

If the root element of a map or a top-level topic has no value for thexml:lang attribute , a processor SHOULD assume a default value. The default value of the processor can be either fixed, configurable, or derived from the content itself, such as the xml:lang attribute on the root map.

The xml:lang attribute is described in the XML Recommendation. Note that the recommended style for the xml:lang attribute is lowercase language and (optional) uppercase, separated by a hyphen, for example, "en-US" or "sp-SP" or "fr". According to RFC 5646, Tags for Identifying Languages, language codes are case insensitive.