Site Resources
Internationalization: An Introduction [PDF] The
Internationalization and Unicode Conference tutorial. This version is the one to be presented at IUC31.
[
PPT of IUC29 version] note that this version does not match the PDF.
Warning: these are very large files
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License.
I have chosen this license to allow others to use these materials for training and learning purposes. A waiver will generally be
granted for commercial use or modification. However, I would like to know about modifications and especially improvements so that
these can be shared with the community.
Standards Work
-
[RFC 4646bis] Internet-Draft: Tags for the Identification of Languages
RFC 4646bis: Tags for the Identification of Languages
Mark Davis and Addison Phillips, editorsA revision to RFC 4646 that will incorporate ISO 639-3 language codes into the language subtag registry.
HTML Version, draft-00 (TXT) (XML)(wdiff)
draft-01 (HTML)- (TXT)- (XML)(wdiff)
draft-02 [2006-12-18] (HTML)- (TXT)- (XML)(wdiff)
draft-03 [2007-03-28] (HTML)- (TXT)- (XML)(wdiff)
draft-04 [2007-04-05] (HTML)- (TXT)- (XML)(wdiff)
draft-05 [2007-04-22] (HTML)- (TXT)- (XML)(wdiff)
draft-06 [2007-05-10] (HTML)- (TXT)- (XML)(wdiff)
draft-07 [2007-07-17] (HTML)- (TXT)- (XML)(wdiff)
draft-08 [2007-08-24] (HTML)- (TXT)- (XML)(wdiff)
draft-09 [2007-11-14] (HTML)- (TXT)- (XML)(wdiff)
draft-10 [2007-12-03] (HTML)- (TXT)- (XML)(wdiff)
draft-11 [2007-12-14] (HTML)- (TXT)- (XML)(wdiff)
draft-12 [2008-03-14] (HTML)- (TXT)- (XML)(wdiff)
draft-13 [2008-04-29] (HTML)- (TXT)- (XML)(wdiff)
draft-14 [2008-05-16] (HTML)- (TXT)- (XML)(wdiff)
draft-15 [2008-06-09] (HTML)- (TXT)- (XML)(wdiff)
Editor’s Copy (HTML)- (TXT)- (XML)(wdiff)
4646 to 4646bis diff -
[RFC 4646] Tags for the Identification of Languages
RFC 4646: Tags for the Identification of Languages
Mark Davis and Addison Phillips, editorsDeveloped by the IETF Language Tag Registry Working Group (LTRU). The RFC that defines language tags used in various Internet standards and protocols, as well as in HTML, XML, locale standards (such as CLDR or .Net), and so forth. This RFC obsoletes RFC 3066 and 1766.
An Explanation of the Design: Why RFC3066bis?
W3C I18N Article: Understanding the New Language Tags
by Richard Ishida, taken from my Multilingual article - [RFC 4647] Matching of Language Tags
-
[RFC 4645] Language Subtag Initial Registry
RFC 4645: Initial Language Subtag Registry
Doug Ewell, editorThis RFC defined the initial IANA Language Subtag Registry and contains the instructions for how to assemble that registry. Doug is also currently editing RFC 4645bis and his personal website contains the various documents related to that effort.
-
W3C Workshop: Constraints and Capabilities: Internationalization of Web Services
Constraints and Capabilities Position Paper: Internationalization of Web Services
HTML version A paper which examines the internationalization of Web services and how policy technologies might be affected. Those interested in the subject could do worse than see the Web Services Internationalization Usage Scenarios and Requirements documents produced by the W3C Internationalization Working Group. These papers can be found linked from the WS Task Force page.
-
[Internet-Draft] The Record-Jar Format
[Internet-Draft] The record-jar Format
Addison Phillips, authorThe record-jar format: an Internet-Draft to document the format familiar from Chapter 5 of Eric S. Raymond's The Art of Unix Programming (ISBN 0-13-142901-9). (Update: 2005-10-05)
(Draft-01 (.txt)) (HTML) (XML)
(Draft-02 TXT) (HTML) (XML)
Unique Content
-
A Guide to Configuring Computers to Edit Unicode
aka: "Learn to Type Japanese (and other languages)"
A Guide to Configuring Computers to Display and Process Non-ASCII Text
Learn to Type Japanese (and other languages) What you need to know in order to configure your computers to display and type text in languages other than English. Includes screenshots and instructions for many flavors of Windows, Mac, Unix.
-
Learning to Test with non-ASCII Data
A little piece on how to plan internationalization testing, with a focus on the management of test matrices.
-
It’s About Time
Examining issues that can arise when working with times, dates, and time zones in software. This paper is ever so slightly outdated. Other references:
Command Line Interfaces: Internationalizing Them. C and Encodings short, not altogether complete primer on working with char*. Plus:
A Delphi Internationalization Cookbook talks about mulitbyte enabling
Delphi programs.
Character Sets in JSP Demonstrates the use of the page directives, taglibs, and other niceties with JSP and servlet. This demo is evolving.
Java Locales: Lightweight demo showing some of the data associated with a Java locale.
People's Names and Software Under construction, some information about how to handle personal names in software. A more extensive paper is linked elsewhere on this page.
Papers and Presentations
-
IUC30: The Theory and Practice of Pseudo-Translation [pdf]
Discusses pseudo-translation and how it can be used—particularly in testing software for non-ASCII character support. [ppt]
10th Open Forum on Metadata Registries presentation:Making Sense of Language Tags (in PowerPoint format). Covers the history of language identification starting with ISO 639 and leading up to RFC 4646bis and the lastest changes in BCP 47. Presented at the Metadata Forum in New York City, July 2007.
Language Standards for Global Business keynote:How Standards Happen (and why sometimes they don't) (in PowerPoint format). The last second presentation I wrote for the LSGB conference in Barcelona, May 2006.
W3C I18N Article: Understanding the New Language Tags: Adapted from my Multilingual magazine article of the same name. Richard Ishida has a lot to do with it appearing here. May 2006.
Unicode 29: Language Tags and Locale Identifiers: A Status Report [PDF]. Presented at IUC29 in San Francisco, March 2006. [PPT]
W3C FAQ: xml:lang in XML Document Schemas
Discusses when to use the xml:lang attribute in your XML documents and when to use a different element or attribute
to identify natural languages.
W3C Note: Working with Time Zones describes the problems you might encounter when working with the date and time types in XML Schema.
Web Services and Internationalization appeared in Multilingual #73, July 2005. This article was translated into German (external link).
Something for Nothing? (a version of this paper appeared in Multilingual #69): Is Translation Memory delivering on its ROI promises?
Unicode 27: Language Tags: A Status Report is a companion piece to the slide presentation that Mark Davis ended up delivering at IUC27.
Unicode 27: Are We Counting Bytes Yet? Writing encoding converters using Java NIO. This paper discusses how legacy encodings are structured and some of the problems with processing data in byte-counted contexts (such as is used in many flat file text documents). Plus: Need to count bytes in JavaScript?
Unicode 26: Why Web Services Are Not Internationalized (yet...)
Unicode 26: Personal Names in Software reviews how to handle people's names in software. Slides (PPT). While Name Games is a very cursory look at two-dimensional resource problems when displaying personal names.
ESWC 2005: RFC 3066bis and the Semantic Web paper with Jeremy Carroll talks about how language tags that follow the structure of RFC 3066bis could be adapted to the Semantic Web to provide better searching and matching of content. This paper was published by Springer-Verlag in Lecture Notes in Computer Science and is available on-line here.
Unicode 25: Web Services and Internationalization explores the work of the W3C Internationalization Web Services Task Force.
Unicode 25: Language and Locale Tags gives a few of the reasons behind the RFC3066bis effort (see above). Look for the presentation here eventually.
Unicode 24: Approaches to Delivering Localized Software examines some of the different ways that localized (translated) software can be created, managed, and delivered to customers. [PDF Format]
Presentation: Managing Multi-Lingual Websites Presentation by Addison Phillips at the September 19, 2000 meeting of the Bay Area Publication Manager's Forum.
Presentation: Creating Multi-Lingual and Multi-Locale Databases International Unicode Conference 19 Presentation [PowerPoint]. SqlDriverConnect for ODBC OTN note on Unicode connections.
Whitepaper: Creating Multi-Lingual and Multi-Locale Databases International Unicode Conference 19 Whitepaper [PDF]. The content of this document expands on that of the presentation.
Whitepaper: Four ACEs, a Survey of ASCII Compatible Encodings International Unicode Conference 22 Whitepaper [PDF].
Presentation: ULocale Tags from IUC23 IUC23 slides discussing the need for locale tags and the ideas behind ULocales. [PDF format].
Links
I18N Gurus The resource for finding out about internationalization.
Unicode The Universal Character Set.
The UTF-8 and Unicode FAQ for Unix/Linux Contains a variety of useful information about using Unicode and especially the UTF-8 encoding of Unicode in a Unix environment.
MS Shell Font The Microsoft documentation on the Shell font, useful for certain kinds of "code page" programs on Windows.
Web Services Internationalization @ W3C Home page of the Task Force, where you'll find useful material on internationalization of Web services.
ICU International Components for Unicode. The IBM Open Source library for C and Java, which provides internationalization capabilities and Unicode support.
UTF-8 and Unicode FAQ for Unix by Markus Kuhn. This has a lot of useful information on dealing with Unicode in your C programs on UNIX.
Roman Czyborra's Alphabet Soup pages
PC Keyboard Where do you get a real, old-fashioned, clicky IBM keyboard? This is the company that bought the patents from Lexmark.

