Introduction

Making JSP pages that use a Unicode encoding is really easy. You only need to do a few things to be successful.

For starters, you need to understand what happens when you make a JSP page. JSPs are really Java programs (they extend the class servlet). The JSP container reads the source file and writes a little Java program. Click here to see the source generated for a version of this page. Basically, all the program generated from the JSP does is write a byte stream back to the browser using an OutputStreamWriter.

Internally, all Java String objects are Unicode UTF-16. So you are working with Unicode within the JSP page, regardless of the encoding. What you need to do to have a working page is:

Here's the code you need to do that.

First, we need to read any input using the right encoding. This requires that you instruct the ServletRequest object what encoding to use before you read any data from it. Once you read data from the Request, the encoding is set forever.

request.setCharacterEncoding("UTF-8");

If all of your pages use UTF-8, then you can skip the above step. Be careful too of pages that link to yours but which are not part of your application. If an external page links to you with a form and it doesn't use UTF-8, you'll null strings when you ask for parameters (since the UTF-8 conversion will silently fail).

Next we need to tell the system what encoding the page is written in. This is not necessarily the same thing as the encoding the page will be in when you serve it, as this directive tells the JSP system how to read the bytes in the .JSP file. You may find that it is easier to use a legacy (non-Unicode) encoding to author your pages, since many IDEs and editors don't support UTF-8 or make it difficult to work with.

The pageEncoding directive controls this encoding:

<%@page pageEncoding="UTF-8"%>

Finally, we need to control the encoding that the page uses when it is sent to the browser. This has several parts to it, since you need to set things in several places. First, you should include a META tag in your HTML markup so that end users can see it. The other things we are doing to the page are more than adequate to make the page work correctly, but end users can sometimes debug page display problems (such as when they manually override the encoding and get junk) by looking at the tag. Here's what a META tag looks like:

<html>
<head>
  <META http-equiv="Content-Type" content="text/html;charset=UTF-8">

Then you need to tell the page compiler what encoding you want to use. This has several effects. First, it sets the actual encoding used. Second it sets the HTTP Content-Type header.

<%@page contentType="text/html;charset=UTF-8"%>

Note that XML files are handled similarly. XML doesn't use a META tag, but you can (and should) set the encoding attribute in the document declaration:

<?xml version="1.0" encoding="UTF-8"?>

There are some caveats about setting the encoding explicitly to UTF-8 using the contentType page directive. The big one is: prior to J2EE 1.4 (Servlet 2.4), the JSTL and other taglib directives related to getting and setting the page Locale caused the page to change encoding to one inferred by the Locale. In J2EE 1.4, this is fixed (so that the page directive takes precedence over any implicit encoding), but for now you need to be careful of setting the page Locale. We'll examine that on another page.

No Charset Specified: Demonstrates how not using the page directives results in a page that uses Latin-1.

Inferred Charset: Demonstrates how an inferred character encoding (from response.setLocale() in this case) overrides the contentType page directive on Servlet 2.3 and earlier. Note that using any of the fmt tags in JSTL in your page will give you this same result.

Includes: Demonstrates the tags <%@include> and <jsp:include />.

Locales Demo

Next Topic

Unicode Demo

Proves that this page really isn't just an ASCII file that happens to work. Note that if you see hollow boxes for characters that are displable (i.e. not control characters or unassigned values), then you may need to install a font and/or configure your browser to display the text. The stylesheet specifies Arial Unicode MS and Code2000, some common Unicode fonts, but you might not have these installed.

To get Code 2000, visit James Kass's site.

Select Unicode Range to View

Currently viewing the 256 characters starting at 20a0

​₠
20A0

​₡
20A1

​₢
20A2

​₣
20A3

​₤
20A4

​₥
20A5

​₦
20A6

​₧
20A7

​₨
20A8

​₩
20A9

​₪
20AA

​₫
20AB

​€
20AC

​₭
20AD

​₮
20AE

​₯
20AF


​₰
20B0

​₱
20B1

20B2

20B3

20B4

20B5

20B6

20B7

20B8

20B9

20BA

20BB

20BC

20BD

20BE

20BF


20C0

20C1

20C2

20C3

20C4

20C5

20C6

20C7

20C8

20C9

20CA

20CB

20CC

20CD

20CE

20CF


​⃐
20D0

​⃑
20D1

​⃒
20D2

​⃓
20D3

​⃔
20D4

​⃕
20D5

​⃖
20D6

​⃗
20D7

​⃘
20D8

​⃙
20D9

​⃚
20DA

​⃛
20DB

​⃜
20DC

​⃝
20DD

​⃞
20DE

​⃟
20DF


​⃠
20E0

​⃡
20E1

​⃢
20E2

​⃣
20E3

​⃤
20E4

​⃥
20E5

​⃦
20E6

​⃧
20E7

​⃨
20E8

​⃩
20E9

​⃪
20EA

20EB

20EC

20ED

20EE

20EF


20F0

20F1

20F2

20F3

20F4

20F5

20F6

20F7

20F8

20F9

20FA

20FB

20FC

20FD

20FE

20FF


​℀
2100

​℁
2101

​ℂ
2102

​℃
2103

​℄
2104

​℅
2105

​℆
2106

​ℇ
2107

​℈
2108

​℉
2109

​ℊ
210A

​ℋ
210B

​ℌ
210C

​ℍ
210D

​ℎ
210E

​ℏ
210F


​ℐ
2110

​ℑ
2111

​ℒ
2112

​ℓ
2113

​℔
2114

​ℕ
2115

​№
2116

​℗
2117

​℘
2118

​ℙ
2119

​ℚ
211A

​ℛ
211B

​ℜ
211C

​ℝ
211D

​℞
211E

​℟
211F


​℠
2120

​℡
2121

​™
2122

​℣
2123

​ℤ
2124

​℥
2125

​Ω
2126

​℧
2127

​ℨ
2128

​℩
2129

​K
212A

​Å
212B

​ℬ
212C

​ℭ
212D

​℮
212E

​ℯ
212F


​ℰ
2130

​ℱ
2131

​Ⅎ
2132

​ℳ
2133

​ℴ
2134

​ℵ
2135

​ℶ
2136

​ℷ
2137

​ℸ
2138

​ℹ
2139

​℺
213A

​℻
213B

213C

​ℽ
213D

​ℾ
213E

​ℿ
213F


​⅀
2140

​⅁
2141

​⅂
2142

​⅃
2143

​⅄
2144

​ⅅ
2145

​ⅆ
2146

​ⅇ
2147

​ⅈ
2148

​ⅉ
2149

​⅊
214A

​⅋
214B

214C

214D

214E

214F


2150

2151

2152

​⅓
2153

​⅔
2154

​⅕
2155

​⅖
2156

​⅗
2157

​⅘
2158

​⅙
2159

​⅚
215A

​⅛
215B

​⅜
215C

​⅝
215D

​⅞
215E

​⅟
215F


​Ⅰ
2160

​Ⅱ
2161

​Ⅲ
2162

​Ⅳ
2163

​Ⅴ
2164

​Ⅵ
2165

​Ⅶ
2166

​Ⅷ
2167

​Ⅸ
2168

​Ⅹ
2169

​Ⅺ
216A

​Ⅻ
216B

​Ⅼ
216C

​Ⅽ
216D

​Ⅾ
216E

​Ⅿ
216F


​ⅰ
2170

​ⅱ
2171

​ⅲ
2172

​ⅳ
2173

​ⅴ
2174

​ⅵ
2175

​ⅶ
2176

​ⅷ
2177

​ⅸ
2178

​ⅹ
2179

​ⅺ
217A

​ⅻ
217B

​ⅼ
217C

​ⅽ
217D

​ⅾ
217E

​ⅿ
217F


​ↀ
2180

​ↁ
2181

​ↂ
2182

​Ↄ
2183

2184

2185

2186

2187

2188

2189

218A

218B

218C

218D

218E

218F


​←
2190

​↑
2191

​→
2192

​↓
2193

​↔
2194

​↕
2195

​↖
2196

​↗
2197

​↘
2198

​↙
2199

​↚
219A

​↛
219B

​↜
219C

​↝
219D

​↞
219E

​↟
219F


References

Developing Multilingual Web Applications Using JavaServer Pages Technology: the excellent article by Norbert Lindenberg

Authoring Techniques for XHTML and HTML Internationalization: the primer from the W3C Internationalization WG.