Dealing with Files and File Attachments
A common thing to want to do is to upload a file from the user's disk to the server. There are several ways to accomplish this. Each has its own problems.
One way to get a text file is to have the user paste the contents of the file into a large text area. The advantage to this is that the file is visible to the user and can be corrected before hitting the submit button. In addition, this benefits the programmer because the encoding of the file is known.
The encoding is known because the browser converts the text in an input field to the character encoding applied to the page. For our Unicode UTF-8 pages, this means that the received file is written in UTF-8.
If the page encoding is not UTF-8, then you may experience some problems. The mojibake or conversion issues you saw elsewhere then apply to the material pasted into the form.
Another problem is with files that contain internal encoding indicators. XML files are a good example. They can contain an encoding declaration at the start of the file that looks like this: <?xml version="1.0" encoding="SJIS">
In this case, a file tagged internally as Shift-JIS would be uploaded as a sequence of UTF-8 bytes and could result in parsing problems.
Dealing with File Select
The more common way to upload files is by using the HTML form tags that bring up the browser's file selection dialog box.
The resulting POST to the server results in a Multipart MIME document sent from the browser.
MIME? Isn't that email?
Multipart MIME documents are, in fact, the same format the email uses for attachments. There are some idiosyncrasies to deal with, though.
For one thing, the MIME header Content-Type sent from a browser never includes the charset parameter (for compatibility with, of all things, Netscape 2).
This means that you cannot assume anything about the encoding of the file and should solicit it from the user (with a default of UTF-8 probably). Detecting the character encoding is risky business.
Here's the HTML for a file upload:
<form method="post" enctype="multipart/form-data" ...>
...
File: <input type="file" name="attachment">
Test Form
Demo currently disabled. Try some other time