start page | rating of books | rating of authors | reviews | copyrights

Unix Power ToolsUnix Power ToolsSearch this book

21.12. Encoding "Binary" Files into ASCII

Email transport systems were originally designed to transmit characters with a seven-bit encoding -- like ASCII. This meant they could send messages with plain English text but not "binary" text, such as program files or graphics (or non-English text!), that used all of an eight-bit byte. Usenet (Section 1.21), the newsgroup system, was transmitted like email and had its same seven-bit limitations. The solution -- which is still used today -- is to encode eight-bit text into characters that use only the seven low bits.

The first popular solution on Unix-type systems was uuencoding. That method is mostly obsolete now (though you'll still find it used sometimes); it's been replaced by MIME encoding. The next two sections cover both of those -- though we recommend avoiding uuencode like the plague.

21.12.1. uuencoding

The uuencode utility encodes eight-bit data into a seven-bit representation for sending via email or on Usenet. The recipient can use uudecode to restore the original data. Unfortunately, there are several different and incompatible versions of these two utilities. Also, uuencoded data doesn't travel well through all mail gateways -- partly because uuencoding is sensitive to changes in whitespace (space and TAB) characters, and some gateways munge (change or corrupt) whitespace. So if you're encoding text for transmission, use MIME instead of uuencode whenever you can.

To create an ASCII version of a binary file, use the uuencode utility. For instance, a compressed file (Section 15.6) is definitely eight-bit; it needs encoding.

A uuencoded file (there's an example later in this article) starts with a begin line that gives the file's name; this name comes from the first argument you give the uuencode utility as it encodes a file. To make uuencode read a file directly, give the filename as the second argument. uuencode writes the encoded file to its standard output. For example, to encode the file emacs.tar.gz from your ~/tarfiles directory and store it in a file named emacs.tar.gz.uu:

% uuencode emacs.tar.gz ~/tarfiles/emacs.tar.gz > emacs.tar.gz.uu

You can then insert emacs.tar.gz.uu into a mail message and send it to someone. Of course, the ASCII-only encoding takes more space than the original binary format. The encoded file will be about one-third larger.[64]

[64]If so, why bother gzipping? Why not forget about both gzip and uuencode? Well, you can't. Remember that tar files are binary files to start with, even if every file in the archive is an ASCII text file. You'd need to uuencode a file before mailing it, anyway, so you'd still pay the 33 percent size penalty that uuencode incurs. Using gzip minimizes the damage.

If you'd rather, you can combine the steps above into one pipeline. Given only one command-line argument (the name of the file for the begin line), uuencode will read its standard input. Instead of creating the ~/tarfiles/emacs.tar.gz, making a second uuencoded file, then mailing that file, you can give tar the "filename" so it writes to its standard output. That feeds the archive down the pipe:[65]

[65]With GNU tar, you can use tar czf - emacs | uuencode .... That's not the point of this example, though. We're just showing how to uuencode some arbitrary data.

mail Section 1.21

% tar cf - emacs | gzip | uuencode emacs.tar.gz | \
    mail -s "uuencoded emacs file" [email protected]

What happens when you receive a uuencoded, compressed tar file? The same thing, in reverse. You'll get a mail message that looks something like this:

From: [email protected]
To: [email protected]
Subject: uuencoded emacs file

begin 644 emacs.tar.gz
M+DQ0"D%L;"!O9B!T:&5S92!P<F]B;&5M<R!C86X@8F4@<V]L=F5D(&)Y(")L
M:6YK<RPB(&$@;65C:&%N:7-M('=H:6-H"F%L;&]W<R!A(&9I;&4@=&\@:&%V
M92!T=V\@;W(@;6]R92!N86UE<RX@(%5.25@@<')O=FED97,@='=O(&1I9F9E
M<F5N= IK:6YD<R!O9B!L:6YK<SH*+DQS($(*+DQI"EQF0DAA<F0@;&EN:W-<
   ...
end

So you save the message in a file, complete with headers. Let's say you call this file mailstuff. How do you get the original files back? Use the following sequence of commands:

% uudecode mailstuff
% gunzip emacs.tar.gz
% tar xf emacs.tar

The uudecode command searches through the file, skipping From:, etc., until it sees its special begin line; it decodes the rest of the file (until the corresponding end line) and creates the file emacs.tar.gz. Then gunzip recreates your original tar file, and tar xf extracts the individual files from the archive.

Again, though, you'll be better off using MIME encoding whenever you can.

21.12.2. MIME Encoding

When MIME (Multipurpose Internet Mail Extensions) was designed in the early 1990s, one main goal was robust email communications. That meant coming up with a mail encoding scheme that would work on all platforms and get through all mail transmission paths.

Some text is "mostly ASCII": for instance, it's in a language like German or French that uses many ASCII characters plus some eight-bit characters (characters with a octal value greater than 177). The MIME standard allows that text to be minimally encoded in a way that it can be read fairly well without decoding: the quoted-printable encoding. Other text is full binary -- either not designed for humans to read, or so far from ASCII that an ASCII representation would be pointless. In that case, you'll want to use the base64 encoding.

Figure Go to http://examples.oreilly.com/upt3 for more information on: mimencode, mailto

Most modern email programs automatically MIME-encode files. Unfortunately, some aren't too smart about it. The Metamail utilities come with a utility called mimencode (also named mmencode) for encoding and decoding MIME formats. Another Metamail utility, mailto, encodes and sends MIME messages directly -- but let's use mimencode, partly because of the extra control it gives you.

By default, mimencode reads text from standard input, uses a base64 encoding, and writes the encoded text to standard output. If you add the -q option, mimencode uses quoted-printable encoding instead.

Unlike uuencoded messages, which contain the filename in the message body, MIME-encoded messages need information in the message header (the lines "To:", "From:", etc.). The mail utility (except an older version) doesn't let you make a message header. So let's do it directly: create a mail header with cat > (Section 11.2), create a mail body with mimencode, and send it using a common system mail transfer agent, sendmail. (You could automate this with a script, of course, but we're just demonstrating.) The MIME standard header formats are still evolving; we'll use a simple set of header fields that should do the job. Here's the setup. Let's do it first in three steps, using temporary files:

$ cat > header
From: [email protected]
To: [email protected]
Subject: base64-encoded smallfile
MIME-Version: 1.0
Content-Type: application/octet-stream; name="smallfile.tar.gz"
Content-Transfer-Encoding: base64

CTRL-d
$ tar cf - smallfile | gzip | mimencode > body
$ cat header body | /usr/lib/sendmail -t

The cat > command lets me create the header file by typing it in at the terminal; I could have used a text editor instead. One important note: the header must end with a blank line. The second command creates the body file. The third command uses cat to output the header, then the body; the message we've built is piped to sendmail, whose -t option tells it to read the addresses from the message header. You should get a message something like this:

Date: Wed, 22 Nov 2000 11:46:53 -0700
Message-Id: <[email protected]>
From: [email protected]
To: [email protected]
Subject: base64-encoded smallfile
MIME-Version: 1.0
Content-Type: application/octet-stream; name="smallfile.tar.gz"
Content-Transfer-Encoding: base64

H4sIACj6GzoAA+1Z21YbRxb1c39FWcvBMIMu3A0IBWxDzMTYDuBgrxU/lKSSVHF3V6erGiGv
rPn22edU3wRIecrMPLgfEGpVV53LPvtcOktcW6au3dnZ2mrZcfTkb7g6G53O7vb2k06ns7G3
06HPzt7uDn/Sra1N/L+32dnd29ve3tjD+s3Nna0novN3CHP/yqyTqRBPfk+U+rpknUnlf0Oc
  ...

Your mail client may be able to extract that file directly. You also can use mimencode -u. But mimencode doesn't know about mail headers, so you should strip off the header first. The behead (Section 21.5) script can do that. For instance, if you've saved the mail message in a file msg:

$ behead msg | mimencode -u > smallfile.tar.gz

Extract (Section 39.2) smallfile.tar.gz and compare it to your original smallfile (maybe with cmp). They should be identical.

If you're planning to do this often, it's important to understand how to form an email header and body properly. For more information, see relevant Internet RFCs (standards documents) and O'Reilly's Programming Internet Email by David Wood.

--JP and ML



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.