The operations of sed's most difficult commands -- hold (h or H), get (g or G), and exchange (x) -- can be explained, somewhat fancifully, in terms of an extremely deliberate medieval scrivener or amanuensis toiling to make a copy of a manuscript. His work is bound by several spatial restrictions: the original manuscript is displayed in one room; the set of instructions for copying the manuscript are stored in a middle room; and the quill, ink, and folio are set up in yet another room. The original manuscript and the set of instructions are written in stone and cannot be moved about. The dutiful scrivener, being sounder of body than mind, is able to make a copy by going from room to room, working on only one line at a time. Entering the room where the original manuscript is, he removes from his robes a scrap of paper to take down the first line of the manuscript. Then he moves to the room containing the list of editing instructions. He reads each instruction to see if it applies to the single line he has scribbled down.
Each instruction, written in special notation, consists of two parts: a pattern and a procedure. The scrivener reads the first instruction and checks the pattern against his line. If there is no match, he doesn't have to worry about the procedure, so he goes to the next instruction. If he finds a match, the scrivener follows the action or actions specified in the procedure.
He makes the edit on his piece of paper before trying to match the pattern in the next instruction. Remember, the scrivener has to read through a series of instructions, and he reads all of them, not just the first instruction that matches the pattern. Because he makes his edits as he goes, he is always trying to match the latest version against the next pattern; he doesn't remember the original line.
When he gets to the bottom of the list of instructions, and has made any edits that were necessary on his piece of paper, he goes into the next room to copy out the line. (He doesn't need to be told to print out the line.) After that is done, he returns to the first room and takes down the next line on a new scrap of paper. When he goes to the second room, once again he reads every instruction from first to last before leaving.
This is what he normally does, that is, unless he is told otherwise. For instance, before he starts, he can be told not to write out every line (the -n option). In this case, he must wait for an instruction that tells him to print (p ). If he does not get that instruction, he throws away his piece of paper and starts over. By the way, regardless of whether or not he is told to write out the line, he always gets to the last instruction on the list.
Let's look at other kinds of instructions the scrivener has to interpret. First of all, an instruction can have zero, one, or two patterns specified:
If no pattern is specified, the same procedure is followed for each line.
If there is only one pattern, he will follow the procedure for any line matching the pattern.
If a pattern is followed by a !, the procedure is followed for all lines that do not match the pattern.
If two patterns are specified, the actions described in the procedure are performed on the first matching line and all succeeding lines until a line matches the second pattern.
The scrivener can work on only one line at a time, so you might wonder how he handles a range of lines. Each time he goes through the instructions, he tries to match only the first of two patterns. Now, after he has found a line that matches the first pattern, each time through with a new line he tries to match the second pattern. He interprets the second pattern as pattern!, so that the procedure is followed only if there is no match. When the second pattern is matched, he starts looking again for the first pattern.
Each procedure contains one or more commands or actions. Remember, if a pattern is specified with a procedure, the pattern must be matched before the procedure is executed. We have already shown many of the usual commands that are similar to other editing commands. However, there are several highly unusual commands.
For instance, the N command tells the scrivener to go, right now, and get another line, adding it to the same piece of paper. The scrivener can be instructed to "hold" on to a single piece of scrap paper. The h command tells him to make a copy of the line on another piece of paper and put it in his pocket. The x command tells him to exchange the extra piece of paper in his pocket with the one in his hand. The g command tells him to throw out the paper in his hand and replace it with the one in his pocket. The G command tells him to append the line he is holding to the paper in front of him. If he encounters a d command, he throws out the scrap of paper and begins again at the top of the list of instructions. A D command has effect when he has been instructed to append two lines on his piece of paper. The D command tells him to delete the first of those lines.
If you want the analogy converted back to computers, the first and last rooms in this medieval manor are standard input and standard output. Thus, the original file is never changed. The line on the scrivener's piece of scrap paper is in the pattern space ; the line on the piece of paper that he holds in his pocket is in the hold space. The hold space allows you to retain a duplicate of a line while you change the original in the pattern space.
Section 34.18 shows a practical application of the scrivener's work, a sed program that searches for a particular phrase that might be split across two lines.
-- DD
Copyright © 2003 O'Reilly & Associates. All rights reserved.