start page | rating of books | rating of authors | reviews | copyrights

Unix Power ToolsUnix Power ToolsSearch this book

32.14. Regular Expressions: Potential Problems

Before I discuss the extensions that extended expressions (Section 32.15) offer, I want to mention two potential problem areas.

The \< and \> characters were introduced in the vi editor. The other programs didn't have this ability at that time. Also, the \{min,max\} modifier is new, and earlier utilities didn't have this ability. This makes it difficult for the novice user of regular expressions, because it seems as if each utility has a different convention. Sun has retrofitted the newest regular expression library to all of their programs, so they all have the same ability. If you try to use these newer features on other vendors' machines, you might find they don't work the same way.

The other potential point of confusion is the extent of the pattern matches (Section 32.17). Regular expressions match the longest possible pattern. That is, the regular expression A.*B matches AAB as well as AAAABBBBABCCCCBBBAAAB. This doesn't cause many problems using grep, because an oversight in a regular expression will just match more lines than desired. If you use sed, and your patterns get carried away, you may end up deleting or changing more than you want to. Perl answers this problem by defining a variety of "greedy" and "non-greedy" regular expressions, which allow you to specify which behavior you want. See the perlre(1) manual page for details.

-- BB



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.