[Top] [Contents] [Index] [ ? ]

The GNU Awk User's Guide

This file documents awk, a program that you can use to select particular records in a file and perform operations upon them.

This is Edition 3 of GAWK: Effective AWK Programming: A User's Guide for GNU Awk, for the 3.1.0 version of the GNU implementation of AWK.

Foreword  Some nice words about this Web page.
Preface  What this Web page is about; brief history and acknowledgments.
2. Getting Started with awk  A basic introduction to using
                                   awk. How to run an awk
                                   program. Command-line syntax.
3. Regular Expressions  All about matching things using regular expressions.
4. Reading Input Files  How to read files and manipulate fields.
5. Printing Output  How to print using awk. Describes the print and printf statements. Also describes redirection of output.
6. Expressions  Expressions are the basic building blocks of statements.
7. Patterns, Actions, and Variables  Overviews of patterns and actions.
8. Arrays in awk  The description and use of arrays. Also includes array-oriented control statements.
9. Functions  Built-in and user-defined functions.
10. Internationalization with gawk  Getting gawk to speak your language.
11. Advanced Features of gawk  Stuff for advanced users, specific to
                                   gawk.
12. Running awk and gawk  How to run gawk.
13. A Library of awk Functions  
14. Practical awk Programs  Many awk programs with complete explanations.
A. The Evolution of the awk Language  The evolution of the awk language.
B. Installing gawk  Installing gawk under various operating systems.
C. Implementation Notes  Notes about gawk extensions and possible future work.
D. Basic Programming Concepts  A very quick intoduction to programming concepts.
Glossary  An explanation of some unfamiliar terms.
GNU General Public License  Your right to copy and distribute
                                   gawk.
GNU Free Documentation License  The license for this Web page.
Index  Concept and Variable Index.

History of awk and gawk  The history of gawk and
                                   awk.
1.0 A Rose by Any Other Name  What name to use to find awk.
1.1 Using This Book  Using this Web page. Includes sample input files that you can use.
1.2 Typographical Conventions  
The GNU Project and This Book  Brief history of the GNU project and this Web page.
How to Contribute  Helping to save the world.
Acknowledgments  
2.1 How to Run awk Programs  How to run gawk programs; includes command-line syntax.
2.1.1 One-Shot Throw-Away awk Programs  Running a short throw-away awk program.
2.1.2 Running awk Without Input Files  Using no input files (input from terminal instead).
2.1.3 Running Long Programs  Putting permanent awk programs in files.
2.1.4 Executable awk Programs  Making self-contained awk programs.
2.1.5 Comments in awk Programs  Adding documentation to gawk programs.
2.1.6 Shell Quoting Issues  More discussion of shell quoting issues.
2.2 Data Files for the Examples  Sample data files for use in the
                                   awk programs illustrated in this
                                   Web page.
2.3 Some Simple Examples  A very simple example.
2.4 An Example with Two Rules  A less simple one-line example using two rules.
2.5 A More Complex Example  A more complex example.
2.6 awk Statements Versus Lines  Subdividing or combining statements into lines.
2.7 Other Features of awk  
2.8 When to Use awk  When to use gawk and when to use other things.
3.1 How to Use Regular Expressions  
3.2 Escape Sequences  How to write non-printing characters.
3.3 Regular Expression Operators  
3.4 Using Character Lists  What can go between `[...]'.
3.5 gawk-Specific Regexp Operators  Operators specific to GNU software.
3.6 Case Sensitivity in Matching  How to do case-insensitive matching.
3.7 How Much Text Matches?  How much text matches.
3.8 Using Dynamic Regexps  
4.1 How Input Is Split into Records  Controlling how data is split into records.
4.2 Examining Fields  An introduction to fields.
4.3 Non-Constant Field Numbers  Non-constant Field Numbers.
4.4 Changing the Contents of a Field  
4.5 Specifying How Fields Are Separated  The field separator and how to change it.
4.5.1 Using Regular Expressions to Separate Fields  Using regexps as the field separator.
4.5.2 Making Each Character a Separate Field  Making each character a separate field.
4.5.3 Setting FS from the Command Line  Setting FS from the command-line.
4.5.4 Field Splitting Summary  Some final points and a summary table.
4.6 Reading Fixed-Width Data  Reading constant width data.
4.7 Multiple-Line Records  Reading multi-line records.
4.8 Explicit Input with getline  Reading files under explicit program control using the getline function.
4.8.1 Using getline with No Arguments  Using getline with no arguments.
4.8.2 Using getline into a Variable  Using getline into a variable.
4.8.3 Using getline from a File  Using getline from a file.
4.8.4 Using getline into a Variable from a File  Using getline into a variable from a file.
4.8.5 Using getline from a Pipe  Using getline from a pipe.
4.8.6 Using getline into a Variable from a Pipe  Using getline into a variable from a pipe.
4.8.7 Using getline from a Coprocess  Using getline from a coprocess.
4.8.8 Using getline into a Variable from a Coprocess  Using getline into a variable from a coprocess.
4.8.9 Points About getline to Remember  Important things to know about
                                   getline.
4.8.10 Summary of getline Variants  
5.1 The print Statement  The print statement.
5.2 Examples of print Statements  Simple examples of print statements.
5.3 Output Separators  The output separators and how to change them.
5.4 Controlling Numeric Output with print  Controlling Numeric Output With
                                   print.
5.5 Using printf Statements for Fancier Printing  The printf statement.
5.5.1 Introduction to the printf Statement  Syntax of the printf statement.
5.5.2 Format-Control Letters  Format-control letters.
5.5.3 Modifiers for printf Formats  Format-specification modifiers.
5.5.4 Examples Using printf  Several examples.
5.6 Redirecting Output of print and printf  How to redirect output to multiple files and pipes.
5.7 Special File Names in gawk  File name interpretation in gawk.
                                   gawk allows access to inherited
                                   file descriptors.
5.7.1 Special Files for Standard Descriptors  Special files for I/O.
5.7.2 Special Files for Process-Related Information  Special files for process information.
5.7.3 Special Files for Network Communications  Special files for network communications.
5.7.4 Special File Name Caveats  Things to watch out for.
5.8 Closing Input and Output Redirections  Closing Input and Output Files and Pipes.
6.1 Constant Expressions  String, numeric and regexp constants.
6.1.1 Numeric and String Constants  Numeric and string constants.
6.1.2 Octal and Hexadecimal Numbers  What are octal and hex numbers.
6.1.3 Regular Expression Constants  Regular Expression constants.
6.2 Using Regular Expression Constants  When and how to use a regexp constant.
6.3 Variables  Variables give names to values for later use.
6.3.1 Using Variables in a Program  Using variables in your programs.
6.3.2 Assigning Variables on the Command Line  Setting variables on the command-line and a summary of command-line syntax. This is an advanced method of input.
6.4 Conversion of Strings and Numbers  The conversion of strings to numbers and vice versa.
6.5 Arithmetic Operators  Arithmetic operations (`+', `-', etc.)
6.6 String Concatenation  Concatenating strings.
6.7 Assignment Expressions  Changing the value of a variable or a field.
6.8 Increment and Decrement Operators  Incrementing the numeric value of a variable.
6.9 True and False in awk  What is "true" and what is "false".
6.10 Variable Typing and Comparison Expressions  How variables acquire types and how this affects comparison of numbers and strings with `<', etc.
6.11 Boolean Expressions  Combining comparison expressions using boolean operators `||' ("or"),
                                   `&&' ("and") and `!' ("not").
6.12 Conditional Expressions  Conditional expressions select between two subexpressions under control of a third subexpression.
6.13 Function Calls  A function call is an expression.
6.14 Operator Precedence (How Operators Nest)  How various operators nest.
7.1 Pattern Elements  What goes into a pattern.
7.1.1 Regular Expressions as Patterns  Using regexps as patterns.
7.1.2 Expressions as Patterns  Any expression can be used as a pattern.
7.1.3 Specifying Record Ranges with Patterns  Pairs of patterns specify record ranges.
7.1.4 The BEGIN and END Special Patterns  Specifying initialization and cleanup rules.
7.1.4.1 Startup and Cleanup Actions  How and why to use BEGIN/END rules.
7.1.4.2 Input/Output from BEGIN and END Rules  I/O issues in BEGIN/END rules.
7.1.5 The Empty Pattern  The empty pattern, which matches every record.
7.2 Using Shell Variables in Programs  How to use shell variables with
                                   awk.
7.3 Actions  What goes into an action.
7.4 Control Statements in Actions  Describes the various control statements in detail.
7.4.1 The if-else Statement  Conditionally execute some awk statements.
7.4.2 The while Statement  Loop until some condition is satisfied.
7.4.3 The do-while Statement  Do specified action while looping until some condition is satisfied.
7.4.4 The for Statement  Another looping statement, that provides initialization and increment clauses.
7.4.5 The break Statement  Immediately exit the innermost enclosing loop.
7.4.6 The continue Statement  Skip to the end of the innermost enclosing loop.
7.4.7 The next Statement  Stop processing the current input record.
7.4.8 Using gawk's nextfile Statement  Stop processing the current file.
7.4.9 The exit Statement  Stop execution of awk.
7.5 Built-in Variables  Summarizes the built-in variables.
7.5.1 Built-in Variables That Control awk  Built-in variables that you change to control awk.
7.5.2 Built-in Variables That Convey Information  Built-in variables where awk gives you information.
7.5.3 Using ARGC and ARGV  Ways to use ARGC and ARGV.
8.1 Introduction to Arrays  
8.2 Referring to an Array Element  How to examine one element of an array.
8.3 Assigning Array Elements  How to change an element of an array.
8.4 Basic Array Example  Basic Example of an Array
8.5 Scanning All Elements of an Array  A variation of the for statement. It loops through the indices of an array's existing elements.
8.6 The delete Statement  The delete statement removes an element from an array.
8.7 Using Numbers to Subscript Arrays  How to use numbers as subscripts in
                                   awk.
8.8 Using Uninitialized Variables as Subscripts  Using Uninitialized variables as subscripts.
8.9 Multidimensional Arrays  Emulating multidimensional arrays in
                                   awk.
8.10 Scanning Multidimensional Arrays  Scanning multidimensional arrays.
8.11 Sorting Array Values and Indices with gawk  Sorting array values and indices.
9.1 Built-in Functions  Summarizes the built-in functions.
9.1.1 Calling Built-in Functions  How to call built-in functions.
9.1.2 Numeric Functions  Functions that work with numbers, including
                                   intsin and rand.
9.1.3 String Manipulation Functions  Functions for string manipulation, such as
                                   splitmatch and
                                   sprintf.
9.1.3.1 More About `\' and `&' with sub, gsub, and gensub  More than you want to know about `\' and `&' with sub, gsub, and gensub.
9.1.4 Input/Output Functions  Functions for files and shell commands.
9.1.5 Using gawk's Timestamp Functions  Functions for dealing with timestamps.
9.1.6 Using gawk's Bit Manipulation Functions  Functions for bitwise operations.
9.1.7 Using gawk's String Translation Functions  Functions for string translation.
9.2 User-Defined Functions  Describes User-defined functions in detail.
9.2.1 Function Definition Syntax  How to write definitions and what they mean.
9.2.2 Function Definition Examples  An example function definition and what it does.
9.2.3 Calling User-Defined Functions  Things to watch out for.
9.2.4 The return Statement  Specifying the value a function returns.
9.2.5 Functions and Their Effect on Variable Typing  How variable types can change at runtime.
10.1 Internationalization and Localization  
10.2 GNU gettext  How GNU gettext works.
10.3 Internationalizing awk Programs  Features for the programmer.
10.4 Translating awk Programs  Features for the translator.
10.4.1 Extracting Marked Strings  Extracting marked strings.
10.4.2 Rearranging printf Arguments  Rearranging printf arguments.
10.4.3 awk Portability Issues  awk-level portability issues.
10.5 A Simple Internationalization Example  A simple i18n example.
10.6 gawk Can Speak Your Language  gawk is also internationalized.
11.1 Allowing Non-Decimal Input Data  Allowing non-decimal input data.
11.2 Two-Way Communications with Another Process  Two-way communications with another process.
11.3 Using gawk for Network Programming  Using gawk for network programming.
11.4 Using gawk with BSD Portals  Using gawk with BSD portals.
11.5 Profiling Your awk Programs  Profiling your awk programs.
12.1 Invoking awk  How to run awk.
12.2 Command-Line Options  Command-line options and their meanings.
12.3 Other Command-Line Arguments  Input file names and variable assignments.
12.4 The AWKPATH Environment Variable  Searching directories for awk programs.
12.5 Obsolete Options and/or Features  Obsolete Options and/or features.
12.6 Undocumented Options and Features  
12.7 Known Bugs in gawk  
13.1 Naming Library Function Global Variables  How to best name private global variables in library functions.
13.2 General Programming  Functions that are of general use.
13.2.1 Implementing nextfile as a Function  Two implementations of a nextfile function.
13.2.2 Assertions  A function for assertions in awk programs.
13.2.3 Rounding Numbers  A function for rounding if sprintf does not do it correctly.
13.2.4 The Cliff Random Number Generator  
13.2.5 Translating Between Characters and Numbers  Functions for using characters as numbers and vice versa.
13.2.6 Merging an Array into a String  A function to join an array into a string.
13.2.7 Managing the Time of Day  A function to get formatted times.
13.3 Data File Management  Functions for managing command-line data files.
13.3.1 Noting Data File Boundaries  A function for handling data file transitions.
13.3.2 Rereading the Current File  A function for rereading the current file.
13.3.3 Checking for Readable Data Files  Checking that data files are readable.
13.3.4 Treating Assignments as File Names  Treating assignments as file names.
13.4 Processing Command-Line Options  A function for processing command-line arguments.
13.5 Reading the User Database  Functions for getting user information.
13.6 Reading the Group Database  Functions for getting group information.
14.1 Running the Example Programs  How to run these examples.
14.2 Reinventing Wheels for Fun and Profit  Clones of common utilities.
14.2.1 Cutting out Fields and Columns  The cut utility.
14.2.2 Searching for Regular Expressions in Files  The egrep utility.
14.2.3 Printing out User Information  The id utility.
14.2.4 Splitting a Large File into Pieces  The split utility.
14.2.5 Duplicating Output into Multiple Files  The tee utility.
14.2.6 Printing Non-Duplicated Lines of Text  The uniq utility.
14.2.7 Counting Things  The wc utility.
14.3 A Grab Bag of awk Programs  Some interesting awk programs.
14.3.1 Finding Duplicated Words in a Document  Finding duplicated words in a document.
14.3.2 An Alarm Clock Program  An alarm clock.
14.3.3 Transliterating Characters  A program similar to the tr utility.
14.3.4 Printing Mailing Labels  Printing mailing labels.
14.3.5 Generating Word Usage Counts  A program to produce a word usage count.
14.3.6 Removing Duplicates from Unsorted Text  Eliminating duplicate entries from a history file.
14.3.7 Extracting Programs from Texinfo Source Files  Pulling out programs from Texinfo source files.
14.3.8 A Simple Stream Editor  
14.3.9 An Easy Way to Use Library Functions  A wrapper for awk that includes files.
A.1 Major Changes Between V7 and SVR3.1  The major changes between V7 and System V Release 3.1.
A.2 Changes Between SVR3.1 and SVR4  Minor changes between System V Releases 3.1 and 4.
A.3 Changes Between SVR4 and POSIX awk  New features from the POSIX standard.
A.4 Extensions in the Bell Laboratories awk  New features from the Bell Laboratories version of awk.
A.5 Extensions in gawk Not in POSIX awk  The extensions in gawk not in POSIX awk.
A.6 Major Contributors to gawk  The major contributors to gawk.
B.1 The gawk Distribution  What is in the gawk distribution.
B.1.1 Getting the gawk Distribution  How to get the distribution.
B.1.2 Extracting the Distribution  How to extract the distribution.
B.1.3 Contents of the gawk Distribution  What is in the distribution.
B.2 Compiling and Installing gawk on Unix  Installing gawk under various versions of Unix.
B.2.1 Compiling gawk for Unix  Compiling gawk under Unix.
B.2.2 Additional Configuration Options  Other compile-time options.
B.2.3 The Configuration Process  How it's all supposed to work.
B.3 Installation on Other Operating Systems  
B.3.1 Installing gawk on an Amiga  
B.3.2 Installing gawk on BeOS  
B.3.3 Installation on PC Operating Systems  Installing and Compiling gawk on MS-DOS and OS/2.
B.3.3.1 Installing a Prepared Distribution for PC Systems  Installing a prepared distribution.
B.3.3.2 Compiling gawk for PC Operating Systems  Compiling gawk for MS-DOS, Win32, and OS/2.
B.3.3.3 Using gawk on PC Operating Systems  Running gawk on MS-DOS, Win32 and OS/2.
B.3.4 How to Compile and Install gawk on VMS  Installing gawk on VMS.
B.3.4.1 Compiling gawk on VMS  How to compile gawk under VMS.
B.3.4.2 Installing gawk on VMS  How to install gawk under VMS.
B.3.4.3 Running gawk on VMS  How to run gawk under VMS.
B.3.4.4 Building and Using gawk on VMS POSIX  Alternate instructions for VMS POSIX.
B.4 Unsupported Operating System Ports  Systems whose ports are no longer supported.
B.4.1 Installing gawk on the Atari ST  
B.4.1.1 Compiling gawk on the Atari ST  Compiling gawk on Atari.
B.4.1.2 Running gawk on the Atari ST  Running gawk on Atari.
B.4.2 Installing gawk on a Tandem  
B.5 Reporting Problems and Bugs  
B.6 Other Freely Available awk Implementations  Other freely available awk implementations.
C.1 Downward Compatibility and Debugging  How to disable certain gawk extensions.
C.2 Making Additions to gawk  Making Additions To gawk.
C.2.1 Adding New Features  Adding code to the main body of
                                   gawk.
C.2.2 Porting gawk to a New Operating System  Porting gawk to a new operating system.
C.3 Adding New Built-in Functions to gawk  Adding new built-in functions to
                                   gawk.
C.3.1 A Minimal Introduction to gawk Internals  A brief look at some gawk internals.
C.3.2 Directory and File Operation Built-ins  A example of new functions.
C.3.2.1 Using chdir and stat  What the new functions will do.
C.3.2.2 C Code for chdir and stat  The code for internal file operations.
C.3.2.3 Integrating the Extensions  How to use an external extension.
C.4 Probable Future Extensions  New features that may be implemented one day.
D.1 What a Program Does  The high level view.
D.2 Data Values in a Computer  A very quick intro to data types.
D.3 Floating-Point Number Caveats  Stuff to know about floating-point numbers.

To Miriam, for making me complete.
To Chana, for the joy you bring us.
To Rivka, for the exponential increase.
To Nachum, for the added dimension.
To Malka, for the new beginning.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Foreword

Arnold Robbins and I are good friends. We were introduced 11 years ago by circumstances--and our favorite programming language, AWK. The circumstances started a couple of years earlier. I was working at a new job and noticed an unplugged Unix computer sitting in the corner. No one knew how to use it, and neither did I. However, a couple of days later it was running, and I was root and the one-and-only user. That day, I began the transition from statistician to Unix programmer.

On one of many trips to the library or bookstore in search of books on Unix, I found the gray AWK book, a.k.a. Aho, Kernighan and Weinberger, The AWK Programming Language, Addison-Wesley, 1988. AWK's simple programming paradigm--find a pattern in the input and then perform an action--often reduced complex or tedious data manipulations to few lines of code. I was excited to try my hand at programming in AWK.

Alas, the awk on my computer was a limited version of the language described in the AWK book. I discovered that my computer had "old awk" and the AWK book described "new awk." I learned that this was typical; the old version refused to step aside or relinquish its name. If a system had a new awk, it was invariably called nawk, and few systems had it. The best way to get a new awk was to ftp the source code for gawk from prep.ai.mit.edu. gawk was a version of new awk written by David Trueman and Arnold, and available under the GNU General Public License.

(Incidentally, it's no longer difficult to find a new awk. gawk ships with Linux, and you can download binaries or source code for almost any system; my wife uses gawk on her VMS box.)

My Unix system started out unplugged from the wall; it certainly was not plugged into a network. So, oblivious to the existence of gawk and the Unix community in general, and desiring a new awk, I wrote my own, called mawk. Before I was finished I knew about gawk, but it was too late to stop, so I eventually posted to a comp.sources newsgroup.

A few days after my posting, I got a friendly email from Arnold introducing himself. He suggested we share design and algorithms and attached a draft of the POSIX standard so that I could update mawk to support language extensions added after publication of the AWK book.

Frankly, if our roles had been reversed, I would not have been so open and we probably would have never met. I'm glad we did meet. He is an AWK expert's AWK expert and a genuinely nice person. Arnold contributes significant amounts of his expertise and time to the Free Software Foundation.

This book is the gawk reference manual, but at its core it is a book about AWK programming that will appeal to a wide audience. It is a definitive reference to the AWK language as defined by the 1987 Bell Labs release and codified in the 1992 POSIX Utilities standard.

On the other hand, the novice AWK programmer can study a wealth of practical programs that emphasize the power of AWK's basic idioms: data driven control-flow, pattern matching with regular expressions, and associative arrays. Those looking for something new can try out gawk's interface to network protocols via special `/inet' files.

The programs in this book make clear that an AWK program is typically much smaller and faster to develop than a counterpart written in C. Consequently, there is often a payoff to prototype an algorithm or design in AWK to get it running quickly and expose problems early. Often, the interpreted performance is adequate and the AWK prototype becomes the product.

The new pgawk (profiling gawk), produces program execution counts. I recently experimented with an algorithm that for n lines of input, exhibited ~ C n^2 performance, while theory predicted ~ C n log n behavior. A few minutes poring over the `awkprof.out' profile pinpointed the problem to a single line of code. pgawk is a welcome addition to my programmer's toolbox.

Arnold has distilled over a decade of experience writing and using AWK programs, and developing gawk, into this book. If you use AWK or want to learn how, then read this book.

 
Michael Brennan
Author of mawk


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Preface

Several kinds of tasks occur repeatedly when working with text files. You might want to extract certain lines and discard the rest. Or you may need to make changes wherever certain patterns appear, but leave the rest of the file alone. Writing single-use programs for these tasks in languages such as C, C++ or Pascal is time-consuming and inconvenient. Such jobs are often easier with awk. The awk utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs.

The GNU implementation of awk is called gawk; it is fully compatible with the System V Release 4 version of awk. gawk is also compatible with the POSIX specification of the awk language. This means that all properly written awk programs should work with gawk. Thus, we usually don't distinguish between gawk and other awk implementations.

Using awk allows you to:

In addition, gawk provides facilities that make it easy to:

This Web page teaches you about the awk language and how you can use it effectively. You should already be familiar with basic system commands, such as cat and ls,(1) as well as basic shell facilities, such as Input/Output (I/O) redirection and pipes.

Implementations of the awk language are available for many different computing environments. This Web page, while describing the awk language in general, also describes the particular implementation of awk called gawk (which stands for "GNU awk"). gawk runs on a broad range of Unix systems, ranging from 80386 PC-based computers, up through large-scale systems, such as Crays. gawk has also been ported to Mac OS X, MS-DOS, Microsoft Windows (all versions) and OS/2 PC's, Atari and Amiga micro-computers, BeOS, Tandem D20, and VMS.

History of awk and gawk  The history of gawk and
                                awk.
1.0 A Rose by Any Other Name  What name to use to find awk.
1.1 Using This Book  Using this Web page. Includes sample input files that you can use.
1.2 Typographical Conventions  
The GNU Project and This Book  Brief history of the GNU project and this Web page.
How to Contribute  Helping to save the world.
Acknowledgments  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

History of awk and gawk

Recipe For A Programming Language

1 part egrep 1 part snobol
2 parts ed 3 parts C

Blend all parts well using lex and yacc. Document minimally and release.

After eight years, add another part egrep and two more parts C. Document very well and release.

The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger and Brian W. Kernighan. The original version of awk was written in 1977 at AT&T Bell Laboratories. In 1985, a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions. This new version became widely available with Unix System V Release 3.1 (SVR3.1). The version in SVR4 added some new features and cleaned up the behavior in some of the "dark corners" of the language. The specification for awk in the POSIX Command Language and Utilities standard further clarified the language. Both the gawk designers and the original Bell Laboratories awk designers provided feedback for the POSIX specification.

Paul Rubin wrote the GNU implementation, gawk, in 1986. Jay Fenlason completed it, with advice from Richard Stallman. John Woods contributed parts of the code as well. In 1988 and 1989, David Trueman, with help from me, thoroughly reworked gawk for compatibility with the newer awk. Circa 1995, I became the primary maintainer. Current development focuses on bug fixes, performance improvements, standards compliance, and occasionally, new features.

In May of 1997, Jürgen Kahrs felt the need for network access from awk, and with a little help from me, set about adding features to do this for gawk. At that time, he also wrote the bulk of TCP/IP Internetworking with gawk (a separate document, available as part of the gawk distribution). His code finally became part of the main gawk distribution with gawk version 3.1.

See section Major Contributors to gawk, for a complete list of those who made important contributions to gawk.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.0 A Rose by Any Other Name

The awk language has evolved over the years. Full details are provided in The Evolution of the awk Language. The language described in this Web page is often referred to as "new awk" (nawk).

Because of this, many systems have multiple versions of awk. Some systems have an awk utility that implements the original version of the awk language and a nawk utility for the new version. Others have an oawk for the "old awk" language and plain awk for the new one. Still others only have one version, which is usually the new one.(2)

All in all, this makes it difficult for you to know which version of awk you should run when writing your programs. The best advice I can give here is to check your local documentation. Look for awk, oawk, and nawk, as well as for gawk. It is likely that you already have some version of new awk on your system, which is what you should use when running your programs. (Of course, if you're reading this Web page, chances are good that you have gawk!)

Throughout this Web page, whenever we refer to a language feature that should be available in any complete implementation of POSIX awk, we simply use the term awk. When referring to a feature that is specific to the GNU implementation, we use the term gawk.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1 Using This Book

Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing.
Dick Brandon

The term awk refers to a particular program as well as to the language you use to tell this program what to do. When we need to be careful, we call the program "the awk utility" and the language "the awk language." This Web page explains both the awk language and how to run the awk utility. The term awk program refers to a program written by you in the awk programming language.

Primarily, this Web page explains the features of awk, as defined in the POSIX standard. It does so in the context of the gawk implementation. While doing so, it also attempts to describe important differences between gawk and other awk implementations.(3) Finally, any gawk features that are not in the POSIX standard for awk are noted.

This Web page has the difficult task of being both a tutorial and a reference. If you are a novice, feel free to skip over details that seem too complex. You should also ignore the many cross references; they are for the expert user and for the online Info version of the document.

There are subsections labelled as Advanced Notes scattered throughout the Web page. They add a more complete explanation of points that are relevant, but not likely to be of interest on first reading. All appear in the index, under the heading "advanced notes."

Most of the time, the examples use complete awk programs. In some of the more advanced sections, only the part of the awk program that illustrates the concept currently being described is shown.

While this Web page is aimed principally at people who have not been exposed to awk, there is a lot of information here that even the awk expert should find useful. In particular, the description of POSIX awk and the example programs in A Library of awk Functions, and in Practical awk Programs, should be of interest.

Getting Started with awk, provides the essentials you need to know to begin using awk.

Regular Expressions, introduces regular expressions in general, and in particular the flavors supported by POSIX awk and gawk.

Reading Input Files, describes how awk reads your data. It introduces the concepts of records and fields, as well as the getline command. I/O redirection is first described here.

Printing Output, describes how awk programs can produce output with print and printf.

6. Expressions, describes expressions, which are the basic building blocks for getting most things done in a program.

Patterns Actions and Variables, describes how to write patterns for matching records, actions for doing something when a record is matched, and the built-in variables awk and gawk use.

Arrays in awk, covers awk's one-and-only data structure: associative arrays. Deleting array elements and whole arrays is also described, as well as sorting arrays in gawk.

9. Functions, describes the built-in functions awk and gawk provide for you, as well as how to define your own functions.

Internationalization with gawk, describes special features in gawk for translating program messages into different languages at runtime.

Advanced Features of gawk, describes a number of gawk-specific advanced features. Of particular note are the abilities to have two-way communications with another process, perform TCP/IP networking, and profile your awk programs.

Running awk and gawk, describes how to run gawk, the meaning of its command-line options, and how it finds awk program source files.

A Library of awk Functions, and Practical awk Programs, provide many sample awk programs. Reading them allows you to see awk being used for solving real problems.

The Evolution of the awk Language, describes how the awk language has evolved since it was first released to present. It also describes how gawk has acquired features over time.

Installing gawk, describes how to get gawk, how to compile it under Unix, and how to compile and use it on different non-Unix systems. It also describes how to report bugs in gawk and where to get three other freely available implementations of awk.

Implementation Notes, describes how to disable gawk's extensions, as well as how to contribute new code to gawk, how to write extension libraries, and some possible future directions for gawk development.

Basic Programming Concepts, provides some very cursory background material for those who are completely unfamiliar with computer programming. Also centralized there is a discussion of some of the issues involved in using floating-point numbers.

The Glossary, defines most, if not all, the significant terms used throughout the book. If you find terms that you aren't familiar with, try looking them up.

GNU General Public License, and GNU Free Documentation License, present the licenses that cover the gawk source code, and this Web page, respectively.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.2 Typographical Conventions

This Web page is written using Texinfo, the GNU documentation formatting language. A single Texinfo source file is used to produce both the printed and online versions of the documentation. This section briefly documents the typographical conventions used in Texinfo.

Examples you would type at the command-line are preceded by the common shell primary and secondary prompts, `$' and `>'. Output from the command is preceded by the glyph "-|". This typically represents the command's standard output. Error messages, and other output on the command's standard error, are preceded by the glyph "error-->". For example:

 
$ echo hi on stdout
-| hi on stdout
$ echo hello on stderr 1>&2
error--> hello on stderr

Characters that you type at the keyboard look like this. In particular, there are special characters called "control characters." These are characters that you type by holding down both the CONTROL key and another key, at the same time. For example, a Ctrl-d is typed by first pressing and holding the CONTROL key, next pressing the d key and finally releasing both keys.

Dark Corners

Dark corners are basically fractal -- no matter how much you illuminate, there's always a smaller but darker one.
Brian Kernighan

Until the POSIX standard (and The Gawk Manual), many features of awk were either poorly documented or not documented at all. Descriptions of such features (often called "dark corners") are noted in this Web page with "(d.c.)". They also appear in the index under the heading "dark corner."

As noted by the opening quote, though, any coverage of dark corners is, by definition, something that is incomplete.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

The GNU Project and This Book

Software is like sex: it's better when it's free.
Linus Torvalds

The Free Software Foundation (FSF) is a non-profit organization dedicated to the production and distribution of freely distributable software. It was founded by Richard M. Stallman, the author of the original Emacs editor. GNU Emacs is the most widely used version of Emacs today.

The GNU(4) Project is an ongoing effort on the part of the Free Software Foundation to create a complete, freely distributable, POSIX-compliant computing environment. The FSF uses the "GNU General Public License" (GPL) to ensure that their software's source code is always available to the end user. A copy of the GPL is included in this Web page for your reference (see section GNU General Public License). The GPL applies to the C language source code for gawk. To find out more about the FSF and the GNU Project online, see the GNU Project's home page. This Web page may also be read from their web site.

A shell, an editor (Emacs), highly portable optimizing C, C++, and Objective-C compilers, a symbolic debugger and dozens of large and small utilities (such as gawk), have all been completed and are freely available. The GNU operating system kernel (the HURD), has been released but is still in an early stage of development.

Until the GNU operating system is more fully developed, you should consider using GNU/Linux, a freely distributable, Unix-like operating system for Intel 80386, DEC Alpha, Sun SPARC, IBM S/390, and other systems.(5) There are many books on GNU/Linux. One that is freely available is Linux Installation and Getting Started, by Matt Welsh. Many GNU/Linux distributions are often available in computer stores or bundled on CD-ROMs with books about Linux. (There are three other freely available, Unix-like operating systems for 80386 and other systems: NetBSD, FreeBSD, and OpenBSD. All are based on the 4.4-Lite Berkeley Software Distribution, and they use recent versions of gawk for their versions of awk.)

The Web page you are reading now is actually free--at least, the information in it is free to anyone. The machine readable source code for the Web page comes with gawk; anyone may take this Web page to a copying machine and make as many copies of it as they like. (Take a moment to check the Free Documentation License; see GNU Free Documentation License.)

Although you could just print it out yourself, bound books are much easier to read and use. Furthermore, the proceeds from sales of this book go back to the FSF to help fund development of more free software.

The Web page itself has gone through a number of previous editions. Paul Rubin wrote the very first draft of The GAWK Manual; it was around 40 pages in size. Diane Close and Richard Stallman improved it, yielding a version that was around 90 pages long and barely described the original, "old" version of awk.

I started working with that version in the fall of 1988. As work on it progressed, the FSF published several preliminary versions (numbered 0.x). In 1996, Edition 1.0 was released with gawk 3.0.0. The FSF published the first two editions under the title The GNU Awk User's Guide.

This edition maintains the basic structure of Edition 1.0, but with significant additional material, reflecting the host of new features in gawk version 3.1. Of particular note is Sorting Array Values and Indices with gawk, as well as Using gawk's Bit Manipulation Functions, Internationalization with gawk, and also Advanced Features of gawk, and Adding New Built-in Functions to gawk.

GAWK: Effective AWK Programming will undoubtedly continue to evolve. An electronic version comes with the gawk distribution from the FSF. If you find an error in this Web page, please report it! See section Reporting Problems and Bugs, for information on submitting problem reports electronically, or write to me in care of the publisher.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

How to Contribute

As the maintainer of GNU awk, I am starting a collection of publicly available awk programs. For more information, see ftp://ftp.freefriends.org/arnold/Awkstuff. If you have written an interesting awk program, or have written a gawk extension that you would like to share with the rest of the world, please contact me (arnold@gnu.org). Making things available on the Internet helps keep the gawk distribution down to manageable size.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Acknowledgments

The initial draft of The GAWK Manual had the following acknowledgments:

Many people need to be thanked for their assistance in producing this manual. Jay Fenlason contributed many ideas and sample programs. Richard Mlynarik and Robert Chassell gave helpful comments on drafts of this manual. The paper A Supplemental Document for awk by John W. Pierce of the Chemistry Department at UC San Diego, pinpointed several issues relevant both to awk implementation and to this manual, that would otherwise have escaped us.

I would like to acknowledge Richard M. Stallman, for his vision of a better world and for his courage in founding the FSF and starting the GNU project.

The following people (in alphabetical order) provided helpful comments on various versions of this book, up to and including this edition. Rick Adams, Nelson H.F. Beebe, Karl Berry, Dr. Michael Brennan, Rich Burridge, Claire Coutier, Diane Close, Scott Deifik, Christopher ("Topher") Eliot, Jeffrey Friedl, Dr. Darrel Hankerson, Michal Jaegermann, Dr. Richard J. LeBlanc, Michael Lijewski, Pat Rankin, Miriam Robbins, Mary Sheehan, and Chuck Toporek.

Robert J. Chassell provided much valuable advice on the use of Texinfo. He also deserves special thanks for convincing me not to title this Web page How To Gawk Politely. Karl Berry helped significantly with the TeX part of Texinfo.

I would like to thank Marshall and Elaine Hartholz of Seattle and Dr. Bert and Rita Schreiber of Detroit for large amounts of quiet vacation time in their homes, which allowed me to make significant progress on this Web page and on gawk itself.

Phil Hughes of SSC contributed in a very important way by loaning me his laptop GNU/Linux system, not once, but twice, which allowed me to do a lot of work while away from home.

David Trueman deserves special credit; he has done a yeoman job of evolving gawk so that it performs well and without bugs. Although he is no longer involved with gawk, working with him on this project was a significant pleasure.

The intrepid members of the GNITS mailing list, and most notably Ulrich Drepper, provided invaluable help and feedback for the design of the internationalization features.

Nelson Beebe, Martin Brown, Scott Deifik, Darrel Hankerson, Michal Jaegermann, Jürgen Kahrs, Pat Rankin, Kai Uwe Rommel, and Eli Zaretskii (in alphabetical order) are long-time members of the gawk "crack portability team." Without their hard work and help, gawk would not be nearly the fine program it is today. It has been and continues to be a pleasure working with this team of fine people.

David and I would like to thank Brian Kernighan of Bell Laboratories for invaluable assistance during the testing and debugging of gawk, and for help in clarifying numerous points about the language. We could not have done nearly as good a job on either gawk or its documentation without his help.

Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed significant editorial help for this Web page for the 3.1 release of gawk.

I must thank my wonderful wife, Miriam, for her patience through the many versions of this project, for her proof-reading, and for sharing me with the computer. I would like to thank my parents for their love, and for the grace with which they raised and educated me. Finally, I also must acknowledge my gratitude to G-d, for the many opportunities He has sent my way, as well as for the gifts He has given me with which to take advantage of those opportunities. Arnold Robbins
Nof Ayalon
ISRAEL
March, 2001


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. Getting Started with awk

The basic function of awk is to search files for lines (or other units of text) that contain certain patterns. When a line matches one of the patterns, awk performs specified actions on that line. awk keeps processing input lines in this way until it reaches the end of the input files.

Programs in awk are different from programs in most other languages, because awk programs are data-driven; that is, you describe the data you want to work with and then what to do when you find it. Most other languages are procedural; you have to describe, in great detail, every step the program is to take. When working with procedural languages, it is usually much harder to clearly describe the data your program will process. For this reason, awk programs are often refreshingly easy to write and read.

When you run awk, you specify an awk program that tells awk what to do. The program consists of a series of rules. (It may also contain function definitions, an advanced feature that we will ignore for now. See section User-Defined Functions.) Each rule specifies one pattern to search for and one action to perform upon finding the pattern.

Syntactically, a rule consists of a pattern followed by an action. The action is enclosed in curly braces to separate it from the pattern. Newlines usually separate rules. Therefore, an awk program looks like this:

 
pattern { action }
pattern { action }
...

2.1 How to Run awk Programs  How to run gawk programs; includes command-line syntax.
2.2 Data Files for the Examples  Sample data files for use in the awk programs illustrated in this Web page.
2.3 Some Simple Examples  A very simple example.
2.4 An Example with Two Rules  A less simple one-line example using two rules.
2.5 A More Complex Example  A more complex example.
2.6 awk Statements Versus Lines  Subdividing or combining statements into lines.
2.7 Other Features of awk  
2.8 When to Use awk  When to use gawk and when to use other things.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1 How to Run awk Programs

There are several ways to run an awk program. If the program is short, it is easiest to include it in the command that runs awk, like this:

 
awk 'program' input-file1 input-file2 ...

When the program is long, it is usually more convenient to put it in a file and run it with a command like this:

 
awk -f program-file input-file1 input-file2 ...

This section discusses both mechanisms, along with several variations of each.

2.1.1 One-Shot Throw-Away awk Programs  Running a short throw-away awk program.
2.1.2 Running awk Without Input Files  Using no input files (input from terminal instead).
2.1.3 Running Long Programs  Putting permanent awk programs in files.
2.1.4 Executable awk Programs  Making self-contained awk programs.
2.1.5 Comments in awk Programs  Adding documentation to gawk programs.
2.1.6 Shell Quoting Issues  More discussion of shell quoting issues.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.1 One-Shot Throw-Away awk Programs

Once you are familiar with awk, you will often type in simple programs the moment you want to use them. Then you can write the program as the first argument of the awk command, like this:

 
awk 'program' input-file1 input-file2 ...

where program consists of a series of patterns and actions, as described earlier.

This command format instructs the shell, or command interpreter, to start awk and use the program to process records in the input file(s). There are single quotes around program so the shell won't interpret any awk characters as special shell characters. The quotes also cause the shell to treat all of program as a single argument for awk, and allow program to be more than one line long.

This format is also useful for running short or medium-sized awk programs from shell scripts, because it avoids the need for a separate file for the awk program. A self-contained shell script is more reliable because there are no other files to misplace.

Some Simple Examples, later in this chapter, presents several short, self-contained programs.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.2 Running awk Without Input Files

You can also run awk without any input files. If you type the following command line:

 
awk 'program'

awk applies the program to the standard input, which usually means whatever you type on the terminal. This continues until you indicate end-of-file by typing Ctrl-d. (On other operating systems, the end-of-file character may be different. For example, on OS/2 and MS-DOS, it is Ctrl-z.)

As an example, the following program prints a friendly piece of advice (from Douglas Adams's The Hitchhiker's Guide to the Galaxy), to keep you from worrying about the complexities of computer programming. (BEGIN is a feature we haven't discussed yet.):

 
$ awk "BEGIN { print \"Don't Panic!\" }"
-| Don't Panic!

This program does not read any input. The `\' before each of the inner double quotes is necessary because of the shell's quoting rules--in particular because it mixes both single quotes and double quotes.(6)

This next simple awk program emulates the cat utility; it copies whatever you type at the keyboard to its standard output. (Why this works is explained shortly.)

 
$ awk '{ print }'
Now is the time for all good men
-| Now is the time for all good men
to come to the aid of their country.
-| to come to the aid of their country.
Four score and seven years ago, ...
-| Four score and seven years ago, ...
What, me worry?
-| What, me worry?
Ctrl-d


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.3 Running Long Programs

Sometimes your awk programs can be very long. In this case, it is more convenient to put the program into a separate file. In order to tell awk to use that file for its program, you type:

 
awk -f source-file input-file1 input-file2 ...

The `-f' instructs the awk utility to get the awk program from the file source-file. Any file name can be used for source-file. For example, you could put the program:

 
BEGIN { print "Don't Panic!" }

into the file `advice'. Then this command:

 
awk -f advice

does the same thing as this one:

 
awk "BEGIN { print \"Don't Panic!\" }"

This was explained earlier (see section Running awk Without Input Files). Note that you don't usually need single quotes around the file name that you specify with `-f', because most file names don't contain any of the shell's special characters. Notice that in `advice', the awk program did not have single quotes around it. The quotes are only needed for programs that are provided on the awk command line.

If you want to identify your awk program files clearly as such, you can add the extension `.awk' to the file name. This doesn't affect the execution of the awk program but it does make "housekeeping" easier.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.4 Executable awk Programs

Once you have learned awk, you may want to write self-contained awk scripts, using the `#!' script mechanism. You can do this on many Unix systems(7) as well as on the GNU system. For example, you could update the file `advice' to look like this:

 
#! /bin/awk -f

BEGIN { print "Don't Panic!" }

After making this file executable (with the chmod utility), simply type `advice' at the shell and the system arranges to run awk(8) as if you had typed `awk -f advice':

 
$ chmod +x advice
$ advice
-| Don't Panic!

Self-contained awk scripts are useful when you want to write a program that users can invoke without their having to know that the program is written in awk.

Advanced Notes: Portability Issues with `#!'

Some systems limit the length of the interpreter name to 32 characters. Often, this can be dealt with by using a symbolic link.

You should not put more than one argument on the `#!' line after the path to awk. It does not work. The operating system treats the rest of the line as a single argument and passes it to awk. Doing this leads to confusing behavior--most likely a usage diagnostic of some sort from awk.

Finally, the value of ARGV[0] (see section 7.5 Built-in Variables) varies depending upon your operating system. Some systems put `awk' there, some put the full pathname of awk (such as `/bin/awk'), and some put the name of your script (`advice'). Don't rely on the value of ARGV[0] to provide your script name.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.5 Comments in awk Programs

A comment is some text that is included in a program for the sake of human readers; it is not really an executable part of the program. Comments can explain what the program does and how it works. Nearly all programming languages have provisions for comments, as programs are typically hard to understand without them.

In the awk language, a comment starts with the sharp sign character (`#') and continues to the end of the line. The `#' does not have to be the first character on the line. The awk language ignores the rest of a line following a sharp sign. For example, we could have put the following into `advice':

 
# This program prints a nice friendly message.  It helps
# keep novice users from being afraid of the computer.
BEGIN    { print "Don't Panic!" }

You can put comment lines into keyboard-composed throw-away awk programs, but this usually isn't very useful; the purpose of a comment is to help you or another person understand the program when reading it at a later time.

Caution: As mentioned in One-Shot Throw-Away awk Programs, you can enclose small to medium programs in single quotes, in order to keep your shell scripts self-contained. When doing so, don't put an apostroph