【免费】awk.cheat.sheet资源-CSDN文库

需积分: 0 66 浏览量更新于2009-08-30 收藏 63KB PDF 举报

### AWK 快速参考指南 #### 一、概述 AWK 是一款强大的文本处理工具，由 Alfred V. Aho、Peter J. Weinberger 和 Brian W. Kernighan 在 20 世纪 70 年代末共同开发。AWK 语言的主要功能在于能够方便地对数据进行提取、分析和报告生成。它特别适用于处理结构化的文本数据，例如日志文件、CSV 文件等。AWK 支持多种变量类型，并且具有丰富的内置函数集合，能够满足大多数文本处理需求。 #### 二、预定义变量概览在 AWK 中，预定义变量是系统自动设置的一些变量，用于控制或获取程序运行时的信息。下面将详细介绍 AWK 中常见的预定义变量及其作用： 1. **FS (Field Separator)**：输入字段分隔符，默认为空格。这个变量决定了 AWK 如何将一行文本分割成多个字段。 - 支持情况：AWK、NAWK、GAWK 2. **OFS (Output Field Separator)**：输出字段分隔符，默认为空格。这个变量决定了 AWK 在输出时如何连接不同的字段。 - 支持情况：AWK、NAWK、GAWK 3. **NF (Number of Fields)**：当前输入记录中的字段数量。该变量反映了当前行被分割后的字段总数。 - 支持情况：AWK、NAWK、GAWK 4. **NR (Number of Records)**：已读取的总记录数。该变量记录了自程序开始执行以来读取的所有行数。 - 支持情况：AWK、NAWK、GAWK 5. **RS (Record Separator)**：记录分隔符，默认为换行符。该变量定义了 AWK 将哪些字符视为记录的边界。 - 支持情况：AWK、NAWK、GAWK 6. **ORS (Output Record Separator)**：输出记录分隔符，默认为换行符。该变量定义了 AWK 在输出记录时使用的分隔符。 - 支持情况：AWK、NAWK、GAWK 7. **FILENAME**：当前输入文件的名称。如果没有指定文件，则该值为 "-"。需要注意的是，在 BEGIN 块中 FILENAME 是未定义的（除非通过 getline 设置）。 - 支持情况：AWK、NAWK、GAWK 8. **ARGC**：命令行参数的数量（不包括 gawk 选项或程序源）。通过动态改变 ARGV 的内容可以控制所使用的数据文件。 - 支持情况：仅 GAWK 支持 9. **ARGV**：一个包含命令行参数的数组。数组从 0 到 ARGC-1 进行索引。 - 支持情况：仅 GAWK 支持 10. **ARGIND**：当前正在处理的文件在 ARGV 数组中的索引。 - 支持情况：仅 GAWK 支持 11. **BINMODE**：在非 POSIX 系统上，用于指定所有文件 I/O 的“二进制”模式。数值 1、2 或 3 分别表示输入文件、输出文件或所有文件应使用二进制 I/O；字符串 "r" 或 "w" 分别表示输入文件或输出文件应使用二进制 I/O；字符串 "rw" 或 "wr" 表示所有文件应使用二进制 I/O；其他任何字符串值都将被视为 "rw"，但会生成警告消息。 - 支持情况：仅 GAWK 支持 12. **CONVFMT**：用于指定数字转换为字符串时的格式，默认为 "%.6g"。 - 支持情况：仅 GAWK 支持 13. **ENVIRON**：一个包含当前环境变量值的数组。 - 支持情况：仅 GAWK 支持 14. **ERRNO**：如果发生系统错误（如重定向 getline、读取 getline 或 close() 期间），则 ERRNO 将包含描述错误的字符串。 - 支持情况：仅 GAWK 支持 15. **FIELDWIDTHS**：字段宽度的空格分隔列表。设置后，gawk 将按照固定宽度解析输入，而不是使用 FS 变量作为字段分隔符。 - 支持情况：仅 GAWK 支持 16. **FNR (File Number of Records)**：包含了读取的行数，但对于每个读取的文件都会重置。 - 支持情况：NAWK、GAWK 17. **IGNORECASE**：控制所有正则表达式和字符串操作的大小写敏感性。如果 IGNORECASE 的值非零，则字符串比较和模式匹配将忽略大小写。 - 支持情况：AWK、NAWK、GAWK 以上这些预定义变量为 AWK 用户提供了极大的灵活性和便利性，使得用户能够在处理数据时更加精确地控制流程和结果。在实际使用过程中，了解并熟练掌握这些变量的用法是非常重要的。 ### 三、应用实例为了更好地理解这些预定义变量的应用场景，我们可以通过一些简单的示例来演示它们的实际使用方法。 #### 示例 1：统计文件中每行的字段数量假设有一个文件 `data.txt`，其内容如下： ``` John Doe 25 Jane Smith 30 ``` 我们可以使用以下命令来统计每行的字段数量： ```bash awk '{print NF}' data.txt ``` 输出结果将是： ``` 3 3 ``` 这里使用了预定义变量 `NF` 来表示每行的字段数量。 #### 示例 2：使用自定义分隔符输出记录假设 `data.csv` 文件内容如下： ``` name,age,email John Doe,25,john@example.com Jane Smith,30,jane@example.com ``` 我们可以使用以下命令来使用逗号作为分隔符并输出所有记录： ```bash awk -F',' '{print $1,$2,$3}' data.csv ``` 输出结果将是： ``` name age email John Doe 25 john@example.com Jane Smith 30 jane@example.com ``` 这里使用了 `-F` 选项来设置 `FS` 变量，并通过 `$1`, `$2`, `$3` 引用各个字段。 #### 四、总结 AWK 作为一种强大的文本处理工具，其预定义变量在实际应用中扮演着非常重要的角色。通过对这些变量的理解和掌握，用户可以更灵活地处理各种复杂的文本数据，并实现自动化脚本编写，从而大大提高工作效率。希望本文的介绍能帮助大家更好地理解和运用 AWK 预定义变量。

AWK (Aho, Kernighan, and Weinberger) Summary

Predefined Variable Summary:

Support:

Variable Description

AWK NAWK GAWK

Input Field Separator, a space by default.

  

OFS

Output Field Separator, a space by default.

  

The Number of Fields in the current input record.



The total Number of input Records seen so far.

  

Record Separator, a newline by default.

  

ORS

Output Record Separator, a newline by default.

  

FILENAME

The name of the current input file. If no files are specified on the command line,

the value of FILENAME is "-". However, FILENAME is undefined inside the

BEGIN block (unless set by getline).

 



ARGC

The number of command line arguments (does not include options to gawk, or the

program source). Dynamically changing the contents of

ARGV

control the files

used for data.

  

ARGV

Array of command line arguments. The array is indexed from 0 to ARGC - 1.

  

ARGIND

The index in ARGV of the current file being processed.

  

BINMODE

On non-POSIX systems, specifies use of "binary" mode for all file I/O.

Numeric values of 1, 2, or 3, specify that input files, output files, or all files,

respectively, should use binary I/O. String values of "r", or "w" specify that input

files, or output files, respectively, should use binary I/O. String values of "rw" or

"wr" specify that all files should use binary I/O. Any other string value is treated as

"rw", but generates a warning message.





CONVFMT

The CONVFMT variable is used to specify the format when converting a number

to a string. Default: "

%.6g

  

ENVIRON

An array containing the values of the current environment.

  

ERRNO

If a system error occurs either doing a redirection for getline, during a read for

getline, or during a close(), then ERRNO will contain a string describing the error.

The value is subject to translation in non-English locales.

  

FIELDWIDTHS

A white-space separated list of fieldwidths. When set, gawk parses the input into

fields of fixed width, instead of using the value of the FS variable as the field

separator.

  

FNR

Contains number of lines read, but is reset for each file read.

  

IGNORECASE

Controls the case-sensitivity of all regular expression and string operations. If

IGNORECASE has a non-zero value, then string comparisons and pattern

matching in rules, field splitting with

, record separating with RS, regular

expression matching with ~ and !~, and the gensub(), gsub(), index(), match(),

split(), and sub() built-in functions all ignore case when doing regular expression

operations. NOTE: Array subscripting is not affected. However, the asort() and

asorti() functions are affected.

  

LINT

Provides dynamic control of the --lint option from within an AWK program. When

true, gawk prints lint warnings.

  

OFMT

The default output format for numbers. Default: "%.6g"

  

PROCINFO

The elements of this array provide access to information about the running AWK

program.

PROCINFO["egid"] the value of the getegid(2) system call.

PROCINFO["euid"] the value of the geteuid(2) system call.

PROCINFO["FS"] "FS" if field splitting with FS is in effect,

or "FIELDWIDTHS" if field splitting with

FIELDWIDTHS is in effect.

PROCINFO["gid"] the value of the getgid(2) system call.

PROCINFO["pgrpid"] the process group ID of the current process.

PROCINFO["pid"] the process ID of the current process.

PROCINFO["ppid"] the parent process ID of the current process.

PROCINFO["uid"] the value of the getuid(2) system call.

  

The record terminator. Gawk sets RT to the input text that matched the character or

regular expression specified by RS.

  

RSTART

The index of the first character matched by match(); 0 if no match.

  

RLENGTH

The length of the string matched by match(); -1 if no match.

  

SUBSEP

The character used to separate multiple subscripts in array elements.

Default: "

\034

" (non-printable character, dec: 28, hex: 1C)

  

TEXTDOMAIN

The text domain of the AWK program; used to find the localized translations for the

program's strings.





Variable is supported: 

Variable is not supported:



http://www.catonmat.net good coders code, great reuse

下载后可阅读完整内容，剩余3页未读，立即下载

资源推荐

资源评论

winneryong

粉丝: 0
资源: 3

awk.cheat.sheet

思维导图-awk (开发你的大脑)

cheat-sheet

awk.cheat.sheet.pdf_awk_

english_cheat_sheet

numpy cheat sheet

sed-awk-cheatsheet：您可以使用sed和awk做的事情

开源项目-dufferzafar-cheat.zip

cheatsheets:非常有用的备忘单

Memento-CheatSheet

Quick-CheatSheet:快速命令备忘单，您可以直接将open导入到您的笔记本中

vim cheat sheet

auth_cheat_sheet

Google cheat sheet

javascript-cheat-sheet

XSS cheat sheet

Privilege-Escalation:此备忘单针对 CTF 玩家和初学者，通过示例帮助他们了解权限提升的基础知识

cheatsheets

备忘单

速查表：开发者速查表CheatSheets

plotly_r_cheat_sheet

chord-cheat-sheet

bash cheat sheet

R cheat sheet

pandas_cheat_sheet

shell cheat sheet

COST_cheat-sheet

Anaconda cheat sheet

最新资源