[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sort
: Sort text files
sort
sorts, merges, or compares all the lines from the given
files, or standard input if none are given or for a file of
`-'. By default, sort
writes the results to standard
output. Synopsis:
sort [option]... [file]... |
sort
has three modes of operation: sort (the default), merge,
and check for sortedness. The following options change the operation
mode:
A pair of lines is compared as follows: if any key fields have
been specified, sort
compares each pair of fields, in the
order specified on the command line, according to the associated
ordering options, until a difference is found or no fields are left.
Unless otherwise specified, all comparisons use the character collating
sequence specified by the LC_COLLATE
locale. (1)
If any of the global options `bdfgiMnr' are given but no key fields
are specified, sort
compares the entire lines according to the
global options.
Finally, as a last resort when all keys compare equal (or if no ordering
options were specified at all), sort
compares the entire lines.
The last resort comparison honors the `--reverse' (`-r')
global option. The `--stable' (`-s') option disables this
last-resort comparison so that lines in which all fields compare equal
are left in their original relative order. If no fields or global
options are specified, `--stable' (`-s') has no effect.
GNU sort
(as specified for all GNU utilities) has no limits on
input line length or restrictions on bytes allowed within lines. In
addition, if the final byte of an input file is not a newline, GNU
sort
silently supplies one. A line's trailing newline is not
part of the line for comparison purposes.
Upon any error, sort
exits with a status of `2'.
If the environment variable TMPDIR
is set, sort
uses its
value as the directory for temporary files instead of `/tmp'. The
`--temporary-directory' (`-T') option in turn overrides
the environment variable.
The following options affect the ordering of output lines. They may be
specified globally or as part of a specific key field. If no key
fields are specified, global options apply to comparison of entire
lines; otherwise the global options are inherited by key fields that do
not specify any special options of their own. In pre-POSIX
versions of sort
, global options affect only later key fields,
so portable shell scripts should specify global options first.
LC_CTYPE
locale determines character types.
LC_CTYPE
locale determines character types.
LC_CTYPE
locale determines character types.
strtod
to convert
a prefix of each line to a double-precision floating point number.
This allows floating point numbers to be specified in scientific notation,
like 1.0e-34
and 10e100
.
The LC_NUMERIC
locale determines the decimal-point character.
Do not report overflow, underflow, or conversion errors.
Use the following collating sequence:
Use this option only if there is no alternative; it is much slower than `--numeric-sort' (`-n') and it can lose information when converting to floating point.
LC_CTYPE
locale determines character types.
LC_TIME
locale
category determines the month spellings.
LC_NUMERIC
locale specifies the decimal-point character and thousands separator.
Numeric sort uses what might be considered an unconventional method to
compare strings representing floating point numbers. Rather than first
converting each string to the C double
type and then comparing
those values, sort
aligns the decimal-point characters in the
two strings and compares the strings a character at a time. One benefit
of using this approach is its speed. In practice this is much more
efficient than performing the two corresponding string-to-double (or
even string-to-integer) conversions and then comparing doubles. In
addition, there is no corresponding loss of precision. Converting each
string to double
before comparison would limit precision to about
16 digits on most systems.
Neither a leading `+' nor exponential notation is recognized. To compare such strings numerically, use the `--general-numeric-sort' (`-g') option.
Other options are:
sort
reads input before opening
output-file, so you can safely sort a file in place by using
commands like sort -o F F
and cat F | sort -o F
.
On newer systems, `-o' cannot appear after an input file if
POSIXLY_CORRECT
is set, e.g., `sort F -o F'. Portable
scripts should specify `-o output-file' before any input
files.
This option can improve the performance of sort
by causing it
to start with a larger or smaller sort buffer than the default.
However, this option affects only the initial buffer size. The buffer
grows beyond size if sort
encounters input lines larger
than size.
sort
breaks it
into fields ` foo' and ` bar'. The field separator is
not considered to be part of either the field preceding or the field
following. But note that sort fields that extend to the end of the line,
as `-k 2', or sort fields consisting of a range, as `-k 2,3',
retain the field separators present between the endpoints of the range.
TMPDIR
environment variable. If this option is given more than
once, temporary files are stored in all the directories given. If you
have a large sort or merge that is I/O-bound, you can often improve
performance by using this option to specify directories on different
disks and controllers.
Normally, output only the first of a sequence of lines that compare equal. For the `--check' (`-c') option, check that no pair of consecutive lines compares equal.
Historical (BSD and System V) implementations of sort
have
differed in their interpretation of some options, particularly
`-b', `-f', and `-n'. GNU sort follows the POSIX
behavior, which is usually (but not always!) like the System V behavior.
According to POSIX, `-n' no longer implies `-b'. For
consistency, `-M' has been changed in the same way. This may
affect the meaning of character positions in field specifications in
obscure cases. The only fix is to add an explicit `-b'.
A position in a sort field specified with the `-k' option has the form `f.c', where f is the number of the field to use and c is the number of the first character from the beginning of the field. In a start position, an omitted `.c' stands for the field's first character. In an end position, an omitted or zero `.c' stands for the field's last character. If the `-b' option was specified, the `.c' part of a field specification is counted from the first nonblank character of the field.
A sort key position may also have any of the option letters `Mbdfinr' appended to it, in which case the global ordering options are not used for that particular field. The `-b' option may be independently attached to either or both of the start and end positions of a field specification, and if it is inherited from the global options it will be attached to both. Keys may span multiple fields.
On older systems, sort
supports an obsolete origin-zero
syntax `+pos1 [-pos2]' for specifying sort keys.
POSIX 1003.1-2001 (see section 2.5 Standards conformance) does not allow
this; use `-k' instead.
Here are some examples to illustrate various combinations of options.
sort -nr |
sort -k 3 |
sort -t : -k 2,2n -k 5.3,5.4 |
Note that if you had written `-k 2' instead of `-k 2,2'
sort
would have used all characters beginning in the second field
and extending to the end of the line as the primary numeric
key. For the large majority of applications, treating keys spanning
more than one field as numeric will not do what you expect.
Also note that the `n' modifier was applied to the field-end specifier for the first key. It would have been equivalent to specify `-k 2n,2' or `-k 2n,2n'. All modifiers except `b' apply to the associated field, regardless of whether the modifier character is attached to the field-start and/or the field-end part of the key specifier.
sort -t : -k 5b,5 -k 3,3n /etc/passwd |
An alternative is to use the global numeric modifier `-n'.
sort -t : -n -k 5b,5 -k 3,3 /etc/passwd |
find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append |
The use of `-print0', `-z', and `-0' in this case means that pathnames that contain Line Feed characters will not get broken up by the sort operation.
Finally, to ignore both leading and trailing white space, you could have applied the `b' modifier to the field-end specifier for the first key,
sort -t : -n -k 5b,5b -k 3,3 /etc/passwd |
or by using the global `-b' modifier instead of `-n' and an explicit `n' with the second key specifier.
sort -t : -b -k 5,5 -k 3,3n /etc/passwd |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |