Node:sentence-end, Next:re-search-forward, Previous:Regexp Search, Up:Regexp Search
sentence-end is bound to the pattern that marks the
end of a sentence. What should this regular expression be?
Clearly, a sentence may be ended by a period, a question mark, or an exclamation mark. Indeed, only clauses that end with one of those three characters should be considered the end of a sentence. This means that the pattern should include the character set:
However, we do not want
forward-sentence merely to jump to a
period, a question mark, or an exclamation mark, because such a character
might be used in the middle of a sentence. A period, for example, is
used after abbreviations. So other information is needed.
According to convention, you type two spaces after every sentence, but only one space after a period, a question mark, or an exclamation mark in the body of a sentence. So a period, a question mark, or an exclamation mark followed by two spaces is a good indicator of an end of sentence. However, in a file, the two spaces may instead be a tab or the end of a line. This means that the regular expression should include these three items as alternatives.
This group of alternatives will look like this:
\\($\\| \\| \\) ^ ^^ TAB SPC
$ indicates the end of the line, and I have pointed out
where the tab and two spaces are inserted in the expression. Both are
inserted by putting the actual characters into the expression.
\\, are required before the parentheses and
vertical bars: the first backslash quotes the following backslash in
Emacs; and the second indicates that the following character, the
parenthesis or the vertical bar, is special.
Also, a sentence may be followed by one or more carriage returns, like this:
Like tabs and spaces, a carriage return is inserted into a regular expression by inserting it literally. The asterisk indicates that the <RET> is repeated zero or more times.
But a sentence end does not consist only of a period, a question mark or an exclamation mark followed by appropriate space: a closing quotation mark or a closing brace of some kind may precede the space. Indeed more than one such mark or brace may precede the space. These require a expression that looks like this:
In this expression, the first
] is the first character in the
expression; the second character is
", which is preceded by a
\ to tell Emacs the
" is not special. The last
three characters are
All this suggests what the regular expression pattern for matching the
end of a sentence should be; and, indeed, if we evaluate
sentence-end we find that it returns the following value:
sentence-end => "[.?!]\"')}]*\\($\\| \\| \\)[ ]*"