Skip to content


[程序员七武器]之正则表达式

比较常用的PCRE规则:

modifier:

  • m

    Treat string as multiple lines. That is, change “^” and “$” from matching the start or end of the string to matching the start or end of any line anywhere within the string.

  • s

    Treat string as single line. That is, change “.” to match any character whatsoever, even a newline, which normally it would not match.

    Used together, as /ms, they let the “.” match any character whatsoever, while still allowing “^” and “$” to match, respectively, just after and just before newlines within the string.

  • i

    Do case-insensitive pattern matching.

    If use locale is in effect, the case map is taken from the current locale. See perllocale.

  • x

    Extend your pattern’s legibility by permitting whitespace and comments.

  • p

    Preserve the string matched such that ${^PREMATCH}, {$^MATCH}, and ${^POSTMATCH} are available for use after matching.

  • g and c

    Global matching, and keep the Current position after failed matching. Unlike i, m, s and x, these two flags affect the way the regex is used rather than the regex itself. See “”Using regular expressions in Perl”" in perlretut for further explanation of the g and c modifiers.

metacharacters:

    \	Quote the next metacharacter
    ^	Match the beginning of the line
    .	Match any character (except newline)
    $	Match the end of the line (or before newline at the end)
    |	Alternation
    ()	Grouping
    []	Character class

quantifiers:

    *	   Match 0 or more times
    +	   Match 1 or more times
    ?	   Match 1 or 0 times
    {n}    Match exactly n times
    {n,}   Match at least n times
    {n,m}  Match at least n but not more than m times
    *?     Match 0 or more times, not greedily
    +?     Match 1 or more times, not greedily
    ??     Match 0 or 1 time, not greedily
    {n}?   Match exactly n times, not greedily
    {n,}?  Match at least n times, not greedily
    {n,m}? Match at least n but not more than m times, not greedily

Escape:

    \t		tab                   (HT, TAB)
    \n		newline               (LF, NL)
    \r		return                (CR)
    \f		form feed             (FF)
    \a		alarm (bell)          (BEL)
    \e		escape (think troff)  (ESC)
    \033	octal char            (example: ESC)
    \x1B	hex char              (example: ESC)
    \x{263a}	long hex char         (example: Unicode SMILEY)
    \cK		control char          (example: VT)
    \N{name}	named Unicode character
    \l		lowercase next char (think vi)
    \u		uppercase next char (think vi)
    \L		lowercase till \E (think vi)
    \U		uppercase till \E (think vi)
    \E		end case modification (think vi)
    \Q		quote (disable) pattern metacharacters till \E

Character Classes:

    \w	     Match a "word" character (alphanumeric plus "_")
    \W	     Match a non-"word" character
    \s	     Match a whitespace character
    \S	     Match a non-whitespace character
    \d	     Match a digit character
    \D	     Match a non-digit character
    \pP	     Match P, named property.  Use \p{Prop} for longer names.
    \PP	     Match non-P
    \X	     Match eXtended Unicode "combining character sequence",
             equivalent to (?:\PM\pM*)
    \C	     Match a single C char (octet) even under Unicode.
	     NOTE: breaks up characters into their UTF-8 bytes,
	     so you may end up with malformed pieces of UTF-8.
	     Unsupported in lookbehind.
    \1       Backreference to a specific group.
	     '1' may actually be any positive integer.
    \g1      Backreference to a specific or previous group,
    \g{-1}   number may be negative indicating a previous buffer and may
             optionally be wrapped in curly brackets for safer parsing.
    \g{name} Named backreference
    \k<name> Named backreference
    \K       Keep the stuff left of the \K, don't include it in $&
    \v       Vertical whitespace
    \V       Not vertical whitespace
    \h       Horizontal whitespace
    \H       Not horizontal whitespace
    \R       Linebreak

Group :

\g{-1} refers to the last buffer, \g{-2} refers to the buffer before that.

Posted in Linux. Tagged with , .

4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Great topic. Now i can say thank you

  2. Very interesting theme

  3. Hi there,
    Where are you from? Is it a secret? :)
    Thanks

Some HTML is OK

(required)

(required, but never shared)

or, reply to this post via trackback.