Monday, July 28, 2008

regexp




Text Pattern Matching in Emacs

Common Patterns:


Pattern Matches
-----------------------------------
. any single character
\. one period

[0-9]+ digit sequence
[A-Za-z]+ sequence of letters
[_A-Za-z0-9]+ sequence of alphanumeric char and underscore
[-A-Za-z0-9]+ sequence of alphanumeric char and hyphen
[[:blank:]]+ sequence of tabs and spaces

"\([^"]+\)" capture text between double quotes

+ means match previous pattern 1 or more times
* means match previous pattern 0 or more times
? means match previous pattern 0 or 1 time



Also:


M-x regexp-builder



3 comments:

doug said...

For trying out regular expressions, I'm a big fan of RegEx Tool, which is available here. It creates a new frame with three windows: one in which you edit your regex, one in which sample text is displayed and matches are highlighted, and one where the matching groups are identified by number. The last window is especially handy if you are using regexp-replace and you need help figuring out what \1, \2, etc will be in the replacement patter.

Bill White said...

I picked this up from Sacha Chua: \< and \> - empty string at the beginning and end of a word.

And to craft hairy regexps I use regexp-opt:

(regexp-opt
'(
"January" "February" "March" "April" "May" "June" "July"
"August" "September" "October" "November" "December"
) t)

"\\(A\\(?:pril\\|ugust\\)\\|December\\|February\\|J\\(?:anuary\\|u\\(?:ly\\|ne\\)\\)\\|Ma\\(?:rch\\|y\\)\\|\\(?:Novem\\|Octo\\|Septem\\)ber\\)"

smitty1e said...

Look at Bill White's response, and see what I think is a genuine weakness in emacs' design: not being able to turn off the compiler when prudent.
Python offers a '''brilliant''' """lesson""" in making life easier on the programmer.
Someday, when I've had enough time to pore over the emacs source code, I plan to hack it up a bit make embedding DSL code for things like regexes and SQL statements just a bit less tedious.
End of rantlet.