Java 简明教程
Java - Regular Expressions
Java 提供 java.util.regex 包来使用正则表达式进行模式匹配。Java regular expressions 与 Perl 编程语言非常相似,并且非常容易学习。
Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.
Regular Expressions in Java
regular expression 是一个特殊的字符序列,它使用模式中包含的专门语法来帮助你匹配或查找其他字符串或字符串集。它们可用于搜索、编辑或处理文本和数据。
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
Java Regular Expressions (Regex) Classes
java.util.regex 包主要包含以下三个类 −
The java.util.regex package primarily consists of the following three classes −
-
Pattern Class − A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile() methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
-
Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.
-
PatternSyntaxException − A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.
Capturing Groups in Regular Expression
捕获组是一种将多个字符视为单个单位的方法。它们是通过将待分组的字符置于一组圆括号内创建的。例如,正则表达式 (dog) 创建单个组,其中包含字母“d”、“o”和“g”。
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
捕获组通过从左到右计算其左括号进行编号。例如,在表达式 A)(B© 中,存在四个这样的组 -
Capturing groups are numbered by counting their opening parentheses from the left to the right. In the expression A)(B©, for example, there are four such groups −
-
A)(B©
-
(A)
-
(B©)
-
©
若要找出表达式中有多少组,请对匹配器对象调用 groupCount 方法。groupCount 方法返回一个 int,显示匹配器模式中存在的分组数。
To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher’s pattern.
还有一个特殊的组,组 0,它始终代表整个表达式。此组不包括在 groupCount 报告的总数中。
There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.
Example
以下示例说明如何从给定的字母数字字符串中找到数字字符串 -
Following example illustrates how to find a digit string from the given alphanumeric string −
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
}else {
System.out.println("NO MATCH");
}
}
}
Output
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
Regular Expression Syntax
以下是列出 Java 中可用的所有正则表达式元字符语法的表格 -
Here is the table listing down all the regular expression metacharacter syntax available in Java −
Subexpression |
Matches |
^ |
Matches the beginning of the line. |
$ |
Matches the end of the line. |
. |
Matches any single character except newline. Using m option allows it to match the newline as well. |
[…] |
Matches any single character in brackets. |
[^…] |
Matches any single character not in brackets. |
\A |
Beginning of the entire string. |
\z |
End of the entire string. |
\Z |
End of the entire string except allowable final line terminator. |
re* |
Matches 0 or more occurrences of the preceding expression. |
re+ |
Matches 1 or more of the previous thing. |
re? |
Matches 0 or 1 occurrence of the preceding expression. |
re{ n} |
Matches exactly n number of occurrences of the preceding expression. |
re{ n,} |
Matches n or more occurrences of the preceding expression. |
re{ n, m} |
Matches at least n and at most m occurrences of the preceding expression. |
a |
|
Matches either a or b. |
(re) |
Groups regular expressions and remembers the matched text. |
(?: re) |
Groups regular expressions without remembering the matched text. |
(?> re) |
Matches the independent pattern without backtracking. |
\w |
Matches the word characters. |
\W |
Matches the nonword characters. |
\s |
Matches the whitespace. Equivalent to [\t\n\r\f]. |
\S |
Matches the nonwhitespace. |
\d |
Matches the digits. Equivalent to [0-9]. |
\D |
Matches the nondigits. |
\A |
Matches the beginning of the string. |
\Z |
Matches the end of the string. If a newline exists, it matches just before newline. |
\z |
Matches the end of the string. |
\G |
Matches the point where the last match finished. |
\n |
Back-reference to capture group number "n". |
\b |
Matches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets. |
\B |
Matches the nonword boundaries. |
\n, \t, etc. |
Matches newlines, carriage returns, tabs, etc. |
\Q |
Escape (quote) all characters up to \E. |
\E |
Regular Expression - Matcher Class Methods
以下是有用的实例方法列表 -
Here is a list of useful instance methods −
Index Methods
索引方法提供有用的索引值,显示在输入字符串中精确匹配的位置 -
Index methods provide useful index values that show precisely where the match was found in the input string −
Sr.No. |
Method & Description |
1 |
public int start() Returns the start index of the previous match. |
2 |
public int start(int group) Returns the start index of the subsequence captured by the given group during the previous match operation. |
3 |
public int end() Returns the offset after the last character matched. |
4 |
public int end(int group) Returns the offset after the last character of the subsequence captured by the given group during the previous match operation. |
Study Methods
研究方法查看输入字符串,并返回一个布尔值来指示是否找到该模式 -
Study methods review the input string and return a Boolean indicating whether or not the pattern is found −
Sr.No. |
Method & Description |
1 |
public boolean lookingAt() Attempts to match the input sequence, starting at the beginning of the region, against the pattern. |
2 |
public boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern. |
3 |
public boolean find(int start) Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index. |
4 |
public boolean matches() Attempts to match the entire region against the pattern. |
Regular Expression - Replacement Methods
替换方法是用于替换输入字符串中的文本的有用方法 −
Replacement methods are useful methods for replacing text in an input string −
Sr.No. |
Method & Description |
1 |
public Matcher appendReplacement(StringBuffer sb, String replacement) Implements a non-terminal append-and-replace step. |
2 |
public StringBuffer appendTail(StringBuffer sb) Implements a terminal append-and-replace step. |
3 |
public String replaceAll(String replacement) Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. |
4 |
public String replaceFirst(String replacement) Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string. |
5 |
public static String quoteReplacement(String s) Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. |
The start and end Methods
以下示例统计了单词“cat”在输入字符串中出现的次数:
Following is the example that counts the number of times the word "cat" appears in the input string −
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static final String REGEX = "\\bcat\\b";
private static final String INPUT = "cat cat cat cattie cat";
public static void main( String args[] ) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
Output
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
您可以看到,此示例使用了词边界以确保字母“c”、“a”和“t”并非更长单词中的一个子字符串。它还提供了一些有用的信息,说明匹配在输入字符串中的位置。
You can see that this example uses word boundaries to ensure that the letters "c" "a" "t" are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred.
start 方法返回在先前的匹配操作期间给定组捕获的子序列的起始索引,end 返回匹配的最后一个字符的索引加 1。
The start method returns the start index of the subsequence captured by the given group during the previous match operation, and the end returns the index of the last character matched, plus one.
The matches() and lookingAt() Methods
matches 和 lookingAt 方法均尝试将一个输入序列与模式进行匹配。但区别在于 matches 要求匹配整个输入序列,而 lookingAt 则不要求。
The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingAt does not.
两种方法都始终从输入字符串的开头处开始。以下示例说明了该功能:
Both methods always start at the beginning of the input string. Here is the example explaining the functionality −
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static final String REGEX = "foo";
private static final String INPUT = "fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;
public static void main( String args[] ) {
pattern = Pattern.compile(REGEX);
matcher = pattern.matcher(INPUT);
System.out.println("Current REGEX is: "+REGEX);
System.out.println("Current INPUT is: "+INPUT);
System.out.println("lookingAt(): "+matcher.lookingAt());
System.out.println("matches(): "+matcher.matches());
}
}
Output
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false
The replaceFirst() and replaceAll() Methods
replaceFirst 和 replaceAll 方法将替换与给定正则表达式匹配的文本。顾名思义,replaceFirst 替换第一个匹配项,而 replaceAll 替换所有匹配项。
The replaceFirst and replaceAll methods replace the text that matches a given regular expression. As their names indicate, replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences.
以下示例说明了该功能:
Here is the example explaining the functionality −
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static String REGEX = "dog";
private static String INPUT = "The dog says meow. " + "All dogs say meow.";
private static String REPLACE = "cat";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}
Output
The cat says meow. All cats say meow.
The appendReplacement() and appendTail() Methods
Matcher 类还提供了 appendReplacement 和 appendTail 方法用于文本替换。
The Matcher class also provides appendReplacement and appendTail methods for text replacement.
以下示例说明了该功能:
Here is the example explaining the functionality −
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static String REGEX = "a*b";
private static String INPUT = "aabfooaabfooabfoob";
private static String REPLACE = "-";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, REPLACE);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
Output
-foo-foo-foo-
Regular Expression - PatternSyntaxException Class Methods
PatternSyntaxException 是一个未经检查的异常,它表示正则表达式模式中的语法错误。PatternSyntaxException 类提供了以下方法来帮助您确定出错的原因:
A PatternSyntaxException is an unchecked exception that indicates a syntax error in a regular expression pattern. The PatternSyntaxException class provides the following methods to help you determine what went wrong −
Sr.No. |
Method & Description |
1 |
public String getDescription() Retrieves the description of the error. |
2 |
public int getIndex() Retrieves the error index. |
3 |
public String getPattern() Retrieves the erroneous regular expression pattern. |
4 |
public String getMessage() Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern. |