Javaregex 简明教程
Java Regex - Overview
Java 提供 java.util.regex 包,用于使用正则表达式进行模式匹配。Java 正则表达式与 Perl 编程语言非常相似,且非常容易学习。
Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.
正则表达式是由字符组成的特殊序列,它使用模式中包含的专门语法,帮助您匹配或查找其他字符串或字符串集。它们可以用来搜索、编辑或处理文本和数据。
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
java.util.regex 包主要包含以下三个类 −
The java.util.regex package primarily consists of the following three classes −
-
Pattern Class − A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile() methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
-
Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.
-
PatternSyntaxException − A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.
Java Regex - Capturing Groups
捕获组是一种将多个字符视为单个单位的方法。它们是通过将待分组的字符置于一组圆括号内创建的。例如,正则表达式 (dog) 创建单个组,其中包含字母“d”、“o”和“g”。
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
捕获组通过从左到右计算其左括号进行编号。例如,在表达式 A)(B© 中,存在四个这样的组 -
Capturing groups are numbered by counting their opening parentheses from the left to the right. In the expression A)(B©, for example, there are four such groups −
-
A)(B©
-
(A)
-
(B©)
-
©
若要找出表达式中有多少组,请对匹配器对象调用 groupCount 方法。groupCount 方法返回一个 int,显示匹配器模式中存在的分组数。
To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher’s pattern.
还有一个特殊的组,组 0,它始终代表整个表达式。此组不包括在 groupCount 报告的总数中。
There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.
Example
以下示例说明如何从给定的字母数字字符串中找到数字字符串 -
Following example illustrates how to find a digit string from the given alphanumeric string −
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
}
这会产生以下结果 −
This will produce the following result −
Java Regex - MatchResult Interface
Introduction
java.util.regex.MatchResult 接口表示匹配操作的结果。此接口包含用于确定针对正则表达式的匹配结果的查询方法。匹配边界、组和组边界可以通过 MatchResult 查看,但不能修改。
The java.util.regex.MatchResult interface represents the result of a match operation. This interface contains query methods used to determine the results of a match against a regular expression. The match boundaries, groups and group boundaries can be seen but not modified through a MatchResult.
Interface declaration
以下是 java.util.regex.MatchResult 接口的声明 −
Following is the declaration for java.util.regex.MatchResult interface −
public interface MatchResult
Interface methods
Sr.No |
Method & Description |
1 |
int end()Returns the offset after the last character matched. |
2 |
int end(int group)Returns the offset after the last character of the subsequence captured by the given group during this match. |
3 |
String group()Returns the input subsequence matched by the previous match. |
4 |
String group(int group)Returns the input subsequence captured by the given group during the previous match operation. |
5 |
int groupCount()Returns the number of capturing groups in this match result’s pattern. |
6 |
int start()Returns the start index of the match. |
7 |
int start(int group)Returns the start index of the subsequence captured by the given group during this match. |
Java Regex - Pattern Class
Introduction
java.util.regex.Pattern 类表示正则表达式的编译表示。
The java.util.regex.Pattern class represents a compiled representation of a regular expression.
Class declaration
以下是 java.util.regex.Pattern 类的声明 -
Following is the declaration for java.util.regex.Pattern class −
public final class Pattern
extends Object
implements Serializable
Field
以下是 java.util.regex.Duration 类的字段 -
Following are the fields for java.util.regex.Duration class −
-
static int CANON_EQ − Enables canonical equivalence.
-
static int CASE_INSENSITIVE − Enables case-insensitive matching.
-
static int COMMENTS − Permits whitespace and comments in pattern.
-
static int DOTALL − Enables dotall mode.
-
static int LITERAL − Enables literal parsing of the pattern.
-
static int MULTILINE − Enables multiline mode.
-
static int UNICODE_CASE − Enables Unicode-aware case folding.
-
static int UNICODE_CHARACTER_CLASS − Enables the Unicode version of Predefined character classes and POSIX character classes.
-
static int UNIX_LINES − Enables Unix lines mode.
Class methods
Sr.No |
Method & Description |
1 |
static Pattern compile(String regex)Compiles the given regular expression into a pattern. |
2 |
static Pattern compile(String regex, int flags)Compiles the given regular expression into a pattern with the given flags. |
3 |
int flags()Returns this pattern’s match flags. |
4 |
Matcher matcher(CharSequence input)Creates a matcher that will match the given input against this pattern. |
5 |
static boolean matches(String regex, CharSequence input)Compiles the given regular expression and attempts to match the given input against it. |
6 |
String pattern()Returns the regular expression from which this pattern was compiled. |
7 |
static String quote(String s)Returns a literal pattern String for the specified String. |
8 |
String[] split(CharSequence input)Splits the given input sequence around matches of this pattern. |
9 |
String[] split(CharSequence input, int limit)Splits the given input sequence around matches of this pattern. |
10 |
String toString()Returns the string representation of this pattern. |
Java Regex - Matcher Class
Introduction
java.util.regex.Matcher 类充当一个引擎,它通过解释 Pattern 来对字符序列执行匹配操作。
The java.util.regex.Matcher class acts as an engine that performs match operations on a character sequence by interpreting a Pattern.
Class declaration
以下是 java.util.regex.Matcher 类的声明−
Following is the declaration for java.util.regex.Matcher class −
public final class Matcher
extends Object
implements MatchResult
Class methods
Sr.No |
Method & Description |
1 |
Matcher appendReplacement(StringBuffer sb, String replacement)Implements a non-terminal append-and-replace step. |
2 |
StringBuffer appendTail(StringBuffer sb)Implements a terminal append-and-replace step. |
3 |
int end()Returns the offset after the last character matched. |
4 |
int end(int group)Returns the offset after the last character of the subsequence captured by the given group during the previous match operation. |
5 |
boolean find()Attempts to find the next subsequence of the input sequence that matches the pattern. |
6 |
boolean find(int start)Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index. |
7 |
String group()Returns the input subsequence captured by the given group during the previous match operation. |
8 |
String group(String name)Returns the input subsequence captured by the given named-capturing group during the previous match operation. |
9 |
int groupCount()Returns the number of capturing groups in this matcher’s pattern. |
10 |
boolean hasAnchoringBounds()Queries the anchoring of region bounds for this matcher. |
11 |
boolean hasTransparentBounds()Queries the transparency of region bounds for this matcher. |
12 |
boolean hitEnd()Returns true if the end of input was hit by the search engine in the last match operation performed by this matcher. |
13 |
boolean lookingAt()Attempts to match the input sequence, starting at the beginning of the region, against the pattern. |
14 |
boolean matches()Attempts to match the entire region against the pattern. |
15 |
Pattern pattern()Returns the pattern that is interpreted by this matcher. |
16 |
static String quoteReplacement(String s)Returns a literal replacement String for the specified String. |
17 |
Matcher region(int start, int end)Sets the limits of this matcher’s region. |
18 |
int regionEnd()Reports the end index (exclusive) of this matcher’s region. |
19 |
int regionStart()Reports the start index of this matcher’s region. |
20 |
String replaceAll(String replacement)Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. |
21 |
String replaceFirst(String replacement)Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string. |
22 |
boolean requireEnd()Returns true if more input could change a positive match into a negative one. |
23 |
Matcher reset()Resets this matcher. |
24 |
Matcher reset(CharSequence input)Resets this matcher with a new input sequence. |
25 |
int start()Returns the start index of the previous match. |
26 |
int start(int group)Returns the start index of the subsequence captured by the given group during the previous match operation. |
27 |
MatchResult toMatchResult()Returns the match state of this matcher as a MatchResult. |
28 |
String toString()Returns the string representation of this matcher. |
29 |
Matcher useAnchoringBounds(boolean b)Sets the anchoring of region bounds for this matcher. |
30 |
Matcher usePattern(Pattern newPattern)Changes the Pattern that this Matcher uses to find matches with. |
31 |
Matcher useTransparentBounds(boolean b)Sets the transparency of region bounds for this matcher. |
Java Regex - PatternSyntaxException Class
Introduction
java.util.regex.PatternSyntaxException 类表示抛出一个未检查异常,以指示正则表达式模式的语法错误。
The java.util.regex.PatternSyntaxException class represents a unchecked exception thrown to indicate a syntax error in a regular-expression pattern.
Class declaration
以下是 java.util.regex.PatternSyntaxException 类的声明 −
Following is the declaration for java.util.regex.PatternSyntaxException class −
public class PatternSyntaxException
extends IllegalArgumentException
Constructors
Sr.No |
Method & Description |
1 |
*PatternSyntaxException(String desc, String regex, int index)*Constructs a new instance of this class. |
Class methods
Sr.No |
Method & Description |
1 |
*String getDescription()*Retrieves the description of the error. |
2 |
*int getIndex()*Retrieves the error index. |
3 |
*String getMessage()*Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular-expression pattern, and a visual indication of the error index within the pattern. |
4 |
*String getPattern()*Retrieves the erroneous regular-expression pattern. |
Methods inherited
此类从以下类中继承方法:
This class inherits methods from the following classes −
-
Java.lang.Throwable
-
Java.lang.Object
Example
以下示例展示了 java.util.regex.Pattern.PatternSyntaxException 类方法的用法。
The following example shows the usage of java.util.regex.Pattern.PatternSyntaxException class methods.
package com.tutorialspoint;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class PatternSyntaxExceptionDemo {
private static String REGEX = "[";
private static String INPUT = "The dog says meow " + "All dogs say meow.";
private static String REPLACE = "cat";
public static void main(String[] args) {
try{
Pattern pattern = Pattern.compile(REGEX);
// get a matcher object
Matcher matcher = pattern.matcher(INPUT);
INPUT = matcher.replaceAll(REPLACE);
} catch(PatternSyntaxException e){
System.out.println("PatternSyntaxException: ");
System.out.println("Description: "+ e.getDescription());
System.out.println("Index: "+ e.getIndex());
System.out.println("Message: "+ e.getMessage());
System.out.println("Pattern: "+ e.getPattern());
}
}
}
让我们编译并运行上述程序,这将生成以下结果 −
Let us compile and run the above program, this will produce the following result −
PatternSyntaxException:
Description: Unclosed character class
Index: 0
Message: Unclosed character class near index 0
[
^
Pattern: [
Java Regex - Examples Matching Characters
以下是使用 Java 中的正则表达式匹配字符的各种示例。
Following are various examples of matching characters using regular expression in java.
Sr.No |
Construct & Matches |
1 |
xThe character x |
2 |
link:../javaregex/javaregex_characters_backslash.html[\\]The backslash character |
3 |
\0nThe character with octal value 0n (0 ≤ n ≤ 7) |
4 |
\0nnThe character with octal value 0nn (0 ≤ n ≤ 7) |
5 |
\0mnnThe character with octal value 0mnn (0 ≤ m ≤ 3, 0 ≤ n ≤ 7) |
6 |
\xhhThe character with hexadecimal value 0xhh |
7 |
\uhhhhThe character with hexadecimal value 0xhhhh |
8 |
\tThe tab character ('\u0009') |
9 |
\nThe newline (line feed) character ('\u000A') |
10 |
\rThe carriage-return character ('\u000D') |
11 |
\fThe form-feed character ('\u000C') |
Java Regex - Matching Character Classes
以下是使用 Java 中的正则表达式匹配各种字符类别的示例。
Following are various examples of matching character classes using regular expression in java.
Sr.No |
Construct & Matches |
1 |
[abc]a, b, or c (simple class). |
2 |
[^abc]Any character except a, b, or c (negation). |
3 |
[a-zA-Z]a through z or A through Z, inclusive (range). |
4 |
[a-d[m-p]]a through d, or m through p: [a-dm-p] (union). |
5 |
[a-z&&[def]]d, e, or f (intersection). |
6 |
[a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction) |
7 |
[a-z&&[^m-p]]a through z, and not m through p: [a-lq-z](subtraction). |
Matching Predefined Character Classes
以下是使用正则表达式在 Java 中匹配预定义字符类的各种示例。
Following are various examples of matching predefined character classes using regular expression in java.
Sr.No |
Construct & Matches |
1 |
.Any character (may or may not match line terminators). |
2 |
\dA digit: [0-9]. |
3 |
\DA non-digit: [^0-9]. |
4 |
\sA whitespace character: [ \t\n\x0B\f\r] |
5 |
\SA non-whitespace character: [^\s]. |
6 |
\wA word character: [a-zA-Z_0-9]. |
7 |
\WA non-word character: [^\w] |
Matching POSIX Character Classes
以下是使用正则表达式在 Java 中匹配 POSIX 字符类的各种示例。
Following are various examples of matching POSIX character classes using regular expression in java.
Sr.No |
Construct & Matches |
1 |
\p{Lower}A lower-case alphabetic character: [a-z]. |
2 |
\p{Upper}An upper-case alphabetic character:[A-Z]. |
3 |
\p{ASCII}All ASCII:[\x00-\x7F]. |
4 |
\p{Alpha}An alphabetic character:[\p{Lower}\p{Upper}]. |
5 |
\p{Digit}A decimal digit: [0-9]. |
6 |
\p{Alnum}An alphanumeric character:[\p{Alpha}\p{Digit}]. |
7 |
\p{Punct}Punctuation: One of !"#$%&'()*+,-./:;<⇒?@[\]^_>{ |
}<. |
8 |
\p{Graph}A visible character: [\p{Alnum}\p{Punct}]. |
9 |
\p{Print}A printable character: [\p{Graph}\x20]. |
10 |
\pA space or a tab: [ \t]. |
11 |
\p{XDigit}A hexadecimal digit: [0-9a-fA-F]. |
12 |
Matching JAVA Character Classes
以下是使用 Java 中正则表达式匹配 JAVA 字符类的各种示例。
Following are various examples of matching JAVA character classes using regular expression in java.
Sr.No |
Construct & Matches |
1 |
\p{javaLowerCase}Equivalent to java.lang.Character.isLowerCase(). |
2 |
\p{javaUpperCase}Equivalent to java.lang.Character.isUpperCase(). |
3 |
\p{javaWhitespace}Equivalent to java.lang.Character.isWhitespace(). |
4 |
\p{javaMirrored}Equivalent to java.lang.Character.isMirrored(). |
Matching Unicode Character Classes
以下是使用正则表达式在 Java 中匹配 Unicode 字符类的各种示例。
Following are various examples of matching Unicode character classes using regular expression in java.
Sr.No |
Construct & Matches |
1 |
\p{IsLatin}A Latin script character. |
2 |
\p{InGreek}A character in the Greek block. |
3 |
\p{Lu}An uppercase letter. |
4 |
\p{IsAlphabetic}An alphabetic character (binary property). |
5 |
\p{Sc}A currency symbol. |
6 |
\P{InGreek}Any character except one in the Greek block. |
7 |
[\p{L}&&[^\p{Lu}]]Any letter except an uppercase letter. |
Examples of Boundary Matchers
以下是使用 Java 中的正则表达式编写边界匹配器的一些示例。
Following are various examples of Boundary Matchers using regular expression in java.
Sr.No |
Construct & Matches |
1 |
[role="bare"]../javaregex/javaregex_boundary_matcher_begin.htmlThe beginning of a line. |
2 |
$The end of a line. |
3 |
\bA word boundary. |
4 |
\BA non-word boundary. |
5 |
\AThe beginning of the input. |
6 |
\GThe end of the previous match. |
7 |
\ZThe end of the input but for the final terminator, if any. |
8 |
\zThe end of the input. |
Java Regexs of Greedy Quantifiers
贪婪限定符指示搜索引擎搜索整个字符串并检查它是否与给定的正则表达式匹配。以下是使用正则表达式在 Java 中使用贪婪限定符的各种示例。
A greedy quantifier indicates to search engine to search the entire string and check whether it matches the given regexp. Following are various examples of Greedy Quantifiers using regular expression in java.
Sr.No |
Construct & Matches |
1 |
X?X, once or not at all. |
2 |
X*X, zero or more times |
3 |
X+X, one or more times. |
4 |
X{n}X, exactly n times. |
5 |
X{n,}X, at least n times. |
6 |
X{n,m}X, at least n but not more than m times |
Examples of Reluctant Quantifiers
懒惰量词指示搜索引擎从最短可能的字符串片段开始。一旦找到匹配项,引擎将继续;否则,它会向正在检查的字符串部分添加一个字符并搜索该部分,依此类推。此过程将一直持续到找到匹配项或用完整个字符串。以下是使用 Java 中的正则表达式编写懒惰量词的一些示例。
A reluctant quantifier indicates the search engine to start with the shortest possible piece of the string. Once match found, the engine continue; otherwise it adds one character to the section of the string being checked and search that, and so on. This process follows until it finds a match or the entire string has been used up. Following are various examples of Reluctant Quantifiers using regular expression in java.
Sr.No |
Construct & Matches |
1 |
X??X, once or not at all. |
2 |
X*?X, zero or more times |
3 |
X+?X, one or more times. |
4 |
X{n}?X, exactly n times. |
5 |
X{n,}?X, at least n times. |
6 |
X{n,m}?X, at least n but not more than m times |
Examples of Possessive Quantifiers
独占量词类似于贪婪量词。它指示引擎从检查整个字符串开始。如果不行,则它有不同的含义,如果匹配失败,并且没有回顾。以下是用 Java 中正则表达式使用独占量词的各种示例。
A possessive quantifier is similar to greedy quantifier. It indicates the engine to start by checking the entire string.It is different in the sense if it doesn’t work, if match failed and there is no looking back. Following are various examples of Possessive Quantifiers using regular expression in java.
Sr.No |
Construct & Matches |
1 |
X?+X, once or not at all. |
2 |
X*+X, zero or more times |
3 |
X++X, one or more times. |
4 |
X{n}+X, exactly n times. |
5 |
X{n,}+X, at least n times. |
6 |
X{n,m}+X, at least n but not more than m times |
Java Regex - Examples of Logical Operators
以下是使用 Java 中正则表达式中逻辑运算符的各种示例。
Following are various examples of Logical Operators using regular expression in java.
Sr.No |
Construct & Matches |
1 |
XYX followed by Y. |
2 |
link:../javaregex/javaregex_logical_xory.html[X |