java tutorial - Java - Regular Expressions - java programming - learn java - java basics - java for beginners
- The java.util.regex package is provided by Java to associate regular expressions with a template. Regular Java expressions are characterized by a significant similarity to the Perl programming language and are very easy to learn.
- In Java, regular expressions are a special sequence of characters that allow you to match or reveal other strings or a set of them, relying on specialized syntax as a template. They can be used to search, edit or manipulate text and data.
The java.util.regex package to be begin with following three classes:
- Pattern Class - an object of the Pattern class denotes compiled representation of a regular expression. In the class Pattern public designer is not provided. To create a template, you first need to call one of the public public static compile () methods , which will then return a Pattern object. The regular expression in these methods is accepted as the first argument.
- Matcher Class - an object of the Matcher class represents a mechanism that interprets the template, and also performs matching operations with the input string. Similar to the Pattern class, Matcher does not contain any public constructors. An object of the Matcher class can be obtained by calling the matcher () method on an object of the Pattern class.
- PatternSyntaxException - PatternSyntaxException denotess the unchecked exception that contains syntax error in the regular expression pattern.
Collecting groups
- Collection groups denotes a method to handle many characters in one unit. They are generated by placing the symbols to be grouped in the series of parentheses. For example, the regular expression (dog) is a separate group containing the letters "d", "o", and "g".
Collecting groups are numbered by determining the number of opening parentheses from left to right. Thus, in the expression ((A) (B (C))) there are four similar groups:
- ((A) (B (C)))
- (A)
- (B (C))
- (C)
- To determine the number of groups represented in the expression, call the groupCount method on the matcher object in Java. The groupCount method retrieves a number of type int that displays the number of collection groups represented in the matched pattern.
- There is also a special group, group 0, which in all cases represents the expression in its entirety. This group is not included in the amount represented by the groupCount method.
Sample Code
Below is an example of a regular expression in Java, illustrating the way to identify a string of digits in alphanumeric strings.
Output
Regular Expression Syntax
In Java, regular expressions use special characters. The following table shows the metacharacters available in the regular expression syntax.
Subexpression | Notation |
^ | Matches the beginning of the line. |
$ | Matches the end of the line. |
. | Matches any single character, except for a new line. Using the m option makes it possible to match a new string. |
[...] | Matches any single character in square brackets. |
[^ ...] | Matches any single character outside square brackets. |
\ A | The beginning of the whole string. |
\ z | End of an entire string. |
\ Z | End of an entire string, except for an allowable end terminator. |
re * | Matches 0 or more occurrences of the preceding expression. |
re + | Matches 1 or more occurrences of the preceding expression. |
re? | Matches 0 or 1 occurrences of the previous expression. |
re {n} | Matches the specified number of occurrences of the previous expression. |
re {n,} | Matches n or more occurrences of the previous expression. |
re {n, m} | Matches n as a minimum and m in most attachments of the previous expression. |
a | b | Matches a or b. |
(re) | Groups regular expressions and stores the compared text. |
(?: re) | Group regular expressions without storing the compared text. |
(?> re) | Matches an independent template without a return. |
\ w | Matches verbal characters. |
\ W | Matches characters that do not form a word. |
\ s | Matches a space. The equivalent of [\ t \ n \ r \ f]. |
\ S | Matches a non-whitespace character. |
\ d | Matches the digit. Equivalent to [0-9]. |
\ D | Matches a non-numeric character. |
\ A | Matches the beginning of the line. |
\ Z | Matches the end of the line. If there is a new line, it is in front of it. |
\ z | Matches the end of the line. |
\ G | Matches the point where the previous match ends. |
\ n | The inverse reference to the collection group under the number "n". |
\ b | Matches the word boundary outside square brackets. Corresponds to the return to one position (0x08) inside square brackets. |
\ B | Match the boundaries of characters that do not form a word. |
\ n, \ t, etc. | Matches the line feed, carriage return, tab, etc. |
\ Q | Controls (cites) all characters before the \ E character. |
\ E | End quotation opened with \ Q. |
Matcher methods
- The following is a list of useful methods for a class instance.
Index Methods
Index methods represent useful index values that show the exact number of matches found in the input string.
No. | Method and Description |
1 | public int start () Returns the starting index to the previous match. |
2 | public int start (int group) Returns the starting index to the sequence captured by this group during the previous matching operation. |
No. | Method and Description |
1 | public boolean lookingAt () Attempts to match the input sequence at the beginning of the region with the template. |
2 | public boolean find () Attempts to search for the next subsequence in the input sequence that matches the pattern. |
3 | public boolean find (int start) Reset this matching match and try to find a new subsequence in the input sequence that matches the pattern from the specified index. |
4 | public boolean matches () Attempts to search for matches in the entire region with a template. |
Replacement methods
Replacement methods provide useful methods for replacing text in an input string.
No. | Method and Description |
1 | public Matcher appendReplacement (StringBuffer sb, String replacement) Produces nonterminal affiliation and substitution. |
2 | public StringBuffer appendTail (StringBuffer sb) Produces terminal connection and replacement. |
3 | public String replaceAll (String replacement) Replaces each subsequence in the input sequence that matches the pattern specified in the replacement string. |
4 | public String replaceFirst (String replacement) Replaces the first sub-sequence in the input sequence that matches the pattern specified in the replacement string. |
5 | public static String quoteReplacement (String s) Returns the literal replacement of the String for the specified String. This method produces a term that will function as a literal substitution for s in the appendReplacement method of the Matcher class. |
Start and end methods
- Following is the example that counts the number of times the word "cat" appears in the input string −
Output
- You can see that this example uses word boundaries to ensure that the letters "c" "a" "t" are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred.
- The start method returns the start index of the subsequence captured by the given group during the previous match operation, and the end returns the index of the last character matched, plus one.
Match and lookingAt methods
- The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingAt does not.
- Both methods always start at the beginning of the input string. Here is the example explaining the functionality −
the following result will be obtained:
The replaceFirst and replaceAll Methods
- The replaceFirst and replaceAll methods replace the text that matches the specified regular expression. Based on their name, replaceFirst replaces the first match, and replaceAll replaces the other matches.
- Here is an example explaining their functionality.
Output
AppendReplacement and appendTail methods
- The Matcher class also provides methods for replacing appendReplacement and appendTail.
- Here is an example explaining their functionality.
Output
Methods of class PatternSyntaxException
- PatternSyntaxException represents an unchecked exception that displays a syntax error in the regular expression pattern. The PatternSyntaxException class is represented by the following methods that will help you determine the error.
No. | Method and Description |
1 | public String getDescription () Represents a description of the error. |
2 | public int getIndex () Represents the error index. |
3 | public String getPattern () Represents a pattern of a regular expression that contains an error. |
4 | public String getMessage () Returns a multiline string that contains a description of the syntax error and its index, an erroneous pattern of the regular expression, and a visual indication of the error index in the template. |