Category: 14. Java Regex

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRhLeb04eaO-Ba8Vtbbsf9WWWF1Yq18-g9a3A&s

  • Examples Matching Characters

    Following are various examples of matching characters using regular expression in java.

    Sr.NoConstruct & Matches
    1xThe character x
    2\\The backslash character
    3\0nThe character with octal value 0n (0 ≤ n ≤ 7)
    4\0nnThe character with octal value 0nn (0 ≤ n ≤ 7)
    5\0mnnThe character with octal value 0mnn (0 ≤ m ≤ 3, 0 ≤ n ≤ 7)
    6\xhhThe character with hexadecimal value 0xhh
    7\uhhhhThe character with hexadecimal value 0xhhhh
    8\tThe tab character (‘\u0009’)
    9\nThe newline (line feed) character (‘\u000A’)
    10\rThe carriage-return character (‘\u000D’)
    11\fThe form-feed character (‘\u000C’)
  • PatternSyntaxException 

    Introduction

    The java.util.regex.PatternSyntaxException class represents a unchecked exception thrown to indicate a syntax error in a regular-expression pattern.

    Class declaration

    Following is the declaration for java.util.regex.PatternSyntaxException class −

    public class PatternSyntaxException
       extends IllegalArgumentException
    

    Learn Java in-depth with real-world projects through our Java certification course. Enroll and become a certified expert to boost your career.

    Constructors

    Sr.NoMethod & Description
    1PatternSyntaxException(String desc, String regex, int index)Constructs a new instance of this class.

    Class methods

    Sr.NoMethod & Description
    1String getDescription()Retrieves the description of the error.
    2int getIndex()Retrieves the error index.
    3String getMessage()Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular-expression pattern, and a visual indication of the error index within the pattern.
    4String getPattern()Retrieves the erroneous regular-expression pattern.

    Methods inherited

    This class inherits methods from the following classes −

    • Java.lang.Throwable
    • Java.lang.Object

    Example

    The following example shows the usage of java.util.regex.Pattern.PatternSyntaxException class methods.

    package com.tutorialspoint;
    
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    import java.util.regex.PatternSyntaxException;
    
    public class PatternSyntaxExceptionDemo {
       private static String REGEX = "[";
       private static String INPUT = "The dog says meow " + "All dogs say meow.";
       private static String REPLACE = "cat";
    
       public static void main(String[] args) {
    
      try{
         Pattern pattern = Pattern.compile(REGEX);
         
         // get a matcher object
         Matcher matcher = pattern.matcher(INPUT); 
         INPUT = matcher.replaceAll(REPLACE);
      } catch(PatternSyntaxException e){
         System.out.println("PatternSyntaxException: ");
         System.out.println("Description: "+ e.getDescription());
         System.out.println("Index: "+ e.getIndex());
         System.out.println("Message: "+ e.getMessage());
         System.out.println("Pattern: "+ e.getPattern());
      }
    } }

    Let us compile and run the above program, this will produce the following result −

    PatternSyntaxException: 
    Description: Unclosed character class
    Index: 0
    Message: Unclosed character class near index 0
    [
    ^
    Pattern: [
    
  • Matcher Class

    Introduction

    The java.util.regex.Matcher class acts as an engine that performs match operations on a character sequence by interpreting a Pattern.

    Class declaration

    Following is the declaration for java.util.regex.Matcher class −

    public final class Matcher
       extends Object
    
      implements MatchResult

    Learn Java in-depth with real-world projects through our Java certification course. Enroll and become a certified expert to boost your career.

    Class methods

    Sr.NoMethod & Description
    1Matcher appendReplacement(StringBuffer sb, String replacement)Implements a non-terminal append-and-replace step.
    2StringBuffer appendTail(StringBuffer sb)Implements a terminal append-and-replace step.
    3int end()Returns the offset after the last character matched.
    4int end(int group)Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.
    5boolean find()Attempts to find the next subsequence of the input sequence that matches the pattern.
    6boolean find(int start)Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
    7String group()Returns the input subsequence captured by the given group during the previous match operation.
    8String group(String name)Returns the input subsequence captured by the given named-capturing group during the previous match operation.
    9int groupCount()Returns the number of capturing groups in this matcher’s pattern.
    10boolean hasAnchoringBounds()Queries the anchoring of region bounds for this matcher.
    11boolean hasTransparentBounds()Queries the transparency of region bounds for this matcher.
    12boolean hitEnd()Returns true if the end of input was hit by the search engine in the last match operation performed by this matcher.
    13boolean lookingAt()Attempts to match the input sequence, starting at the beginning of the region, against the pattern.
    14boolean matches()Attempts to match the entire region against the pattern.
    15Pattern pattern()Returns the pattern that is interpreted by this matcher.
    16static String quoteReplacement(String s)Returns a literal replacement String for the specified String.
    17Matcher region(int start, int end)Sets the limits of this matcher’s region.
    18int regionEnd()Reports the end index (exclusive) of this matcher’s region.
    19int regionStart()Reports the start index of this matcher’s region.
    20String replaceAll(String replacement)Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
    21String replaceFirst(String replacement)Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.
    22boolean requireEnd()Returns true if more input could change a positive match into a negative one.
    23Matcher reset()Resets this matcher.
    24Matcher reset(CharSequence input)Resets this matcher with a new input sequence.
    25int start()Returns the start index of the previous match.
    26int start(int group)Returns the start index of the subsequence captured by the given group during the previous match operation.
    27MatchResult toMatchResult()Returns the match state of this matcher as a MatchResult.
    28String toString()Returns the string representation of this matcher.
    29Matcher useAnchoringBounds(boolean b)Sets the anchoring of region bounds for this matcher.
    30Matcher usePattern(Pattern newPattern)Changes the Pattern that this Matcher uses to find matches with.
    31Matcher useTransparentBounds(boolean b)Sets the transparency of region bounds for this matcher.

    Methods inherited

    This class inherits methods from the following classes −

    • Java.lang.Object
  • Pattern Class

    Introduction

    The java.util.regex.Pattern class represents a compiled representation of a regular expression.

    Class declaration

    Following is the declaration for java.util.regex.Pattern class −

    public final class Pattern
       extends Object
    
      implements Serializable

    Learn Java in-depth with real-world projects through our Java certification course. Enroll and become a certified expert to boost your career.

    Field

    Following are the fields for java.util.regex.Duration class −

    • static int CANON_EQ − Enables canonical equivalence.
    • static int CASE_INSENSITIVE − Enables case-insensitive matching.
    • static int COMMENTS − Permits whitespace and comments in pattern.
    • static int DOTALL − Enables dotall mode.
    • static int LITERAL − Enables literal parsing of the pattern.
    • static int MULTILINE − Enables multiline mode.
    • static int UNICODE_CASE − Enables Unicode-aware case folding.
    • static int UNICODE_CHARACTER_CLASS − Enables the Unicode version of Predefined character classes and POSIX character classes.
    • static int UNIX_LINES − Enables Unix lines mode.

    Class methods

    Sr.NoMethod & Description
    1static Pattern compile(String regex)Compiles the given regular expression into a pattern.
    2static Pattern compile(String regex, int flags)Compiles the given regular expression into a pattern with the given flags.
    3int flags()Returns this pattern’s match flags.
    4Matcher matcher(CharSequence input)Creates a matcher that will match the given input against this pattern.
    5static boolean matches(String regex, CharSequence input)Compiles the given regular expression and attempts to match the given input against it.
    6String pattern()Returns the regular expression from which this pattern was compiled.
    7static String quote(String s)Returns a literal pattern String for the specified String.
    8String[] split(CharSequence input)Splits the given input sequence around matches of this pattern.
    9String[] split(CharSequence input, int limit)Splits the given input sequence around matches of this pattern.
    10String toString()Returns the string representation of this pattern.

    Methods inherited

    This class inherits methods from the following classes −

    • Java.lang.Object
  • MatchResult Interface

    Introduction

    The java.util.regex.MatchResult interface represents the result of a match operation. This interface contains query methods used to determine the results of a match against a regular expression. The match boundaries, groups and group boundaries can be seen but not modified through a MatchResult.

    Interface declaration

    Following is the declaration for java.util.regex.MatchResult interface −

    public interface MatchResult
    

    Learn Java in-depth with real-world projects through our Java certification course. Enroll and become a certified expert to boost your career.

    Interface methods

    Sr.NoMethod & Description
    1int end()Returns the offset after the last character matched.
    2int end(int group)Returns the offset after the last character of the subsequence captured by the given group during this match.
    3String group()Returns the input subsequence matched by the previous match.
    4String group(int group)Returns the input subsequence captured by the given group during the previous match operation.
    5int groupCount()Returns the number of capturing groups in this match result’s pattern.
    6int start()Returns the start index of the match.
    7int start(int group)Returns the start index of the subsequence captured by the given group during this match.
  • Capturing Groups

    Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters “d”, “o”, and “g”.

    Capturing groups are numbered by counting their opening parentheses from the left to the right. In the expression ((A)(B(C))), for example, there are four such groups −

    • ((A)(B(C)))
    • (A)
    • (B(C))
    • (C)

    To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher’s pattern.

    There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.

    Example

    Following example illustrates how to find a digit string from the given alphanumeric string −

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class RegexMatches {
       public static void main( String args[] ) {
    
      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(.*)(\\d+)(.*)";
      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);
      // Now create matcher object.
      Matcher m = r.matcher(line);
      
      if (m.find( )) {
         System.out.println("Found value: " + m.group(0) );
         System.out.println("Found value: " + m.group(1) );
         System.out.println("Found value: " + m.group(2) );
      } else {
         System.out.println("NO MATCH");
      }
    } }

    This will produce the following result −

    Output

    Found value: This order was placed for QT3000! OK?
    Found value: This order was placed for QT300
    Found value: 0
    
  • Overview

    Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.

    A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.

    The java.util.regex package primarily consists of the following three classes −

    • Pattern Class − A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile() methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
    • Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.
    • PatternSyntaxException − A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.