Basic Regex Patterns in Java/Perl

Below you’ll find the basic Regex patterns you can use to match, edit and replace strings. I am using the Java/Perl Regex flavor, so the patterns might be slightly different if you are using another programming language or platform.

Java Implementation

Here’s a simple Java sample that uses the Regex library. It will try to match the given pattern on a string as many times as possible.

import java.util.regex.*;

public class Text{

        public static void main(String[] args){
                boolean found = false;
                String str1 = "Washington is located in the United States.";
                Pattern pattern = Pattern.compile("[Ww]ashington");

                Matcher matcher = pattern.matcher(str1);

                        found = true;
                        System.out.println("Found: ";
                        System.out.println("Start index: "+matcher.start());
                        System.out.println("End index: "+matcher.end());

                        System.out.println("No match found");



Regex Patterns

1. “string” (String literals)

The most basic pattern you can use is a string literal. It will basically try to match the exact pattern on the target string, as many times as possible.

“test” will be matched twice on the string “testtest”

2. . (metacharacter)

The . will match any given character, so the pattern “box.” will match “boxe” as well as “boxp”. The pattern “.” will match “a”, “b” and “c” on the string “abc”.

Remember that you can escape metacharacters with a backslash, so “\.” will match a dot in the target string.

3. [] (character class)
You can use brackets to create a disjuntion on your pattern (i.e. a separated part), which is also called a character class. The matcher will look for any of the characters inside the brackets. For instance, this can be used to match a string with or without a capital letter.

“[Ww]ashington” will match either “Washington” or “washington”

4. ^ (negation)

The ^ metacharacter negates the characters inside a character class. So “[^abc]ice” will match “dice” but not “bice”.

5. [a-d] (range)

If you want to include many characters or numbers on your character class you can use the hyphen to form a range. For instance, “[a-d]” will match any character from a through d.

6. Unions and Intersections

You can compose a character class from the union of two different classes. You achieve that by nesting the classes: “[a-c[d-f]]” will match any character from a through f.

For the intersection you use the && symbol before the nested element. For instance, “[a-c&&[d-f]]” won’t match anything because the intersection is empty.

7. Predefined Classes

There are several predefined classes that will make your job easier. The most used ones are:
. (any character)
\d (any digit)
\D (any character except digits)
\s (whitespace)
\S (anything except whitespace)
\w (any alphanumeric character)
\W (anything except alphanumeric characters)

8. Quantifiers

You can use quantifiers to specify how many times or in which sequences the characters you are looking for should appear.

“a*” means the character ‘a’ appearing zero or more times
“a+” means the character ‘a’ appearing one or more times
“a?” means the character ‘a’ appearing once or not at all
“a{5}” means the character ‘a’ appearing exactly five times
“a{2,}” means the character ‘a’ appearing at least twice
“a{2,3}” means the character ‘a’ appearing at least twice but at most three times

9. Specifying Locations

If needed you can specify exactly where your pattern should be matched on the target string.

^ means at the beginning of the line
$ means at the end of the line
\b means word boundary
\G means the end of the previous match

Leave a Reply

Your email address will not be published. Required fields are marked *