Friday 30 May 2014

Regular Expressions in Oracle

What Are Regular Expressions?

Regular expressions (normally abbreviated regex or regexp) are sequences of characters that form a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace” like operations. The concept of regular expressions came into existence around 1950s through an American mathematician named Stephen Kleene. These expressions were used first time in UNIX. In today’s world, regular expressions are widely supported in many programming languages like Oracle, Java, .NET, Perl, Python, C++, Ruby, AWK etc.

Regular Expressions in Oracle:

Regular Expressions came into Oracle with 10g database. Oracle Regular Expressions provide a simple yet powerful mechanism for rapidly describing patterns and greatly simplifies the way in which you search, extract, format, and otherwise manipulate text in the database. Oracle DB implements regular expression support with a set of Oracle Database SQL functions and conditions that enable you to search and manipulate string data. You can use these functions in any environment that supports Oracle Database SQL. You can use these functions on a text literal, bind variable, or any column that holds character data such as CHARNCHARCLOBNCLOBNVARCHAR2, and VARCHAR2 (but not LONG).
Here are those condition and functions supported by Oracle:
Oracle Condition/Function Description
REGEXP_LIKE It is similar to the LIKE condition, except REGEXP_LIKE performs regular expression matching instead of the simple pattern matching performed by LIKE. It searches a character column for a pattern. Use this function in the WHERE clause of a query to return rows matching a regular expression. The condition is also valid in a constraint or as a PL/SQL function returning a Boolean.
REGEXP_INSTR It extends the functionality of the INSTR function by letting you search a string for a regular expression pattern. It searches a string for a given occurrence of a regular expression pattern and returns an integer indicating the position in the string where the match is found. You can specify which occurrence you want to find and the start position.
REGEXP_REPLACE It extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. It searches for a pattern in a character column and replaces each occurrence of that pattern with the specified string.
REGEXP_SUBSTR It extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It returns the substring matching the regular expression pattern that you specify.
REGEXP_COUNT It complements the functionality of the REGEXP_INSTR function by returning the number of times a pattern occurs in a source string. The function evaluates strings using characters as defined by the input character set. It returns an integer indicating the number of occurrences of pattern. If no match is found, then the function returns 0. 

Here are the syntax for those condition and functions:
Oracle Condition/Function Syntax
REGEXP_LIKE REGEXP_LIKE(search_string, pattern [,match_option])
REGEXP_INSTR REGEXP_INSTR(search_string, pattern [, position [, occurrence [, return_option [, match_option]]]])
REGEXP_REPLACE REGEXP_REPLACE(search_string, pattern [,replace_string [, position [, occurrence [, match_option]]]])
REGEXP_SUBSTR REGEXP_SUBSTR(search_string, pattern [, position [, occurrence [, match_option]]])
REGEXP_COUNT REGEXP_COUNT(search_string, pattern [,position] [,match_option])

Where:
  • search_string: the string you are searching in.
  • pattern: A regular expression. (Max 512 bytes). 
  • position: The character position from which you want the search to begin.
  • occurrence: The occurrence you are searching for. If you want to return the second match for the regular expression, this is a 2.
  • return_option: 0 means return the pattern’s beginning position. 1 mean return the ending character position.
  • match_option: can have one of these values:
    ‘c’: case-sensitive (default);
    ‘i’: case-insensitive; 
    ‘n’: Allow match-any-character operator (.);
    ‘m’: Treat source string as multiple line. For every line ^ and $ will be used for the beginning and ending of the line.
Oracle Database follows the exact syntax and matching semantics for these operators as defined in the IEE POSIX (Portable Operating System Interface) standard for matching ASCII (English language) data.
Here is the list of POSIX Metacharacters that can be used Oracle Database Regular Expressions:
Syntax Description Example
. Matches any character in the supported character set except NULL. a.b matches abb, acb, adb etc.
+ Matches one or more occurrences a+ matches a, aa, aaa
? Matches zero or one occurrence ab?c matches abc and ac, but does not match abbc.
* Matches zero or more occurrences  ab*c matches acabc, and abbc.
^ Matches the beginning of a string by default. In multiline mode, it matches the beginning of any line anywhere within the source string. ^abc matches abc in the string abcxyz but does not match abc in xyzabc.
$ Matches the end of a string by default. In multiline mode, it matches the end of any line anywhere within the source string. $abc matches abc in the string xyzabc but does not match abc in abcxyz.
\ Escape character. Use a backslash to search for a character that is normally threated as a metacharacter. The expression \+ searches for the plus character (+). It matches the plus in the string abc+xyz.
[] Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. [abc] matches the first character in the strings allbill, and cold, but does not match any characters in doll.
[^ ] Non matching character list. Matches none of the expressions represented in the list The expression [^abc] matches the character d in abcdef, but not the character ab, or c.
| Matches one of the alternatives. a|b matches character a or character b.
[:class:] Matches any character belonging to the specified POSIX character class.  [:alnum:] Alphanumeric characters 
[:alpha:] Alphabetic characters 
[:ascii:] ASCII characters 
[:blank:] Space and tab 
[:cntrl:] Control characters
[:digit:] Digits 
[:graph:] Visible characters 
[:lower:] Lowercase letters [a-z]
[:print:] Visible characters and spaces
[:punct:] Punctuation and symbols. 
[:space:] All whitespace characters, including line breaks 
[:upper:] Uppercase letters 
[:word:] Word characters
[:xdigit:] Hexadecimal digits
The expression [[:upper:]]+ searches for one or more consecutive uppercase characters. 
[.element.] Specifies a collating element to use in the regular expression.  The expression [[.ch.]] searches for the collating element ch and matches ch in string chabc, but does not match cdefg.
{m} Matches exactly m times a{3} matches the strings aaa, but does not match aa.
{m,} Matches at least m times a{3,} matches the strings aaa and aaaa, but does not match aa.
{m,n} Matches at least m times but no more than n times a{3,5} matches the strings aaaaaaa, andaaaaa, but does not match aa.
\n The backreference expression (n is a digit between 1 and 9) matches the nth subexpression enclosed between ‘(‘ and ‘)’ preceding the \n The expression (abc|def)xy\1 matches the strings abcxyabc and defxydef, but does not match abcxydef orabcxy.
( ) Grouping expression, treated as a single subexpression (abc)?def matches the optional string abc, followed by def. Thus, the expression matches abcdefghi and def, but does not match ghi.

Oracle Database also provides built-in support for some of the most heavily used Perl regular expression extensions that are not included in the POSIX standard but do not conflict with it. Here is the list that describes Perl-influenced metacharacters supported in Oracle Database regular expression functions and conditions.
Reg. Exp. Description
\d A digit character. It is equivalent to the POSIX class [[:digit:]].
\D A non-digit character. It is equivalent to the POSIX class [^[:digit:]].
\w A word character, which is defined as an alphanumeric or underscore (_) character. It is equivalent to the POSIX class [[:alnum:]_].
\W A non-word character. It is equivalent to the POSIX class [^[:alnum:]_].
\s A whitespace character. It is equivalent to the POSIX class [[:space:]].
\S A non-whitespace character. It is equivalent to the POSIX class[^[:space:]].
\A Only at the beginning of a string. In multi-line mode, that is, when embedded newline characters in a string are considered the termination of a line, \A does not match the beginning of each line.
\Z Only at the end of string or before a newline ending a string. In multi-line mode, that is, when embedded newline characters in a string are considered the termination of a line, \Z does not match the end of each line.
\z Only at the end of a string.
*? The preceding pattern element 0 or more times (non-greedy). Note that this quantifier matches the empty string whenever possible.
+? The preceding pattern element 1 or more times (non-greedy).
?? The preceding pattern element 0 or 1 time (non-greedy). Note that this quantifier matches the empty string whenever possible.
{n}? The preceding pattern element exactly n times (non-greedy). In this case{n}? is equivalent to {n}.
{n,}? The preceding pattern element at least n times (non-greedy).
{n,m}? At least n but not more than m times (non-greedy). Note that {0,m}?matches the empty string whenever possible.

Few Examples:

Demo Table: xx_demo_reg_exp

] Find the Employees where emp_num column contains only numbers:
 
select *
from xx_demo_reg_exp
where regexp_like(emp_num, '^-?[[:digit:],.]*$');
 
 regexp example 1
2] Put a space after every character in the column FIRST_NAME:

SELECT first_name,
REGEXP_REPLACE(first_name, '(.)', '\1 ') "First Name with Space"
FROM xx_demo_reg_exp;
 
 
regexp example 2 
 
3] Count how many m’s (ignore case) are there in the LAST_NAME column of the employees:
 
SELECT LAST_NAME,
REGEXP_COUNT(last_name, 'M', 1, 'i') "Count of M"
FROM xx_demo_reg_exp;
regexp example 3 
 
4] Find the employees whose LAST_NAME contain at-least one lower case character:
 
SELECT *
FROM xx_demo_reg_exp
WHERE REGEXP_LIKE(LAST_NAME, '[[:lower:]]');
 
 
regexp example 4 
 
 6] Find employees with valid phone numbers (10 digits with formats XXXXXXXXXX or XXX-XXX-XXXX or (XXX) XXX-XXXX)
 
select *
from xx_demo_reg_exp
WHERE REGEXP_LIKE(phone, '^\(?\d{3}\)?([[:blank:]|-])?\d{3}-?\d{4}$');
 
regexp example 6 
 
 7] Find employees with valid gmail id:
 
SELECT *
FROM xx_demo_reg_exp WHERE
REGEXP_LIKE (EMAIL, '[a-z0-9]+@gmail+\.[a-z]{2,4}');
 
 
 regexp example 7
 
 
 
 
 
 
 

1 comment:

  1. Regards,
    Harish
    Oracle technical and fusion cloud SCM online classes, if you are interested please whatsapp to this number +91 7382582893, thank you.

    ReplyDelete