Day 12. Regular Expressions

On Day 12, I learned about regular expressions. I have learned the concept at Day 2, but in this lesson I could learn more deeply about how to use them. Let’s read the documents and examples to become used to various expressions!

header image TIL


Regular Expressions

A regular expression, or RegExp is a small programming language that helps to find pattern in data. A RegExp can be used to check if some pattern exists in a different data types.

  • How to use RegExp in JavaScript

    1) Use RegExp constructor

    2) Declare a RegExp pattern: two forward slashes(\) followed by a flag

  • To declare a string
    • single quote(‘ ‘)
    • double quote(“ “)
    • backtick(``)
  • To declare a regular expression
    • two forward slashes(\) and an optional flag
    • the flag could be: g, i, m, s, u, y

1. RegExp parameters

A regular expression takes two parameters: a search pattern(required) and a flag(optional).

Pattern

A pattern could a be a text or any form of pattern that has similarity. For instance, the word ‘spam’ in an email could be a pattern. A pattern is something I’m interested to look for. A pattern could be a word, a number, etc.

Flags

Flags are optional parameters in a regular expression which determine the type of searching.

  • g: global flag. It looks for a pattern in the whole text.

  • i: ignore case. It searches case insensitively, for both lowercase and uppercase.

  • m: multiline


2. Creating a pattern with RegExp Constructor

/pattern/flags
new RegExp(pattern[, flags])
RegExp(pattern[, flags])


1) Declaring regular expression without flags

let pattern = 'love';
let regEx = new RegExp(pattern);

2) Declaring regular expression with g flag and i flag

let pattern = 'love';
let flag = 'gi'
let regEx = new RegExp(pattern, flag);

3) Declaring regular expression with flags + writing the flag inside the RegExp constructor

let regEx = new RegExp('love', 'gi');


3. Creating a pattern without RegExp Constructor

To declare a regular expression, use two forward slashes(\) and an optional flag.

// without using RegExp constructor
let regEx = /love/gi;

// using RegExp constructor
let regEx = new RegExp('love', 'gi');


4. RegExp Object Methods

1) testing for a match : test()

regexp.test(string)

test() is used when I want to know if the pattern exists in a string. It returns true or false.

const str = 'I love JavaScript';
const pattern = 'love'
let result = pattern.test(str);
console.log(result);  // true

A string method search() returns the index of a match(or -1 if the pattern is not found).
A RexExp method test() returns a boolean.


match()

string.match(regexp)

Return value:

  • if the match is not found: null

  • with g flag: all results matching the complete regular expression will be returned. Capturing groups will not be returned.

  • without g flag: only the first complete match and its related capturing groups are returned. The returned item will have additional properties: index, input, groups.


  • groups :

an object of named capturing groups, whose keys are the names and values are the capturing groups.

undefined if no named capturing groups were defined.

  • index : the index where the result was found

If I search for a word, match() without g flag will return the first index of the word detected.

  • input: a copy of the search string
// if the match is not found
const str = 'I love JavaScript';
const pattern = /art/;
const result = str.match(pattern);
console.log(result);  // null


// with 'g' flag
const str = 'I love JavaScript';
const pattern = /[A-Z]]/g;
const result = str.match(pattern);
console.log(result);
// (3) ['I', 'J', 'S']


// without 'g' flag
const str = 'I love JavaScript';
const pattern = /love/;
const result = str.match(pattern);
console.log(result);
// ['love', index: 2, input: 'I love JavaScript', groups: undefined]



search()

search() is used when I want to know whether a pattern is found, and also know its index within a string.

string.search(regexp)

Return value:

  • if found: the index of the first match between the regular expression and the given string.

  • if not found: -1

const str = 'I love JavaScript';
const pattern = /love/g;
const result = str.search(pattern);
console.log(result);  // 2


3) replacing a substring: replace()

string.replace(regexp, newSubstr)
string.replace(regexp, replacerFunction)string.

string.replace(substr, newSubstr)
string.replace(substr, replacerFunction)

replace() executes a search for a match in a string, and replaces the matched substring with a replacement substring.

const txt = 'If it rains, I will take my umbrella. The Umbrella Academy is so fun.';

newTxt = txt.replace(/Umbrella|umbrella/, 'Sparrow')
console.log(newTxt);
// If it rains, I will take my Sparrow. The Umbrella Academy is so fun.

I can use OR operator(|) to find both lowercase and uppercase words. Using i flag will do the same job more easily. One more thing. Since the original string variable is declared as const, I should set another variable which is to be replaced.

const txt = 'If it rains, I will take my umbrella. The Umbrella Academy is so fun.';

newTxt = txt.replace(/Umbrella/gi, 'Sparrow')
console.log(newTxt);
// If it rains, I will take my Sparrow. The Sparrow Academy is so fun.

Since I used both g flag and i flag, replace() method will find the pattern in the whole text(g) case-insensitively(i).


4) How to set a regular expression

[ ] : a set of characters

[a-c]: a or b or c

[a-z]: any letter between a and z(lowercase)

[A-Z]: any letter between A and Z(uppercase)

[0-3]: 0 or 1 or 2 or 3

[0-9]: any number between 0 and 9 (same as `\d`)

[A-Za-z0-9]: any character between A and Z, a and z, 0 and 9
  • Using square bracket([]) with/without g flag
// without 'g' flag
const pattern = '[Aa]pple'; 
const txt = 'Apple is delicious. I like an apple juice.'
const matches = txt.match(pattern);

console.log(matches);
// ['Apple', index: 0, input: 'Apple is delicious. I like an apple juice.', groups: undefined]


// with 'g' flag
const pattern = /[Aa]pple/g;
const txt = 'Apple is delicious. I like an apple juice.'
const matches = txt.match(pattern);

console.log(matches);  // (2) ['Apple', 'apple']

In the upper code, [Aa] means ‘either A or a’.


  • Using square bracket([]) and or operator(|)
const pattern = /[Aa]pple|[Oo]range/g
const txt = 'Apple is delicious. I like apple juice and orange juice. Orange has a nice flavor.'
const matches = txt.match(pattern);
console.log(matches);
// (4) ['Apple', 'apple', 'orange', 'Orange']


\ : escape from special characters

  • \d: one digit - match where the string contains digits(0-9)

  • \D: one non-digit - match where the string doesn’t contain digits, including space(‘ ‘)


  • Using escape character(\) with/without +
// without +
const pattern = /\d/g;
const txt = 'Today is May 11, 2022.';
const matches = txt.match(pattern);

console.log(matches); 
// (6) ['1', '1', '2', '0', '2', '2']


// with +(one or more times)
const pattern = /\d+/g;
const txt = 'Today is May 11, 2022.';
const matches = txt.match(pattern);

console.log(matches); // (2) ['11', '2022']


// \D
const pattern = /\D/g;
const txt = 'Today is May 11, 2022.';
const matches = txt.match(pattern);

console.log(matches);
// (16) ['T', 'o', 'd', 'a', 'y', ' ', 'i', 's', ' ', 'M', 'a', 'y', ' ', ',', ' ', '.']


// \D+
const pattern = /\D+/g;
const txt = 'Today is May 11, 2022.';
const matches = txt.match(pattern);

console.log(matches);
// (3) ['Today is May ', ', ', '.']

‘d’ is a special character that means ‘digits’.


. : any character except the newline character(\n)

[a]. finds a pattern that starts with ‘a’, followed by any one character except newline(\n).

// search aㅁ
const pattern = /[a]./g;
const txt = 'Apple and banana are fruits.';
const matches = txt.match(pattern);

console.log(matches);
// (5) ['an', 'an', 'an', 'a ', 'ar']


// search aㅁㅁ
const pattern = /[a]../g;
const txt = 'Apple and banana are fruits.'
const matches = txt.match(pattern);

console.log(matches);
// (3) ['and', 'ana', 'a a']


// search aㅁㅁ...
const pattern = /[a].+/g;
const txt = 'Apple and banana are fruits.'
const matches = txt.match(pattern);

console.log(matches);
// ['and banana are fruits.']



^ (Caret) : start of string / negation

  • Using ^ as ‘starts with’

^word : a sentence which starts with word

const str = 'Learning JavaScript is fun.';
// test if the string starts with 'l'
console.log(/^l/i.test(str));  // true


// without 'g' flag
const txt = 'This is Ramona. This lesson is interesting.'
const pattern = /^This/;
const matches =  txt.match(pattern);

console.log(matches);
// ['This', index: 0, input: 'This is Ramona. This lesson is interesting.', groups: undefined]


// with 'g' flag (no difference)
const txt = 'This is Ramona. This lesson is interesting.';
const pattern = /^This/g;
const matches =  txt.match(pattern);

console.log(matches);  // ['This']


  • Using ^ as negation(in a set character)

[^abc] : not a, not b, not c

const txt = 'This is Ramona. It is 11 in the morning.';
// not A-Z, not a-z, not ., not ,, not space( )
const pattern = /[^A-Za-z., ]+/g;
const matches =  txt.match(pattern);

console.log(matches);  // ['11']
// without '+', ['1', '1'] is returned


$ : end of string()

word$ : a sentence which ends with word


④+⑤ ^...$ : exact match

^ and $ can be used together like ^...$ to find a full match.

/* a pattern that starts with A-Z(uppercase),
ends with a-z(lowercase), and has 3-12 characters */
let pattern = /^[A-Z][a-z]{3,12}$/;
let name = 'Ramona';
let result = pattern.test(name);

console.log(result);  // true


// some experiments
let name = 'ramOnd';
console.log(result);  // false

let name = 'RamONd';
console.log(result);  // false



* : zero or more times (same as {0,})

[word]* : word may not exist or occur many times

/* find a pattern that starts with 'a' 
followed by any one or more characters except newline(\n) */
const pattern = /[a].*/g
const txt = 'Apple and banana are fruits.'
const matches = txt.match(pattern);

console.log(matches);
// ['and banana are fruits.']


+ : one or more times

[word]+ : word may occur at least once or many times


? : zero or one times (same as {0, 1})

[word]? : word may not exist or occur only once, making the symbol (word) optional

th?at finds a pattern that starts with ‘t’, followed by an optional ‘h’(none or one) and ends with ‘at’.

const txt = 'hat that tat threat thhat';
// search tㅁat
const pattern = /th?at/g; 
const matches = txt.match(pattern);

console.log(matches);
// (2) ['that', 'tat']
// notice that 'thhat' isn't included


const txt = 'Do you write color or colour?'
// search coloㅁr
const pattern = /colou?r/g; 
const matches = txt.match(pattern);

console.log(matches);  // (2) ['color', 'colour']

In the upper code, the pattern looks for ‘colo’ followed by zero or one ‘u’, and then ‘r’. It both returns ‘color’ and ‘colour’. ? cannot be placed in the front.


{} : the quantifier

I can specify the length of the substring that I look for in a text, using curly braces.

  • {3} : exactly 3 characters

  • {3,} : at least 3 characters

  • {3,8} : 3 to 8 characters

// exact length
const txt = 'Today is May 11 of the year 2022.';
const pattern = /\d{4}/g;
const matches = txt.match(pattern);

console.log(matches);  // ['2022']


// range of length
const txt = 'Today is May 11 of the year 2022. I am learning coding for 48 days.';
const pattern = /\d{1,4}/g;
const matches = txt.match(pattern);

console.log(matches);  // (3) ['11', '2022', '48']


| : either / or

a|b : either a or b


() : capture and group


Leave a comment