52 lines
2.5 KiB
Markdown
52 lines
2.5 KiB
Markdown
# Word boundary: \b
|
|
|
|
A word boundary `pattern:\b` is a test, just like `pattern:^` and `pattern:$`.
|
|
|
|
When the regexp engine (program module that implements searching for regexps) comes across `pattern:\b`, it checks that the position in the string is a word boundary.
|
|
|
|
There are three different positions that qualify as word boundaries:
|
|
|
|
- At string start, if the first string character is a word character `pattern:\w`.
|
|
- Between two characters in the string, where one is a word character `pattern:\w` and the other is not.
|
|
- At string end, if the last string character is a word character `pattern:\w`.
|
|
|
|
For instance, regexp `pattern:\bJava\b` will be found in `subject:Hello, Java!`, where `subject:Java` is a standalone word, but not in `subject:Hello, JavaScript!`.
|
|
|
|
```js run
|
|
alert( "Hello, Java!".match(/\bJava\b/) ); // Java
|
|
alert( "Hello, JavaScript!".match(/\bJava\b/) ); // null
|
|
```
|
|
|
|
In the string `subject:Hello, Java!` following positions correspond to `pattern:\b`:
|
|
|
|

|
|
|
|
So, it matches the pattern `pattern:\bHello\b`, because:
|
|
|
|
1. At the beginning of the string matches the first test `pattern:\b`.
|
|
2. Then matches the word `pattern:Hello`.
|
|
3. Then the test `pattern:\b` matches again, as we're between `subject:o` and a comma.
|
|
|
|
The pattern `pattern:\bHello\b` would also match. But not `pattern:\bHell\b` (because there's no word boundary after `l`) and not `Java!\b` (because the exclamation sign is not a wordly character `pattern:\w`, so there's no word boundary after it).
|
|
|
|
```js run
|
|
alert( "Hello, Java!".match(/\bHello\b/) ); // Hello
|
|
alert( "Hello, Java!".match(/\bJava\b/) ); // Java
|
|
alert( "Hello, Java!".match(/\bHell\b/) ); // null (no match)
|
|
alert( "Hello, Java!".match(/\bJava!\b/) ); // null (no match)
|
|
```
|
|
|
|
We can use `pattern:\b` not only with words, but with digits as well.
|
|
|
|
For example, the pattern `pattern:\b\d\d\b` looks for standalone 2-digit numbers. In other words, it looks for 2-digit numbers that are surrounded by characters different from `pattern:\w`, such as spaces or punctuation (or text start/end).
|
|
|
|
```js run
|
|
alert( "1 23 456 78".match(/\b\d\d\b/g) ); // 23,78
|
|
alert( "12,34,56".match(/\b\d\d\b/g) ); // 12,34,56
|
|
```
|
|
|
|
```warn header="Word boundary `pattern:\b` doesn't work for non-latin alphabets"
|
|
The word boundary test `pattern:\b` checks that there should be `pattern:\w` on the one side from the position and "not `pattern:\w`" - on the other side.
|
|
|
|
But `pattern:\w` means a latin letter `a-z` (or a digit or an underscore), so the test doesn't work for other characters, e.g. cyrillic letters or hieroglyphs.
|
|
```
|