JavaScript

How to Tokenize a String in JavaScript

To address the issue of string tokenizing, some languages offer special classes. In JavaScript, no dedicated classes or functions support tokenizing string problems. However, we have an effective mechanism in the form of regular expressions. Therefore, use regular expressions with a JavaScript predefined method to parse strings into tokens for tokenization.

This article will illustrate the procedure for JavaScript string tokenization.

How to Tokenize a String in JavaScript?

To tokenize a string in JavaScript, use the JavaScript built-in method named the “split()” method. The JavaScript split() method splits a string into an array of substrings. The original string is not changed. It requires two optional parameters that indicate how the method should act.

How to Tokenize a String Using split() Method?

Follow the below syntax of the split() method for tokenizing a string in JavaScript:

string.split(separator, limit);

 

    • Here, the “separator” is an alphanumeric or non-alphanumeric character, such as a space, or a regex pattern, is used as the separator parameter to specify where to split the string.
    • limit” is an integer that indicates the number of splits.
    • The method is invoked on a variable that has a string value with the help of dot notation.
    • It returns an array of substrings based on the arguments, and if no parameter is passed in the method, it will return the whole string.

Example 1

In the following example, first, create a variable “str” and store a string in it:

var str = "LinuxHint is the best website for learning skills";

 
Now, split the string into tokens using the “split()” method by passing (“ “) as an argument. The space indicates that the string will be split as the space occurs:

var strToken = str.split(" ");

 
Finally, print the tokens on the console using the “console.log()” method:

console.log(strToken);

 
The output displays an array of substrings based on the separator “space” (“ ”):


The split() method also takes the “regex pattern” as a separator instead of a specific character as an argument:

var strToken = str.split(/\W+/);

 
Here, in regex pattern, the forward slashes (/) indicates the start and end of a pattern, while the (\W) is the metacharacter that matches all the alphanumeric characters a-z, A-Z, 0-9 without white spaces. While (+) indicates multiple matches.

Output


If you want to get tokens of a specific length from a string, follow the given section.

Example 2

Now, tokenize a string of length three from a string. To do this, use the “filter()” method with the “split()” method:

var strToken = str.split(" ").filter(function(token) {
 return token.length == 3;
});

 
Print the resultant tokens on console:

console.log(strToken);

 
The output indicates that only substrings of length 3 are returned from the string:

Conclusion

To tokenize a string in JavaScript, you can use the “split()” method. The split() method divides the string depending on its input “separator”. It can split a string into a number of smaller strings depending on the arguments. If the method receives no parameters, the entire string will be printed. If you want to get tokens of a specific length from a string, use the “filter()” method with the split() method. In this article, the process of tokenizing a string in JavaScript is illustrated with examples.

About the author

Farah Batool

I completed my master's degree in computer science. I am an academic researcher and love to learn and write about new technologies. I am passionate about writing and sharing my experience with the world.