pig tutorial - apache pig tutorial - Apache Pig - STRSPLIT() - pig latin - apache pig - pig hadoop
What is STRSPLIT() ?
- STRSPLIT() function is used to split a given string by a given delimiter.
Syntax:
- The syntax of STRSPLIT() is given below. This function accepts a string that is needed to be split, a regular expression, and an integer value specifying the limit (the number of substrings the string should be split).
- This function parses the string and when it encounters the given regular expression, it splits the string into n number of substrings where n will be the value passed to limit.
Example:
Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name, age, and city.
wikitechy_emp.txt
And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
Following is an example of the STRSPLIT() function. If you observe the wikitechy_emp.txt file, you can find that, in the name column, we have the names and surnames of the employees separated by the delemeter '_'. In this example, we are trying to split the name and surname of the employees using STRSPLIT() function.
The result of the statement will be stored in the relation named strsplit_data. Verify the content of the relation strsplit_data, using the Dump operator as shown below.