pig tutorial - apache pig tutorial - Apache Pig STRSPLITTOBAG() - pig latin - apache pig - pig hadoop
What is STRSPLIT()?
- This function is similar to the STRSPLIT() function. It splits the string by a given delimiter and returns the result in a bag.
Syntax:
- · The syntax of STRSPLITTOBAG() is given below.
- · This function accepts a string that is needed to be split, a regular expression, and an integer value specifying the limit (the number of substrings the string should be split).
- · This function parses the string and when it encounters the given regular expression, it splits the sting into n number of substrings where n will be the value passed to limit.
Example:
- Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name, age, and city.
wikitechy_emp.txt
- And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
- Following is an example of the STRSPLITTOBAG() function. If you observe the wikitechy_emp.txt file, you can find that, in the name column, we have name and surname of the employees separated by the delemeter “_”.
- In this example, we are trying to split the name and surname of the employee, and get the result in a bag using STRSPLITTOBAG() function.
- The result of the statement will be stored in the relation named strsplittobag_data. Verify the content of the relation strsplittobag_data,using the Dump operator as shown below.