pig tutorial - apache pig tutorial - Apache Pig TOKENIZE() Function - pig latin - apache pig - pig hadoop
What is TOKENIZE() function in Apache Pig ?
- The TOKENIZE() function used in Apache Pig is used to split a string in a single tuple and returns a bag which contains the output of the split operation.
- The TOKENIZE() function is used to break an input string into tokens separated by a regular expression pattern.
- The TOKENIZE() function is when the Token elements are placed under the element
- The TOKENIZE() function will returns one token element, which contains the input string.
- The TOKENIZE() function has each substring value which is found between the separator matches is placed inside elements with the name token and the namespace mhub
Syntax
Example
wikitechy_student_details.txt
We have loaded the file into Pig with the relation name wikitechy_student_details which is given below:
Tokenizing a String
We can use the TOKENIZE() function to split into a string.
grunt> student_name_tokenize = foreach wikitechy_student_details Generate TOKENIZE(name);