pig tutorial - apache pig tutorial - Apache Pig SUBSTRING() - pig latin - apache pig - pig hadoop
What is substring in Apache Pig ?
- A substring of a string is a string that occurs "in". For example, "the best of" is a substring of "It was the best of times".
- This is not to be confused with subsequence, which is a generalization of substring. For example, "Itwastimes" is a subsequence of "It was the best of times", but not a substring.
- This function returns a substring from the given string.
Syntax:
- Given below is the syntax of the SUBSTRING() function.
- This function accepts three parameters one is the column name of the string we want.
- And the other two are the start and stop indexes of the required substring.
Example:
- Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name age and city.
wikitechy_emp.txt
- And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
- Following is an example of the SUBSTRING() function. This example fetches the sub strings that starts with 0th letter and ends with 2nd letter from the employee names.
- The above statement fetches the required substrings from the names of the employees. The result of the statement will be stored in the relation named substring_data.
- Verify the content of the relation substring_data, using the Dump operator as shown below.