[Solved-2 Solutions] In pig, Check if an element is present in a bag ?
Problem:
How to check in piglatin, if a bag contains an element ?
Solution 1:
Use foreach
- The FOREACH operator is used to generate specified data transformations based on the column data.
Syntax
- The syntax of FOREACH operator.
grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);
- In Apache Pig we can use statements nested in FOREACH . Here is example: A is a bag in B.
X = FOREACH B {
S = FILTER A BY 'xyz';
GENERATE COUNT (S.$0);
}
- Instead of COUNT we can also use IsEmpty and ?: operator
X = FOREACH B {
S = FILTER A BY 'xyz';
GENERATE (IsEmpty(S.$0)) ? 'xyz NOT PRESENT' : 'xyz PRESENT') as present, B;
}
Solution 2:
This is one of the way to do it without any custom udf code is :
- Assume A has schema my_bag:{(f1, f2, f3)};
B = FOREACH A {
X = FILTER my_bag BY f1 == 'my_element';
--- Now, count(X) will tell you if my_element is present in my_bag.
--- Example use below.
GENERATE my_bag, COUNT(X) as my_flag;
};
C = FILTER B by my_flag > 0;