[Solved-1 Solution] What exactly does the double colon mean in Pig ?
What is disambiquate operator
- Use the disambiguate operator ( :: ) to identify field names after JOIN, COGROUP, CROSS, or FLATTEN operators.
- In this example, to disambiguate y, use A::y or B::y. In cases where there is no ambiguity, such as z, the :: is not necessary but is still supported.
A = load 'data1' as (x, y);
B = load 'data2' as (x, y, z);
C = join A by x, B by x;
D = foreach C generate y; -- which y?
Problem:
- When you use DESCRIBE , you can see a lot of double colons in the Pig output, and it's not obvious what they mean.
For example, after grouping and flattening
key::observerId:chararray,key::endpoint:chararray,...
- At some point you had grouped by observerId and endpoint, renamed the group tuple to 'key', and then reflattened. So, what exactly does the double colon mean ?
Solution 1:
- In the question :: is the disambiguate operator and it is use to identify field names after JOIN, COGROUP, CROSS or FLATTEN Operators.
- We can find an example of using disambiguate operator .
- We have two fields observerID and endpoint and they both are identified by the key alias means both observerID and endpoint are referenced in same alias while there could be other "
observerID and endpoint
" referenced to other the "key
" alias.
Here is an example:
key = some_statement_with_observerID_and_endpoint
otherkey = some_statement_with_observerID_and_endpoint
key::observerId:chararray,key::endpoint:chararray,...
key = some_statement_with_observerID_and_endpoint
otherkey = some_statement_with_observerID_and_endpoint
key::observerId:chararray,key::endpoint:chararray,...
The above sentence means the observerID and endpoint we see as a part of the key alias not the otherkey alias.