[Solved-1 Solution] What exactly does the double colon mean in Pig ?



What is disambiquate operator

  • Use the disambiguate operator ( :: ) to identify field names after JOIN, COGROUP, CROSS, or FLATTEN operators.
  • In this example, to disambiguate y, use A::y or B::y. In cases where there is no ambiguity, such as z, the :: is not necessary but is still supported.
A = load 'data1' as (x, y);
B = load 'data2' as (x, y, z);
C = join A by x, B by x;
D = foreach C generate y; -- which y?

Problem:

  • When you use DESCRIBE , you can see a lot of double colons in the Pig output, and it's not obvious what they mean.

For example, after grouping and flattening

key::observerId:chararray,key::endpoint:chararray,...
  • At some point you had grouped by observerId and endpoint, renamed the group tuple to 'key', and then reflattened. So, what exactly does the double colon mean ?

Solution 1:

  • In the question :: is the disambiguate operator and it is use to identify field names after JOIN, COGROUP, CROSS or FLATTEN Operators.
  • We can find an example of using disambiguate operator .
  • We have two fields observerID and endpoint and they both are identified by the key alias means both observerID and endpoint are referenced in same alias while there could be other "observerID and endpoint" referenced to other the "key" alias.

Here is an example:

key      = some_statement_with_observerID_and_endpoint
otherkey = some_statement_with_observerID_and_endpoint

key::observerId:chararray,key::endpoint:chararray,...
key      = some_statement_with_observerID_and_endpoint
otherkey = some_statement_with_observerID_and_endpoint
key::observerId:chararray,key::endpoint:chararray,...

The above sentence means the observerID and endpoint we see as a part of the key alias not the otherkey alias.


Related Searches to What exactly does the double colon mean in Pig?