[Solved-2 Solutions] Pig approach to pairing data fields in a data set ?
Self - join
- Self-join is used to join a table with itself as if the table were two relations, temporarily renaming at least one relation.
- Generally, in Apache Pig, to perform self-join, we will load the same data multiple times, under different aliases (names). Therefore let us load the contents of the file customers.txt as two tables as shown below.
Syntax
- Given below is the syntax of performing self-join operation using the JOINoperator.
Problem:
Is there is a way to pairing data fields in a data set in pig ?
Solution 1:
- Let us perform .self-join operation on the relation customers, by joining the two relations customers1 and customers2 as shown below.
The first approach is a self join
The other option would be to use CROSS nested in a FOREACH after the GROUP:
Solution 2:
- This can be done with a self-join and some simple filtering.