[Solved-1 Solution] Pig - Get Max Count ?
What is Max() ?
- The Pig Latin MAX() function is used to calculate the highest value for a column (numeric values or chararrays) in a single-column bag. While calculating the maximum value, the Max() function ignores the NULL values.
- To get the global maximum value, we need to perform a Group All operation, and calculate the maximum value using the MAX() function.
- To get the maximum value of a group, we need to group it using the Group By operator and proceed with the maximum function.
Syntax
- the syntax of the Max() function.
grunt> Max(expression)
What is group operator ?
- The GROUP operator is used to group the data in one or more relations. It collects the data having the same key.
Syntax
- Given below is the syntax of the group operator.
grunt> Group_data = GROUP Relation_name BY age;
- To get max count we can use max function.
Problem:
Sample Data
DATE WindDirection
1/1/2000 SW
1/2/2000 SW
1/3/2000 SW
1/4/2000 NW
1/5/2000 NW
Every day is unqiue, and wind direction is not unique, so now we are trying to get the COUNT of the most COMMON wind direction
Query:
weather_data = FOREACH Weather GENERATE $16 AS Date, $9 AS w_direction;
e = FOREACH weather_data
{
unique_winds = DISTINCT weather_data.w_direction;
GENERATE unique_winds, COUNT(unique_winds);
}
dump e;
The logic is to find the DISTINCT WindDirections (there are like 7), then group by WindDirectionand apply count.
We get the total number or count of directions of winds.
Solution 1:
We will have to GROUP BY wind direction and get the counts. Order the counts by desc order and get the top most row
wd = FOREACH Weather GENERATE $9 AS w_direction;
gwd = GROUP wd BY w_direction;
cwd = FOREACH gwd GENERATE group as wd,COUNT(wd.$0);
owd = ORDER cwd BY $1 DESC;
mwd = LIMIT owd 1;
DUMP mwd;