Set Number of Reducer in Pig:
- Where XXX is the number of reducer.
- This command is used to set the number of reducers at the script level
- The coder need to write this configuration at top/beginning of their pig script.
- Alternatively, use the PARALLEL clause to set the number of reducers at the operator level.
- We set the value using the PARALLEL clause will override any value we specify through (“SET default parallel.”) to reduce phase you can include the PARALLEL clause with any operator.
-
- COGROUP
- CROSS
- DISTINCT
- GROUP
- JOIN (inner)
- JOIN (outer) and
- ORDER BY.
For Example:
In GROUP operator the PARALLEL class has been used.
- A = LOAD ‘myfile’ AS (t, u, v);
- B = GROUP A BY t PARALLEL 18
Here 18 is number of reducer.
- If neither “set default parallel” nor the PARALLEL clause are used, using size of the input data Pig sets the number of reducers.
The properties values has been specified
pig.exec.reducers.bytes.per.reducer
– Defines the number of input bytes per reduce; Pig reducer default value is 1000*1000*1000 (1GB).pig.exec.reducers.max
– Defines the upper bound on the number of reducers; Pig reducer default is 999.