Specifying the UDF output schema
- A UDF has input and output. Here is the different ways you can specify the output format of a Python UDF through use of the outputSchema decorator.
Sample Code:
OutputSchema can be used to imply that a function outputs one or a combination of basic types. Those types are:
- chararray: like a string
- bytearray: a bunch of bytes in a row. Like a string but not as human friendly
- long: long integer
- int: normal integer
- double: floating point number
- datetime
- boolean
- No schema is specified;then the Pig assumes that the UDF outputs a bytearray.