[Solved-2 Solutions] Manipulating a data structure in Pig/Hive ?
Problem:
We have a data structure like this:
Item | Year | Jan | Feb | Mar |
---|---|---|---|---|
A | 2000 | 3.2 | 4.1 | 2.8 |
B | 2000 | 4.3 | 5.0 | 3.9 |
A | 2001 | 7.2 | 6.8 | 9.2 |
We need the data look like this:
Item | Year | Month | Value |
---|---|---|---|
A | 2000 | Jan | 3.2 |
A | 2000 | Feb | 4.1 |
A | 2000 | Mar | 2.8 |
B | 2000 | Jan | 4.3 |
B | 2000 | Feb | 5.0 |
B | 2000 | Mar | 3.9 |
A | 2001 | Jan | 7.2 |
A | 2001 | Feb | 6.8 |
A | 2001 | Mar | 9.2 |
Here we would like to be able to do this in Pig, but knowing how to do it in Hive, R, Python, or Excel/LibreCalc ?
Solution 1:
You can try this:
CREATE TABLE myDatabase.newTable STORED AS TEXTFILE AS
SELECT item, year, 'jan' AS Month, jan AS Value FROM myDatabase.myTable UNION ALL
SELECT item, year, 'feb' AS Month, feb AS Value FROM myDatabase.myTable UNION ALL
SELECT item, year, 'mar' AS Month, mar AS Value FROM myDatabase.myTable;
Solution 2:
This solution will work in Hive. but it is pretty similar to SQL.
select item, year,
'Jan' as Month,
Jan as value
from yourtable
UNION
select item, year,
'Feb' as Month,
Feb as value
from yourtable
UNION
select item, year,
'Mar' as Month,
Mar as value
from yourtable