Tutorial big data analysis: Weather changes in the Carpathian-Basin from 1900 to 2014 – Part 6/9
Manipulating output data with the Linux SED command – SED example
This Kartograph tutorial uses JSON as dataformat, so I needed the same format for my own data that the tutorial uses – example:
[{"Weather": "ARAD RO", "ll": [21.35, 46.1331], "1882": 742.0 … {"Weather": "MURSKA SOBOTA RAKICAN SI", "ll": [16.2, 46.7], … "1962": 931.0}]
And the resulting data from PIG is not compatible with it, as it have different markup, presented here:
((ARAD RO,46.1331,21.35),{((1882,742)),((1883,680)),((1884,656)),((1885,656)),((1886,770)),((1887,718)),((1888,467)),((1889,893)),((1890,570)), … 92))}) ((DEVA RO,45.8667,22.9),…
So I’ve used the SED Linux command to alter the resulting dataset from Pig
SED example
Sed command 1
sed 's/(\([A-Z ]*\),\([0-9\.]*\),\([0-9.]*\))/[{"Weather": "\1", "ll: [\3, \2], /g' rain_orig.csv > rain_new.csv && cat rain_new.csv
This command checks for “(any number of capital characters)”, followed by “,” and “(any number of 0-9 numbers or “.”)” blocks of two separated by a “,” so is valid for
“((ARAD RO,46.1331,21.35)”
and changes this string to
[{“Weather”: “ARAD RO”, “ll”: [21.35, 46.1331]
The syntax:
sed 's/ / /g' input > output
for each string of the input find:
regular expression / change it to something.
\1, \2 in SED Command 1 is a link to the regular expressions defined in the first part, in our case, \1 refers to ARAD RO, which is encapsulated in “ ” marks
I parse rain_orig.csv, the output of PIG and output it to rain_new.csv
Sed command 2
sed 's/{((\([0-9]*\),/"\1": /g' rain_new.csv > rain_new2.csv && cat rain_new2.csv
The followings are set up as the above, parsing for {((number, and changing it to “number” – valid for years
Sed command 3
sed 's/)),((\([0-9]*\),/, "\1":/g' rain_new2.csv > rain_new3.csv && cat rain_new3.csv
While this one parses for )),((numbers), and changes it to numbers
So after running all the 3 SED commands we have the needed results in the format of
<b>[{"Weather": "ARAD RO", "ll": [21.35, 46.1331], "1882": 742.0, "1883": 680.0, …</b>
One manual task left is to properly close the encapsulation at the end of the JSON file by changing }, to }]
The JSON for the Map is available now, and I’ve saved it to the directory of the new map visualization example.