Weather data analysis and visualization – Big data tutorial Part 6/9 – SED example

Tutorial big data analysis: Weather changes in the Carpathian-Basin from 1900 to 2014 – Part 6/9

Manipulating output data with the Linux SED command – SED example

This Kartograph tutorial uses JSON as dataformat, so I needed the same format for my own data that the tutorial uses – example:

[{"Weather": "ARAD RO", "ll": [21.35, 46.1331], "1882": 742.0  {"Weather": "MURSKA SOBOTA RAKICAN SI", "ll": [16.2, 46.7],  "1962": 931.0}]

And the resulting data from PIG is not compatible with it, as it have different markup, presented here:

((ARAD RO,46.1331,21.35),{((1882,742)),((1883,680)),((1884,656)),((1885,656)),((1886,770)),((1887,718)),((1888,467)),((1889,893)),((1890,570)),



92))})

((DEVA RO,45.8667,22.9),

So I’ve used the SED Linux command to alter the resulting dataset from Pig

SED example

Sed command 1

sed 's/(\([A-Z ]*\),\([0-9\.]*\),\([0-9.]*\))/[{"Weather": "\1", "ll: [\3, \2], /g' rain_orig.csv > rain_new.csv && cat rain_new.csv

This command checks for “(any number of capital characters)”, followed by “,” and “(any number of 0-9 numbers or “.”)” blocks of two separated by a “,” so is valid for

“((ARAD RO,46.1331,21.35)”

and changes this string to

[{“Weather”: “ARAD RO”, “ll”: [21.35, 46.1331]

The syntax:

sed 's/  /  /g' input > output

for each string of the input find:

regular expression / change it to something.

\1, \2 in SED Command 1 is a link to the regular expressions defined in the first part, in our case, \1 refers to ARAD RO, which is encapsulated in “ ” marks

I parse rain_orig.csv, the output of PIG and output it to rain_new.csv

Sed command 2

sed 's/{((\([0-9]*\),/"\1": /g' rain_new.csv > rain_new2.csv && cat rain_new2.csv

The followings are set up as the above, parsing for {((number, and changing it to “number” – valid for years

Sed command 3

sed 's/)),((\([0-9]*\),/, "\1":/g' rain_new2.csv > rain_new3.csv && cat rain_new3.csv

While this one parses for )),((numbers), and changes it to numbers

So after running all the 3 SED commands we have the needed results in the format of

<b>[{"Weather": "ARAD RO", "ll": [21.35, 46.1331], "1882": 742.0, "1883": 680.0, </b>

One manual task left is to properly close the encapsulation at the end of the JSON file by changing }, to }]

The JSON for the Map is available now, and I’ve saved it to the directory of the new map visualization example.

Comments