JSON Tables¶
The JSON serde is useful for parsing data stored as JSON. The JSON implementation has been borrowed from rcongiu.
Data¶
{"n_nationkey":"5", "n_name":"ETHIOPIA", "n_regionkey":"0", "n_comment":"ven packages wake quickly. regu" }
{"n_nationkey":"6", "n_name":"FRANCE", "n_regionkey":"3", "n_comment":"refully final requests. regular, ironi" }
{"n_nationkey":"7", "n_name":"GERMANY", "n_regionkey":"3", "n_comment":"l platelets. regular accounts x-ray: unusual, regular acco" }
Features¶
The features of JSON tables are:
- Entire JSON document must fit in a single line of the text file.
- Read the data stored in the JSON format.
- Convert the data to the JSON format when INSERT INTO table.
- Arrays and maps are supported.
- Nested data structures are also supported.
Nested JSON Elements¶
If your data contains nested JSON elements like this:
{"country":"Switzerland","languages":["German","French","Italian"],"religions":{"catholic":[10,20],"protestant":[40,50]}}
You can declare languages as an array<string> and religions as a map<string,array<int> like this (location omitted).
CREATE EXTERNAL TABLE json_nested_test (
country string,
languages array<string>,
religions map<string,array<int>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE
LOCATION '<scheme>..'
;
<scheme>
is the Cloud-specific URI and path: for example, gs://
is the URI for GCP Cloud Provider.
You can access a nested element like this
select religions['catholic'][0] from json_nested_test;
Which produces the result
10