如何在Python中将 pandas DataFrame 转换为 JSON?
数据分析是当今世界极为重要的工具。数据分析的一个关键方面是数据的有组织的表示。计算机科学中有许多数据结构可以完成这项任务。在本文中,我们讨论了两种这样的数据结构,即。熊猫数据帧和JSON 。此外,我们还将了解如何将 DataFrame 转换为 JSON 格式。
Pandas DataFrame 是数据的表格表示形式,其中列表示单个数据条目中的各种数据点,每一行是唯一的数据条目。而 JSON 是用 JavaScript 对象表示法编写的文本。
注意:更多信息请参考Python |熊猫数据框
将 pandas DataFrame 转换为 JSON
要将 pandas DataFrames 转换为 JSON 格式,我们使用Python中 pandas 库中的函数DataFrame.to_json()
。 to_json函数中有多种自定义可用于实现所需的 JSON 格式。先看看函数接受的参数,再探索自定义
参数:
Parameter | Value | Use |
---|---|---|
path_or_buf | string or filename, optional | File path or object. If not specified, the result is returned as a string. |
orient | ‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’, default=’index’ | Indication of expected JSON string format. |
date_format | None, ‘epoch’, ‘iso’, default=’epoch’ | Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient=’table’, the default is ‘iso’. For all other orients, the default is ‘epoch’. |
double_precision | integer value, default=10 | The number of decimal places to use when encoding floating point values. |
force_ascii | boolean value, default=True | Force encoded string to be ASCII. |
date_unit | ‘s’, ‘ms’, ‘us’, ‘ns’, default=’ms’ | The time unit to encode to, governs timestamp and ISO8601 precision. The values represent second, millisecond, microsecond, and nanosecond respectively. |
default_handler | callable function | Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object. |
lines | boolean value, default=False | If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like. |
compression | ‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None, default=’infer’ | A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename. |
index | boolean value, default=True | Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’. |
indent | integer value | Length of whitespace used to indent each record. Optional argument need not be mentioned. |
我们现在看几个例子来了解函数DataFrame.to_json 的用法。
示例 1:基本用法
import numpy as np
import pandas as pd
data = np.array([['1', '2'], ['3', '4']])
dataFrame = pd.DataFrame(data, columns = ['col1', 'col2'])
json = dataFrame.to_json()
print(json)
输出 :
{"col1":{"0":"1", "1":"3"}, "col2":{"0":"2", "1":"4"}}
示例 2:探索 DataFrame.to_json函数的 'orient' 属性
import numpy as np
import pandas as pd
data = np.array([['1', '2'], ['3', '4']])
dataFrame = pd.DataFrame(data, columns = ['col1', 'col2'])
json = dataFrame.to_json()
print(json)
json_split = dataFrame.to_json(orient ='split')
print("json_split = ", json_split, "\n")
json_records = dataFrame.to_json(orient ='records')
print("json_records = ", json_records, "\n")
json_index = dataFrame.to_json(orient ='index')
print("json_index = ", json_index, "\n")
json_columns = dataFrame.to_json(orient ='columns')
print("json_columns = ", json_columns, "\n")
json_values = dataFrame.to_json(orient ='values')
print("json_values = ", json_values, "\n")
json_table = dataFrame.to_json(orient ='table')
print("json_table = ", json_table, "\n")
输出 :
json_split = {“columns”:[“col1”, “col2”], “index”:[0, 1], “data”:[[“1”, “2”], [“3”, “4”]]}
json_records = [{“col1″:”1”, “col2″:”2”}, {“col1″:”3”, “col2″:”4”}]
json_index = {“0”:{“col1″:”1”, “col2″:”2”}, “1”:{“col1″:”3”, “col2″:”4”}}
json_columns = {“col1”:{“0″:”1”, “1”:”3″}, “col2”:{“0″:”2”, “1”:”4″}}
json_values = [[“1”, “2”], [“3”, “4”]]
json_table = {“schema”:{“fields”:[{“name”:”index”, “type”:”integer”}, {“name”:”col1″, “type”:”string”}, {“name”:”col2″, “type”:”string”}], “primaryKey”:[“index”], “pandas_version”:”0.20.0″}, “data”:[{“index”:0, “col1″:”1”, “col2″:”2”}, {“index”:1, “col1″:”3”, “col2″:”4”}]}