怎么读取多个json

如何高效读取多个JSON文件：从基础到进阶的完整指南

在数据处理、Web开发或自动化脚本任务中，经常需要一次性处理多个JSON文件，无论是读取配置文件、处理日志数据，还是整合API响应，如何高效读取多个JSON文件都是一项重要技能，本文将详细介绍从基础到进阶的多种方法，帮助你应对不同场景下的需求。

基础方法：逐个读取文件

对于少量JSON文件,最直接的方法是逐个读取并解析，以Python为例，其内置的json模块提供了简洁的解决方案：

import json
import os
# 假设有多个JSON文件在同一目录下
file_dir = 'json_files'
json_files = ['data1.json', 'data2.json', 'data3.json']
all_data = []
for filename in json_files:
    file_path = os.path.join(file_dir, filename)
    with open(file_path, 'r', encoding='utf-8') as f:
        data = json.load(f)
        all_data.append(data)
# 现在all_data包含了所有解析后的JSON数据

优点：

逻辑简单直观
适用于少量文件处理

缺点：

文件数量多时效率较低
需要手动管理文件列表

批量读取目录下所有JSON文件

当需要读取目录下所有JSON文件时,可以通过文件扩展名进行筛选：

import json
import glob
# 读取当前目录下所有.json文件
json_files = glob.glob('*.json')
all_data = []
for file_path in json_files:
    with open(file_path, 'r', encoding='utf-8') as f:
        data = json.load(f)
        all_data.append(data)

进阶版本（递归读取子目录）：

import json
import glob
# 递归读取所有子目录中的.json文件
json_files = glob.glob('**/*.json', recursive=True)
all_data = []
for file_path in json_files:
    with open(file_path, 'r', encoding='utf-8') as f:
        data = json.load(f)
        all_data.append(data)

使用并行处理提高读取效率

面对大量JSON文件,串行读取可能耗时较长，可以利用多进程或异步IO并行处理：

import json
import os
from multiprocessing import Pool
def read_json_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        return json.load(f)
file_dir = 'json_files'
json_files = [os.path.join(file_dir, f) for f in os.listdir(file_dir) if f.endswith('.json')]
with Pool(processes=4) as pool:  # 使用4个进程并行处理
    all_data = pool.map(read_json_file, json_files)

处理嵌套JSON和复杂结构

当JSON文件具有复杂嵌套结构时,可能需要递归处理或使用特定库：

import json
from collections import defaultdict
def flatten_json(y, parent_key='', sep='_'):
    items = []
    for k, v in y.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten_json(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)
# 处理多个嵌套JSON文件并合并
merged_data = defaultdict(list)
for file_path in json_files:
    with open(file_path, 'r', encoding='utf-8') as f:
        data = json.load(f)
        flattened = flatten_json(data)
        for key, value in flattened.items():
            merged_data[key].append(value)

错误处理和最佳实践

在实际应用中,需要考虑文件不存在、格式错误等异常情况：

import json
import os
def safe_read_json(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            return json.load(f)
    except FileNotFoundError:
        print(f"文件未找到: {file_path}")
        return None
    except json.JSONDecodeError:
        print(f"JSON格式错误: {file_path}")
        return None
    except Exception as e:
        print(f"读取文件时发生错误: {file_path}, 错误: {str(e)}")
        return None
file_dir = 'json_files'
json_files = [os.path.join(file_dir, f) for f in os.listdir(file_dir) if f.endswith('.json')]
all_data = []
for file_path in json_files:
    data = safe_read_json(file_path)
    if data is not None:
        all_data.append(data)

使用专业库简化操作

对于更复杂的需求,可以考虑使用专业库如pandas：

import pandas as pd
import json
# 读取多个JSON文件并合并为DataFrame
json_files = ['data1.json', 'data2.json', 'data3.json']
dfs = []
for file_path in json_files:
    with open(file_path, 'r', encoding='utf-8') as f:
        data = json.load(f)
    df = pd.json_normalize(data)  # 处理嵌套JSON
    dfs.append(df)
combined_df = pd.concat(dfs, ignore_index=True)

读取多个JSON文件的方法多种多样,选择合适的方法取决于具体需求：