ERA5 Data Module
The get_era5_data module provides functionality for downloading ERA5 atmospheric reanalysis data from the Copernicus Climate Data Store (CDS). It handles authentication, data requests, and automatic file management for ATMOS-BUD workflows.
Key Features
Modern CDSAPI Integration: Uses the latest CDSAPI 0.7.6 with updated authentication
Flexible Data Selection: Configure variables, pressure levels, and time periods
Automatic File Management: Handles output file naming and organization
Error Handling: Robust error handling for network and API issues
Progress Tracking: Built-in progress monitoring for large downloads
Dependencies
The module requires specific versions for compatibility:
cdsapi >= 0.7.6: Latest version with updated API syntax
requests: HTTP request handling
os, logging: Standard Python libraries for file and logging operations
Authentication Setup
Before using this module, you must set up CDS API credentials:
Create CDS Account: Register at https://cds.climate.copernicus.eu/
Generate API Key: Get your key from your CDS profile page
Setup Credentials: Create
~/.cdsapircfile:url: https://cds.climate.copernicus.eu/api/v2 key: <your-uid>:<your-api-key>
Functions Overview
- src.get_era5_data.download_era5_data(variables: List[str], pressure_levels: List[int], start_date: str, end_date: str, area: List[float], output_file: str, hours: List[str] | None = None, logger: Logger | None = None) None[source]
Downloads ERA5 reanalysis data from the Copernicus Climate Data Store using the modern CDSAPI.
This function provides a flexible interface for downloading ERA5 atmospheric data with configurable variables, pressure levels, time periods, and spatial domains. Uses the modern CDSAPI 0.7.6+ syntax with list-based parameters.
- Parameters:
variables (List[str]) – List of ERA5 variable names to download. Examples: - ‘temperature’ - ‘u_component_of_wind’ - ‘v_component_of_wind’ - ‘geopotential’ - ‘vorticity’ - ‘specific_humidity’
pressure_levels (List[int]) – List of pressure levels in hPa. Examples: [1000, 925, 850, 700, 500, 300]
start_date (str) – Start date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-01’)
end_date (str) – End date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-31’)
area (List[float]) – Spatial domain as [North, West, South, East] in decimal degrees. Example: [20, -80, -60, -20] for South America region
output_file (str) – Output filename for the downloaded NetCDF file
hours (Optional[List[str]], default None) – List of hours in ‘HH:MM’ format. If None, uses [‘00:00’, ‘06:00’, ‘12:00’, ‘18:00’]
logger (Optional[logging.Logger], default None) – Logger object for progress tracking and error reporting
- Returns:
Downloads file directly to specified output path
- Return type:
None
- Raises:
Exception – If download fails due to authentication, network, or API errors
Notes
Requires valid CDS API credentials in ~/.cdsapirc
Uses modern CDSAPI syntax (version 0.7.6+)
Automatically handles date range conversion to required format
Includes progress monitoring and comprehensive error handling
Examples
>>> # Basic usage for atmospheric budget analysis >>> variables = ['temperature', 'u_component_of_wind', 'v_component_of_wind', 'geopotential'] >>> levels = [850, 700, 500, 300] >>> download_era5_data( ... variables=variables, ... pressure_levels=levels, ... start_date='2023-01-01', ... end_date='2023-01-31', ... area=[10, -80, -40, -30], ... output_file='era5_january2023.nc' ... )
>>> # Custom time selection >>> download_era5_data( ... variables=['vorticity', 'geopotential'], ... pressure_levels=[850], ... start_date='2023-06-15', ... end_date='2023-06-15', ... area=[-10, -70, -30, -40], ... output_file='era5_single_day.nc', ... hours=['00:00', '12:00'] ... )
Downloads ERA5 reanalysis data from the Copernicus Climate Data Store using the modern CDSAPI.
Key Features:
Modern CDSAPI syntax with list-based parameters
Flexible variable and level selection
Automatic temporal range handling
Progress monitoring and error handling
Standard NetCDF output format
Usage Example:
from src.get_era5_data import download_era5_data import logging # Setup logging logger = logging.getLogger(__name__) # Define download parameters variables = ['temperature', 'u_component_of_wind', 'v_component_of_wind'] pressure_levels = [850, 700, 500, 300] download_era5_data( variables=variables, pressure_levels=pressure_levels, start_date='2023-01-01', end_date='2023-01-31', area=[20, -80, -60, -20], # North, West, South, East output_file='era5_data_january2023.nc', logger=logger )
Data Request Configuration
Variable Selection
The module supports all ERA5 atmospheric variables:
Temperature: temperature, potential_temperature
Wind Components: u_component_of_wind, v_component_of_wind, w_component_of_wind
Geopotential: geopotential, geopotential_height
Vorticity: vorticity, absolute_vorticity, potential_vorticity
Humidity: specific_humidity, relative_humidity
Surface Variables: surface_pressure, mean_sea_level_pressure
Pressure Level Configuration
Standard pressure levels in hPa:
# Common atmospheric levels
standard_levels = [1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100]
# Tropospheric focus
tropospheric_levels = [1000, 850, 700, 500, 300]
# Single level analysis
single_level = [850] # Must be list format for CDSAPI 0.7.6
Temporal Configuration
Flexible time period specification:
# Single day
start_date = end_date = '2023-01-15'
# Month-long period
start_date, end_date = '2023-01-01', '2023-01-31'
# Specific hours (if supported)
hours = ['00:00', '06:00', '12:00', '18:00']
Spatial Domain Configuration
Area specification follows [North, West, South, East] format:
# South America focus
south_america = [15, -85, -60, -30]
# Brazil region
brazil_region = [10, -75, -35, -30]
# Custom analysis domain
analysis_domain = [max_lat, min_lon, min_lat, max_lon]
Integration Examples
Complete ERA5 Download Workflow
"""
Complete workflow for downloading ERA5 data for atmospheric budget analysis
"""
import logging
from datetime import datetime, timedelta
from src.get_era5_data import download_era5_data
# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Define analysis period
start_date = '2023-01-01'
end_date = '2023-01-31'
# Analysis domain (South America)
domain = [15, -85, -60, -30] # [North, West, South, East]
# Required variables for budget analysis
variables = [
'temperature',
'u_component_of_wind',
'v_component_of_wind',
'w_component_of_wind',
'geopotential',
'vorticity',
'specific_humidity'
]
# Atmospheric levels for analysis
levels = [1000, 925, 850, 700, 600, 500, 400, 300, 250, 200]
# Download data
output_file = f'era5_budget_analysis_{start_date}_{end_date}.nc'
try:
download_era5_data(
variables=variables,
pressure_levels=levels,
start_date=start_date,
end_date=end_date,
area=domain,
output_file=output_file,
logger=logger
)
logger.info(f"✅ Successfully downloaded ERA5 data: {output_file}")
except Exception as e:
logger.error(f"❌ Failed to download ERA5 data: {e}")
Multi-Case Download with Error Handling
"""
Download ERA5 data for multiple analysis cases with robust error handling
"""
import time
from src.get_era5_data import download_era5_data
# Define multiple analysis cases
analysis_cases = [
{
'name': 'Summer_Case_2023',
'start': '2023-01-15',
'end': '2023-01-20',
'domain': [-10, -70, -30, -40],
'variables': ['temperature', 'geopotential', 'vorticity']
},
{
'name': 'Winter_Case_2023',
'start': '2023-07-10',
'end': '2023-07-15',
'domain': [-15, -65, -35, -35],
'variables': ['temperature', 'u_component_of_wind', 'v_component_of_wind']
}
]
pressure_levels = [850, 700, 500, 300]
for case in analysis_cases:
output_file = f"era5_{case['name']}.nc"
try:
logger.info(f"🌍 Downloading data for case: {case['name']}")
download_era5_data(
variables=case['variables'],
pressure_levels=pressure_levels,
start_date=case['start'],
end_date=case['end'],
area=case['domain'],
output_file=output_file,
logger=logger
)
logger.info(f"✅ Completed: {case['name']}")
# Add delay between requests to be respectful to CDS
time.sleep(10)
except Exception as e:
logger.error(f"❌ Failed to download {case['name']}: {e}")
continue # Continue with next case
Custom Variable Selection
"""
Custom variable selection for specific research needs
"""
from src.get_era5_data import download_era5_data
# Vorticity and wind analysis
vorticity_variables = [
'vorticity',
'u_component_of_wind',
'v_component_of_wind',
'geopotential'
]
# Thermodynamic analysis
thermal_variables = [
'temperature',
'potential_temperature',
'specific_humidity',
'relative_humidity'
]
# Download for different analysis types
analysis_types = {
'vorticity_analysis': vorticity_variables,
'thermal_analysis': thermal_variables
}
for analysis_name, variables in analysis_types.items():
output_file = f'era5_{analysis_name}_202301.nc'
download_era5_data(
variables=variables,
pressure_levels=[850, 700, 500],
start_date='2023-01-01',
end_date='2023-01-31',
area=[10, -80, -40, -30],
output_file=output_file,
logger=logger
)
Technical Notes
API Limitations
Be aware of CDS API limitations:
Request Size: Large requests may be queued or rejected
Rate Limits: Respect API rate limits to avoid blocking
Concurrent Requests: Limit simultaneous downloads
Data Volume: Monitor your CDS quota usage
Error Handling
The module handles various error conditions:
Authentication Errors: Invalid or missing API credentials
Network Issues: Connection timeouts and interruptions
API Errors: Server-side processing failures
File System Errors: Disk space and permission issues
Performance Optimization
For optimal performance:
Request Sizing: Balance between large requests and API limits
Sequential Downloads: Process requests sequentially to avoid conflicts
Local Caching: Avoid re-downloading existing data
Progress Monitoring: Use logging to track download progress
Data Quality
ERA5 data characteristics:
Temporal Resolution: Hourly data available
Spatial Resolution: Approximately 31 km (0.25° × 0.25°)
Vertical Levels: 37 pressure levels from 1000 to 1 hPa
Data Format: NetCDF with CF conventions
Quality Control: Extensive quality assurance in reanalysis
ERA5 Data Download Module
This module provides functionality for downloading ERA5 atmospheric reanalysis data from the Copernicus Climate Data Store (CDS). It handles authentication, data requests, and automatic file management for ATMOS-BUD workflows.
Author: Danilo Couto de Souza Date: 2024
- src.get_era5_data.download_era5_data(variables: List[str], pressure_levels: List[int], start_date: str, end_date: str, area: List[float], output_file: str, hours: List[str] | None = None, logger: Logger | None = None) None[source]
Downloads ERA5 reanalysis data from the Copernicus Climate Data Store using the modern CDSAPI.
This function provides a flexible interface for downloading ERA5 atmospheric data with configurable variables, pressure levels, time periods, and spatial domains. Uses the modern CDSAPI 0.7.6+ syntax with list-based parameters.
- Parameters:
variables (List[str]) – List of ERA5 variable names to download. Examples: - ‘temperature’ - ‘u_component_of_wind’ - ‘v_component_of_wind’ - ‘geopotential’ - ‘vorticity’ - ‘specific_humidity’
pressure_levels (List[int]) – List of pressure levels in hPa. Examples: [1000, 925, 850, 700, 500, 300]
start_date (str) – Start date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-01’)
end_date (str) – End date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-31’)
area (List[float]) – Spatial domain as [North, West, South, East] in decimal degrees. Example: [20, -80, -60, -20] for South America region
output_file (str) – Output filename for the downloaded NetCDF file
hours (Optional[List[str]], default None) – List of hours in ‘HH:MM’ format. If None, uses [‘00:00’, ‘06:00’, ‘12:00’, ‘18:00’]
logger (Optional[logging.Logger], default None) – Logger object for progress tracking and error reporting
- Returns:
Downloads file directly to specified output path
- Return type:
None
- Raises:
Exception – If download fails due to authentication, network, or API errors
Notes
Requires valid CDS API credentials in ~/.cdsapirc
Uses modern CDSAPI syntax (version 0.7.6+)
Automatically handles date range conversion to required format
Includes progress monitoring and comprehensive error handling
Examples
>>> # Basic usage for atmospheric budget analysis >>> variables = ['temperature', 'u_component_of_wind', 'v_component_of_wind', 'geopotential'] >>> levels = [850, 700, 500, 300] >>> download_era5_data( ... variables=variables, ... pressure_levels=levels, ... start_date='2023-01-01', ... end_date='2023-01-31', ... area=[10, -80, -40, -30], ... output_file='era5_january2023.nc' ... )
>>> # Custom time selection >>> download_era5_data( ... variables=['vorticity', 'geopotential'], ... pressure_levels=[850], ... start_date='2023-06-15', ... end_date='2023-06-15', ... area=[-10, -70, -30, -40], ... output_file='era5_single_day.nc', ... hours=['00:00', '12:00'] ... )
- src.get_era5_data.download_era5_data_legacy()[source]
Legacy download function for backward compatibility.
Downloads specific ERA5 data for the 2005-08-08 to 2005-08-14 case study over the South America region. This function maintains compatibility with existing workflows while the main download_era5_data function provides more flexibility.
- Returns:
Downloads data to ‘system-20050808_ERA5.nc’
- Return type:
None
Notes
This function is maintained for backward compatibility. For new workflows, use the main download_era5_data function which provides more flexibility.