ERA5 Data Module

The get_era5_data module provides functionality for downloading ERA5 atmospheric reanalysis data from the Copernicus Climate Data Store (CDS). It handles authentication, data requests, and automatic file management for ATMOS-BUD workflows.

Key Features

  • Modern CDSAPI Integration: Uses the latest CDSAPI 0.7.6 with updated authentication

  • Flexible Data Selection: Configure variables, pressure levels, and time periods

  • Automatic File Management: Handles output file naming and organization

  • Error Handling: Robust error handling for network and API issues

  • Progress Tracking: Built-in progress monitoring for large downloads

Dependencies

The module requires specific versions for compatibility:

  • cdsapi >= 0.7.6: Latest version with updated API syntax

  • requests: HTTP request handling

  • os, logging: Standard Python libraries for file and logging operations

Authentication Setup

Before using this module, you must set up CDS API credentials:

  1. Create CDS Account: Register at https://cds.climate.copernicus.eu/

  2. Generate API Key: Get your key from your CDS profile page

  3. Setup Credentials: Create ~/.cdsapirc file:

    url: https://cds.climate.copernicus.eu/api/v2
    key: <your-uid>:<your-api-key>
    

Functions Overview

src.get_era5_data.download_era5_data(variables: List[str], pressure_levels: List[int], start_date: str, end_date: str, area: List[float], output_file: str, hours: List[str] | None = None, logger: Logger | None = None) None[source]

Downloads ERA5 reanalysis data from the Copernicus Climate Data Store using the modern CDSAPI.

This function provides a flexible interface for downloading ERA5 atmospheric data with configurable variables, pressure levels, time periods, and spatial domains. Uses the modern CDSAPI 0.7.6+ syntax with list-based parameters.

Parameters:
  • variables (List[str]) – List of ERA5 variable names to download. Examples: - ‘temperature’ - ‘u_component_of_wind’ - ‘v_component_of_wind’ - ‘geopotential’ - ‘vorticity’ - ‘specific_humidity’

  • pressure_levels (List[int]) – List of pressure levels in hPa. Examples: [1000, 925, 850, 700, 500, 300]

  • start_date (str) – Start date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-01’)

  • end_date (str) – End date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-31’)

  • area (List[float]) – Spatial domain as [North, West, South, East] in decimal degrees. Example: [20, -80, -60, -20] for South America region

  • output_file (str) – Output filename for the downloaded NetCDF file

  • hours (Optional[List[str]], default None) – List of hours in ‘HH:MM’ format. If None, uses [‘00:00’, ‘06:00’, ‘12:00’, ‘18:00’]

  • logger (Optional[logging.Logger], default None) – Logger object for progress tracking and error reporting

Returns:

Downloads file directly to specified output path

Return type:

None

Raises:

Exception – If download fails due to authentication, network, or API errors

Notes

  • Requires valid CDS API credentials in ~/.cdsapirc

  • Uses modern CDSAPI syntax (version 0.7.6+)

  • Automatically handles date range conversion to required format

  • Includes progress monitoring and comprehensive error handling

Examples

>>> # Basic usage for atmospheric budget analysis
>>> variables = ['temperature', 'u_component_of_wind', 'v_component_of_wind', 'geopotential']
>>> levels = [850, 700, 500, 300]
>>> download_era5_data(
...     variables=variables,
...     pressure_levels=levels,
...     start_date='2023-01-01',
...     end_date='2023-01-31',
...     area=[10, -80, -40, -30],
...     output_file='era5_january2023.nc'
... )
>>> # Custom time selection
>>> download_era5_data(
...     variables=['vorticity', 'geopotential'],
...     pressure_levels=[850],
...     start_date='2023-06-15',
...     end_date='2023-06-15',
...     area=[-10, -70, -30, -40],
...     output_file='era5_single_day.nc',
...     hours=['00:00', '12:00']
... )

Downloads ERA5 reanalysis data from the Copernicus Climate Data Store using the modern CDSAPI.

Key Features:

  • Modern CDSAPI syntax with list-based parameters

  • Flexible variable and level selection

  • Automatic temporal range handling

  • Progress monitoring and error handling

  • Standard NetCDF output format

Usage Example:

from src.get_era5_data import download_era5_data
import logging

# Setup logging
logger = logging.getLogger(__name__)

# Define download parameters
variables = ['temperature', 'u_component_of_wind', 'v_component_of_wind']
pressure_levels = [850, 700, 500, 300]

download_era5_data(
    variables=variables,
    pressure_levels=pressure_levels,
    start_date='2023-01-01',
    end_date='2023-01-31',
    area=[20, -80, -60, -20],  # North, West, South, East
    output_file='era5_data_january2023.nc',
    logger=logger
)

Data Request Configuration

Variable Selection

The module supports all ERA5 atmospheric variables:

  • Temperature: temperature, potential_temperature

  • Wind Components: u_component_of_wind, v_component_of_wind, w_component_of_wind

  • Geopotential: geopotential, geopotential_height

  • Vorticity: vorticity, absolute_vorticity, potential_vorticity

  • Humidity: specific_humidity, relative_humidity

  • Surface Variables: surface_pressure, mean_sea_level_pressure

Pressure Level Configuration

Standard pressure levels in hPa:

# Common atmospheric levels
standard_levels = [1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100]

# Tropospheric focus
tropospheric_levels = [1000, 850, 700, 500, 300]

# Single level analysis
single_level = [850]  # Must be list format for CDSAPI 0.7.6

Temporal Configuration

Flexible time period specification:

# Single day
start_date = end_date = '2023-01-15'

# Month-long period
start_date, end_date = '2023-01-01', '2023-01-31'

# Specific hours (if supported)
hours = ['00:00', '06:00', '12:00', '18:00']

Spatial Domain Configuration

Area specification follows [North, West, South, East] format:

# South America focus
south_america = [15, -85, -60, -30]

# Brazil region
brazil_region = [10, -75, -35, -30]

# Custom analysis domain
analysis_domain = [max_lat, min_lon, min_lat, max_lon]

Integration Examples

Complete ERA5 Download Workflow

"""
Complete workflow for downloading ERA5 data for atmospheric budget analysis
"""
import logging
from datetime import datetime, timedelta
from src.get_era5_data import download_era5_data

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Define analysis period
start_date = '2023-01-01'
end_date = '2023-01-31'

# Analysis domain (South America)
domain = [15, -85, -60, -30]  # [North, West, South, East]

# Required variables for budget analysis
variables = [
    'temperature',
    'u_component_of_wind',
    'v_component_of_wind',
    'w_component_of_wind',
    'geopotential',
    'vorticity',
    'specific_humidity'
]

# Atmospheric levels for analysis
levels = [1000, 925, 850, 700, 600, 500, 400, 300, 250, 200]

# Download data
output_file = f'era5_budget_analysis_{start_date}_{end_date}.nc'

try:
    download_era5_data(
        variables=variables,
        pressure_levels=levels,
        start_date=start_date,
        end_date=end_date,
        area=domain,
        output_file=output_file,
        logger=logger
    )

    logger.info(f"✅ Successfully downloaded ERA5 data: {output_file}")

except Exception as e:
    logger.error(f"❌ Failed to download ERA5 data: {e}")

Multi-Case Download with Error Handling

"""
Download ERA5 data for multiple analysis cases with robust error handling
"""
import time
from src.get_era5_data import download_era5_data

# Define multiple analysis cases
analysis_cases = [
    {
        'name': 'Summer_Case_2023',
        'start': '2023-01-15',
        'end': '2023-01-20',
        'domain': [-10, -70, -30, -40],
        'variables': ['temperature', 'geopotential', 'vorticity']
    },
    {
        'name': 'Winter_Case_2023',
        'start': '2023-07-10',
        'end': '2023-07-15',
        'domain': [-15, -65, -35, -35],
        'variables': ['temperature', 'u_component_of_wind', 'v_component_of_wind']
    }
]

pressure_levels = [850, 700, 500, 300]

for case in analysis_cases:
    output_file = f"era5_{case['name']}.nc"

    try:
        logger.info(f"🌍 Downloading data for case: {case['name']}")

        download_era5_data(
            variables=case['variables'],
            pressure_levels=pressure_levels,
            start_date=case['start'],
            end_date=case['end'],
            area=case['domain'],
            output_file=output_file,
            logger=logger
        )

        logger.info(f"✅ Completed: {case['name']}")

        # Add delay between requests to be respectful to CDS
        time.sleep(10)

    except Exception as e:
        logger.error(f"❌ Failed to download {case['name']}: {e}")
        continue  # Continue with next case

Custom Variable Selection

"""
Custom variable selection for specific research needs
"""
from src.get_era5_data import download_era5_data

# Vorticity and wind analysis
vorticity_variables = [
    'vorticity',
    'u_component_of_wind',
    'v_component_of_wind',
    'geopotential'
]

# Thermodynamic analysis
thermal_variables = [
    'temperature',
    'potential_temperature',
    'specific_humidity',
    'relative_humidity'
]

# Download for different analysis types
analysis_types = {
    'vorticity_analysis': vorticity_variables,
    'thermal_analysis': thermal_variables
}

for analysis_name, variables in analysis_types.items():
    output_file = f'era5_{analysis_name}_202301.nc'

    download_era5_data(
        variables=variables,
        pressure_levels=[850, 700, 500],
        start_date='2023-01-01',
        end_date='2023-01-31',
        area=[10, -80, -40, -30],
        output_file=output_file,
        logger=logger
    )

Technical Notes

API Limitations

Be aware of CDS API limitations:

  • Request Size: Large requests may be queued or rejected

  • Rate Limits: Respect API rate limits to avoid blocking

  • Concurrent Requests: Limit simultaneous downloads

  • Data Volume: Monitor your CDS quota usage

Error Handling

The module handles various error conditions:

  • Authentication Errors: Invalid or missing API credentials

  • Network Issues: Connection timeouts and interruptions

  • API Errors: Server-side processing failures

  • File System Errors: Disk space and permission issues

Performance Optimization

For optimal performance:

  • Request Sizing: Balance between large requests and API limits

  • Sequential Downloads: Process requests sequentially to avoid conflicts

  • Local Caching: Avoid re-downloading existing data

  • Progress Monitoring: Use logging to track download progress

Data Quality

ERA5 data characteristics:

  • Temporal Resolution: Hourly data available

  • Spatial Resolution: Approximately 31 km (0.25° × 0.25°)

  • Vertical Levels: 37 pressure levels from 1000 to 1 hPa

  • Data Format: NetCDF with CF conventions

  • Quality Control: Extensive quality assurance in reanalysis

ERA5 Data Download Module

This module provides functionality for downloading ERA5 atmospheric reanalysis data from the Copernicus Climate Data Store (CDS). It handles authentication, data requests, and automatic file management for ATMOS-BUD workflows.

Author: Danilo Couto de Souza Date: 2024

src.get_era5_data.download_era5_data(variables: List[str], pressure_levels: List[int], start_date: str, end_date: str, area: List[float], output_file: str, hours: List[str] | None = None, logger: Logger | None = None) None[source]

Downloads ERA5 reanalysis data from the Copernicus Climate Data Store using the modern CDSAPI.

This function provides a flexible interface for downloading ERA5 atmospheric data with configurable variables, pressure levels, time periods, and spatial domains. Uses the modern CDSAPI 0.7.6+ syntax with list-based parameters.

Parameters:
  • variables (List[str]) – List of ERA5 variable names to download. Examples: - ‘temperature’ - ‘u_component_of_wind’ - ‘v_component_of_wind’ - ‘geopotential’ - ‘vorticity’ - ‘specific_humidity’

  • pressure_levels (List[int]) – List of pressure levels in hPa. Examples: [1000, 925, 850, 700, 500, 300]

  • start_date (str) – Start date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-01’)

  • end_date (str) – End date in ‘YYYY-MM-DD’ format (e.g., ‘2023-01-31’)

  • area (List[float]) – Spatial domain as [North, West, South, East] in decimal degrees. Example: [20, -80, -60, -20] for South America region

  • output_file (str) – Output filename for the downloaded NetCDF file

  • hours (Optional[List[str]], default None) – List of hours in ‘HH:MM’ format. If None, uses [‘00:00’, ‘06:00’, ‘12:00’, ‘18:00’]

  • logger (Optional[logging.Logger], default None) – Logger object for progress tracking and error reporting

Returns:

Downloads file directly to specified output path

Return type:

None

Raises:

Exception – If download fails due to authentication, network, or API errors

Notes

  • Requires valid CDS API credentials in ~/.cdsapirc

  • Uses modern CDSAPI syntax (version 0.7.6+)

  • Automatically handles date range conversion to required format

  • Includes progress monitoring and comprehensive error handling

Examples

>>> # Basic usage for atmospheric budget analysis
>>> variables = ['temperature', 'u_component_of_wind', 'v_component_of_wind', 'geopotential']
>>> levels = [850, 700, 500, 300]
>>> download_era5_data(
...     variables=variables,
...     pressure_levels=levels,
...     start_date='2023-01-01',
...     end_date='2023-01-31',
...     area=[10, -80, -40, -30],
...     output_file='era5_january2023.nc'
... )
>>> # Custom time selection
>>> download_era5_data(
...     variables=['vorticity', 'geopotential'],
...     pressure_levels=[850],
...     start_date='2023-06-15',
...     end_date='2023-06-15',
...     area=[-10, -70, -30, -40],
...     output_file='era5_single_day.nc',
...     hours=['00:00', '12:00']
... )
src.get_era5_data.download_era5_data_legacy()[source]

Legacy download function for backward compatibility.

Downloads specific ERA5 data for the 2005-08-08 to 2005-08-14 case study over the South America region. This function maintains compatibility with existing workflows while the main download_era5_data function provides more flexibility.

Returns:

Downloads data to ‘system-20050808_ERA5.nc’

Return type:

None

Notes

This function is maintained for backward compatibility. For new workflows, use the main download_era5_data function which provides more flexibility.