Architecture¶

This page explains the internal design of colombia-hydrodata — how the three main layers relate to each other and how a single API call eventually hands you a pandas / geopandas DataFrame ready for analysis.

The Three-Layer Hierarchy¶

flowchart TD
    User(["👤 Your code"])

    subgraph Layer1["Layer 1 — Entry point"]
        Client["<b>Client</b>\nCoordinates all I/O\nHolds base URLs & catalog"]
    end

    subgraph Layer2["Layer 2 — Hydrological station"]
        Station["<b>Station</b>\nMetadata + variable catalogue\nGives access to time series"]
    end

    subgraph Layer3["Layer 3 — Measurement record"]
        Dataset["<b>Dataset</b>\nSingle variable time series\nReturns a DataFrame"]
    end

    subgraph Sources["External data sources"]
        CNE["CNE Catalog\ndatos.gov.co / SODA API"]
        AQ["Aquarius WebPortal\nTime-series API"]
    end

    User -->|"fetch_stations()\nfetch_bbox()\nfetch_region()"| Client
    Client -->|"HTTP GET — station list"| CNE
    Client -->|"builds"| Station
    Station -->|"HTTP GET — variable list"| AQ
    Station -->|"builds"| Dataset
    Dataset -->|"HTTP GET — time series"| AQ
    Dataset -->|"returns"| DataFrame(["pd.DataFrame"])

Mermaid diagrams

Material for MkDocs renders Mermaid diagrams automatically when you add mermaid to the markdown_extensions list in mkdocs.yml. See the Material docs for setup instructions.

Layer 1 — `Client`¶

Client is the single object your code instantiates. It is responsible for:

Holding the base URLs for the CNE catalog (datos.gov.co) and the Aquarius WebPortal.
Fetching and caching the CNE catalog as a GeoDataFrame at construction time.
Performing spatial and list-based station lookups, returning either GeoDataFrame slices (lightweight helpers) or fully hydrated Station objects.

# Simplified internal flow of Client.fetch_station()

# 1. Look up station_id in client.catalog (already in memory)
# 2. Call station_data(), station_location_data(), station_hydrographic_data()
# 3. Call station_datasets() via Aquarius WebPortal
# 4. Construct and return a frozen Station dataclass

Lightweight vs full fetch

Client.stations_in_list() and Client.stations_in_region() skip Step 2–4 entirely — they return a GeoDataFrame slice directly from client.catalog without making any additional network calls. Use these when you only need station locations or metadata.

Layer 2 — `Station`¶

Station is a frozen dataclass populated once at construction time. When Client.fetch_station() is called, it immediately fetches all metadata from the CNE catalog, the Aquarius variable list, and stores everything in a single immutable object — there is no lazy loading.

Attribute / method	What it gives you	Network call?
`.id`, `.name`, `.category`, …	CNE metadata fields	❌
`.location`	`Location` (altitude, lon, lat)	❌
`.hydrographic`	`Hydrographic` (area, zone, subzone)	❌
`.variables`	`dict[str, Variable]` of `PARAM@LABEL` keys	❌
`key in station`	Membership test for a variable key	❌
`station.fetch(key)` / `station[key]`	A `Dataset` for one variable	✅

# Station is populated in one shot when fetch_station() is called
station = client.fetch_station("21017010")

print(station.variables)
# {
#   'NIVEL@NV_MEDIA_D':    Variable(param='NIVEL', label='NV_MEDIA_D', id=123),
#   'CAUDAL@HIS_Q_MEDIA_D': Variable(param='CAUDAL', label='HIS_Q_MEDIA_D', id=456),
#   ...
# }

Why does `Station` exist?¶

Without this layer, your code would need to know Aquarius dataset IDs up front. Station hides that complexity: it translates the human-friendly PARAM@LABEL key format (derived from CNE + Aquarius metadata) into the internal numeric dataset ID that the Aquarius API requires.

Layer 3 — `Dataset`¶

A Dataset is a plain dataclass that links a Station, a Variable, and the fetched time-series DataFrame. The actual data is retrieved when station.fetch(key) or station[key] is called.

ds = station["CAUDAL@HIS_Q_MEDIA_D"]

print(ds.data.head())
# Returns a pd.DataFrame with 'timestamp' and 'value' columns

Attribute	Description
`.station`	The parent `Station` object
`.variable`	`Variable` descriptor (`param`, `label`, `id`)
`.data`	`pd.DataFrame` with `timestamp` and `value` columns

Data Flow End-to-End¶

The sequence below traces a single call from user code all the way to a DataFrame you can plot.

Step-by-step narrativeCode walkthroughASCII diagram (no Mermaid)

You call Client.fetch_station("21017010").
Client looks up station 21017010 in client.catalog (the GeoDataFrame loaded at construction time).
Calls station_data(), station_location_data(), and station_hydrographic_data() to gather CNE metadata fields.
Calls station_datasets() on the Aquarius WebPortal to discover all available variables for the station.
All gathered data is bundled into a frozen Station dataclass and returned.
You call station["CAUDAL@HIS_Q_MEDIA_D"]. This resolves the key, calls the Aquarius dataset endpoint, and returns a Dataset wrapping the resulting pd.DataFrame.

from colombia_hydrodata import Client

# Layer 1 — Client (fetches CNE catalog on instantiation)
client = Client()

# Layer 1 → Layer 2: fetch_station fetches all metadata in one call
station = client.fetch_station("21017010")
print(station.name, station.department)

# Layer 2: inspect available variables (already in memory)
if "CAUDAL@HIS_Q_MEDIA_D" in station:
    # Layer 2 → Layer 3: fetches time series from Aquarius
    ds = station["CAUDAL@HIS_Q_MEDIA_D"]

    # Layer 3 — Dataset
    print(ds.data.head())

Your code
    │
    ▼
┌─────────────────────────────────────────────┐
│  Client                                     │
│  • fetch_stations() / fetch_bbox() / …      │
│  • Queries CNE via SODA API                 │
└───────────────────┬─────────────────────────┘
                    │  builds N×
                    ▼
┌─────────────────────────────────────────────┐
│  Station                                    │
│  • Stores CNE metadata                      │
│  • Lazily fetches variable list (Aquarius)  │
└───────────────────┬─────────────────────────┘
                    │  builds 1×
                    ▼
┌─────────────────────────────────────────────┐
│  Dataset                                    │
│  • One variable / one station               │
│  • get_data() → pd.DataFrame                │
└─────────────────────────────────────────────┘

Design Principles¶

Why two external sources?

The CNE catalog (datos.gov.co) is the authoritative register of Colombian hydrological stations — it has location, administrative region, and operational status. Aquarius holds the actual measurements. Neither source alone gives you both. See Data Sources for more detail.

Why a Filters object instead of plain keyword arguments?

Wrapping filter criteria in a dedicated Filters dataclass makes filters composable and reusable: build one Filters instance and pass it to fetch_bbox, fetch_region, and filter_stations without repeating yourself.

Architecture¶

The Three-Layer Hierarchy¶

Layer 1 — Client¶

Layer 2 — Station¶

Why does Station exist?¶

Layer 3 — Dataset¶

Data Flow End-to-End¶

Design Principles¶

Layer 1 — `Client`¶

Layer 2 — `Station`¶

Why does `Station` exist?¶

Layer 3 — `Dataset`¶