Skip to content

Dataset

Dataset is a plain dataclass that bundles a Station reference, a Variable descriptor, and the fetched time-series data. It is returned by station.fetch(key) or station[key]. Access the measurements directly via dataset.data, which is a pandas.DataFrame with timestamp (datetime) and value (float64) columns. Plotting helpers are available through the dataset.plot property.

colombia_hydrodata.dataset.Dataset dataclass

Holds time-series data for a single variable measured at a station.

Attributes:

Name Type Description
station Station

The station at which the variable was measured.

variable Variable

The hydrological or meteorological variable being recorded.

data DataFrame

A DataFrame containing the time-series observations for the variable, as retrieved from the Aquarius data source.

plot property

Return a plotting helper bound to this dataset.

Provides convenient access to the plotting API via dataset.plot.<method>() without storing plotting logic directly on the dataset class itself.

Returns:

Name Type Description
A DatasetPlot

class:colombia_hydrodata.plot.DatasetPlot instance linked to

DatasetPlot

the current dataset.

from_variable(station, variable) classmethod

Construct a Dataset by fetching data for the given variable from Aquarius.

Parameters:

Name Type Description Default
station Station

The station associated with the variable.

required
variable Variable

The variable whose time-series data should be fetched.

required

Returns:

Type Description
Self

A new Dataset instance populated with the fetched data.

__str__()

Return a human-readable summary of the dataset.

Returns:

Type Description
str

A comma-separated string containing the station name, station ID,

str

municipality, department, and variable description.

sight_level(level)

Adjust observed stage values by subtracting the sight level reference.

The sight level is the difference between the observable (staff gauge) reading and the absolute sea-level elevation of the zero mark. Applying it converts raw gauge readings into elevation-referenced stage values, enabling meaningful comparison of water levels across stations along the same river reach.

Note

This method is intended exclusively for stage datasets, i.e. variables whose key starts with 'NIVEL'. Applying it to other variable types produces meaningless results.

Parameters:

Name Type Description Default
level float

The sight level offset (in the same units as the observed values, typically metres) to subtract from every observation.

required

Returns:

Type Description
Self

A new Dataset instance with adjusted value column, leaving

Self

the original unchanged.

rescale(scale)

Convert observed values from one measurement unit to another.

Multiplies every value in the series by a conversion factor, allowing unit transformations without altering the underlying data source.

Example

To convert stage readings from centimetres to metres::

dataset.rescale(1 / 100)

Parameters:

Name Type Description Default
scale float

The multiplicative conversion factor to apply to all observed values. Must be set by the caller according to the desired unit transformation (e.g. 1/100 for cm → m, 1/1000 for mm → m).

required

Returns:

Type Description
Self

A new Dataset instance with rescaled value column, leaving

Self

the original unchanged.

interpolate(time_precision=None, **kwargs)

Resample the time series to a regular frequency and interpolate missing values.

Resamples the dataset to a uniform time grid, introducing NaN at any timestamps where no measurement was recorded, then fills those gaps using :meth:pandas.DataFrame.interpolate.

The target frequency can be supplied explicitly or derived automatically from the variable label. Variable labels follow the convention "<PARAM>_<FREQ>", where <FREQ> is a single-character code ('A' annual, 'M' monthly, 'D' daily, 'H' hourly) that is mapped to the corresponding pandas offset alias via time_precision_options.

Parameters:

Name Type Description Default
time_precision str | None

A pandas offset alias (e.g. 'D', 'ME', 'H') that defines the target resampling frequency. If None, the frequency is inferred from the trailing segment of self.variable.label.

None
**kwargs Any

Additional keyword arguments forwarded to :meth:pandas.DataFrame.interpolate (e.g. method='linear', limit=3).

{}

Returns:

Type Description
Self

A new Dataset instance with a regularly spaced timestamp index

Self

and interpolated value column, leaving the original unchanged.

Raises:

Type Description
ValueError

If time_precision is None and the variable label does not contain a recognised time-precision code.

detrend(**kwargs)

Remove the trend component from the dataset's value series.

Delegates to :func:colombia_hydrodata.utils.tsa.detrend. The resulting trend and detrended columns are added to a copy of the underlying DataFrame.

Parameters:

Name Type Description Default
**kwargs Any

Keyword arguments forwarded to :func:~colombia_hydrodata.utils.tsa.detrend (e.g. trend='ma', window=12).

{}

Returns:

Type Description
Self

A new Dataset instance with trend and detrended columns

Self

appended to the data, leaving the original unchanged.

seasonal()

Compute the seasonal component from the detrended series.

Delegates to :func:colombia_hydrodata.utils.tsa.seasonal_series. Must be called after :meth:detrend.

Returns:

Type Description
Self

A new Dataset instance with a seasonal column appended to

Self

the data, leaving the original unchanged.

Raises:

Type Description
KeyError

If the detrended column is not present in the data.

anomalies()

Compute anomalies by removing the seasonal component.

Delegates to :func:colombia_hydrodata.utils.tsa.anomalies_series. Must be called after :meth:seasonal.

Returns:

Type Description
Self

A new Dataset instance with an anomalies column appended to

Self

the data, leaving the original unchanged.

Raises:

Type Description
KeyError

If the seasonal column is not present in the data.

deconstruction(**kwargs)

Fully decompose the value series in a single step.

Delegates to :func:colombia_hydrodata.utils.tsa.deconstruction, running detrending, seasonal estimation, and anomaly extraction at once. Replaces the entire DataFrame with the decomposition result.

Parameters:

Name Type Description Default
**kwargs Any

Keyword arguments forwarded to :func:~colombia_hydrodata.utils.tsa.deconstruction (e.g. trend='ma', window=12).

{}

Returns:

Type Description
Self

A new Dataset instance whose data contains columns: timestamp,

Self

value, trend, detrended, seasonal, and

Self

anomalies.

options: show_source: false show_root_heading: true show_symbol_type_heading: true show_symbol_type_toc: true members_order: source