Data model

This section presents the data model of DBnomics.

The data model is implemented with classes defined in dbnomics_toolbox.model.

Those classes represent data in memory and have no methods to read or write data. Read the data storage page for that.

Overview

DBnomics data model overview

Provider metadata

The ProviderMetadata class represents metadata about a provider.

Example:

from dbnomics_toolbox.model import ProviderMetadata

ProviderMetadata.create(
    code="INSEE",
    name="Institut national de la statistique et des études économiques",
    # Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
    region="FR",
    terms_of_use="https://www.insee.fr/fr/information/2381863",
    website="https://www.insee.fr/",
)

On DBnomics website that metadata is displayed on each provider page, in this case https://db.nomics.world/INSEE.

See also: BaseStorage.load_provider_metadata and BaseStorage.save_provider_metadata.

Category tree

The CategoryTree class represents the category tree of datasets.

The tree has no root node: it starts with a list of nodes.

Each node can be either a reference to a dataset by using the DatasetReference class, or a category by using the Category class.

Example:

from dbnomics_toolbox.model import Category, CategoryTree, DatasetReference

CategoryTree(
    children=[
        Category.create(
            children=[
                DatasetReference.create(
                    "POP",
                    name="Population",
                )
            ],
            code="KEI",
            name="Key economic indicators",
        )
    ]
)

See also: BaseStorage.load_category_tree and BaseStorage.save_category_tree.

Dataset metadata

The DatasetMetadata class represents metadata about a dataset.

The sample dataset detailed in the concepts page can be represented like this:

from dbnomics_toolbox.model import DatasetMetadata

model_factory = ModelFactory()

dataset_metadata = DatasetMetadata.create(
    "PRODUCT_PRICES",
    dimensions=[
        model_factory.create_dimension(
          "SKU",
          label="Stock keeping unit",
          values=[
            model_factory.create_dimension_value("111", label="Computer"),
            model_factory.create_dimension_value("222", label="Smartphone"),
            model_factory.create_dimension_value("333", label="Television"),
          ],
        ),
        model_factory.create_dimension(
          "COUNTRY",
          label="Country",
          values=[
            model_factory.create_dimension_value("DE", label="Germany"),
            model_factory.create_dimension_value("FR", label="France"),
          ],
        ),
    ],
    name="Product prices",
)

Representing the actual time series is detailed in the next section.

See also: BaseStorage.load_dataset_metadata and BaseStorage.save_dataset_metadata.

Series

The Series and Observation classes represent series metadata and observations.

See also: BaseStorage.load_series, BaseStorage.iter_dataset_series and BaseStorage.save_series.

Auto-generated series code

To represent one of the series of the sample dataset detailed in the concepts page:

from dbnomics_toolbox.model.factories.model_factory import ModelFactory

model_factory = ModelFactory()

series = model_factory.create_series(
    dataset_dimensions=dataset_metadata.dimensions,
    dimensions={
        "COUNTRY": "FR",
        "SKU": "111",
    },
    observations=[
        model_factory.create_observation(period="2000", value=12),
        model_factory.create_observation(period="2001", value=13),
        model_factory.create_observation(period="2002", value=11),
    ],
)
print(series)
Series(
    code='111.FR',
    dimensions={'COUNTRY': 'FR', 'SKU': '111'},
    observations=[
        Observation(period=YearPeriod(year_num=2000), value=12, attributes={}),
        Observation(period=YearPeriod(year_num=2001), value=13, attributes={}),
        Observation(period=YearPeriod(year_num=2002), value=11, attributes={}),
    ],
)

Note that the series code 111.FR was auto-generated as it was not given to the create_series method. Dataset dimensions were used to determine the order of dimensions.

Arbitrary series code

An arbitrary series code can also be given:

from dbnomics_toolbox.model.factories.model_factory import ModelFactory

model_factory = ModelFactory()

series = model_factory.create_series(
    code="foo",
    dimensions={
        "COUNTRY": "FR",
        "SKU": "111",
    },
)
print(series)
Series(
    code='foo',
    dimensions={'COUNTRY': 'FR', 'SKU': '111'},
    observations=[],
)

Periods

There are different period types, each one being represented by a model class.

Those classes can parse str periods and their instances can be serialized to str.

Period type

Text representation

Example

Model class

year

{YYYY}

2025

YearPeriod

semester (6 months)

{YYYY}-S{S:1-2}

2025-S1

SemesterPeriod

quarter

{YYYY}-Q{Q:1-4}

2025-Q1

QuarterPeriod

bimester (2 months)

{YYYY}-B{N:1-6}

2025-B1

BimesterPeriod

month

{YYYY}-{MM}

2025-01

MonthPeriod

week

{YYYY}-W{WW}

2025-W01

WeekPeriod

day

{YYYY}-{MM}-{DD}

2025-01-01

DayPeriod

NA values

To represent a NA (non-available) value, create an Observation with a value of None:

from dbnomics_toolbox.model.factories.model_factory import ModelFactory

model_factory = ModelFactory()

series = model_factory.create_series(
    code="foo",
    observations=[
        model_factory.create_observation(period="2000", value=12),
        model_factory.create_observation(period="2001", value=None),
        model_factory.create_observation(period="2002", value=11),
    ],
)

Creating model instances

There are several ways to create a model class instance, each one offering a certain degree of convenience and/or performance.

Each model class defines a standard __init__ method which expects already parsed valid values.

In the following example using ProviderMetadata.__init__, the code kwarg must be given a parsed ProviderCode, and the terms_of_use and website kwargs must be given a parsed PublicUrl:

from dbnomics_toolbox.model.identifiers.types import ProviderCode
from dbnomics_toolbox.model.provider_metadata import ProviderMetadata
from dbnomics_toolbox.model.url import PublicUrl

ProviderMetadata(
    code=ProviderCode.parse("INSEE"),
    name="Institut national de la statistique et des études économiques",
    # Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
    region="FR",
    terms_of_use=PublicUrl.parse("https://www.insee.fr/fr/information/2381863"),
    website=PublicUrl.parse("https://www.insee.fr/"),
)

Some model classes define a create classmethod acting as an alternative constructor that offers more relaxed arguments.

In the following example using ProviderMetadata.create, the code, terms_of_use and website kwargs can be given str values that will be parsed by the create classmethod before calling the corresponding __init__ method:

from dbnomics_toolbox.model import ProviderMetadata

ProviderMetadata.create(
    code="INSEE",
    name="Institut national de la statistique et des études économiques",
    # Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
    region="FR",
    terms_of_use="https://www.insee.fr/fr/information/2381863",
    website="https://www.insee.fr/",
)

The create classmethods makes the life easier for the caller, but it may be inefficient to parse the same values again and again, especially for very frequent values like dimension codes or periods.

That’s why the ModelFactory exists: it offers the same convenience as the create classmethods, but prevents parsing the same value twice by maintaining a cache of the already parsed values. This is particularly useful during the data conversion process where many datasets are processed sequentially: the same ModelFactory instance is kept during all the life of the converter.

The following example uses ModelFactory.create_dimension:

from dbnomics_toolbox.model import Dimension
from dbnomics_toolbox.model.factories.model_factory import ModelFactory

model_factory = ModelFactory()
model_factory.create_dimension(
  "COUNTRY",
  label="Country",
  values=[
    model_factory.create_dimension_value("DE", label="Germany"),
    model_factory.create_dimension_value("FR", label="France"),
  ],
)

Some classes like Dimension, DimensionValue, Attribute, AttributeValue do not even define a create classmethod to avoid incitating users to write inefficient code: the same codes would be parsed too often.

Note: the ModelFactory is used internally by the _create_* methods of the BaseDatasetConverter class.