Data model¶
This section presents the data model of DBnomics.
The data model is implemented with classes defined in dbnomics_toolbox.model.
Those classes represent data in memory and have no methods to read or write data. Read the data storage page for that.
Overview¶
Provider metadata¶
The ProviderMetadata class represents metadata about a provider.
Example:
from dbnomics_toolbox.model import ProviderMetadata
ProviderMetadata.create(
code="INSEE",
name="Institut national de la statistique et des études économiques",
# Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
region="FR",
terms_of_use="https://www.insee.fr/fr/information/2381863",
website="https://www.insee.fr/",
)
On DBnomics website that metadata is displayed on each provider page, in this case https://db.nomics.world/INSEE.
See also: BaseStorage.load_provider_metadata and BaseStorage.save_provider_metadata.
Category tree¶
The CategoryTree class represents the category tree of datasets.
The tree has no root node: it starts with a list of nodes.
Each node can be either a reference to a dataset by using the DatasetReference class, or a category by using the Category class.
Example:
from dbnomics_toolbox.model import Category, CategoryTree, DatasetReference
CategoryTree(
children=[
Category.create(
children=[
DatasetReference.create(
"POP",
name="Population",
)
],
code="KEI",
name="Key economic indicators",
)
]
)
See also: BaseStorage.load_category_tree and BaseStorage.save_category_tree.
Dataset metadata¶
The DatasetMetadata class represents metadata about a dataset.
The sample dataset detailed in the concepts page can be represented like this:
from dbnomics_toolbox.model import DatasetMetadata
model_factory = ModelFactory()
dataset_metadata = DatasetMetadata.create(
"PRODUCT_PRICES",
dimensions=[
model_factory.create_dimension(
"SKU",
label="Stock keeping unit",
values=[
model_factory.create_dimension_value("111", label="Computer"),
model_factory.create_dimension_value("222", label="Smartphone"),
model_factory.create_dimension_value("333", label="Television"),
],
),
model_factory.create_dimension(
"COUNTRY",
label="Country",
values=[
model_factory.create_dimension_value("DE", label="Germany"),
model_factory.create_dimension_value("FR", label="France"),
],
),
],
name="Product prices",
)
Representing the actual time series is detailed in the next section.
See also: BaseStorage.load_dataset_metadata and BaseStorage.save_dataset_metadata.
Series¶
The Series and Observation classes represent series metadata and observations.
See also: BaseStorage.load_series, BaseStorage.iter_dataset_series and BaseStorage.save_series.
Auto-generated series code¶
To represent one of the series of the sample dataset detailed in the concepts page:
from dbnomics_toolbox.model.factories.model_factory import ModelFactory
model_factory = ModelFactory()
series = model_factory.create_series(
dataset_dimensions=dataset_metadata.dimensions,
dimensions={
"COUNTRY": "FR",
"SKU": "111",
},
observations=[
model_factory.create_observation(period="2000", value=12),
model_factory.create_observation(period="2001", value=13),
model_factory.create_observation(period="2002", value=11),
],
)
print(series)
Series(
code='111.FR',
dimensions={'COUNTRY': 'FR', 'SKU': '111'},
observations=[
Observation(period=YearPeriod(year_num=2000), value=12, attributes={}),
Observation(period=YearPeriod(year_num=2001), value=13, attributes={}),
Observation(period=YearPeriod(year_num=2002), value=11, attributes={}),
],
)
Note that the series code 111.FR was auto-generated as it was not given to the create_series method.
Dataset dimensions were used to determine the order of dimensions.
Arbitrary series code¶
An arbitrary series code can also be given:
from dbnomics_toolbox.model.factories.model_factory import ModelFactory
model_factory = ModelFactory()
series = model_factory.create_series(
code="foo",
dimensions={
"COUNTRY": "FR",
"SKU": "111",
},
)
print(series)
Series(
code='foo',
dimensions={'COUNTRY': 'FR', 'SKU': '111'},
observations=[],
)
Periods¶
There are different period types, each one being represented by a model class.
Those classes can parse str periods and their instances can be serialized to str.
Period type |
Text representation |
Example |
Model class |
|---|---|---|---|
year |
|
|
|
semester (6 months) |
|
|
|
|
|
||
bimester (2 months) |
|
|
|
month |
|
|
|
week |
|
|
|
day |
|
|
NA values¶
To represent a NA (non-available) value, create an Observation with a value of None:
from dbnomics_toolbox.model.factories.model_factory import ModelFactory
model_factory = ModelFactory()
series = model_factory.create_series(
code="foo",
observations=[
model_factory.create_observation(period="2000", value=12),
model_factory.create_observation(period="2001", value=None),
model_factory.create_observation(period="2002", value=11),
],
)
Creating model instances¶
There are several ways to create a model class instance, each one offering a certain degree of convenience and/or performance.
Each model class defines a standard __init__ method which expects already parsed valid values.
In the following example using ProviderMetadata.__init__, the code kwarg must be given a parsed ProviderCode, and the terms_of_use and website kwargs must be given a parsed PublicUrl:
from dbnomics_toolbox.model.identifiers.types import ProviderCode
from dbnomics_toolbox.model.provider_metadata import ProviderMetadata
from dbnomics_toolbox.model.url import PublicUrl
ProviderMetadata(
code=ProviderCode.parse("INSEE"),
name="Institut national de la statistique et des études économiques",
# Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
region="FR",
terms_of_use=PublicUrl.parse("https://www.insee.fr/fr/information/2381863"),
website=PublicUrl.parse("https://www.insee.fr/"),
)
Some model classes define a create classmethod acting as an alternative constructor that offers more relaxed arguments.
In the following example using ProviderMetadata.create, the code, terms_of_use and website kwargs can be given str values that will be parsed by the create classmethod before calling the corresponding __init__ method:
from dbnomics_toolbox.model import ProviderMetadata
ProviderMetadata.create(
code="INSEE",
name="Institut national de la statistique et des études économiques",
# Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
region="FR",
terms_of_use="https://www.insee.fr/fr/information/2381863",
website="https://www.insee.fr/",
)
The create classmethods makes the life easier for the caller, but it may be inefficient to parse the same values again and again, especially for very frequent values like dimension codes or periods.
That’s why the ModelFactory exists: it offers the same convenience as the create classmethods, but prevents parsing the same value twice by maintaining a cache of the already parsed values.
This is particularly useful during the data conversion process where many datasets are processed sequentially: the same ModelFactory instance is kept during all the life of the converter.
The following example uses ModelFactory.create_dimension:
from dbnomics_toolbox.model import Dimension
from dbnomics_toolbox.model.factories.model_factory import ModelFactory
model_factory = ModelFactory()
model_factory.create_dimension(
"COUNTRY",
label="Country",
values=[
model_factory.create_dimension_value("DE", label="Germany"),
model_factory.create_dimension_value("FR", label="France"),
],
)
Some classes like Dimension, DimensionValue, Attribute, AttributeValue do not even define a create classmethod to avoid incitating users to write inefficient code: the same codes would be parsed too often.
Note: the ModelFactory is used internally by the _create_* methods of the BaseDatasetConverter class.