# Data model This section presents the data model of DBnomics. The data model is implemented with classes defined in [`dbnomics_toolbox.model`](dbnomics_toolbox.model). Those classes represent data in memory and have no methods to read or write data. Read the [data storage](data-storage.md) page for that. ## Overview ![DBnomics data model overview](_static/data-model-overview.drawio.svg) ## Provider metadata The [`ProviderMetadata`](ProviderMetadata) class represents metadata about a provider. Example: ```python from dbnomics_toolbox.model import ProviderMetadata ProviderMetadata.create( code="INSEE", name="Institut national de la statistique et des études économiques", # Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 region="FR", terms_of_use="https://www.insee.fr/fr/information/2381863", website="https://www.insee.fr/", ) ``` On DBnomics website that metadata is displayed on each provider page, in this case . See also: [`BaseStorage.load_provider_metadata`](BaseStorage.load_provider_metadata) and [`BaseStorage.save_provider_metadata`](BaseStorage.save_provider_metadata). ## Category tree The [`CategoryTree`](CategoryTree) class represents the category tree of datasets. The tree has no root node: it starts with a list of nodes. Each node can be either a reference to a dataset by using the [`DatasetReference`](DatasetReference) class, or a category by using the [`Category`](Category) class. Example: ```python from dbnomics_toolbox.model import Category, CategoryTree, DatasetReference CategoryTree( children=[ Category.create( children=[ DatasetReference.create( "POP", name="Population", ) ], code="KEI", name="Key economic indicators", ) ] ) ``` See also: [`BaseStorage.load_category_tree`](BaseStorage.load_category_tree) and [`BaseStorage.save_category_tree`](BaseStorage.save_category_tree). ## Dataset metadata The [`DatasetMetadata`](DatasetMetadata) class represents metadata about a dataset. The [sample dataset](concepts.md#multi-dimensional-datasets) detailed in the concepts page can be represented like this: ```python from dbnomics_toolbox.model import DatasetMetadata model_factory = ModelFactory() dataset_metadata = DatasetMetadata.create( "PRODUCT_PRICES", dimensions=[ model_factory.create_dimension( "SKU", label="Stock keeping unit", values=[ model_factory.create_dimension_value("111", label="Computer"), model_factory.create_dimension_value("222", label="Smartphone"), model_factory.create_dimension_value("333", label="Television"), ], ), model_factory.create_dimension( "COUNTRY", label="Country", values=[ model_factory.create_dimension_value("DE", label="Germany"), model_factory.create_dimension_value("FR", label="France"), ], ), ], name="Product prices", ) ``` Representing the actual time series is detailed in the next section. See also: [`BaseStorage.load_dataset_metadata`](BaseStorage.load_dataset_metadata) and [`BaseStorage.save_dataset_metadata`](BaseStorage.save_dataset_metadata). ## Series The [`Series`](Series) and [`Observation`](Observation) classes represent series metadata and observations. See also: [`BaseStorage.load_series`](BaseStorage.load_series), [`BaseStorage.iter_dataset_series`](BaseStorage.iter_dataset_series) and [`BaseStorage.save_series`](BaseStorage.save_series). ### Auto-generated series code To represent one of the series of the [sample dataset](concepts.md#multi-dimensional-datasets) detailed in the concepts page: ```python from dbnomics_toolbox.model.factories.model_factory import ModelFactory model_factory = ModelFactory() series = model_factory.create_series( dataset_dimensions=dataset_metadata.dimensions, dimensions={ "COUNTRY": "FR", "SKU": "111", }, observations=[ model_factory.create_observation(period="2000", value=12), model_factory.create_observation(period="2001", value=13), model_factory.create_observation(period="2002", value=11), ], ) print(series) ``` ```python Series( code='111.FR', dimensions={'COUNTRY': 'FR', 'SKU': '111'}, observations=[ Observation(period=YearPeriod(year_num=2000), value=12, attributes={}), Observation(period=YearPeriod(year_num=2001), value=13, attributes={}), Observation(period=YearPeriod(year_num=2002), value=11, attributes={}), ], ) ``` Note that the series code `111.FR` was auto-generated as it was not given to the `create_series` method. Dataset dimensions were used to determine the order of dimensions. ### Arbitrary series code An arbitrary series code can also be given: ```python from dbnomics_toolbox.model.factories.model_factory import ModelFactory model_factory = ModelFactory() series = model_factory.create_series( code="foo", dimensions={ "COUNTRY": "FR", "SKU": "111", }, ) print(series) ``` ```python Series( code='foo', dimensions={'COUNTRY': 'FR', 'SKU': '111'}, observations=[], ) ``` ### Periods There are different period types, each one being represented by a model class. Those classes can parse `str` periods and their instances can be serialized to `str`. | Period type | Text representation | Example | Model class | | ------------------------------------------------------------------------- | ------------------- | ------------ | ---------------------------------- | | year | `{YYYY}` | `2025` | [`YearPeriod`](YearPeriod) | | [semester](https://en.wikipedia.org/wiki/Academic_term) (6 months) | `{YYYY}-S{S:1-2}` | `2025-S1` | [`SemesterPeriod`](SemesterPeriod) | | [quarter](https://en.wikipedia.org/wiki/Academic_quarter_(year_division)) | `{YYYY}-Q{Q:1-4}` | `2025-Q1` | [`QuarterPeriod`](QuarterPeriod) | | [bimester](https://en.wikipedia.org/wiki/Academic_term) (2 months) | `{YYYY}-B{N:1-6}` | `2025-B1` | [`BimesterPeriod`](BimesterPeriod) | | month | `{YYYY}-{MM}` | `2025-01` | [`MonthPeriod`](MonthPeriod) | | week | `{YYYY}-W{WW}` | `2025-W01` | [`WeekPeriod`](WeekPeriod) | | day | `{YYYY}-{MM}-{DD}` | `2025-01-01` | [`DayPeriod`](DayPeriod) | ### NA values To represent a NA (non-available) value, create an [`Observation`](Observation) with a value of `None`: ```python from dbnomics_toolbox.model.factories.model_factory import ModelFactory model_factory = ModelFactory() series = model_factory.create_series( code="foo", observations=[ model_factory.create_observation(period="2000", value=12), model_factory.create_observation(period="2001", value=None), model_factory.create_observation(period="2002", value=11), ], ) ``` ## Creating model instances There are several ways to create a model class instance, each one offering a certain degree of convenience and/or performance. Each model class defines a standard `__init__` method which expects already parsed valid values. In the following example using [`ProviderMetadata.__init__`](ProviderMetadata.__init__), the `code` kwarg must be given a parsed [`ProviderCode`](ProviderCode), and the `terms_of_use` and `website` kwargs must be given a parsed [`PublicUrl`](PublicUrl): ```python from dbnomics_toolbox.model.identifiers.types import ProviderCode from dbnomics_toolbox.model.provider_metadata import ProviderMetadata from dbnomics_toolbox.model.url import PublicUrl ProviderMetadata( code=ProviderCode.parse("INSEE"), name="Institut national de la statistique et des études économiques", # Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 region="FR", terms_of_use=PublicUrl.parse("https://www.insee.fr/fr/information/2381863"), website=PublicUrl.parse("https://www.insee.fr/"), ) ``` Some model classes define a `create` classmethod acting as an alternative constructor that offers more relaxed arguments. In the following example using [`ProviderMetadata.create`](ProviderMetadata.create), the `code`, `terms_of_use` and `website` kwargs can be given `str` values that will be parsed by the `create` classmethod before calling the corresponding `__init__` method: ```python from dbnomics_toolbox.model import ProviderMetadata ProviderMetadata.create( code="INSEE", name="Institut national de la statistique et des études économiques", # Cf https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 region="FR", terms_of_use="https://www.insee.fr/fr/information/2381863", website="https://www.insee.fr/", ) ``` The `create` classmethods makes the life easier for the caller, but it may be inefficient to parse the same values again and again, especially for very frequent values like dimension codes or periods. That's why the [`ModelFactory`](ModelFactory) exists: it offers the same convenience as the `create` classmethods, but prevents parsing the same value twice by maintaining a cache of the already parsed values. This is particularly useful during the data conversion process where many datasets are processed sequentially: the same [`ModelFactory`](ModelFactory) instance is kept during all the life of the converter. The following example uses [`ModelFactory.create_dimension`](ModelFactory.create_dimension): ```python from dbnomics_toolbox.model import Dimension from dbnomics_toolbox.model.factories.model_factory import ModelFactory model_factory = ModelFactory() model_factory.create_dimension( "COUNTRY", label="Country", values=[ model_factory.create_dimension_value("DE", label="Germany"), model_factory.create_dimension_value("FR", label="France"), ], ) ``` Some classes like [`Dimension`](Dimension), [`DimensionValue`](DimensionValue), [`Attribute`](Attribute), [`AttributeValue`](AttributeValue) do not even define a `create` classmethod to avoid incitating users to write inefficient code: the same codes would be parsed too often. Note: the `ModelFactory` is used internally by the `_create_*` methods of the [`BaseDatasetConverter`](BaseDatasetConverter) class.