Concepts¶
Multi-dimensional datasets¶
In the context of DBnomics, a multi-dimensional dataset is a group of time series where each one is categorized using dimensions.
Let’s start from the following hypothetical CSV file named product_prices.csv that tracks the evolution of the price of different products in different countries:
sku |
country |
year |
price |
|---|---|---|---|
111 |
FR |
2000 |
12 |
111 |
FR |
2001 |
13 |
111 |
FR |
2002 |
11 |
111 |
DE |
2001 |
9 |
111 |
DE |
2002 |
11 |
111 |
DE |
2003 |
14 |
222 |
FR |
2000 |
87 |
222 |
FR |
2001 |
88 |
222 |
FR |
2002 |
90 |
222 |
FR |
2003 |
79 |
333 |
FR |
2000 |
23 |
333 |
FR |
2001 |
22 |
333 |
FR |
2002 |
23 |
333 |
FR |
2003 |
21 |
This CSV file can be turned into a multi-dimensional dataset with the code PRODUCT_PRICES.
We can infer 2 dimensions: SKU={111,222,333} and COUNTRY={DE,FR}.
The dataset is composed of 4 time series, each being related to a single product and country, as each dimension must be set with a single value:
Series 111.FR:
period |
value |
|---|---|
2000 |
12 |
2001 |
13 |
2002 |
11 |
Series 111.DE:
period |
value |
|---|---|
2001 |
9 |
2002 |
11 |
2003 |
14 |
Series 222.FR:
period |
value |
|---|---|
2000 |
87 |
2001 |
88 |
2002 |
90 |
2003 |
79 |
Series 333.FR:
period |
value |
|---|---|
2000 |
23 |
2001 |
22 |
2002 |
23 |
2003 |
21 |
Note: because the dimensions of a dataset are ordered, we can infer the series codes by concatenating the codes of the values of the dimensions, separated by a . character.
For example, the series for SKU=111 and COUNTRY=FR has the code 111.FR.