Concepts

Multi-dimensional datasets

In the context of DBnomics, a multi-dimensional dataset is a group of time series where each one is categorized using dimensions.

Let’s start from the following hypothetical CSV file named product_prices.csv that tracks the evolution of the price of different products in different countries:

sku

country

year

price

111

FR

2000

12

111

FR

2001

13

111

FR

2002

11

111

DE

2001

9

111

DE

2002

11

111

DE

2003

14

222

FR

2000

87

222

FR

2001

88

222

FR

2002

90

222

FR

2003

79

333

FR

2000

23

333

FR

2001

22

333

FR

2002

23

333

FR

2003

21

This CSV file can be turned into a multi-dimensional dataset with the code PRODUCT_PRICES.

We can infer 2 dimensions: SKU={111,222,333} and COUNTRY={DE,FR}.

The dataset is composed of 4 time series, each being related to a single product and country, as each dimension must be set with a single value:

Series 111.FR:

period

value

2000

12

2001

13

2002

11

Series 111.DE:

period

value

2001

9

2002

11

2003

14

Series 222.FR:

period

value

2000

87

2001

88

2002

90

2003

79

Series 333.FR:

period

value

2000

23

2001

22

2002

23

2003

21

Note: because the dimensions of a dataset are ordered, we can infer the series codes by concatenating the codes of the values of the dimensions, separated by a . character. For example, the series for SKU=111 and COUNTRY=FR has the code 111.FR.