Getting started¶

Evaluate feasibility¶

Before starting writing a new fetcher, please check that is does not already exist by looking at the providers page on DBnomics or the dbnomics-fetchers GitLab group.

Verify also that the legal terms of use of the provider don’t forbid data redistribution. In case of doubt, try asking on the forum.

Then look at the source data and try to answer key questions:

Is data distributed with an HTTP API, static files, etc.? What are the URLs?
What are the formats? (CSV, XML, JSON, XLSX, custom, etc.)
Does the source data fit the multi-dimensional datasets
How to get the list of datasets? By an HTTP API call, or by scraping the website? Or using a hard-coded list?
Is there a category tree to organize datasets?
How do we know when datasets are updated? Is a calendar of updates available? Or do each dataset define a custom attribute, for example “updated_at”?
How to get data and metadata for each dataset?

If everything seems OK, start a new fetcher by following the next sections.

Requirements¶

This documentation makes use of uv, a popular Python project manager that makes the developer life easier (compared to the standard pip tool).

Please install it beforehand to follow the instructions, or just use the pip and adapt the commands.

Initialize a new project¶

All the fetchers are more or less based on the same boilerplate code. To avoid laborious initial copy-paste operations, DBnomics provides a fetcher template to create a new Python project and customize it by answering some questions.

For the sake of the example we will call this one “ABC fetcher” for an hypothetical institution named ABC, and name the directory of the source code of that fetcher accordingly to the provider code as abc-fetcher. Here are some real examples: oecd-fetcher for OECD, insee-fetcher for INSEE, etc.

First, create a working directory or if you already have one, go to it:

mkdir dbnomics-fetchers
cd dbnomics-fetchers

Run fetcher template with the copier tool and answer the questions:

$ uvx copier copy https://git.nomics.world/dbnomics/dbnomics-fetcher-template.git abc-fetcher

No git tags found in template; using HEAD as ref
🎤 What is the name of the provider? (e.g. "World Bank")
   Authority for Banking and Credit
🎤 What is the code of the provider? (e.g. "WB" for World Bank)
   ABC
🎤 What is the slug of the provider? (e.g. "wb" for World Bank)
   abc
🎤 What is the URL of the provider? (e.g. "https://www.worldbank.org/" for World Bank)
   https://abc-provider.com
🎤 What is the maximum line length for Python source code?
   120
🎤 What is the Python version used?
   3.13
🎤 What is the tag of the Python container image?
   3.13-slim-bookworm

Copying from template version 0.0.0.post179.dev0+5e22b6f
    create  .copier-answers.yml
    create  tests
    create  tests/test_placeholder.py
    create  tests/__init__.py
    create  src
    create  src/abc_fetcher
    create  src/abc_fetcher/source_data_repo.py
    create  src/abc_fetcher/py.typed
    create  src/abc_fetcher/downloader.py
    create  src/abc_fetcher/converter.py
    create  src/abc_fetcher/constants.py
    create  src/abc_fetcher/__init__.py
    create  pyproject.toml
    create  download.py
    create  convert.py
    create  README.md
    create  LICENSE
    create  Dockerfile
    create  .vscode
    create  .vscode/settings.json
    create  .vscode/extensions.json
    create  .python-version
    create  .pre-commit-config.yaml
    create  .gitlab-ci.yml
    create  .gitignore
    create  .editorconfig
    create  .dockerignore

Note: the provider slug is a lower-case version of the provider code that is compatible with an URL and a directory name. For example: if the provider code is INSEE, then the slug is insee, so that we name the directory of the source code of the fetcher insee-fetcher.

Install the dependencies:

cd abc-fetcher
uv sync

Create a first commit with git:

git init
git add -A
git commit -m "Add initial files"

The next step is to implement data download then data conversion.

Getting started¶

Evaluate feasibility¶

Requirements¶

Initialize a new project¶

DBnomics toolbox

Navigation

Related Topics