Getting started¶
Evaluate feasibility¶
Before starting writing a new fetcher, please check that is does not already exist by looking at the providers page on DBnomics or the dbnomics-fetchers GitLab group.
Verify also that the legal terms of use of the provider don’t forbid data redistribution. In case of doubt, try asking on the forum.
Then look at the source data and try to answer key questions:
Is data distributed with an HTTP API, static files, etc.? What are the URLs?
What are the formats? (CSV, XML, JSON, XLSX, custom, etc.)
Does the source data fit the multi-dimensional datasets
How to get the list of datasets? By an HTTP API call, or by scraping the website? Or using a hard-coded list?
Is there a category tree to organize datasets?
How do we know when datasets are updated? Is a calendar of updates available? Or do each dataset define a custom attribute, for example “updated_at”?
How to get data and metadata for each dataset?
If everything seems OK, start a new fetcher by following the next sections.
Requirements¶
This documentation makes use of uv, a popular Python project manager that makes the developer life easier (compared to the standard pip tool).
Please install it beforehand to follow the instructions, or just use the pip and adapt the commands.
Initialize a new project¶
All the fetchers are more or less based on the same boilerplate code. To avoid laborious initial copy-paste operations, DBnomics provides a fetcher template to create a new Python project and customize it by answering some questions.
For the sake of the example we will call this one “ABC fetcher” for an hypothetical institution named ABC, and name the directory of the source code of that fetcher accordingly to the provider code as abc-fetcher.
Here are some real examples: oecd-fetcher for OECD, insee-fetcher for INSEE, etc.
First, create a working directory or if you already have one, go to it:
mkdir dbnomics-fetchers
cd dbnomics-fetchers
Run fetcher template with the copier tool and answer the questions:
$ uvx copier copy https://git.nomics.world/dbnomics/dbnomics-fetcher-template.git abc-fetcher
No git tags found in template; using HEAD as ref
🎤 What is the name of the provider? (e.g. "World Bank")
Authority for Banking and Credit
🎤 What is the code of the provider? (e.g. "WB" for World Bank)
ABC
🎤 What is the slug of the provider? (e.g. "wb" for World Bank)
abc
🎤 What is the URL of the provider? (e.g. "https://www.worldbank.org/" for World Bank)
https://abc-provider.com
🎤 What is the maximum line length for Python source code?
120
🎤 What is the Python version used?
3.13
🎤 What is the tag of the Python container image?
3.13-slim-bookworm
Copying from template version 0.0.0.post179.dev0+5e22b6f
create .copier-answers.yml
create tests
create tests/test_placeholder.py
create tests/__init__.py
create src
create src/abc_fetcher
create src/abc_fetcher/source_data_repo.py
create src/abc_fetcher/py.typed
create src/abc_fetcher/downloader.py
create src/abc_fetcher/converter.py
create src/abc_fetcher/constants.py
create src/abc_fetcher/__init__.py
create pyproject.toml
create download.py
create convert.py
create README.md
create LICENSE
create Dockerfile
create .vscode
create .vscode/settings.json
create .vscode/extensions.json
create .python-version
create .pre-commit-config.yaml
create .gitlab-ci.yml
create .gitignore
create .editorconfig
create .dockerignore
Note: the provider slug is a lower-case version of the provider code that is compatible with an URL and a directory name.
For example: if the provider code is INSEE, then the slug is insee, so that we name the directory of the source code of the fetcher insee-fetcher.
Install the dependencies:
cd abc-fetcher
uv sync
Create a first commit with git:
git init
git add -A
git commit -m "Add initial files"
The next step is to implement data download then data conversion.