# Getting started ## Evaluate feasibility Before starting writing a new fetcher, please check that is does not already exist by looking at the [providers](https://db.nomics.world/providers) page on DBnomics or the [dbnomics-fetchers](https://git.nomics.world/dbnomics-fetchers/) GitLab group. Verify also that the legal terms of use of the provider don't forbid data redistribution. In case of doubt, try asking on the [forum](https://forum.db.nomics.world/). Then look at the source data and try to answer key questions: - Is data distributed with an HTTP API, static files, etc.? What are the URLs? - What are the formats? (CSV, XML, JSON, XLSX, custom, etc.) - Does the source data fit the [multi-dimensional datasets](/concepts.md#multi-dimensional%20datasets) - How to get the list of datasets? By an HTTP API call, or by scraping the website? Or using a hard-coded list? - Is there a category tree to organize datasets? - How do we know when datasets are updated? Is a calendar of updates available? Or do each dataset define a custom attribute, for example "updated_at"? - How to get data and metadata for each dataset? If everything seems OK, start a new fetcher by following the next sections. ## Requirements This documentation makes use of [`uv`](https://docs.astral.sh/uv/), a popular Python project manager that makes the developer life easier (compared to the standard `pip` tool). Please install it beforehand to follow the instructions, or just use the `pip` and adapt the commands. ## Initialize a new project All the fetchers are more or less based on the same boilerplate code. To avoid laborious initial copy-paste operations, DBnomics provides a [fetcher template](https://git.nomics.world/dbnomics/dbnomics-fetcher-template) to create a new Python project and customize it by answering some questions. For the sake of the example we will call this one "ABC fetcher" for an hypothetical institution named ABC, and name the directory of the source code of that fetcher accordingly to the provider code as `abc-fetcher`. Here are some real examples: [`oecd-fetcher`](https://git.nomics.world/dbnomics-fetchers/oecd-fetcher) for OECD, [`insee-fetcher`](https://git.nomics.world/dbnomics-fetchers/insee-fetcher) for INSEE, etc. First, create a working directory or if you already have one, go to it: ```bash mkdir dbnomics-fetchers cd dbnomics-fetchers ``` Run fetcher template with the [`copier`](https://copier.readthedocs.io/en/latest/) tool and answer the questions: ```bash $ uvx copier copy https://git.nomics.world/dbnomics/dbnomics-fetcher-template.git abc-fetcher No git tags found in template; using HEAD as ref 🎤 What is the name of the provider? (e.g. "World Bank") Authority for Banking and Credit 🎤 What is the code of the provider? (e.g. "WB" for World Bank) ABC 🎤 What is the slug of the provider? (e.g. "wb" for World Bank) abc 🎤 What is the URL of the provider? (e.g. "https://www.worldbank.org/" for World Bank) https://abc-provider.com 🎤 What is the maximum line length for Python source code? 120 🎤 What is the Python version used? 3.13 🎤 What is the tag of the Python container image? 3.13-slim-bookworm Copying from template version 0.0.0.post179.dev0+5e22b6f create .copier-answers.yml create tests create tests/test_placeholder.py create tests/__init__.py create src create src/abc_fetcher create src/abc_fetcher/source_data_repo.py create src/abc_fetcher/py.typed create src/abc_fetcher/downloader.py create src/abc_fetcher/converter.py create src/abc_fetcher/constants.py create src/abc_fetcher/__init__.py create pyproject.toml create download.py create convert.py create README.md create LICENSE create Dockerfile create .vscode create .vscode/settings.json create .vscode/extensions.json create .python-version create .pre-commit-config.yaml create .gitlab-ci.yml create .gitignore create .editorconfig create .dockerignore ``` Note: the provider [slug](https://en.wikipedia.org/wiki/Clean_URL#Slug) is a lower-case version of the provider code that is compatible with an URL and a directory name. For example: if the provider code is INSEE, then the slug is `insee`, so that we name the directory of the source code of the fetcher `insee-fetcher`. Install the dependencies: ```bash cd abc-fetcher uv sync ``` Create a first commit with [`git`](https://git-scm.com/): ```bash git init git add -A git commit -m "Add initial files" ``` The next step is to implement [data download](downloading-data.md) then [data conversion](converting-data.md).