# Getting started

## Evaluate feasibility

Before starting writing a new fetcher, please check that is does not already exist by looking at the [providers](https://db.nomics.world/providers) page on DBnomics or the [dbnomics-fetchers](https://git.nomics.world/dbnomics-fetchers/) GitLab group.

Verify also that the legal terms of use of the provider don't forbid data redistribution.
In case of doubt, try asking on the [forum](https://forum.db.nomics.world/).

Then look at the source data and try to answer key questions:

- Is data distributed with an HTTP API, static files, etc.? What are the URLs?
- What are the formats? (CSV, XML, JSON, XLSX, custom, etc.)
- Does the source data fit the [multi-dimensional datasets](/concepts.md#multi-dimensional%20datasets)
- How to get the list of datasets? By an HTTP API call, or by scraping the website? Or using a hard-coded list?
- Is there a category tree to organize datasets?
- How do we know when datasets are updated? Is a calendar of updates available? Or do each dataset define a custom attribute, for example "updated_at"?
- How to get data and metadata for each dataset?

If everything seems OK, start a new fetcher by following the next sections.

## Requirements

This documentation makes use of [`uv`](https://docs.astral.sh/uv/), a popular Python project manager that makes the developer life easier (compared to the standard `pip` tool).

Please install it beforehand to follow the instructions, or just use the `pip` and adapt the commands.

## Initialize a new project

All the fetchers are more or less based on the same boilerplate code.
To avoid laborious initial copy-paste operations, DBnomics provides a [fetcher template](https://git.nomics.world/dbnomics/dbnomics-fetcher-template) to create a new Python project and customize it by answering some questions.

For the sake of the example we will call this one "ABC fetcher" for an hypothetical institution named ABC, and name the directory of the source code of that fetcher accordingly to the provider code as `abc-fetcher`.
Here are some real examples: [`oecd-fetcher`](https://git.nomics.world/dbnomics-fetchers/oecd-fetcher) for OECD, [`insee-fetcher`](https://git.nomics.world/dbnomics-fetchers/insee-fetcher) for INSEE, etc.

First, create a working directory or if you already have one, go to it:

```bash
mkdir dbnomics-fetchers
cd dbnomics-fetchers
```

Run fetcher template with the [`copier`](https://copier.readthedocs.io/en/latest/) tool and answer the questions:

```bash
$ uvx copier copy https://git.nomics.world/dbnomics/dbnomics-fetcher-template.git abc-fetcher

No git tags found in template; using HEAD as ref
🎤 What is the name of the provider? (e.g. "World Bank")
   Authority for Banking and Credit
🎤 What is the code of the provider? (e.g. "WB" for World Bank)
   ABC
🎤 What is the slug of the provider? (e.g. "wb" for World Bank)
   abc
🎤 What is the URL of the provider? (e.g. "https://www.worldbank.org/" for World Bank)
   https://abc-provider.com
🎤 What is the maximum line length for Python source code?
   120
🎤 What is the Python version used?
   3.13
🎤 What is the tag of the Python container image?
   3.13-slim-bookworm

Copying from template version 0.0.0.post179.dev0+5e22b6f
    create  .copier-answers.yml
    create  tests
    create  tests/test_placeholder.py
    create  tests/__init__.py
    create  src
    create  src/abc_fetcher
    create  src/abc_fetcher/source_data_repo.py
    create  src/abc_fetcher/py.typed
    create  src/abc_fetcher/downloader.py
    create  src/abc_fetcher/converter.py
    create  src/abc_fetcher/constants.py
    create  src/abc_fetcher/__init__.py
    create  pyproject.toml
    create  download.py
    create  convert.py
    create  README.md
    create  LICENSE
    create  Dockerfile
    create  .vscode
    create  .vscode/settings.json
    create  .vscode/extensions.json
    create  .python-version
    create  .pre-commit-config.yaml
    create  .gitlab-ci.yml
    create  .gitignore
    create  .editorconfig
    create  .dockerignore
```

Note: the provider [slug](https://en.wikipedia.org/wiki/Clean_URL#Slug) is a lower-case version of the provider code that is compatible with an URL and a directory name.
For example: if the provider code is INSEE, then the slug is `insee`, so that we name the directory of the source code of the fetcher `insee-fetcher`.

Install the dependencies:

```bash
cd abc-fetcher
uv sync
```

Create a first commit with [`git`](https://git-scm.com/):

```bash
git init
git add -A
git commit -m "Add initial files"
```

The next step is to implement [data download](downloading-data.md) then [data conversion](converting-data.md).