HTTP resources

The HttpResource class is a concrete implementation of the FileResource class described in the file resources section of the documentation.

This class uses requests to fetch URLs and tenacity to implement retrying.

Retry strategy

By default, the HttpResource is configured with sensible defaults regarding HTTP retries.

For example, HTTP requests whose response have a status code of 500 (Internal Server Error) will be automatically retried, but 404 (Not Found) won’t.

The HttpResource._default_retrying property calls the create_requests_retrying function.

This strategy retries the HTTP requests when the response status code is:

{
    408,  # Request Timeout
    425,  # Too Early
    429,  # Too Many Requests
    500,  # Internal Server Error
    502,  # Bad Gateway
    503,  # Service Unavailable
    504,  # Gateway Timeout
    524,  # A Timeout Occurred (Cloudflare-specific)
    522,  # Connection Timed Out (Cloudflare-specific)}.
}

This strategy applies a delay between different retry attempts based on the Retry-After HTTP header in the response (cf RFC 6585), or if not found fallbacks by applying an exponential delay following the formula (2 ** (attempt_index - 1)) * 1.5, starting from 1.5 seconds for the second attempt (the first attempt has no reason to wait), then 3, 6, etc., limited to a maximum of 5 minutes.

The retry strategy can be customized by passing the retrying kwarg to the constructor of the HttpResource class, or by creating a child class that overrides the HttpResource._default_retrying property.

Validate the response

By default the HttpResource._validate_response method calls the Response.raise_for_status method, which raises an exception if the response status code is unsuccessful.

The response validation can be customized by passing the validate_response kwarg to the constructor of the HttpResource class, or by creating a child class that overrides the HttpResource._validate_response method.

Proxies

If you need to use a proxy, you pass the proxies argument to the HttpResource class:

from dbnomics_toolbox.fetcher_utils.resources.http_resource import HttpResource

proxies = {
    "http": "http://10.10.1.10:3128",
    "https": "http://10.10.1.10:1080",
}

HttpResource(
    proxies=proxies,
    request="http://example.org,
    target_file="test.txt",
)

Alternatively you can configure it once for an entire Session:

from dbnomics_toolbox.fetcher_utils.resources.http_resource import HttpResource
from requests import Session

proxies = {
    "http": "http://10.10.1.10:3128",
    "https": "http://10.10.1.10:1080",
}

with Session() as session:
    session.proxies.update(proxies)
    resource = HttpResource(
        request="http://example.org,
        session=session,
        target_file=target_file,
    )

Proxies can also be configured by using the standard environment variables http_proxy, https_proxy, no_proxy, and all_proxy, as documented by the Requests library.

See also: https://requests.readthedocs.io/en/latest/user/advanced/#proxies