HTTP resources¶
The HttpResource class is a concrete implementation of the FileResource class described in the file resources section of the documentation.
This class uses requests to fetch URLs and tenacity to implement retrying.
Retry strategy¶
By default, the HttpResource is configured with sensible defaults regarding HTTP retries.
For example, HTTP requests whose response have a status code of 500 (Internal Server Error) will be automatically retried, but 404 (Not Found) won’t.
The HttpResource._default_retrying property calls the create_requests_retrying function.
This strategy retries the HTTP requests when the response status code is:
{
408, # Request Timeout
425, # Too Early
429, # Too Many Requests
500, # Internal Server Error
502, # Bad Gateway
503, # Service Unavailable
504, # Gateway Timeout
524, # A Timeout Occurred (Cloudflare-specific)
522, # Connection Timed Out (Cloudflare-specific)}.
}
This strategy applies a delay between different retry attempts based on the Retry-After HTTP header in the response (cf RFC 6585), or if not found fallbacks by applying an exponential delay following the formula (2 ** (attempt_index - 1)) * 1.5, starting from 1.5 seconds for the second attempt (the first attempt has no reason to wait), then 3, 6, etc., limited to a maximum of 5 minutes.
The retry strategy can be customized by passing the retrying kwarg to the constructor of the HttpResource class, or by creating a child class that overrides the HttpResource._default_retrying property.
Validate the response¶
By default the HttpResource._validate_response method calls the Response.raise_for_status method, which raises an exception if the response status code is unsuccessful.
The response validation can be customized by passing the validate_response kwarg to the constructor of the HttpResource class, or by creating a child class that overrides the HttpResource._validate_response method.
Proxies¶
If you need to use a proxy, you pass the proxies argument to the HttpResource class:
from dbnomics_toolbox.fetcher_utils.resources.http_resource import HttpResource
proxies = {
"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080",
}
HttpResource(
proxies=proxies,
request="http://example.org,
target_file="test.txt",
)
Alternatively you can configure it once for an entire Session:
from dbnomics_toolbox.fetcher_utils.resources.http_resource import HttpResource
from requests import Session
proxies = {
"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080",
}
with Session() as session:
session.proxies.update(proxies)
resource = HttpResource(
request="http://example.org,
session=session,
target_file=target_file,
)
Proxies can also be configured by using the standard environment variables http_proxy, https_proxy, no_proxy, and all_proxy, as documented by the Requests library.
See also: https://requests.readthedocs.io/en/latest/user/advanced/#proxies