HTTPChannel

class HTTPChannel

Bases: TypedReferenceCount

A single channel of communication from an HTTPClient. This is similar to the concept of a ‘connection’, except that HTTP is technically connectionless; in fact, a channel may represent one unbroken connection or it may transparently close and reopen a new connection with each request.

A channel is conceptually a single thread of I/O. One document at a time may be requested using a channel; a new document may (in general) not be requested from the same HTTPChannel until the first document has been fully retrieved.

Inheritance diagram

enum StatusCode

get_status_code() will either return an HTTP-style status code >= 100 (e.g. 404), or one of the following values. In general, these are ordered from less-successful to more-successful.

enumerator SC_incomplete = 0

enumerator SC_internal_error = 1

enumerator SC_no_connection = 2

enumerator SC_timeout = 3

enumerator SC_lost_connection = 4

enumerator SC_non_http_response = 5

enumerator SC_invalid_http = 6

enumerator SC_socks_invalid_version = 7

enumerator SC_socks_no_acceptable_login_method = 8

enumerator SC_socks_refused = 9

enumerator SC_socks_no_connection = 10

enumerator SC_ssl_internal_failure = 11

enumerator SC_ssl_no_handshake = 12

enumerator SC_http_error_watermark = 13: No one returns this code, but StatusCode values higher than this are deemed more successful than any generic HTTP response.

enumerator SC_ssl_invalid_server_certificate = 14

enumerator SC_ssl_self_signed_server_certificate = 15

enumerator SC_ssl_unexpected_server = 16

enumerator SC_download_open_error = 17: These errors are only generated after a download_to_*() call been issued.

enumerator SC_download_write_error = 18

enumerator SC_download_invalid_range = 19

HTTPChannel(HTTPChannel const&) = default

void begin_connect_to(DocumentSpec const &url)

Begins a non-blocking request to establish a direct connection to the server and port indicated by the URL. No HTTP requests will be issued beyond what is necessary to establish the connection. When run() has finished, you may call is_connection_ready() to determine if the connection was successfully established.

If successful, the connection may then be taken to use for whatever purposes you like by calling get_connection().

This establishes a nonblocking I/O socket. Also see connect_to().

void begin_get_document(DocumentSpec const &url)

Begins a non-blocking request to retrieve a given document. This method will return immediately, even before a connection to the server has necessarily been established; you must then call run() from time to time until the return value of run() is false. Then you may check is_valid() and get_status_code() to determine the status of your request.

If a previous request had been pending, that request is discarded.

void begin_get_header(DocumentSpec const &url): Begins a non-blocking request to retrieve a given header. See begin_get_document() and get_header().

void begin_get_subdocument(DocumentSpec const &url, std::size_t first_byte, std::size_t last_byte): Begins a non-blocking request to retrieve only the specified byte range of the indicated document. If last_byte is 0, it stands for the last byte of the document. When a subdocument is requested, get_file_size() and get_bytes_downloaded() will report the number of bytes of the subdocument, not of the complete document.

void begin_post_form(DocumentSpec const &url, std::string const &body)

Posts form data to a particular URL and retrieves the response, all using non-blocking I/O. See begin_get_document() and post_form().

It is important to note that you must call run() repeatedly after calling this method until run() returns false, and you may not call any other document posting or retrieving methods using the HTTPChannel object in the interim, or your form data may not get posted.

void clear_extra_headers(void): Resets the extra headers that were previously added via calls to send_extra_header().

void close_read_body(std::istream *stream) const: Closes a file opened by a previous call to open_read_body(). This really just deletes the istream pointer, but it is recommended to use this interface instead of deleting it explicitly, to help work around compiler issues.

bool connect_to(DocumentSpec const &url)

Establish a direct connection to the server and port indicated by the URL, but do not issue any HTTP requests. If successful, the connection may then be taken to use for whatever purposes you like by calling get_connection().

This establishes a blocking I/O socket. Also see begin_connect_to().

bool delete_document(DocumentSpec const &url): Requests the server to remove the indicated URL.

bool download_to_file(Filename const &filename, bool subdocument_resumes = true)

Specifies the name of a file to download the resulting document to. This should be called immediately after get_document() or begin_get_document() or related functions.

In the case of the blocking I/O methods like get_document(), this function will download the entire document to the file and return true if it was successfully downloaded, false otherwise.

In the case of non-blocking I/O methods like begin_get_document(), this function simply indicates an intention to download to the indicated file. It returns true if the file can be opened for writing, false otherwise, but the contents will not be completely downloaded until run() has returned false. At this time, it is possible that a communications error will have left a partial file, so is_download_complete() may be called to test this.

If subdocument_resumes is true and the document in question was previously requested as a subdocument (i.e. get_subdocument() with a first_byte value greater than zero), this will automatically seek to the appropriate byte within the file for writing the output. In this case, the file must already exist and must have at least first_byte bytes in it. If subdocument_resumes is false, a subdocument will always be downloaded beginning at the first byte of the file.

bool download_to_ram(Ramfile *ramfile, bool subdocument_resumes = true)

Specifies a Ramfile object to download the resulting document to. This should be called immediately after get_document() or begin_get_document() or related functions.

In the case of the blocking I/O methods like get_document(), this function will download the entire document to the Ramfile and return true if it was successfully downloaded, false otherwise.

In the case of non-blocking I/O methods like begin_get_document(), this function simply indicates an intention to download to the indicated Ramfile. It returns true if the file can be opened for writing, false otherwise, but the contents will not be completely downloaded until run() has returned false. At this time, it is possible that a communications error will have left a partial file, so is_download_complete() may be called to test this.

If subdocument_resumes is true and the document in question was previously requested as a subdocument (i.e. get_subdocument() with a first_byte value greater than zero), this will automatically seek to the appropriate byte within the Ramfile for writing the output. In this case, the Ramfile must already have at least first_byte bytes in it.

bool download_to_stream(std::ostream *strm, bool subdocument_resumes = true)

Specifies the name of an ostream to download the resulting document to. This should be called immediately after get_document() or begin_get_document() or related functions.

In the case of the blocking I/O methods like get_document(), this function will download the entire document to the file and return true if it was successfully downloaded, false otherwise.

In the case of non-blocking I/O methods like begin_get_document(), this function simply indicates an intention to download to the indicated file. It returns true if the file can be opened for writing, false otherwise, but the contents will not be completely downloaded until run() has returned false. At this time, it is possible that a communications error will have left a partial file, so is_download_complete() may be called to test this.

If subdocument_resumes is true and the document in question was previously requested as a subdocument (i.e. get_subdocument() with a first_byte value greater than zero), this will automatically seek to the appropriate byte within the file for writing the output. In this case, the file must already exist and must have at least first_byte bytes in it. If subdocument_resumes is false, a subdocument will always be downloaded beginning at the first byte of the file.

bool get_allow_proxy(void) const: If this is true (the normal case), the HTTPClient will be consulted for information about the proxy to be used for each connection via this HTTPChannel. If this has been set to false by the user, then all connections will be made directly, regardless of the proxy settings indicated on the HTTPClient.

bool get_blocking_connect(void) const: If this flag is true, a socket connect will block even for nonblocking I/O calls like begin_get_document(), begin_connect_to(), etc. If false, a socket connect will not block for nonblocking I/O calls, but will block for blocking I/O calls (get_document(), connect_to(), etc.).

std::size_t get_bytes_downloaded(void) const: Returns the number of bytes downloaded during the last (or current) download_to_file() or download_to_ram operation(). This can be used in conjunction with get_file_size() to report the percent complete (but be careful, since get_file_size() may return 0 if the server has not told us the size of the file).

std::size_t get_bytes_requested(void) const

When download throttling is in effect (set_download_throttle() has been set to true) and non-blocking I/O methods (like begin_get_document()) are used, this returns the number of bytes “requested” from the server so far: that is, the theoretical maximum value for get_bytes_downloaded(), if the server has been keeping up with our demand.

If this number is less than get_bytes_downloaded(), then the server has not been supplying bytes fast enough to meet our own download throttle rate.

When download throttling is not in effect, or when the blocking I/O methods (like get_document(), etc.) are used, this returns 0.

static TypeHandle get_class_type(void)

HTTPClient *get_client(void) const: Returns the HTTPClient object that owns this channel.

double get_connect_timeout(void) const: Returns the length of time, in seconds, to wait for a new nonblocking socket to connect. See set_connect_timeout().

SocketStream *get_connection(void)

Returns the connection that was established via a previous call to connect_to() or begin_connect_to(), or NULL if the connection attempt failed or if those methods have not recently been called.

This stream has been allocated from the free store. It is the user’s responsibility to delete this pointer when finished with it.

std::string get_content_type(void) const: Returns the value of the Content-Type header.

bool get_document(DocumentSpec const &url): Opens the named document for reading, if available. Returns true if successful, false otherwise.

DocumentSpec const &get_document_spec(void) const

Returns the DocumentSpec associated with the most recent document. This includes its actual URL (following redirects) along with the identity tag and last-modified date, if supplied by the server.

This structure may be saved and used to retrieve the same version of the document later, or to conditionally retrieve a newer version if it is available.

bool get_download_throttle(void) const: Returns whether the nonblocking downloads will be bandwidth-limited. See set_download_throttle().

std::streamsize get_file_size(void) const

Returns the size of the file, if it is known. Returns the value set by set_expected_file_size() if the file size is not known, or 0 if this value was not set.

If the file is dynamically generated, the size may not be available until a read has started (e.g. open_read_body() has been called); and even then it may increase as more of the file is read due to the nature of HTTP/1.1 requests which can change their minds midstream about how much data they’re sending you.

std::size_t get_first_byte_delivered(void) const: Returns the first byte of the file (that will be) delivered by the server in response to the current request. Normally, this is the same as get_first_byte_requested(), but some servers will ignore a subdocument request and always return the whole file, in which case this value will be 0, regardless of what was requested to get_subdocument().

std::size_t get_first_byte_requested(void) const: Returns the first byte of the file requested by the request. This will normally be 0 to indicate that the file is being requested from the beginning, but if the file was requested via a get_subdocument() call, this will contain the first_byte parameter from that call.

bool get_header(DocumentSpec const &url): Like get_document(), except only the header associated with the document is retrieved. This may be used to test for existence of the document; it might also return the size of the document (if the server gives us this information).

std::string get_header_value(std::string const &key) const: Returns the HTML header value associated with the indicated key, or empty string if the key was not defined in the message returned by the server.

double get_http_timeout(void) const: Returns the length of time, in seconds, to wait for the HTTP server to respond to our request. See set_http_timeout().

HTTPEnum::HTTPVersion get_http_version(void) const: Returns the HTTP version number returned by the server, as one of the HTTPClient enumerated types, e.g. HTTPClient::HV_11.

std::string const &get_http_version_string(void) const: Returns the HTTP version number returned by the server, formatted as a string, e.g. “HTTP/1.1”.

double get_idle_timeout(void) const: Returns the amount of time, in seconds, in which an previously-established connection is allowed to remain open and unused. See set_idle_timeout().

std::size_t get_last_byte_delivered(void) const: Returns the last byte of the file (that will be) delivered by the server in response to the current request. Normally, this is the same as get_last_byte_requested(), but some servers will ignore a subdocument request and always return the whole file, in which case this value will be 0, regardless of what was requested to get_subdocument().

std::size_t get_last_byte_requested(void) const: Returns the last byte of the file requested by the request. This will normally be 0 to indicate that the file is being requested to its last byte, but if the file was requested via a get_subdocument() call, this will contain the last_byte parameter from that call.

double get_max_bytes_per_second(void) const: Returns the maximum number of bytes per second that may be consumed by this channel when get_download_throttle() is true.

double get_max_updates_per_second(void) const: Returns the maximum number of times per second that run() will do anything at all, when get_download_throttle() is true.

int get_num_redirect_steps(void) const: If the document automatically followed one or more redirects, this will return the number of redirects that were automatically followed. Use get_redirect_step() to retrieve each URL in sequence.

bool get_options(DocumentSpec const &url): Sends an OPTIONS message to the server, which should query the available options, possibly in relation to a specified URL.

bool get_persistent_connection(void) const: Returns whether the HTTPChannel should try to keep the connection to the server open and reuse that connection for multiple documents, or whether it should close the connection and open a new one for each request. See set_persistent_connection().

std::string const &get_proxy_realm(void) const: If the document failed to connect because of a 407 (Proxy authorization required), this method will return the “realm” returned by the proxy. This string may be presented to the user to request an associated username and password (which then should be stored in HTTPClient::set_username()).

bool get_proxy_tunnel(void) const: Returns true if connections always tunnel through a proxy, or false (the normal case) if we allow the proxy to serve up documents. See set_proxy_tunnel().

URLSpec const &get_redirect(void) const: If the document failed with a redirect code (300 series), this will generally contain the new URL the server wants us to try. In many cases, the client will automatically follow redirects; if these are successful the client will return a successful code and get_redirect() will return empty, but get_url() will return the new, redirected URL.

URLSpec const &get_redirect_step(int n) const: Use in conjunction with get_num_redirect_steps() to extract the chain of URL’s that the channel was automatically redirected through to arrive at the final document.

std::size_t get_skip_body_size(void) const: Returns the maximum number of bytes in a received (but unwanted) body that will be skipped past, in order to reset to a new request. See set_skip_body_size().

int get_status_code(void) const

Returns the HTML return code from the document retrieval request. This will be in the 200 range if the document is successfully retrieved, or some other value in the case of an error.

Some proxy errors during an https-over-proxy request would return the same status code as a different error that occurred on the host server. To differentiate these cases, status codes that are returned by the proxy during the CONNECT phase (except code 407) are incremented by 1000.

std::string get_status_string(void) const: Returns the string as returned by the server describing the status code for humans. This may or may not be meaningful.

bool get_subdocument(DocumentSpec const &url, std::size_t first_byte, std::size_t last_byte): Retrieves only the specified byte range of the indicated document. If last_byte is 0, it stands for the last byte of the document. When a subdocument is requested, get_file_size() and get_bytes_downloaded() will report the number of bytes of the subdocument, not of the complete document.

bool get_trace(DocumentSpec const &url): Sends a TRACE message to the server, which should return back the same message as the server received it, allowing inspection of proxy hops, etc.

URLSpec const &get_url(void) const: Returns the URL that was used to retrieve the most recent document: whatever URL was last passed to get_document() or get_header(). If a redirect has transparently occurred, this will return the new, redirected URL (the actual URL at which the document was located).

std::string const &get_www_realm(void) const: If the document failed to connect because of a 401 (Authorization required), this method will return the “realm” returned by the server in which the requested document must be authenticated. This string may be presented to the user to request an associated username and password (which then should be stored in HTTPClient::set_username()).

bool is_connection_ready(void) const: Returns true if a connection has been established to the named server in a previous call to connect_to() or begin_connect_to(), false otherwise.

bool is_download_complete(void) const

Returns true when a download_to() or download_to_ram() has executed and the file has been fully downloaded. If this still returns false after processing has completed, there was an error in transmission.

Note that simply testing is_download_complete() does not prove that the requested document was successfully retrieved–you might have just downloaded the “404 not found” stub (for instance) that a server would provide in response to some error condition. You should also check is_valid() to prove that the file you expected has been successfully retrieved.

bool is_file_size_known(void) const: Returns true if the size of the file we are currently retrieving was told us by the server and thus is reliably known, or false if the size reported by get_file_size() represents an educated guess (possibly as set by set_expected_file_size(), or as inferred from a chunked transfer encoding in progress).

bool is_valid(void) const: Returns true if the last-requested document was successfully retrieved and is ready to be read, false otherwise.

std::istream *open_read_body(void)

Returns a newly-allocated istream suitable for reading the body of the document. This may only be called immediately after a call to get_document() or post_form(), or after a call to run() has returned false.

Note that, in nonblocking mode, the returned stream may report an early EOF, even before the actual end of file. When this happens, you should call stream->is_closed() to determine whether you should attempt to read some more later.

The user is responsible for passing the returned istream to close_read_body() later.

bool post_form(DocumentSpec const &url, std::string const &body): Posts form data to a particular URL and retrieves the response.

void preserve_status(void)

Preserves the previous status code (presumably a failure) from the previous connection attempt. If the subsequent connection attempt also fails, the returned status code will be the better of the previous code and the current code.

This can be called to daisy-chain subsequent attempts to download the same document from different servers. After all servers have been attempted, the final status code will reflect the attempt that most nearly succeeded.

bool put_document(DocumentSpec const &url, std::string const &body): Uploads the indicated body to the server to replace the indicated URL, if the server allows this.

void reset(void)

Stops whatever file transaction is currently in progress, closes the connection, and resets to begin anew. You shouldn’t ever need to call this, since the channel should be able to reset itself cleanly between requests, but it is provided in case you are an especially nervous type.

Don’t call this after every request unless you set set_persistent_connection() to false, since calling reset() rudely closes the connection regardless of whether we have told the server we intend to keep it open or not.

bool run(void)

This must be called from time to time when non-blocking I/O is in use. It checks for data coming in on the socket and writes data out to the socket when possible, and does whatever processing is required towards completing the current task.

The return value is true if the task is still pending (and run() will need to be called again in the future), or false if the current task is complete.

void send_extra_header(std::string const &key, std::string const &value)

Specifies an additional key: value pair that is added into the header sent to the server with the next request. This is passed along with no interpretation by the HTTPChannel code. You may call this repeatedly to append multiple headers.

This is persistent for one request only; it must be set again for each new request.

void set_allow_proxy(bool allow_proxy): If this is true (the normal case), the HTTPClient will be consulted for information about the proxy to be used for each connection via this HTTPChannel. If this has been set to false by the user, then all connections will be made directly, regardless of the proxy settings indicated on the HTTPClient.

void set_blocking_connect(bool blocking_connect)

If this flag is true, a socket connect will block even for nonblocking I/O calls like begin_get_document(), begin_connect_to(), etc. If false, a socket connect will not block for nonblocking I/O calls, but will block for blocking I/O calls (get_document(), connect_to(), etc.).

Setting this true is useful when you want to use non-blocking I/O once you have established the connection, but you don’t want to bother with polling for the initial connection. It’s also useful when you don’t particularly care about non-blocking I/O, but you need to respect timeouts like connect_timeout and http_timeout.

void set_connect_timeout(double timeout_seconds)

Sets the maximum length of time, in seconds, that the channel will wait before giving up on establishing a TCP connection.

At present, this is used only for the nonblocking interfaces (e.g. begin_get_document(), begin_connect_to()), but it is used whether set_blocking_connect() is true or false.

void set_content_type(std::string content_type): Specifies the Content-Type header, useful for applications that require different types of content, such as JSON.

void set_download_throttle(bool download_throttle)

Specifies whether nonblocking downloads (via download_to_file() or download_to_ram()) will be limited so as not to use all available bandwidth.

If this is true, when a download has been started on this channel it will be invoked no more frequently than get_max_updates_per_second(), and the total bandwidth used by the download will be no more than get_max_bytes_per_second(). If this is false, downloads will proceed as fast as the server can send the data.

This only has effect on the nonblocking I/O methods like begin_get_document(), etc. The blocking methods like get_document() always use as much CPU and bandwidth as they can get.

void set_expected_file_size(std::size_t file_size): This may be called immediately after a call to get_document() or some related function to specify the expected size of the document we are retrieving, if we happen to know. This is used as the return value to get_file_size() only in the case that the server does not tell us the actual file size.

void set_http_timeout(double timeout_seconds)

Sets the maximum length of time, in seconds, that the channel will wait for the HTTP server to finish sending its response to our request.

The timer starts counting after the TCP connection has been established (see set_connect_timeout(), above) and the request has been sent.

At present, this is used only for the nonblocking interfaces (e.g. begin_get_document(), begin_connect_to()), but it is used whether set_blocking_connect() is true or false.

void set_idle_timeout(double idle_timeout): Specifies the amount of time, in seconds, in which a previously-established connection is allowed to remain open and unused. If a previous connection has remained unused for at least this number of seconds, it will be closed and a new connection will be opened; otherwise, the same connection will be reused for the next request (for this particular HTTPChannel).

void set_max_bytes_per_second(double max_bytes_per_second): When bandwidth throttling is in effect (see set_download_throttle()), this specifies the maximum number of bytes per second that may be consumed by this channel.

void set_max_updates_per_second(double max_updates_per_second): When bandwidth throttling is in effect (see set_download_throttle()), this specifies the maximum number of times per second that run() will attempt to do any downloading at all.

void set_persistent_connection(bool persistent_connection)

Indicates whether the HTTPChannel should try to keep the connection to the server open and reuse that connection for multiple documents, or whether it should close the connection and open a new one for each request. Set this true to keep the connections around when possible, false to recycle them.

It makes most sense to set this false when the HTTPChannel will be used only once to retrieve a single document, true when you will be using the same HTTPChannel object to retrieve multiple documents.

void set_proxy_tunnel(bool proxy_tunnel)

Normally, a proxy is itself asked for ordinary URL’s, and the proxy decides whether to hand the client a cached version of the document or to contact the server for a fresh version. The proxy may also modify the headers and transfer encoding on the way.

If this is set to true, then instead of asking for URL’s from the proxy, we will ask the proxy to open a connection to the server (for instance, on port 80); if the proxy honors this request, then we contact the server directly through this connection to retrieve the document. If the proxy does not honor the connect request, then the retrieve operation fails.

SSL connections (e.g. https), and connections through a Socks proxy, are always tunneled, regardless of the setting of this flag.

void set_skip_body_size(std::size_t skip_body_size)

Specifies the maximum number of bytes in a received (but unwanted) body that will be skipped past, in order to reset to a new request.

That is, if this HTTPChannel requests a file via get_document(), but does not call download_to_ram(), download_to_file(), or open_read_body(), and instead immediately requests a new file, then the HTTPChannel has a choice whether to skip past the unwanted document, or to close the connection and open a new one. If the number of bytes to skip is more than this threshold, the connection will be closed; otherwise, the data will simply be read and discarded.

bool will_close_connection(void) const: Returns true if the server has indicated it will close the connection after this document has been read, or false if it will remain open (and future documents may be requested on the same connection).

void write_headers(std::ostream &out) const: Outputs a list of all headers defined by the server to the indicated output stream.