HTTPChannel
-
class HTTPChannel
Bases:
TypedReferenceCount
A single channel of communication from an
HTTPClient
. This is similar to the concept of a ‘connection’, except that HTTP is technically connectionless; in fact, a channel may represent one unbroken connection or it may transparently close and reopen a new connection with each request.A channel is conceptually a single thread of I/O. One document at a time may be requested using a channel; a new document may (in general) not be requested from the same HTTPChannel until the first document has been fully retrieved.
Inheritance diagram
-
enum StatusCode
get_status_code()
will either return an HTTP-style status code >= 100 (e.g. 404), or one of the following values. In general, these are ordered from less-successful to more-successful.-
enumerator SC_incomplete = 0
-
enumerator SC_internal_error = 1
-
enumerator SC_no_connection = 2
-
enumerator SC_timeout = 3
-
enumerator SC_lost_connection = 4
-
enumerator SC_non_http_response = 5
-
enumerator SC_invalid_http = 6
-
enumerator SC_socks_invalid_version = 7
-
enumerator SC_socks_no_acceptable_login_method = 8
-
enumerator SC_socks_refused = 9
-
enumerator SC_socks_no_connection = 10
-
enumerator SC_ssl_internal_failure = 11
-
enumerator SC_ssl_no_handshake = 12
-
enumerator SC_http_error_watermark = 13
No one returns this code, but StatusCode values higher than this are deemed more successful than any generic HTTP response.
-
enumerator SC_ssl_invalid_server_certificate = 14
-
enumerator SC_ssl_self_signed_server_certificate = 15
-
enumerator SC_ssl_unexpected_server = 16
-
enumerator SC_download_open_error = 17
These errors are only generated after a download_to_*() call been issued.
-
enumerator SC_download_write_error = 18
-
enumerator SC_download_invalid_range = 19
-
enumerator SC_incomplete = 0
-
HTTPChannel(HTTPChannel const&) = default
-
void begin_connect_to(DocumentSpec const &url)
Begins a non-blocking request to establish a direct connection to the server and port indicated by the URL. No HTTP requests will be issued beyond what is necessary to establish the connection. When
run()
has finished, you may callis_connection_ready()
to determine if the connection was successfully established.If successful, the connection may then be taken to use for whatever purposes you like by calling
get_connection()
.This establishes a nonblocking I/O socket. Also see
connect_to()
.
-
void begin_get_document(DocumentSpec const &url)
Begins a non-blocking request to retrieve a given document. This method will return immediately, even before a connection to the server has necessarily been established; you must then call
run()
from time to time until the return value ofrun()
is false. Then you may checkis_valid()
andget_status_code()
to determine the status of your request.If a previous request had been pending, that request is discarded.
-
void begin_get_header(DocumentSpec const &url)
Begins a non-blocking request to retrieve a given header. See
begin_get_document()
andget_header()
.
-
void begin_get_subdocument(DocumentSpec const &url, std::size_t first_byte, std::size_t last_byte)
Begins a non-blocking request to retrieve only the specified byte range of the indicated document. If last_byte is 0, it stands for the last byte of the document. When a subdocument is requested,
get_file_size()
andget_bytes_downloaded()
will report the number of bytes of the subdocument, not of the complete document.
-
void begin_post_form(DocumentSpec const &url, std::string const &body)
Posts form data to a particular URL and retrieves the response, all using non-blocking I/O. See
begin_get_document()
andpost_form()
.It is important to note that you must call
run()
repeatedly after calling this method untilrun()
returns false, and you may not call any other document posting or retrieving methods using theHTTPChannel
object in the interim, or your form data may not get posted.
-
void clear_extra_headers(void)
Resets the extra headers that were previously added via calls to
send_extra_header()
.
-
void close_read_body(std::istream *stream) const
Closes a file opened by a previous call to
open_read_body()
. This really just deletes the istream pointer, but it is recommended to use this interface instead of deleting it explicitly, to help work around compiler issues.
-
bool connect_to(DocumentSpec const &url)
Establish a direct connection to the server and port indicated by the URL, but do not issue any HTTP requests. If successful, the connection may then be taken to use for whatever purposes you like by calling
get_connection()
.This establishes a blocking I/O socket. Also see
begin_connect_to()
.
-
bool delete_document(DocumentSpec const &url)
Requests the server to remove the indicated URL.
-
bool download_to_file(Filename const &filename, bool subdocument_resumes = true)
Specifies the name of a file to download the resulting document to. This should be called immediately after
get_document()
orbegin_get_document()
or related functions.In the case of the blocking I/O methods like
get_document()
, this function will download the entire document to the file and return true if it was successfully downloaded, false otherwise.In the case of non-blocking I/O methods like
begin_get_document()
, this function simply indicates an intention to download to the indicated file. It returns true if the file can be opened for writing, false otherwise, but the contents will not be completely downloaded untilrun()
has returned false. At this time, it is possible that a communications error will have left a partial file, sois_download_complete()
may be called to test this.If subdocument_resumes is true and the document in question was previously requested as a subdocument (i.e.
get_subdocument()
with a first_byte value greater than zero), this will automatically seek to the appropriate byte within the file for writing the output. In this case, the file must already exist and must have at least first_byte bytes in it. If subdocument_resumes is false, a subdocument will always be downloaded beginning at the first byte of the file.
-
bool download_to_ram(Ramfile *ramfile, bool subdocument_resumes = true)
Specifies a Ramfile object to download the resulting document to. This should be called immediately after
get_document()
orbegin_get_document()
or related functions.In the case of the blocking I/O methods like
get_document()
, this function will download the entire document to the Ramfile and return true if it was successfully downloaded, false otherwise.In the case of non-blocking I/O methods like
begin_get_document()
, this function simply indicates an intention to download to the indicated Ramfile. It returns true if the file can be opened for writing, false otherwise, but the contents will not be completely downloaded untilrun()
has returned false. At this time, it is possible that a communications error will have left a partial file, sois_download_complete()
may be called to test this.If subdocument_resumes is true and the document in question was previously requested as a subdocument (i.e.
get_subdocument()
with a first_byte value greater than zero), this will automatically seek to the appropriate byte within the Ramfile for writing the output. In this case, the Ramfile must already have at least first_byte bytes in it.
-
bool download_to_stream(std::ostream *strm, bool subdocument_resumes = true)
Specifies the name of an ostream to download the resulting document to. This should be called immediately after
get_document()
orbegin_get_document()
or related functions.In the case of the blocking I/O methods like
get_document()
, this function will download the entire document to the file and return true if it was successfully downloaded, false otherwise.In the case of non-blocking I/O methods like
begin_get_document()
, this function simply indicates an intention to download to the indicated file. It returns true if the file can be opened for writing, false otherwise, but the contents will not be completely downloaded untilrun()
has returned false. At this time, it is possible that a communications error will have left a partial file, sois_download_complete()
may be called to test this.If subdocument_resumes is true and the document in question was previously requested as a subdocument (i.e.
get_subdocument()
with a first_byte value greater than zero), this will automatically seek to the appropriate byte within the file for writing the output. In this case, the file must already exist and must have at least first_byte bytes in it. If subdocument_resumes is false, a subdocument will always be downloaded beginning at the first byte of the file.
-
bool get_allow_proxy(void) const
If this is true (the normal case), the
HTTPClient
will be consulted for information about the proxy to be used for each connection via thisHTTPChannel
. If this has been set to false by the user, then all connections will be made directly, regardless of the proxy settings indicated on theHTTPClient
.
-
bool get_blocking_connect(void) const
If this flag is true, a socket connect will block even for nonblocking I/O calls like
begin_get_document()
,begin_connect_to()
, etc. If false, a socket connect will not block for nonblocking I/O calls, but will block for blocking I/O calls (get_document(),connect_to()
, etc.).
-
std::size_t get_bytes_downloaded(void) const
Returns the number of bytes downloaded during the last (or current)
download_to_file()
ordownload_to_ram
operation(). This can be used in conjunction withget_file_size()
to report the percent complete (but be careful, sinceget_file_size()
may return 0 if the server has not told us the size of the file).
-
std::size_t get_bytes_requested(void) const
When download throttling is in effect (set_download_throttle() has been set to true) and non-blocking I/O methods (like
begin_get_document()
) are used, this returns the number of bytes “requested” from the server so far: that is, the theoretical maximum value forget_bytes_downloaded()
, if the server has been keeping up with our demand.If this number is less than
get_bytes_downloaded()
, then the server has not been supplying bytes fast enough to meet our own download throttle rate.When download throttling is not in effect, or when the blocking I/O methods (like
get_document()
, etc.) are used, this returns 0.
-
static TypeHandle get_class_type(void)
-
HTTPClient *get_client(void) const
Returns the
HTTPClient
object that owns this channel.
-
double get_connect_timeout(void) const
Returns the length of time, in seconds, to wait for a new nonblocking socket to connect. See
set_connect_timeout()
.
-
SocketStream *get_connection(void)
Returns the connection that was established via a previous call to
connect_to()
orbegin_connect_to()
, or NULL if the connection attempt failed or if those methods have not recently been called.This stream has been allocated from the free store. It is the user’s responsibility to delete this pointer when finished with it.
-
std::string get_content_type(void) const
Returns the value of the Content-Type header.
-
bool get_document(DocumentSpec const &url)
Opens the named document for reading, if available. Returns true if successful, false otherwise.
-
DocumentSpec const &get_document_spec(void) const
Returns the
DocumentSpec
associated with the most recent document. This includes its actual URL (following redirects) along with the identity tag and last-modified date, if supplied by the server.This structure may be saved and used to retrieve the same version of the document later, or to conditionally retrieve a newer version if it is available.
-
bool get_download_throttle(void) const
Returns whether the nonblocking downloads will be bandwidth-limited. See
set_download_throttle()
.
-
std::streamsize get_file_size(void) const
Returns the size of the file, if it is known. Returns the value set by
set_expected_file_size()
if the file size is not known, or 0 if this value was not set.If the file is dynamically generated, the size may not be available until a read has started (e.g.
open_read_body()
has been called); and even then it may increase as more of the file is read due to the nature of HTTP/1.1 requests which can change their minds midstream about how much data they’re sending you.
-
std::size_t get_first_byte_delivered(void) const
Returns the first byte of the file (that will be) delivered by the server in response to the current request. Normally, this is the same as
get_first_byte_requested()
, but some servers will ignore a subdocument request and always return the whole file, in which case this value will be 0, regardless of what was requested toget_subdocument()
.
-
std::size_t get_first_byte_requested(void) const
Returns the first byte of the file requested by the request. This will normally be 0 to indicate that the file is being requested from the beginning, but if the file was requested via a
get_subdocument()
call, this will contain the first_byte parameter from that call.
-
bool get_header(DocumentSpec const &url)
Like
get_document()
, except only the header associated with the document is retrieved. This may be used to test for existence of the document; it might also return the size of the document (if the server gives us this information).
-
std::string get_header_value(std::string const &key) const
Returns the HTML header value associated with the indicated key, or empty string if the key was not defined in the message returned by the server.
-
double get_http_timeout(void) const
Returns the length of time, in seconds, to wait for the HTTP server to respond to our request. See
set_http_timeout()
.
-
HTTPEnum::HTTPVersion get_http_version(void) const
Returns the HTTP version number returned by the server, as one of the
HTTPClient
enumerated types, e.g. HTTPClient::HV_11.
-
std::string const &get_http_version_string(void) const
Returns the HTTP version number returned by the server, formatted as a string, e.g. “HTTP/1.1”.
-
double get_idle_timeout(void) const
Returns the amount of time, in seconds, in which an previously-established connection is allowed to remain open and unused. See
set_idle_timeout()
.
-
std::size_t get_last_byte_delivered(void) const
Returns the last byte of the file (that will be) delivered by the server in response to the current request. Normally, this is the same as
get_last_byte_requested()
, but some servers will ignore a subdocument request and always return the whole file, in which case this value will be 0, regardless of what was requested toget_subdocument()
.
-
std::size_t get_last_byte_requested(void) const
Returns the last byte of the file requested by the request. This will normally be 0 to indicate that the file is being requested to its last byte, but if the file was requested via a
get_subdocument()
call, this will contain the last_byte parameter from that call.
-
double get_max_bytes_per_second(void) const
Returns the maximum number of bytes per second that may be consumed by this channel when
get_download_throttle()
is true.
-
double get_max_updates_per_second(void) const
Returns the maximum number of times per second that
run()
will do anything at all, whenget_download_throttle()
is true.
-
int get_num_redirect_steps(void) const
If the document automatically followed one or more redirects, this will return the number of redirects that were automatically followed. Use
get_redirect_step()
to retrieve each URL in sequence.
-
bool get_options(DocumentSpec const &url)
Sends an OPTIONS message to the server, which should query the available options, possibly in relation to a specified URL.
-
bool get_persistent_connection(void) const
Returns whether the
HTTPChannel
should try to keep the connection to the server open and reuse that connection for multiple documents, or whether it should close the connection and open a new one for each request. Seeset_persistent_connection()
.
-
std::string const &get_proxy_realm(void) const
If the document failed to connect because of a 407 (Proxy authorization required), this method will return the “realm” returned by the proxy. This string may be presented to the user to request an associated username and password (which then should be stored in
HTTPClient::set_username()
).
-
bool get_proxy_tunnel(void) const
Returns true if connections always tunnel through a proxy, or false (the normal case) if we allow the proxy to serve up documents. See
set_proxy_tunnel()
.
-
URLSpec const &get_redirect(void) const
If the document failed with a redirect code (300 series), this will generally contain the new URL the server wants us to try. In many cases, the client will automatically follow redirects; if these are successful the client will return a successful code and get_redirect() will return empty, but
get_url()
will return the new, redirected URL.
-
URLSpec const &get_redirect_step(int n) const
Use in conjunction with
get_num_redirect_steps()
to extract the chain of URL’s that the channel was automatically redirected through to arrive at the final document.
-
std::size_t get_skip_body_size(void) const
Returns the maximum number of bytes in a received (but unwanted) body that will be skipped past, in order to reset to a new request. See
set_skip_body_size()
.
-
int get_status_code(void) const
Returns the HTML return code from the document retrieval request. This will be in the 200 range if the document is successfully retrieved, or some other value in the case of an error.
Some proxy errors during an https-over-proxy request would return the same status code as a different error that occurred on the host server. To differentiate these cases, status codes that are returned by the proxy during the CONNECT phase (except code 407) are incremented by 1000.
-
std::string get_status_string(void) const
Returns the string as returned by the server describing the status code for humans. This may or may not be meaningful.
-
bool get_subdocument(DocumentSpec const &url, std::size_t first_byte, std::size_t last_byte)
Retrieves only the specified byte range of the indicated document. If last_byte is 0, it stands for the last byte of the document. When a subdocument is requested,
get_file_size()
andget_bytes_downloaded()
will report the number of bytes of the subdocument, not of the complete document.
-
bool get_trace(DocumentSpec const &url)
Sends a TRACE message to the server, which should return back the same message as the server received it, allowing inspection of proxy hops, etc.
-
URLSpec const &get_url(void) const
Returns the URL that was used to retrieve the most recent document: whatever URL was last passed to
get_document()
orget_header()
. If a redirect has transparently occurred, this will return the new, redirected URL (the actual URL at which the document was located).
-
std::string const &get_www_realm(void) const
If the document failed to connect because of a 401 (Authorization required), this method will return the “realm” returned by the server in which the requested document must be authenticated. This string may be presented to the user to request an associated username and password (which then should be stored in
HTTPClient::set_username()
).
-
bool is_connection_ready(void) const
Returns true if a connection has been established to the named server in a previous call to
connect_to()
orbegin_connect_to()
, false otherwise.
-
bool is_download_complete(void) const
Returns true when a download_to() or
download_to_ram()
has executed and the file has been fully downloaded. If this still returns false after processing has completed, there was an error in transmission.Note that simply testing is_download_complete() does not prove that the requested document was successfully retrieved–you might have just downloaded the “404 not found” stub (for instance) that a server would provide in response to some error condition. You should also check
is_valid()
to prove that the file you expected has been successfully retrieved.
-
bool is_file_size_known(void) const
Returns true if the size of the file we are currently retrieving was told us by the server and thus is reliably known, or false if the size reported by
get_file_size()
represents an educated guess (possibly as set byset_expected_file_size()
, or as inferred from a chunked transfer encoding in progress).
-
bool is_valid(void) const
Returns true if the last-requested document was successfully retrieved and is ready to be read, false otherwise.
-
ISocketStream *open_read_body(void)
Returns a newly-allocated istream suitable for reading the body of the document. This may only be called immediately after a call to
get_document()
orpost_form()
, or after a call torun()
has returned false.Note that, in nonblocking mode, the returned stream may report an early EOF, even before the actual end of file. When this happens, you should call stream->is_closed() to determine whether you should attempt to read some more later.
The user is responsible for passing the returned istream to
close_read_body()
later.
-
bool post_form(DocumentSpec const &url, std::string const &body)
Posts form data to a particular URL and retrieves the response.
-
void preserve_status(void)
Preserves the previous status code (presumably a failure) from the previous connection attempt. If the subsequent connection attempt also fails, the returned status code will be the better of the previous code and the current code.
This can be called to daisy-chain subsequent attempts to download the same document from different servers. After all servers have been attempted, the final status code will reflect the attempt that most nearly succeeded.
-
bool put_document(DocumentSpec const &url, std::string const &body)
Uploads the indicated body to the server to replace the indicated URL, if the server allows this.
-
void reset(void)
Stops whatever file transaction is currently in progress, closes the connection, and resets to begin anew. You shouldn’t ever need to call this, since the channel should be able to reset itself cleanly between requests, but it is provided in case you are an especially nervous type.
Don’t call this after every request unless you set
set_persistent_connection()
to false, since calling reset() rudely closes the connection regardless of whether we have told the server we intend to keep it open or not.
-
bool run(void)
This must be called from time to time when non-blocking I/O is in use. It checks for data coming in on the socket and writes data out to the socket when possible, and does whatever processing is required towards completing the current task.
The return value is true if the task is still pending (and run() will need to be called again in the future), or false if the current task is complete.
-
void send_extra_header(std::string const &key, std::string const &value)
Specifies an additional key: value pair that is added into the header sent to the server with the next request. This is passed along with no interpretation by the
HTTPChannel
code. You may call this repeatedly to append multiple headers.This is persistent for one request only; it must be set again for each new request.
-
void set_allow_proxy(bool allow_proxy)
If this is true (the normal case), the
HTTPClient
will be consulted for information about the proxy to be used for each connection via thisHTTPChannel
. If this has been set to false by the user, then all connections will be made directly, regardless of the proxy settings indicated on theHTTPClient
.
-
void set_blocking_connect(bool blocking_connect)
If this flag is true, a socket connect will block even for nonblocking I/O calls like
begin_get_document()
,begin_connect_to()
, etc. If false, a socket connect will not block for nonblocking I/O calls, but will block for blocking I/O calls (get_document(),connect_to()
, etc.).Setting this true is useful when you want to use non-blocking I/O once you have established the connection, but you don’t want to bother with polling for the initial connection. It’s also useful when you don’t particularly care about non-blocking I/O, but you need to respect timeouts like connect_timeout and http_timeout.
-
void set_connect_timeout(double timeout_seconds)
Sets the maximum length of time, in seconds, that the channel will wait before giving up on establishing a TCP connection.
At present, this is used only for the nonblocking interfaces (e.g.
begin_get_document()
,begin_connect_to()
), but it is used whetherset_blocking_connect()
is true or false.
-
void set_content_type(std::string content_type)
Specifies the Content-Type header, useful for applications that require different types of content, such as JSON.
-
void set_download_throttle(bool download_throttle)
Specifies whether nonblocking downloads (via
download_to_file()
ordownload_to_ram()
) will be limited so as not to use all available bandwidth.If this is true, when a download has been started on this channel it will be invoked no more frequently than
get_max_updates_per_second()
, and the total bandwidth used by the download will be no more thanget_max_bytes_per_second()
. If this is false, downloads will proceed as fast as the server can send the data.This only has effect on the nonblocking I/O methods like
begin_get_document()
, etc. The blocking methods likeget_document()
always use as much CPU and bandwidth as they can get.
-
void set_expected_file_size(std::size_t file_size)
This may be called immediately after a call to
get_document()
or some related function to specify the expected size of the document we are retrieving, if we happen to know. This is used as the return value toget_file_size()
only in the case that the server does not tell us the actual file size.
-
void set_http_timeout(double timeout_seconds)
Sets the maximum length of time, in seconds, that the channel will wait for the HTTP server to finish sending its response to our request.
The timer starts counting after the TCP connection has been established (see
set_connect_timeout()
, above) and the request has been sent.At present, this is used only for the nonblocking interfaces (e.g.
begin_get_document()
,begin_connect_to()
), but it is used whetherset_blocking_connect()
is true or false.
-
void set_idle_timeout(double idle_timeout)
Specifies the amount of time, in seconds, in which a previously-established connection is allowed to remain open and unused. If a previous connection has remained unused for at least this number of seconds, it will be closed and a new connection will be opened; otherwise, the same connection will be reused for the next request (for this particular
HTTPChannel
).
-
void set_max_bytes_per_second(double max_bytes_per_second)
When bandwidth throttling is in effect (see
set_download_throttle()
), this specifies the maximum number of bytes per second that may be consumed by this channel.
-
void set_max_updates_per_second(double max_updates_per_second)
When bandwidth throttling is in effect (see
set_download_throttle()
), this specifies the maximum number of times per second thatrun()
will attempt to do any downloading at all.
-
void set_persistent_connection(bool persistent_connection)
Indicates whether the
HTTPChannel
should try to keep the connection to the server open and reuse that connection for multiple documents, or whether it should close the connection and open a new one for each request. Set this true to keep the connections around when possible, false to recycle them.It makes most sense to set this false when the
HTTPChannel
will be used only once to retrieve a single document, true when you will be using the sameHTTPChannel
object to retrieve multiple documents.
-
void set_proxy_tunnel(bool proxy_tunnel)
Normally, a proxy is itself asked for ordinary URL’s, and the proxy decides whether to hand the client a cached version of the document or to contact the server for a fresh version. The proxy may also modify the headers and transfer encoding on the way.
If this is set to true, then instead of asking for URL’s from the proxy, we will ask the proxy to open a connection to the server (for instance, on port 80); if the proxy honors this request, then we contact the server directly through this connection to retrieve the document. If the proxy does not honor the connect request, then the retrieve operation fails.
SSL connections (e.g. https), and connections through a Socks proxy, are always tunneled, regardless of the setting of this flag.
-
void set_skip_body_size(std::size_t skip_body_size)
Specifies the maximum number of bytes in a received (but unwanted) body that will be skipped past, in order to reset to a new request.
That is, if this
HTTPChannel
requests a file viaget_document()
, but does not calldownload_to_ram()
,download_to_file()
, oropen_read_body()
, and instead immediately requests a new file, then theHTTPChannel
has a choice whether to skip past the unwanted document, or to close the connection and open a new one. If the number of bytes to skip is more than this threshold, the connection will be closed; otherwise, the data will simply be read and discarded.
-
bool will_close_connection(void) const
Returns true if the server has indicated it will close the connection after this document has been read, or false if it will remain open (and future documents may be requested on the same connection).
-
void write_headers(std::ostream &out) const
Outputs a list of all headers defined by the server to the indicated output stream.
-
enum StatusCode