Script Index


/var/lib/sorcery/modules/liburl

This file contains functions for downloading and verifying urls. It does this by extracting information from a url with url handlers specific to that url type. The url handler also determines a download handler to actually download the url. FIXME For example, the request to download the following url is made through the generic F function: http://machinename.com/path/to/file.tar.bz2 The F function parses the url prefix (in this case, http) and passes the url to the http download handler (F). A similar approach is used for url verification. This file provides an infrastructure that makes it relatively easy to add new url handlers. In order to add new handlers, all that has to be done is add a new file to the sorcerer library directory with the new url handler functions defined in the file. This new file will automatically be discovered and used by the sorcerer scripts. The following section describes how to add new url handlers in a little more detail.

WRITING NEW URL HANDLERS

This section describes the steps needed to write new url handlers.

Decide on the Url Format

Urls must be of the form ://
. The prefix should be something unique. Only the prefix is used by this script, the address is not parsed or used, simply passed to the appropriate url handler.

Create a File to Hold the New Url Handling Functions

In the SGL library directory (i.e., the directory pointed to by the SGL_LIBRARY variable), create a new file called url_. For example, if your new url prefix is I, you should create a new file called F. The file should be executable.

Implement Url Handlers

The next step is to write the actual functions that will handle url requests and put them in the new file you just created. The functions that must be implemented are:
url__bucketize 
url__crack 
url__expand 
url__hostname 
url__is_valid 
url__netselect 
url__verify 
The easiest way to figure out what to do is to look at one of the existing files (e.g., url_http handles http requests).

Handling Multiple Url Types in a Single File

It's perfectly valid for a file to handle mutlple types of urls. The F file actually handles ftp, http, and https urls. Take a look at the file to see how it's done.

Synopsis

Functions that download and verify urls.


function url_download_expand_sort()

Parameters:

  • $1: see url_download

Description

This is simply a wrapper around url_download which provides the additional functionality of expanding and ranking urls. Processes like summon should use this, more defined processes like scribe and sorcery update will probably directly call url_download.


function url_download()

Parameters:

  • $1: target Expected target file or tree, this is only a suggestion, and the download handler may ignore it.
  • $2: url_list List of urls to get the target from
  • $3: hints Hints, these help the function determine what type of thing it is downloading or other tunables.
  • $4: udl_target Name of variable in which to store the name of the result directory or file
  • $5: udl_type Name of the variable in which to store the type of thing downloaded

Returns:

  • 0 if file could be downloaded
  • 1 otherwise

Description

Downloads the specified url. Returns true if file could be downloaded, false otherwise.


function url_get_valid_urls()

Parameters:

  • $1: urllist

function url_expand_urls()

Parameters:

  • $1: urllist

Description

Frontend to expand urls into a nice list.


function url_sort_urls()

Parameters:

  • $1: urllist

function url_rank()

Parameters:

  • $1: urllist

Stdout

new list

Description

Ranks the urls in order of speed from fastest to slowest using netselect Makes use of url__hostname functions. If multiple urls from the same hostname are passed in, their ordering is preserved, although that group of urls may move as a whole in the list.


function url_get_prefix()

Parameters:

  • $1: url

Returns:

  • 0 valid url
  • 1 otherwise

Stdout

url prefix

Type

Private

Description

Takes a url and echos the url prefix. Returns true if a valid url could be found, returns false otherwise. This is the only place parsing of a url should take place outside of a url_handler. Doing so elsewhere is bugworthy.


function url_strip_prefix()

Parameters:

  • $1: url
  • $2: url prefix

Type

Private

stdout

url sans prefix


function url_bucketize()

Parameters:

  • $1: url

stdout

dl handler

Description

Get the download handler for this url


function url_crack()

Parameters:

  • $1: url

Description

Parse the url somehow. The results are url specific right now, this is usually only called by dl handlers who know what url types they can handle, and thus, understand the return value.


function url_expand()

Parameters:

  • $1: url(s)

stdout

urls

Description

Attempt to get more similar urls to the given one based on the sorcery mirrors files. Most url types simply expand to the input.


function url_verify()

Parameters:

  • $1: url

Description

Verify the url, this usually means going out to the internet and somehow determining if the link is good.


function url_hostname()

Parameters:

  • $1: url

Returns:

  • 0 valid url
  • 1 otherwise

Stdout

url hostname

Description

Takes a url and echos the url hostname. Returns true if a hostname could be found, returns false otherwise.


function url_netselect()

Parameters:

  • $1: url

Returns:

  • 0 valid url
  • 1 otherwise

Stdout

url netslect output

Description

Prints the netselect output from the url handlers attempt at running netselect.


function url_is_valid()

Parameters:

  • $1: url

Returns:

  • 0 valid url
  • 1 otherwise

Description

Returns true if the given url is a valid url understood by the url library, returns false otherwise.


function url_generic_apifunc()

Parameters:

  • $1: function name
  • $2: url(s)

Description

This implements the common code for simple url handler inheritence. The above url api functions call this which then calls the handler specific function, if it exists, or the default version. This allows url handlers to only override functions as necessary. If multiple urls are given, the prefix is assumed of all of them is assumed to be the same.