wget(1)

wget(1)

weave Home Page User Commands Index Wharf


NAME
       wget - a utility to retrieve files from the World Wide Web

SYNOPSIS
       wget [options] [URL-list]

WARNING
       The information in this man page is an  extract  from  the
       full  documentation  of  Wget.   It  is  well out of date.
       Please refer to the info page for full,  up-to-date  docu-
       mentation.   You  can view the info documentation with the
       Emacs info subsystem or the standalone info program.

DESCRIPTION
       Wget is a utility designed for retrieving binary documents
       across the Web, through the use of HTTP (Hyper Text Trans-
       fer Protocol) and FTP (File Transfer Protocol), and saving
       them to disk.  Wget is non-interactive, which means it can
       work in the background, while the user is not  logged  in,
       unlike  most  of web browsers (thus you may start the pro-
       gram and log off,  letting  it  do  its  work).  Analysing
       server  responses,  it distinguishes between correctly and
       incorrectly retrieved documents,  and  retries  retrieving
       them as many times as necessary, or until a user-specified
       limit is reached. REST is used in FTP on hosts  that  sup-
       port  it.  Proxy  servers  are  supported  to speed up the
       retrieval and lighten network load.

       Wget supports a full-featured recursion mechanism, through
       which  you  can  retrieve large parts of the web, creating
       local copies of remote directory hierarchies.  Of  course,
       maximum  level  of  recursion  and other parameters can be
       specified. Infinite recursion loops are always avoided  by
       hashing  the  retrieved  data.  All of this works for both
       HTTP and FTP.

       The retrieval is conveniently traced with  printing  dots,
       each  dot  representing  one  kilobyte  of  data received.
       Builtin features offer mechanisms to tune which links  you
       wish to follow (cf. -L, -D and -H).

URL CONVENTIONS
       Most  of the URL conventions described in RFC1738 are sup-
       ported. Two alternative syntaxes are also supported, which
       means  you  can  use  three  forms of address to specify a
       file:

       Normal URL (recommended form):
       http://host[:port]/path
       http://fly.cc.fer.hr/
       ftp://ftp.xemacs.org/pub/xemacs/xemacs-19.14.tar.gz
       ftp://username:password@host/dir/file

       FTP only (ncftp-like): hostname:/dir/file

       HTTP only (netscape-like):
       hostname(:port)/dir/file

       You may encode your username and/or password to URL  using
       the form:

       ftp://user:password@host/dir/file

       If  you  do  not  understand  these syntaxes, just use the
       plain ordinary syntax with which you would  call  lynx  or
       netscape.  Note that the alternative forms are deprecated,
       and may cease being supported in the future.

OPTIONS
       There are quite a few command-line options for wget.  Note
       that  you  do  not  have to know or to use them unless you
       wish to change the default behaviour of the  program.  For
       simple operations you need no options at all. It is also a
       good idea to put frequently used command-line  options  in
       .wgetrc, where they can be stored in a more readable form.

       This is the complete list of  options  with  descriptions,
       sorted in descending order of importance:

       -h --help
              Print  a help screen. You will also get help if you
              do not supply command-line arguments.

       -V --version
              Display version of wget.

       -v --verbose
              Verbose output, with all the  available  data.  The
              default  output consists only of saving updates and
              error messages. If the output is stdout, verbose is
              default.

       -q --quiet
              Quiet mode, with no output at all.

       -d --debug
              Debug  output,  and will work only if wget was com-
              piled with -DDEBUG. Note that when the  program  is
              compiled  with  debug  output,  it  is  not printed
              unless you specify -d.

       -i filename --input-file=filename
              Read URL-s from filename, in which  case  no  URL-s
              need  to be on the command line. If there are URL-s
              both on the command line and in a  filename,  those
              on  the command line are first to be retrieved. The
              filename need not be an HTML document (but no  harm
              if  it  is)  -  it  is enough if the URL-s are just
              listed sequentially.
              However, if you specify --force-html, the  document
              will be regarded as HTML. In that case you may have
              problems with relative links, which you  can  solve
              either  by adding <base href="url"> to the document
              or by specifying --base=url on the command-line.

       -o logfile --output-file=logfile
              Log messages to logfile, instead of default stdout.
              Verbose  output  is now the default at logfiles. If
              you do not wish it, use -nv (non-verbose).

       -a logfile --append-output=logfile
              Append to logfile - same as -o, but  appends  to  a
              logfile  (or creating a new one if the old does not
              exist) instead of rewriting the old log file.

       -t num --tries=num
              Set number of retries to num.  Specify 0 for  infi-
              nite retrying.

       --follow-ftp
              Follow FTP links from HTML documents.

       -c --continue-ftp
              Continue  retrieval of FTP documents, from where it
              was left off. If you specify  "wget  -c  ftp://sun-
              site.doc.ic.ac.uk/ls-lR.Z",  and there is already a
              file named ls-lR.Z in the current  directory,  wget
              continue  retrieval  from  the  offset equal to the
              length of the existing file. Note that you  do  not
              need  to  specify this option if the only thing you
              want is wget to continue retrieving where  it  left
              off when the connection is lost - wget does this by
              default. You need this option when you want to con-
              tinue   retrieval   of   a   file  already  halfway
              retrieved, saved by other FTP software, or left  by
              wget being killed.

       -g on/off --glob=on/off
              Turn  FTP  globbing on or off. By default, globbing
              will be turned on if the URL  contains  a  globbing
              characters  (an asterisk, e.g.). Globbing means you
              may  use  the  special  characters  (wildcards)  to
              retrieve  more  files  from  the  same directory at
              once,  like   wget   ftp://gnjilux.cc.fer.hr/*.msg.
              Globbing  currently works only on UNIX FTP servers.

       -e command --execute=command
              Execute command, as if it were a  part  of  .wgetrc
              file.  A  command invoked this way will take prece-
              dence over the same command in .wgetrc, if there is
              one.

       -N --timestamping
              Use  the so-called time-stamps to determine whether
              to retrieve a file. If the  last-modification  date
              of  the remote file is equal to, or older than that
              of local file, and the sizes of  files  are  equal,
              the  remote file will not be retrieved. This option
              is useful for  weekly  mirroring  of  HTTP  or  FTP
              sites,  since it will not permit downloading of the
              same file twice.

       -F --force-html
              When input is read from a  file,  force  it  to  be
              HTML.  This  enables you to retrieve relative links
              from existing HTML files on  your  local  disk,  by
              adding <base href> to HTML, or using --base.

       -B base_href --base=base_href
              Use  base_href  as base reference, as if it were in
              the file, in the form <base href="base_href">. Note
              that the base in the file will take precedence over
              the one on the command-line.

       -r --recursive
              Recursive web-suck. According to  the  protocol  of
              the  URL,  this  can  mean  two  things.  Recursive
              retrieval of a HTTP URL means that Wget will  down-
              load the URL you want, parse it as an HTML document
              (if an HTML document it is), and retrieve the files
              this  document  is  referring to, down to a certain
              depth (default 5; change it with  -l).   Wget  will
              create  a  hierarchy of directories locally, corre-
              sponding to the one found on the HTTP server.
              This option is ideal for presentations, where  slow
              connections should be bypassed. The results will be
              especially good if relative links were used,  since
              the  pages will then work on the new location with-
              out change.
              When using this option with an  FTP  URL,  it  will
              retrieve  all the data from the given directory and
              subdirectories,   similar   to    HTTP    recursive
              retrieval.
              You  should be warned that invoking this option may
              cause grave overloading  of  your  connection.  The
              load  can  be  minimized  by  lowering  the maximal
              recursion level (see -l)  and/or  by  lowering  the
              number of retries (see -t).

       -m --mirror
              Turn  on mirroring options. This will set recursion
              and time-stamping, combining -r and -N.

       -l depth --level=depth
              Set recursion depth level to the  specified  level.
              Default  is  5.  After the given recursion level is
              reached, the sucking will proceed from the  parent.
              Thus  specifying  -r  -l1 should equal a recursion-
              less retrieve from file. Setting the level to  zero
              makes  recursion  depth  (theoretically) unlimited.
              Note that the number of  retrieved  documents  will
              increase exponentially with the depth level.

       -H --span-hosts
              Enable  spanning  across hosts when doing recursive
              retrieving. See -r and -D. Refer to FOLLOWING LINKS
              for a more detailed description.

       -L --relative
              Follow only relative links. Useful for retrieving a
              specific homepage  without  any  distractions,  not
              even  those  from the same host. Refer to FOLLOWING
              LINKS for a more detailed description.

       -D domain-list --domains=domain-list
              Set domains to be accepted and DNS looked-up, where
              domain-list is a comma-separated list. Note that it
              does not turn on -H. This speeds things up, even if
              only  one host is spanned. Refer to FOLLOWING LINKS
              for a more detailed description.

       -A acclist / -R rejlist --accept=acclist /
              --reject=rejlist
              Comma-separated     list     of    extensions    to
              accept/reject. For example, if you wish to download
              only  GIFs and JPEGs, you will use -A gif,jpg,jpeg.
              If you wish to download everything  except  cumber-
              some   MPEGs   and  .AU  files,  you  will  use  -R
              mpg,mpeg,au.

       -X list --exclude-directories list
              Comma-separated list of directories to exclude from
              FTP fetching.

       -P prefix --directory-prefix=prefix
              Set  directory  prefix  ("." by default) to prefix.
              The directory prefix is  the  directory  where  all
              other files and subdirectories will be saved to.

       -T value --timeout=value
              Set the read timeout to a specified value. Whenever
              a read is issued, the file  descriptor  is  checked
              for a possible timeout, which could otherwise leave
              a  pending  connection  (uninterrupted  read).  The
              default timeout is 900 seconds (fifteen minutes).

       -Y on/off --proxy=on/off
              Turn proxy on or off. The proxy is on by default if
              the appropriate environmental variable is  defined.

       -Q quota[KM] --quota=quota[KM]
              Specify  download  quota, in bytes (default), kilo-
              bytes or megabytes. More useful for  rc  file.  See
              below.

       -O filename --output-document=filename
              The  documents will not be written to the appropri-
              ate files, but all will be  appended  to  a  unique
              file  name  specified by this option. The number of
              tries will be automatically set to 1. If this file-
              name  is `-', the documents will be written to std-
              out, and --quiet will be turned on. Use this option
              with  caution,  since it turns off all the diagnos-
              tics Wget can otherwise give about various  errors.

       -S --server-response
              Print  the  headers  sent by the HTTP server and/or
              responses sent by the FTP server.

       -s --save-headers
              Save the headers sent by the  HTTP  server  to  the
              file, before the actual contents.

       --header=additional-header
              Define  an  additional  header. You can define more
              than additional headers. Do not  try  to  terminate
              the header with CR or LF.

       --http-user --http-passwd
              Use  these two options to set username and password
              Wget will send to HTTP servers. Wget supports  only
              the basic WWW authentication scheme.

       -nc    Do not clobber existing files when saving to direc-
              tory hierarchy within recursive retrieval  of  sev-
              eral  files.  This  option is extremely useful when
              you wish  to  continue  where  you  left  off  with
              retrieval.   If the files are .html or (yuck) .htm,
              it will be loaded from the disk, and parsed  as  if
              they have been retrieved from the Web.

       -nv    Non-verbose  -  turn off verbose without being com-
              pletely quiet (use -q for that), which  means  that
              error  messages  and  basic  information  still get
              printed.

       -nd    Do not  create  a  hierarchy  of  directories  when
              retrieving recursively. With this option turned on,
              all files will get saved to the current  directory,
              without  clobbering  (if  a name shows up more than
              once, the filenames will get extensions .n).

       -x     The opposite of -nd -- Force creation of a  hierar-
              chy  of  directories even if it would not have been
              done otherwise.

       -nh    Disable time-consuming DNS  lookup  of  almost  all
              hosts. Refer to FOLLOWING LINKS for a more detailed
              description.

       -nH    Disable  host-prefixed  directories.  By   default,
              http://fly.cc.fer.hr/   will  produce  a  directory
              named fly.cc.fer.hr in which everything  else  will
              go. This option disables such behaviour.

       --no-parent
              Do not ascend to parent directory.

       -k --convert-links
              Convert  the  non-relative  links  to relative ones
              locally.

FOLLOWING LINKS
       Recursive retrieving has a mechanism that  allows  you  to
       specify which links wget will follow.

       Only relative links
              When  only relative links are followed (option -L),
              recursive retrieving will never span  hosts.   will
              never  get  called,  and  the  process will be very
              fast, with the minimum strain of the network.  This
              will  suit  your needs most of the time, especially
              when mirroring the output the output of *2html con-
              verters,  which  generally  produce  only  relative
              links.

       Host checking
              The drawback of following the relative links solely
              is that humans often tend to mix them with absolute
              links to the very same  host,  and  the  very  same
              page. In this mode (which is the default), all URL-
              s that refer to the same host will be retrieved.
              The problem with this options are  the  aliases  of
              the  hosts  and  domains.  Thus there is no way for
              wget to know that regoc.srce.hr and www.srce.hr are
              the  same  hosts, or that fly.cc.fer.hr is the same
              as fly.cc.etf.hr.  Whenever  an  absolute  link  is
              encountered,   gethostbyname  is  called  to  check
              whether we are really on the same  host.   Although
              results  of  gethostbyname  are  hashed, so that it
              will never get called twice for the same  host,  it
              still presents a nuisance e.g. in the large indexes
              of difference hosts, when each of them  has  to  be
              looked  up. You can use -nh to prevent such complex
              checking, and then wget will just compare the host-
              name.  Things  will  run much faster, but also much
              less reliable.

       Domain acceptance
              With the -D option you  may  specify  domains  that
              will be followed.  The nice thing about this option
              is that hosts that are not from those domains  will
              not   get  DNS-looked  up.  Thus  you  may  specify
              -Dmit.edu, just to make sure that  nothing  outside
              .mit.edu  gets  looked up .  This is very important
              and useful. It also means that -D does not imply -H
              (it must be explicitly specified). Feel free to use
              this option, since it will speed things up greatly,
              with almost all the reliability of host checking of
              all hosts.
              Of course, domain acceptance can be used  to  limit
              the  retrieval  to  particular  domains, but freely
              spanning hosts within the domain, but then you must
              explicitly specify -H.

       All hosts
              When  -H  is  specified  without  -D, all hosts are
              being spanned. It is useful to  set  the  recursion
              level to a small value in those cases.  Such option
              is rarely useful.

       FTP    The rules for FTP are somewhat specific, since they
              have  to  be.  To have FTP links followed from HTML
              documents, you must specify -f (follow_ftp). If you
              do specify it, FTP links will be able to span hosts
              even  if  span_hosts  is  not  set.   Option  rela-
              tive_only  (-L)  has  no  effect  on FTP.  However,
              domain acceptance (-D)  and  suffix  rules  (-A/-R)
              still apply.

STARTUP FILE
       Wget  supports  the  use  of  initialization file .wgetrc.
       First  a  system-wide  init  file  will  be   looked   for
       (/usr/local/lib/wgetrc  by  default)  and loaded. Then the
       user's file will be searched for in  two  places:  In  the
       environmental  variable  WGETRC (which is presumed to hold
       the full pathname) and $HOME/.wgetrc.  Note that the  set-
       tings  in user's startup file may override the system set-
       tings, which includes the quota settings (he he).

       The syntax of each line of startup file is simple:

            variable = value

       Valid values are different for  different  variables.  The
       complete set of commands is listed below, the letter after
       equation-sign denoting the value the command takes. It  is
       on/off  for  on  or off (which can also be 1 or 0), string
       for any string or N for positive  integer.   For  example,
       you  may specify "use_proxy = off" to disable use of proxy
       servers by default. You may use  inf  for  infinite  value
       (the  role  of  0 on the command line), where appropriate.
       The commands are case-insensitive and  underscore-insensi-
       tive,  thus  DIr__Prefix  is  the same as dirprefix. Empty

       lines, lines consisting of spaces, or lines beginning with
       '#' are skipped.

       Most  of  the  commands have their equivalent command-line
       option, except some more obscure or rarely  used  ones.  A
       sample  init  file  is provided in the distribution, named
       sample.wgetrc.

       accept/reject = string
              Same as -A/-R.

       add_hostdir = on/off
              Enable/disable host-prefixed  hostnames.  -nH  dis-
              ables it.

       always_rest = on/off
              Enable/disable  continuation  of the retrieval, the
              same as -c.

       base = string
              Set base for relative URL-s, the same as -B.

       convert links = on/off
              Convert non-relative links locally. The same as -k.

       debug = on/off
              Debug mode, same as -d.

       dir_mode = N
              Set  permission  modes  of  created  subdirectories
              (default is 755).

       dir_prefix = string
              Top of directory tree, the same as -P.

       dirstruct = on/off
              Turning dirstruct on or off, the same as -x or -nd,
              respectively.

       domains = string
              Same as -D.

       follow_ftp = on/off
              Follow  FTP  links from HTML documents, the same as
              -f.

       force_html = on/off
              If set to  on,  force  the  input  filename  to  be
              regarded as an HTML document, the same as -F.

       ftp_proxy = string
              Use  the  string  as  FTP proxy, instead of the one
              specified in environment.

       glob = on/off
              Turn globbing on/off, the same as -g.

       header = string
              Define an additional header, like --header.

       http_passwd = string
              Set HTTP password.

       http_proxy = string
              Use the string as HTTP proxy, instead  of  the  one
              specified in environment.

       http_user = string
              Set HTTP user.

       input = string
              Read the URL-s from filename, like -i.

       kill_longer = on/off
              Consider  data  longer  than  specified in content-
              length header as invalid (and  retry  getting  it).
              The  default  behaviour  is to save as much data as
              there is, provided there is more than or  equal  to
              the value in content-length.

       logfile = string
              Set logfile, the same as -o.

       login = string
              Your  user  name  on  the  remote machine, for FTP.
              Defaults to "anonymous".

       mirror = on/off
              Turn mirroring on/off. The same as -m.

       noclobber = on/off
              Same as -nc.

       no_parent = on/off
              Same as --no-parent.

       no_proxy = string
              Use the  string  as  the  comma-separated  list  of
              domains  to  avoid in proxy loading, instead of the
              one specified in environment.

       num_tries = N
              Set number of retries per URL, the same as -t.

       output_document = string
              Set the output filename, the same as -O.

       passwd = string
              Your password  on  the  remote  machine,  for  FTP.
              Defaults to username@hostname.domainname.

       quiet = on/off
              Quiet mode, the same as -q.

       quota = quota
              Specify  the download quota, which is useful to put
              in /usr/local/lib/wgetrc. When  download  quota  is
              specified,  wget  will  stop  retrieving  after the
              download sum has become  greater  than  quota.  The
              quota  can  be specified in bytes (default), kbytes
              ('k'  appended)  or  mbytes  ('m'  appended).  Thus
              "quota  =  5m" will set the quota to 5 mbytes. Note
              that the user's startup file overrides system  set-
              tings.

       reclevel = N
              Recursion level, the same as -l.

       recursive = on/off
              Recursive on/off, the same as -r.

       relative_only = on/off
              Follow  only relative links (the same as -L). Refer
              to section FOLLOWING  LINKS  for  a  more  detailed
              description.

       robots = on/off
              Use (or not) robots.txt file.

       server_response = on/off
              Choose  whether  or  not  to print the HTTP and FTP
              server responses, the same as -S.

       simple_host_check = on/off
              Same as -nh.

       span_hosts = on/off
              Same as -H.

       timeout = N
              Set timeout value, the same as -T.

       timestamping = on/off
              Turn timestamping on/off. The same as -N.

       use_proxy = on/off
              Turn proxy support on/off. The same as -Y.

       verbose = on/off
              Turn verbose on/off, the same as -v/-nv.

SIGNALS
       Wget will catch the SIGHUP (hangup signal) and ignore  it.
       If  the  output  was on stdout, it will be redirected to a
       file named wget-log. This is also convenient when you wish
       to redirect the output of Wget interactively.

       $ wget http://www.ifi.uio.no/~larsi/gnus.tar.gz &
       $ kill -HUP %%       # to redirect the output

       Wget will not try to handle any signals other than SIGHUP.
       Thus you may interrupt Wget using ^C or SIGTERM.

EXAMPLES
       Get URL http://fly.cc.fer.hr/:
       wget http://fly.cc.fer.hr/

       Force non-verbose output:
       wget -nv http://fly.cc.fer.hr/

       Unlimit number of retries:
       wget -t0 http://www.yahoo.com/

       Create a mirror image of fly's web (with the same directory structure
       the original has), up to six recursion levels, with only one try per
       document, saving the verbose output to log file 'log':
       wget -r -l6 -t1 -o log http://fly.cc.fer.hr/

       Retrieve from yahoo host only (depth 50):
       wget -r -l50 http://www.yahoo.com/

ENVIRONMENT
       http_proxy, ftp_proxy, no_proxy, WGETRC, HOME

FILES
       /usr/local/lib/wgetrc, $HOME/.wgetrc

UNRESTRICTIONS
       Wget is free; anyone may redistribute copies  of  Wget  to
       anyone  under  the  terms  stated  in  the  General Public
       License, a copy of which accompanies each copy of Wget.

SEE ALSO
       lynx(1) ftp(1) 

AUTHOR
       Hrvoje Niksic lt;hniksic@srce.hr is  the  author  of  Wget.
       Thanks  to  the  beta testers and all the other people who
       helped with useful suggestions.


weave Home Page User Commands Index Wharf