LOCATEDB(5L)

LOCATEDB(5L)

locale Home Page File Formats Index log


NAME
       locatedb - front-compressed file name database

DESCRIPTION
       This  manual  page  documents  the  format  of  file  name
       databases for the GNU version of locate.   The  file  name
       databases  contain  lists of files that were in particular
       directory trees when the databases were last updated.

       There can be multiple databases.  Users can  select  which
       databases locate searches using an environment variable or
       command line option; see locate(1L).  The system  adminis-
       trator  can  choose the file name of the default database,
       the frequency with which the databases  are  updated,  and
       the directories for which they contain entries.  Normally,
       file name databases are updated by  running  the  updatedb
       program periodically, typically nightly; see updatedb(1L).

       updatedb runs a program called frcode to compress the list
       of  file  names using front-compression, which reduces the
       database size by a factor of 4  to  5.   Front-compression
       (also known as incremental encoding) works as follows.

       The  database  entries  are  a  sorted list (case-insensi-
       tively,  for  users'  convenience).   Since  the  list  is
       sorted,  each  entry  is likely to share a prefix (initial
       string) with the  previous  entry.   Each  database  entry
       begins  with  an  offset-differential count byte, which is
       the additional number of characters of prefix of the  pre-
       ceding  entry  to use beyond the number that the preceding
       entry is using of its predecessor.   (The  counts  can  be
       negative.)  Following the count is a null-terminated ASCII
       remainder -- the part of the name that follows the  shared
       prefix.

       If  the  offset-differential  count  is larger than can be
       stored in a byte (+/-127), the byte has the value 0x80 and
       the  count  follows  in  a 2-byte word, with the high byte
       first (network byte order).

       Every database begins with a dummy entry for a file called
       `LOCATE02',  which  locate  checks  for to ensure that the
       database file has the correct format; it ignores the entry
       in doing the search.

       Databases  can  not  be concatenated together, even if the
       first (dummy) entry is trimmed  from  all  but  the  first
       database.   This  is because the offset-differential count
       in the first entry of the second and  following  databases
       will be wrong.

       There  is also an old database format, used by Unix locate
       and find programs and earlier releases of  the  GNU  ones.
       updatedb  runs  programs called bigram and code to produce

       old-format databases.  The old  format  differs  from  the
       above  description in the following ways.  Instead of each
       entry starting with an offset-differential count byte  and
       ending with a null, byte values from 0 through 28 indicate
       offset-differential counts from -14 through 14.  The  byte
       value  indicating  that  a  long offset-differential count
       follows is 0x1e (30),  not  0x80.   The  long  counts  are
       stored  in  host byte order, which is not necessarily net-
       work byte order, and host integer word size, which is usu-
       ally  4  bytes.   They also represent a count 14 less than
       their value.  The database lines have no termination byte;
       the  start of the next line is indicated by its first byte
       having a value <= 30.

       In addition, instead of starting with a dummy  entry,  the
       old  database format starts with a 256 byte table contain-
       ing the 128 most common  bigrams  in  the  file  list.   A
       bigram is a pair of adjacent bytes.  Bytes in the database
       that have the high bit set are indexes (with the high  bit
       cleared)  into  the  bigram table.  The bigram and offset-
       differential count coding  makes  these  databases  20-25%
       smaller  than  the  new  format,  but makes them not 8-bit
       clean.  Any byte in a file name that is in the ranges used
       for  the  special  codes  is replaced in the database by a
       question mark, which not coincidentally is the shell wild-
       card to match a single character.

EXAMPLE
       Input to frcode:
       /usr/src
       /usr/src/cmd/aardvark.c
       /usr/src/cmd/armadillo.c
       /usr/tmp/zoo

       Length of the longest prefix of the preceding entry to share:
       0 /usr/src
       8 /cmd/aardvark.c
       14 rmadillo.c
       5 tmp/zoo

       Output  from  frcode,  with trailing nulls changed to new-
       lines and count bytes made printable:
       0 LOCATE02
       0 /usr/src
       8 /cmd/aardvark.c
       6 rmadillo.c
       -9 tmp/zoo

       (6 = 14 - 8, and -9 = 5 - 14)

SEE ALSO
       find(1L) locate(1L) locatedb(5L) xargs(1L) Finding 
       Files(on-line in Info, or printed) 

locale Home Page File Formats Index log