Subversion Repositories DevTools

Rev

Rev 4456 | Go to most recent revision | Blame | Compare with Previous | Last modification | View Log | RSS feed

Notes on the blat package transfer system
-----------------------------------------

Reason for its creation
-----------------------
Need to transfer packages from dpkg_archive to remote sites in a timely manner.
Rsync was considered but it has several problems:

1) Does not handle symlinks in a suitable manner
2) Works with all the files in the repository. Experience has
   shown that this can be very slow
3) Still requires significant scripting in order to be useful

Blat can make several assumptions about the package system.
Blat will:
    Support multiple transfer target destinations
    Allow for rapid detection of new packages that need to be transferred
    Allow for multiple Releases to be synchronized
    Allow for all (not-closed) releases in a Project to be synchronized
    Easily configured - and can be configured on the fly
    Atomically transfer packages
    Transfer a PackageList for future cleanup operations
    Logging and debug facilities

Overview of Blat
---------------
There are three main components in Blat
    Daemon supervisor
        Responsible for start and restarting configured daemons
    Transfer Daemons
        Responsible for the package sync operations for one target
        Multiple Daemons ( targets ) are supported
    On Target utilities
        A set of scripts that support Blat
        These are transferred to the target machine.

Each Blat Daemon performs three main operations
    1) Fast package transfer
    2) Repository synchronization
    3) PackageList creation
    4) Package aging (Optional)

Each Blat target can perform the following:
    1) Package aging
    2) dpkg_archive content indexing

Fast package transfer
===============================
This is mechanism whereby Blat will detect the need to transfer a newly built
package to the target system.

It works by monitoring a directory of tags. It is the responsibility of Release
Manager to populate the directory.

The responsiveness of the detection can be configured, but a period of 5
seconds is suggested.

Repository synchronization
===============================
The daemon will request a list of packages that are present on the target and
determine the list of packages that should be on the target. Discrepancies will
be transferred to the target. Excess packages are left on the target.

Blat will request the target to create and transfer a list of packages.
This is done by invoking a small program on the target to perform the work.

Blat will interrogate the Release Manager database for Releases to be processed
and packages in those Releases.

A package will be transferred to the target if:
    * The package is required, but not present on the target
    * The time-stamps of the descpkg files differ

Package transfer may be delayed if the source package is writable, unless it
has been writable for longer than a configured time period.

The frequency of the Repository synchronization can be configured. A time of
several hours is suggested.


PackageList creation
===============================
Blat will create and send to the target a list of package-version that are
in the current set. This list may be used to clean out the package archive,
but this functionality has not yet been implemented.

Package aging
=============
Blat can be configured to delete packages that are no longer a part of the
current package-version set. There are 4 methods:

1) None
   Packages will never be deleted by Blat on the target.
   The target file system will need to be managed to prevent it filling up.

2) Immediate
   Packages will be deleted as soon as they are not a part of the current
   package-version set.

3) Aged by blat master
   Packages will be marked for deletion and the blat master will delete
   the packages after a configured number of days.

4) Aged by blat target
   Packages will be marked for deletion and the blat target will delete
   the packages after a configured number of days. This operation requires
   that a cron job be configured on the target machine.

dpkg_archive content indexing
=============================
Blat provides a utility that can be run by the transfer target, as a cron job,
that will maintain a list of files and folders in the package archive.

This list greatly simplifies the process of locating a file in the archive.
The user simply greps the package list, rather than search the directory tree.

The file list is in a file .../dpkg_archive/.dpkg_archive/dpkg_archive_list.txt
   

Host System Requirements
========================
1) Unix
   It has been designed for a Unix environment - not Windows
2) Perl
   Blat is written in Perl
3) Java
   Required for the Database interface
4) Shell
   Start and stop scripts are in shell
5) Utilities
    ssh
    gtar
    gzip

Target System Requirements
========================
1) Unix
   It has been designed for a Unix environment - not Windows
2) Perl
3) Shell
   Blat will execute a number of scripts on the target in order
   to control the process. These are in Shell and Perl
4) Utilities
    ssh
    gtar
    gunzip
5) User with write access to the dpkg_archive - (pkgadmin)
6) Link for the users home directory to the package archive
   This link is called 'dpkg_archive'

Shared requirements
===================
Blat uses ssh for the transfer process. It uses an 'identity' file to allow
passwordless authentication with the target. The public part of the identify
file must be appended to the target users .ssh/authorized_keys file.

The private part of the identity file is held by the Blat Daemon.

Design assumptions
================================================================================
Blat is designed to transfer dpkg_archive packages in one direction.

Blat makes assumptions on the structure of a package
    - They contain a descpkg file
    - They are read-only when fully released
    - The contents of packages does not change
    - It is not necessary to check every file in the package

The Blat master is designed to run in a single directory tree.
The config file should be in a 'config' directory under the location
of the blat master program.

Installation :: Target System
=============================
1) Create or acquire a user that has write access to the package archive

2) Create or acquire a passwordless identity file and associated public key
   of the identity file. One set is available in the 'ssh' subdirectory.

   Append the public part of the identity file (id_rsa_pkg_admin.pub) to
   ~/.ssh/authorized_keys

   I suggest using 'ssh-copy-id'.

3) Create a link from the users home directory to dpkg_archive
   The must be called dpkg_archive

4) Transfer the blat receiver scripts to a directory accessible to the
   transfer user. ie: ~/bin
   The required receiver files are:
        get_plist.pl
        receive_file
        receive_package
        delete_package
        pkg_mon.pl
        pkg_purge.pl
   Ensure the programs are executable by the transfer user.
   Only get_plist.pl is really needed. The others will be transferred
   when detected missing. 

5) Set up cron jobs (optional)
   Will be used to maintain package information
   Suggest crontab entry - may vary for each installation

   0 3 * * * /home/pkgadmin/bin/pkg_mon.pl
   0 6 * * 1 /home/pkgadmin/bin/pkg_purge.pl

Installation :: Host System
=============================
This section really deals with the configuration of a new target.

1) Create a new config file in Blat's config directory - with a .conf
   suffix. This is best done by cloning an existing entry.

   Note: The blat master will automatically spawn a daemon as soon
   as a new config file is seen. Its best to create the file elsewhere
   and copy it to the directory when ready.

   Note: The Blat daemon will detect changes to its own config file and
   re-read it on the fly.

Useful Tricks
=============

kill -usr1 pid-of-daemon
    Will force the daemon to perform a repository sync check.

kill -hup pid-of-daemon
    Will force the daemon to roll its own log files

kill pid-of-daemon
    Will force the daemon to exit. It will be restarted.

Remove the daemons pid file
    Will force the daemon to exit. It will be restarted.
    Useful for debugging on a live system

kill -usr1 pid_of_master
    Will signal -usr1 to all daemons
    Will force all daemons to perform a repository sync check.

kill -hup pid_of_master
    Will signal -hup to all daemons
    Will force all daemon to roll their own log files

kill pid_of_master
    Will shut down system gracefully by sending kill to all
    children.

Debug verbosity is controlled via the 'verbose' config item

The pkg.xxxx config items are very special.
If the named package-version is a symlink, then both the
link and the package addresses will be transferred.
The link MUST address another version of the same package.
This is intended to support the 'jats2_current' link.
When a new version of JATS is released, then the new package
will be transferred, as well the new link.

Config items that control a time period allow the following sufixes:
    s - Seconds. Same as no suffix
    m - Minutes
    h - Hours
    d - Days
Multiple are allowed. ie: 1h10h

Config items that control a file size in blocks allow the following suffixes:
    k - Kilobytes (Same as no suffix)
    b - Blocks    (Same as no suffix)
    m - Megabytes
    g - Gigabytes


ToDo
======================
1) Better handling of soft-links for core_devl
   Works, but its prone to error
   There is no test to ensure the link exists. If the link
   is deleted, then it won't be recreated.