Notes on the blat package transfer system ----------------------------------------- Reason for its creation ----------------------- Need to transfer packages from dpkg_archive to remote sites in a timely manner. Rsync was considered but it has several problems: 1) Does not handle symlinks in a suitable manner 2) Works with all the files in the repository. Experience has shown that this can be very slow 3) Still requires significant scripting in order to be useful Blat can make several assumptions about the package system. Blat will: Support multiple transfer target destinations Allow for rapid detection of new packages that need to be transferred Allow for multiple Releases to be synchronized Allow for all (not-closed) releases in a Project to be synchronized Easily configured - and can be configured on the fly Atomically transfer packages Transfer a PackageList for future cleanup operations Logging and debug facilities Overview of Blat --------------- There are three main components in Blat Daemon supervisor Responsible for start and restarting configured daemons Transfer Daemons Responsible for the package sync operations for one target Multiple Daemons ( targets ) are supported Multiple Daemon types are supported dpkg_archive sync (original) s3Sync (AWS S3 bucket sync for CI/CD) On Target utilities A set of scripts that support Blat These are transferred to the target machine. Each Blat Daemon performs three main operations 1) Fast package transfer 2) Repository synchronization 3) PackageList creation 4) Package aging (Optional) Each Blat target can perform the following: 1) Package aging 2) dpkg_archive content indexing Fast package transfer =============================== This is mechanism whereby Blat will detect the need to transfer a newly built package to the target system. It works by monitoring a directory of tags. It is the responsibility of Release Manager to populate the directory. The responsiveness of the detection can be configured, but a period of 5 seconds is suggested. Repository synchronization =============================== The daemon will request a list of packages that are present on the target and determine the list of packages that should be on the target. Discrepancies will be transferred to the target. Excess packages are left on the target. Blat will request the target to create and transfer a list of packages. This is done by invoking a small program on the target to perform the work. Blat will interrogate the Release Manager database for Releases to be processed and packages in those Releases. A package will be transferred to the target if: * The package is required, but not present on the target * The time-stamps of the descpkg files differ Package transfer may be delayed if the source package is writable, unless it has been writable for longer than a configured time period. The frequency of the Repository synchronization can be configured. A time of several hours is suggested. PackageList creation =============================== Blat will create and send to the target a list of package-version that are in the current set. This list may be used to clean out the package archive, but this functionality has not yet been implemented. Package aging ============= Blat can be configured to delete packages that are no longer a part of the current package-version set. There are 4 methods: 1) None Packages will never be deleted by Blat on the target. The target file system will need to be managed to prevent it filling up. 2) Immediate Packages will be deleted as soon as they are not a part of the current package-version set. 3) Aged by blat master Packages will be marked for deletion and the blat master will delete the packages after a configured number of days. 4) Aged by blat target Packages will be marked for deletion and the blat target will delete the packages after a configured number of days. This operation requires that a cron job be configured on the target machine. dpkg_archive content indexing ============================= Blat provides a utility that can be run by the transfer target, as a cron job, that will maintain a list of files and folders in the package archive. This list greatly simplifies the process of locating a file in the archive. The user simply greps the package list, rather than search the directory tree. The file list is in a file .../dpkg_archive/.dpkg_archive/dpkg_archive_list.txt S3 Bucket Delivery =============================== Blat has been extended to provide CI/CD support via an S3 bucket The s3Sync task will maintain a single S3 bucket with ZIP files of packages from Releases that support S3Sync Host System Requirements ======================== 1) Unix It has been designed for a Unix environment - not Windows 2) Perl Blat is written in Perl 3) Java Required for the Database interface 4) Shell Start and stop scripts are in shell 5) Utilities ssh gtar gzip aws cli (for s3Sync) Target System Requirements - dpkg_archive sync ======================== 1) Unix It has been designed for a Unix environment - not Windows 2) Perl 3) Shell Blat will execute a number of scripts on the target in order to control the process. These are in Shell and Perl 4) Utilities ssh gtar gunzip 5) User with write access to the dpkg_archive - (pkgadmin) 6) Link for the users home directory to the package archive This link is called 'dpkg_archive' Shared requirements =================== Blat uses ssh for the transfer process. It uses an 'identity' file to allow passwordless authentication with the target. The public part of the identify file must be appended to the target users .ssh/authorized_keys file. The private part of the identity file is held by the Blat Daemon. Design assumptions ================================================================================ Blat is designed to transfer dpkg_archive packages in one direction. Blat makes assumptions on the structure of a package - They contain a descpkg file - They are read-only when fully released - The contents of packages does not change - It is not necessary to check every file in the package The Blat master is designed to run in a single directory tree. The config file should be in a 'config' directory under the location of the blat master program. Installation :: Target System ============================= 1) Create or acquire a user that has write access to the package archive 2) Create or acquire a passwordless identity file and associated public key of the identity file. One set is available in the 'ssh' subdirectory. Append the public part of the identity file (id_rsa_pkg_admin.pub) to ~/.ssh/authorized_keys I suggest using 'ssh-copy-id'. 3) Create a link from the users home directory to dpkg_archive The must be called dpkg_archive 4) Transfer the blat receiver scripts to a directory accessible to the transfer user. ie: ~/bin The required receiver files are: get_plist.pl receive_file receive_package delete_package pkg_mon.pl pkg_purge.pl Ensure the programs are executable by the transfer user. Only get_plist.pl is really needed. The others will be transferred when detected missing. 5) Set up cron jobs (optional) Will be used to maintain package information Suggest crontab entry - may vary for each installation 0 3 * * * /home/pkgadmin/bin/pkg_mon.pl 0 6 * * 1 /home/pkgadmin/bin/pkg_purge.pl Installation :: Host System ============================= This section really deals with the configuration of a new target. 1) Create a new config file in Blat's config directory - with a .conf suffix. This is best done by cloning an existing entry. Note: The blat master will automatically spawn a daemon as soon as a new config file is seen. Its best to create the file elsewhere and copy it to the directory when ready. Note: The Blat daemon will detect changes to its own config file and re-read it on the fly. Useful Tricks ============= kill -usr1 pid-of-daemon Will force the daemon to perform a repository sync check. kill -hup pid-of-daemon Will force the daemon to roll its own log files kill pid-of-daemon Will force the daemon to exit. It will be restarted. Remove the daemons pid file Will force the daemon to exit. It will be restarted. Useful for debugging on a live system kill -usr1 pid_of_master Will signal -usr1 to all daemons Will force all daemons to perform a repository sync check. kill -hup pid_of_master Will signal -hup to all daemons Will force all daemon to roll their own log files kill pid_of_master Will shut down system gracefully by sending kill to all children. ssh-to Will ssh to the target machine as the pkgadmin user ssh-copy-id -i ssh/id_rsa_pkg_admin pkgadmin@ Will copy the ssh identity file to the target machine You will need the password of the 'pkgadmin' user as configured on the target machine Debug verbosity is controlled via the 'verbose' config item The pkg.xxxx config items are very special. If the named package-version is a symlink, then both the link and the package addresses will be transferred. The link MUST address another version of the same package. This is intended to support the 'jats2_current' link. When a new version of JATS is released, then the new package will be transferred, as well the new link. Config items that control a time period allow the following sufixes: s - Seconds. Same as no suffix m - Minutes h - Hours d - Days Multiple are allowed. ie: 1h10h Config items that control a file size in blocks allow the following suffixes: k - Kilobytes (Same as no suffix) b - Blocks (Same as no suffix) m - Megabytes g - Gigabytes ToDo ====================== 1) Better handling of soft-links for core_devl Works, but its prone to error There is no test to ensure the link exists. If the link is deleted, then it won't be recreated.