Rev 6320 | Blame | Compare with Previous | Last modification | View Log | RSS feed
Notes on the blat package transfer system-----------------------------------------Reason for its creation-----------------------Need to transfer packages from dpkg_archive to remote sites in a timely manner.Rsync was considered but it has several problems:1) Does not handle symlinks in a suitable manner2) Works with all the files in the repository. Experience hasshown that this can be very slow3) Still requires significant scripting in order to be usefulBlat can make several assumptions about the package system.Blat will:Support multiple transfer target destinationsAllow for rapid detection of new packages that need to be transferredAllow for multiple Releases to be synchronizedAllow for all (not-closed) releases in a Project to be synchronizedEasily configured - and can be configured on the flyAtomically transfer packagesTransfer a PackageList for future cleanup operationsLogging and debug facilitiesOverview of Blat---------------There are three main components in BlatDaemon supervisorResponsible for start and restarting configured daemonsTransfer DaemonsResponsible for the package sync operations for one targetMultiple Daemons ( targets ) are supportedMultiple Daemon types are supporteddpkg_archive sync (original)s3Sync (AWS S3 bucket sync for CI/CD)On Target utilitiesA set of scripts that support BlatThese are transferred to the target machine.Each Blat Daemon performs three main operations1) Fast package transfer2) Repository synchronization3) PackageList creation4) Package aging (Optional)Each Blat target can perform the following:1) Package aging2) dpkg_archive content indexingFast package transfer===============================This is mechanism whereby Blat will detect the need to transfer a newly builtpackage to the target system.It works by monitoring a directory of tags. It is the responsibility of ReleaseManager to populate the directory.The responsiveness of the detection can be configured, but a period of 5seconds is suggested.Repository synchronization===============================The daemon will request a list of packages that are present on the target anddetermine the list of packages that should be on the target. Discrepancies willbe transferred to the target. Excess packages are left on the target.Blat will request the target to create and transfer a list of packages.This is done by invoking a small program on the target to perform the work.Blat will interrogate the Release Manager database for Releases to be processedand packages in those Releases.A package will be transferred to the target if:* The package is required, but not present on the target* The time-stamps of the descpkg files differPackage transfer may be delayed if the source package is writable, unless ithas been writable for longer than a configured time period.The frequency of the Repository synchronization can be configured. A time ofseveral hours is suggested.PackageList creation===============================Blat will create and send to the target a list of package-version that arein the current set. This list may be used to clean out the package archive,but this functionality has not yet been implemented.Package aging=============Blat can be configured to delete packages that are no longer a part of thecurrent package-version set. There are 4 methods:1) NonePackages will never be deleted by Blat on the target.The target file system will need to be managed to prevent it filling up.2) ImmediatePackages will be deleted as soon as they are not a part of the currentpackage-version set.3) Aged by blat masterPackages will be marked for deletion and the blat master will deletethe packages after a configured number of days.4) Aged by blat targetPackages will be marked for deletion and the blat target will deletethe packages after a configured number of days. This operation requiresthat a cron job be configured on the target machine.dpkg_archive content indexing=============================Blat provides a utility that can be run by the transfer target, as a cron job,that will maintain a list of files and folders in the package archive.This list greatly simplifies the process of locating a file in the archive.The user simply greps the package list, rather than search the directory tree.The file list is in a file .../dpkg_archive/.dpkg_archive/dpkg_archive_list.txtS3 Bucket Delivery===============================Blat has been extended to provide CI/CD support via an S3 bucketThe s3Sync task will maintain a single S3 bucket with ZIP files ofpackages from Releases that support S3SyncHost System Requirements========================1) UnixIt has been designed for a Unix environment - not Windows2) PerlBlat is written in Perl3) JavaRequired for the Database interface4) ShellStart and stop scripts are in shell5) Utilitiessshgtargzipaws cli (for s3Sync)Target System Requirements - dpkg_archive sync========================1) UnixIt has been designed for a Unix environment - not Windows2) Perl3) ShellBlat will execute a number of scripts on the target in orderto control the process. These are in Shell and Perl4) Utilitiessshgtargunzip5) User with write access to the dpkg_archive - (pkgadmin)6) Link for the users home directory to the package archiveThis link is called 'dpkg_archive'Shared requirements===================Blat uses ssh for the transfer process. It uses an 'identity' file to allowpasswordless authentication with the target. The public part of the identifyfile must be appended to the target users .ssh/authorized_keys file.The private part of the identity file is held by the Blat Daemon.Design assumptions================================================================================Blat is designed to transfer dpkg_archive packages in one direction.Blat makes assumptions on the structure of a package- They contain a descpkg file- They are read-only when fully released- The contents of packages does not change- It is not necessary to check every file in the packageThe Blat master is designed to run in a single directory tree.The config file should be in a 'config' directory under the locationof the blat master program.Installation :: Target System=============================1) Create or acquire a user that has write access to the package archive2) Create or acquire a passwordless identity file and associated public keyof the identity file. One set is available in the 'ssh' subdirectory.Append the public part of the identity file (id_rsa_pkg_admin.pub) to~/.ssh/authorized_keysI suggest using 'ssh-copy-id'.3) Create a link from the users home directory to dpkg_archiveThe must be called dpkg_archive4) Transfer the blat receiver scripts to a directory accessible to thetransfer user. ie: ~/binThe required receiver files are:get_plist.plreceive_filereceive_packagedelete_packagepkg_mon.plpkg_purge.plEnsure the programs are executable by the transfer user.Only get_plist.pl is really needed. The others will be transferredwhen detected missing.5) Set up cron jobs (optional)Will be used to maintain package informationSuggest crontab entry - may vary for each installation0 3 * * * /home/pkgadmin/bin/pkg_mon.pl0 6 * * 1 /home/pkgadmin/bin/pkg_purge.plInstallation :: Host System=============================This section really deals with the configuration of a new target.1) Create a new config file in Blat's config directory - with a .confsuffix. This is best done by cloning an existing entry.Note: The blat master will automatically spawn a daemon as soonas a new config file is seen. Its best to create the file elsewhereand copy it to the directory when ready.Note: The Blat daemon will detect changes to its own config file andre-read it on the fly.Useful Tricks=============kill -usr1 pid-of-daemonWill force the daemon to perform a repository sync check.kill -hup pid-of-daemonWill force the daemon to roll its own log fileskill pid-of-daemonWill force the daemon to exit. It will be restarted.Remove the daemons pid fileWill force the daemon to exit. It will be restarted.Useful for debugging on a live systemkill -usr1 pid_of_masterWill signal -usr1 to all daemonsWill force all daemons to perform a repository sync check.kill -hup pid_of_masterWill signal -hup to all daemonsWill force all daemon to roll their own log fileskill pid_of_masterWill shut down system gracefully by sending kill to allchildren.ssh-to <name or ip address>Will ssh to the target machine as the pkgadmin userssh-copy-id -i ssh/id_rsa_pkg_admin pkgadmin@<name or ip address>Will copy the ssh identity file to the target machineYou will need the password of the 'pkgadmin' user as configured on the target machineDebug verbosity is controlled via the 'verbose' config itemThe pkg.xxxx config items are very special.If the named package-version is a symlink, then both thelink and the package addresses will be transferred.The link MUST address another version of the same package.This is intended to support the 'jats2_current' link.When a new version of JATS is released, then the new packagewill be transferred, as well the new link.Config items that control a time period allow the following sufixes:s - Seconds. Same as no suffixm - Minutesh - Hoursd - DaysMultiple are allowed. ie: 1h10hConfig items that control a file size in blocks allow the following suffixes:k - Kilobytes (Same as no suffix)b - Blocks (Same as no suffix)m - Megabytesg - GigabytesToDo======================1) Better handling of soft-links for core_devlWorks, but its prone to errorThere is no test to ensure the link exists. If the linkis deleted, then it won't be recreated.