FCM Detailed Design > Extract System

Extract System

In this chapter, we shall discuss in detail the design of the extract system. For information of how to use the extract system, please see: FCM System User Guide > The Extract System.

The extract system extracts source directories from different branches of Subversion repositories, combining them with source directories from the local file system to give a source directory tree suitable for feeding into the build system. The system is written in a set of Perl modules. The extract system uses a similar interface to the build system. It shares the same command line interface and many other utilities with the code management system and the build system.

Input and Output

The extract system should provide the following outputs:

The following inputs are required by the extract system:

Components

The extract system uses the following commands, modules and tools:

Name Category Description
fcm Perl executable Top level command line interface of the FCM system.
Fcm::CfgFile Perl module A class for reading from and writing to configuration files.
Fcm::Config Perl module A class that contains the configuration settings shared by all FCM components.
Fcm::Extract Perl module Main class that controls the running of the extract system.
Fcm::ReposBranch Perl module A class that stores and processes information of a repository branch.
Fcm::SrcDirLayer Perl module A class that stores and processes information of a "layer" in the extraction sequence of a source directory.
Fcm::Util Perl module A collection of utilities shared by all FCM components.
svn Subversion client The following sub-commands are used: "info", "list", "export" and "cat".
ksh Unix shell The following shell commands are used: "cp", "rm" and "mkdir".
rdist Unix utility A remote distribution tool for mirror the extracted source directory to a remote host.
rsync Unix utility A remote synchronisation tool for mirror the extracted source directory to a remote host.
remsh Unix command A command to invoke a shell on a remote host.

Task

To do its job, the extract system executes the following tasks in order:

The extract configuration

When we invoke the FCM command, it creates a new instance of Fcm::Config, which reads, processes and stores information from the central and user configuration file. Configuration settings in Fcm::Config are then accessible by all other modules used by the extract system.

When we invoke the extract command, it creates a new instance of Fcm::Extract, which automatically creates a new instance of the Fcm::CfgFile. If an argument is specified in the command line, the argument is used as the "basis". Otherwise, the current working directory is taken as the basis. If the basis is a directory, Fcm::CfgFile will attempt to locate a file called "ext.cfg" under this directory. If such a file is not found, it will attempt to locate it under "cfg/ext.cfg". If the basis is a regular file, the file itself is used.

Once a file is located, Fcm::CfgFile will attempt to parse it. This is done by reading and processing each line of the configuration file into separate label, value and comment fields. If an INC declaration is encountered, a new instance of Fcm::CfgFile is created to read the included file as specified. The included lines are then added to the current array. Each line is then pushed into an array that can be fetched as a property of Fcm::CfgFile. Internally, each line is recorded as a reference to a hash table with the following keys:

The information given by each line is "deciphered" by Fcm::Extract. The information is processed in the following ways:

If a full extraction is required, Fcm::Extract will attempt to remove any sub-directories created by previous extractions in the same location. Destination directories are (re-)created as they are required.

For each repository branch, if the REPOS declaration is a file system path, the VERSION declaration will be set automatically to the word "USER". If the REPOS declaration matches a FCM URL keyword pattern, it is expanded to the full URL. If REPOS is not in the local file system and the VERSION declaration is not a number, the system will attempt to convert the keyword back to a number. If the keyword is "HEAD", the system will use "svn info" to determine the revision number. Otherwise, it will attempt to match the keyword with a pre-defined FCM revision keyword. If there are any expanded source directory (EXPSRC) declarations, the system will use "svn ls -R" to search recursively for all normal source directories containing regular files. These directories are then added to the "dir" property of the Fcm::ReposBranch instance.

The extraction sequence

In the next step, the extract system converts the information given in the list of repository branches into a list of source directory sub-package. Each source directory sub-package will have a destination and a "stack" of extraction sequence. The sequence is basically a list for locating the source directories in the repository branches. The order of the sequence is based on the order in which a repository branch is declared. The logic has already been discussed in the user guide.

The sequence is implemented by a list of Fcm::SrcDirLayer instances. For each Fcm::SrcDirLayer instance in an extraction sequence of a source directory, the system will attempt to find out its "last commit" revision, using the "svn info" command on the particular revision on the given URL. This information is normally cached in a file called ".config" in the cache sub-directory of the extraction destination root. For an incremental extraction, the system will consult the cache to obtain the list of "last commit" revisions for the source directories, instead of having to go through a large number of "svn info" commands again. The cache file is read/written using a temporary instance of Fcm::CfgFile. The label in each line consists of the package name of the sub-package, its URL and a revision number. The corresponding value is the "last commit" revision at the given revision number.

The extraction

With the extraction sequence in place for each source directory, the extraction itself can now take place. There are two steps in this process.

For each "layer" in the extraction sequence of each source directory, if the "layer" contains a repository URL, the system extracts from that URL the source directory and place the resulting source files in a cache. From the cache sub-directory of the destination root, the cache for each source directory is placed under a relative path that reflects the sub-package name of the source directory. Underneath this path is a list of directories with names reflecting the name of the branch and the "last commit" revision, (separated by double underscore "__"). These are where the cache of the source files for the "layers" of the source directory are placed.

It is also worth noting that source files from the local file system are not cached. They will be taken directly from their locations.

Once we have the cached "layers" (or branches) of the source directories, the system will select the source files from the correct cache before updating the destinations. The logic of which has already been discussed in the user guide.

At the end of this stage, we should have a directory tree in the destination source sub-directory, with the relative paths to the extracted files reflecting the sub-package names of those files.

The extract/build configuration generator

If extraction completes without any error, the system will attempt to write an expanded extract configuration file, where all revision keywords are expanded into numbers, and all source directory packages are declared. Subsequent dependent extractions will be able to re-use this configuration without having to invoke the Subversion client for repository and revision information.

The system will also attempt to produce a build configuration file for feeding to the build system. The following "conversions" are performed:

The mirror interface

The system uses "rdist" or "rsync" to mirror the extracted source code and the generated configuration files to a remote machine.