|
What are hsmget and hsmput ?
As part of the FRE Overhaul we described a three-level storage model
that allow data coming from archive to be cached in an area called
ptmp (long-term shared scratch) when being copied to vftmp (fast
scratch) at runtime.
hsmget and hsmput are scripts that achieve this. Scripts are loosely
based on Tim Yeager's proposal.
What hsmget and hsmput do
hsmget and hsmput assume three levels of storage:
- an archive, possibly remote, slow connection, linear media,
permanent storage. This is
/archive on the HPCS.
- a long-term shared scratch, accessible across a cluster of
supercomputing nodes such as the HPCS. Here we have two filesystems
/ptmp and /work that serve this purpose. You could also use
/vftmp/\$user as a long-term scratch between jobs on the same node.
Data in long-term scratch is not guaranteed to stay forever:
anything requiring permanent storage must be archived.
- a fast scratch, not guaranteed continuity beyond a single batch
job. This is the
\$TMPDIR created on /vftmp for a qlogin/qsub
session.
On each of these we define root directories, below which the directory
tree is identical for the files being retrieved. That is to say, given
variables called \$ARCHROOT (e.g "/archive/\$USER ") , \$PTMPROOT (e.g
"/ptmp/\$USER ") and \$WORKROOT (e.g "\$TMPDIR "), a request to retrieve
\$WORKROOT/foo/bar will look for a newer file called \$PTMPROOT/foo/bar ,
which in turn will look for a newer file called \$ARCHROOT/foo/bar .
The remote file might also be in a tar or cpio container, i.e
\$ARCHROOT/foo.cpio which contains a file called bar . This is done to
reduce the number of individual files in archive .
Note that transfers are only initiated when the source file is newer:
the underlying code is actually a Makefile.
hsmget Syntax:
hsmget retrieves files from remote storage.
Usage: /home/vb/fre/bin/hsmget [options] file [file...]
Options:
-a|--archroot <dir> anchor point on remote storage
-p|--ptmproot <dir> anchor point on long-term scratch
-w|--workroot <dir> anchor point on local fast scratch
-c|--checksum run checksums on transfers
-s|--sum <file> external file of prior checksums (if known)
-t|--time turn on timers
-f|--force force transfer even if local file up-to-date
-e|--extra extra copy of target file saved in \\$workroot
-n|--nocopy dry run, no actual data transfer
-m|--makefile makefile that this invokes
-q|--quiet minimal output.
-v|--verbose verbose output.
-h|--help print help message.
Arguments must be files.
(No recursive gets because of the danger of huge retrievals)
Files are specified by listing the target path relative to workroot.
Container directory may be a tar/cpio archive on remote storage.
hsmput syntax
hsmput puts files to remote storage.
Usage: /home/vb/fre/bin/hsmput [options] path [path...]
Options:
-a|--archroot <dir> anchor point on remote storage
-p|--ptmproot <dir> anchor point on long-term scratch
-w|--workroot <dir> anchor point on local fast scratch
-s|--store remote store type (cpio, tar or directory)
-c|--checksum run checksums on transfers
-t|--time turn on timers
-f|--force force transfer even if local file up-to-date
-n|--nocopy dry run, no actual data transfer
-m|--makefile makefile that this invokes
-q|--quiet minimal output.
-v|--verbose verbose output.
-h|--help print help message.
Arguments must be files or directories.
Container directory may be a tar/cpio archive on remote storage.
Notes
- Note that
hsmget arguments must be files. This is to avoid the risk
of excessive transfers, filling disks and so on, by inadvertently
retrieving large file trees. However you can hsmput an entire
directory... this is less risky.
hsmput only puts data to ptmp unless you explicitly request storing
to archive using hsmput -s cpio or equivalent.
- To repeat, the underlying code uses
make : transfers in either
direction are only initiated when the target is out of date.
- The idea is that this can be generalized to having the archive and
compute not being collocated. i.e this can easily be generalized to
a situation where the archive is remote and compute local; or vice versa
Where to get it
<example>
cvs co -r testing_fre hsm
</example>
|
ENDCONTENT;
print $pagecontent;
$url = 'http://cobweb.gfdl.noaa.gov/~vb/weblogs/FRENews.rdf';
$rss = fetch_rss($url);
if( $rss ) {
echo "" . $rss->channel['title'] . " | \n";
foreach ($rss->items as $item) {
$href = $item['link'];
$title = $item['title'];
if ( preg_match( "/\b$subj\b/i", $title ) ) {
echo "$title | \n";
}
}
}
$subj = 'FRE';
$url = 'http://www.gfdl.noaa.gov/~vb/weblogs/journal2009.rdf';
$rss = fetch_rss($url);
if( $rss ) {
echo "" . $rss->channel['title'] .
" entries on $subj | \n";
foreach ($rss->items as $item) {
$href = $item['link'];
$title = $item['title'];
if ( preg_match( "/\b$subj\b/i", $title ) ) {
echo "$title | \n";
}
}
}
$url = 'http://www.gfdl.noaa.gov/~vb/weblogs/journal2008.rdf';
$rss = fetch_rss($url);
if( $rss ) {
echo "" . $rss->channel['title'] .
" entries on $subj (<2009) | \n";
foreach ($rss->items as $item) {
$href = $item['link'];
$title = $item['title'];
if ( preg_match( "/\b$subj\b/i", $title ) ) {
echo "$title | \n";
}
}
}
$url = 'http://www.gfdl.noaa.gov/~vb/weblogs/journal2007.rdf';
$rss = fetch_rss($url);
if( $rss ) {
echo "" . $rss->channel['title'] .
" entries on $subj (<2008) | \n";
foreach ($rss->items as $item) {
$href = $item['link'];
$title = $item['title'];
if ( preg_match( "/\b$subj\b/i", $title ) ) {
echo "$title | \n";
}
}
}
$pagecontent = <<
created by v. balaji (balaji princeton.edu) in emacs using the emacs-muse
mode.
ENDCONTENT;
print $pagecontent;
print "last modified: ". date( "d F Y", getlastmod() );
print " this page visited: ".getCount(). " times ";
include "/var/www/html/core/partf";
include "/var/www/html/core/partg";
|