PRINTSCRIPT; print $script_style; include "/var/www/html/core/partc"; $linkpage = <<< PRINTLINK gfdl homepage > people > cobweb homepage > people > v. balaji homepage > this page PRINTLINK; print $linkpage; // GFDL header include "/var/www/html/core/partd"; $titlepage = <<< TITLEPAGE hsmget and hsmput TITLEPAGE; print $titlepage; // GFDL header include_once( '/var/lib/php/counter.inc' ); error_reporting(E_ERROR); require_once('../magpierss/rss_fetch.inc'); require_once('../magpierss/rss_utils.inc'); include "/var/www/html/core/parte"; $pagecontent = <<< ENDCONTENT

FRE pages
Starting point
FRE: a complete redesign
FRE in components
FRE platform abstraction
FRE framework abstraction
refrepp: detect and correct frepp failures
hsmget and hsmput: transfer to and from deep storage

What are `hsmget` and `hsmput`?

As part of the FRE Overhaul we described a three-level storage model that allow data coming from archive to be cached in an area called ptmp (long-term shared scratch) when being copied to vftmp (fast scratch) at runtime.

hsmget and hsmput are scripts that achieve this. Scripts are loosely based on Tim Yeager's proposal.

What `hsmget` and `hsmput` do

hsmget and hsmput assume three levels of storage:

an archive, possibly remote, slow connection, linear media, permanent storage. This is /archive on the HPCS.
a long-term shared scratch, accessible across a cluster of supercomputing nodes such as the HPCS. Here we have two filesystems /ptmp and /work that serve this purpose. You could also use /vftmp/\$user as a long-term scratch between jobs on the same node. Data in long-term scratch is not guaranteed to stay forever: anything requiring permanent storage must be archived.
a fast scratch, not guaranteed continuity beyond a single batch job. This is the \$TMPDIR created on /vftmp for a qlogin/qsub session.

On each of these we define root directories, below which the directory tree is identical for the files being retrieved. That is to say, given variables called \$ARCHROOT (e.g "/archive/\$USER") , \$PTMPROOT (e.g "/ptmp/\$USER") and \$WORKROOT (e.g "\$TMPDIR"), a request to retrieve \$WORKROOT/foo/bar will look for a newer file called \$PTMPROOT/foo/bar, which in turn will look for a newer file called \$ARCHROOT/foo/bar.

The remote file might also be in a tar or cpio container, i.e \$ARCHROOT/foo.cpio which contains a file called bar. This is done to reduce the number of individual files in archive.

Note that transfers are only initiated when the source file is newer: the underlying code is actually a Makefile.

`hsmget` Syntax:

hsmget retrieves files from remote storage.

Usage: /home/vb/fre/bin/hsmget [options] file [file...]
  Options:
      -a|--archroot <dir> anchor point on remote storage
      -p|--ptmproot <dir> anchor point on long-term scratch
      -w|--workroot <dir> anchor point on local fast scratch
      -c|--checksum       run checksums on transfers
      -s|--sum <file>     external file of prior checksums (if known)
      -t|--time           turn on timers
      -f|--force          force transfer even if local file up-to-date
      -e|--extra          extra copy of target file saved in \\$workroot
      -n|--nocopy         dry run, no actual data transfer
      -m|--makefile       makefile that this invokes
      -q|--quiet          minimal output.
      -v|--verbose        verbose output.
      -h|--help           print help message.
  Arguments must be files.
  (No recursive gets because of the danger of huge retrievals)
  Files are specified by listing the target path relative to workroot.
  Container directory may be a tar/cpio archive on remote storage.

`hsmput` syntax

hsmput puts files to remote storage.

Usage: /home/vb/fre/bin/hsmput [options] path [path...]
  Options:
      -a|--archroot <dir> anchor point on remote storage
      -p|--ptmproot <dir> anchor point on long-term scratch
      -w|--workroot <dir> anchor point on local fast scratch
      -s|--store          remote store type (cpio, tar or directory)
      -c|--checksum       run checksums on transfers
      -t|--time           turn on timers
      -f|--force          force transfer even if local file up-to-date
      -n|--nocopy         dry run, no actual data transfer
      -m|--makefile       makefile that this invokes
      -q|--quiet          minimal output.
      -v|--verbose        verbose output.
      -h|--help           print help message.
  Arguments must be files or directories.
  Container directory may be a tar/cpio archive on remote storage.

Notes

Note that hsmget arguments must be files. This is to avoid the risk of excessive transfers, filling disks and so on, by inadvertently retrieving large file trees. However you can hsmput an entire directory... this is less risky.
hsmput only puts data to ptmp unless you explicitly request storing to archive using hsmput -s cpio or equivalent.
To repeat, the underlying code uses make: transfers in either direction are only initiated when the target is out of date.
The idea is that this can be generalized to having the archive and compute not being collocated. i.e this can easily be generalized to a situation where the archive is remote and compute local; or vice versa

Where to get it

<example> cvs co -r testing_fre hsm </example>

ENDCONTENT; print $pagecontent; $url = 'http://cobweb.gfdl.noaa.gov/~vb/weblogs/FRENews.rdf'; $rss = fetch_rss($url); if( $rss ) { echo "\n"; foreach ($rss->items as $item) { $href = $item['link']; $title = $item['title']; if ( preg_match( "/\b$subj\b/i", $title ) ) { echo "\n"; } } } $subj = 'FRE'; $url = 'http://www.gfdl.noaa.gov/~vb/weblogs/journal2009.rdf'; $rss = fetch_rss($url); if( $rss ) { echo "\n"; foreach ($rss->items as $item) { $href = $item['link']; $title = $item['title']; if ( preg_match( "/\b$subj\b/i", $title ) ) { echo "\n"; } } } $url = 'http://www.gfdl.noaa.gov/~vb/weblogs/journal2008.rdf'; $rss = fetch_rss($url); if( $rss ) { echo "\n"; foreach ($rss->items as $item) { $href = $item['link']; $title = $item['title']; if ( preg_match( "/\b$subj\b/i", $title ) ) { echo "\n"; } } } $url = 'http://www.gfdl.noaa.gov/~vb/weblogs/journal2007.rdf'; $rss = fetch_rss($url); if( $rss ) { echo "\n"; foreach ($rss->items as $item) { $href = $item['link']; $title = $item['title']; if ( preg_match( "/\b$subj\b/i", $title ) ) { echo "\n"; } } } $pagecontent = <<

" . $rss->channel['title'] . "
$title
" . $rss->channel['title'] . " entries on $subj
$title
" . $rss->channel['title'] . " entries on $subj (<2009)
$title
" . $rss->channel['title'] . " entries on $subj (<2008)
$title

created by v. balaji (balaji

princeton.edu) in emacs using the emacs-muse mode.
ENDCONTENT; print $pagecontent; print "last modified: ". date( "d F Y", getlastmod() ); print "
this page visited: ".getCount(). " times "; include "/var/www/html/core/partf"; include "/var/www/html/core/partg";

What are hsmget and hsmput?

What hsmget and hsmput do