Extracting files from SAM
by Greg Landsberg - Last update on 22-Nov-2005
Introduction
Here are instructions on how to extract (stage) files from SAM. They are applicable to data, MC, or any generic SAM query. In order to extract files you will need to know the following:
- The name of the SAM project, which was used to create particular dataset. This may be either your private query, or a standard query from the Common Sample Group. For example, to extract CAF root files corresponding to a particular MC run corresponding to the fixed p17.08.01 data, use CSG_CAF-MCv1-XXXXX as the project name, where XXXXX is the MC request ID. In order to stage files of a particular CAF skim fixed with reco version p17.07.00, use CSG_CAF_SKIM_v3, where SKIM is the name of the skim, e.g. 2EMhighpt.
- Working directory to which you would like to copy staged files. Note that on some systems (e.g., clued0) SAM cache is distributed locally among various machines in the cluster, so it is not seen cluster-wide. Thus you are best off by copying files over to your work directory as soon as they are staged on a particular local node and put in the particular local cache. Make sure that there is enough free space in the working directory. If you omit the name of the directory, files will be staged but not copied.
- Maximum number of files you want to stage. If the skim you are working is very large and you only need the first few files, this is a good option to reduce the output and processing time. If this parameter not set, a maximum of 1000 files will be staged.
Usage
The script is located on clued0 cluster, in the ~gll/sam/ directory: sam_query. You can run it from your own work space. The script submits a sam batch job; upon its completion your files will be copied to the working directory. You will have to do manual clean-up of the SAM batch job log files in your home directory and getroot_*.py* files in your working directory (or the current directory if no working directory was given), once the SAM job is finished.
The syntax is as follows:
% csh ~gll/sam/sam_query project_name [working_directory] [max_files]
Limitations
The script should be portable to other clusters, with appropriate changes of the script file locations in the script code. The script uses two auxilary files,
_getroot.py
and
__getroot.py
located in the same directory with the script. You may have to set environmental variable
SAM_STATION
on your local cluster, if it's not set by the
setup sam
command.