Condor has a host of features for managing how to stage files for a process, however visualizing where everything lands isn't immediately obvious.
Submitting A Job without a shared filesystem is the official reference, and has multiple examples to show how to specify to input and output paths.
This is the minimal condor script that sets up a job that uses condors file transfering features, a bit below is the example python script I was using to understand how the file transfer system works.
universe=vanilla output=output error=output log=running.log executable=file_example.py # this is what turns this feature on should_transfer_files=yes # you can specify if you want to transfer when the program finishes successful, # or even if it only partially runs (ON_EXIT_OR_EVICT) WhenToTransferOutput=on_exit # You need to list what input files you want to send # condor will copy the output transfer_input_files=example.condor # you can limit which files to copy out if you want # transfer_output_files=output.file arguments="--input test.file --output output.file" queue
The following is a small script that simulates an actual job, it reads some input, makes a user specified output file, its own output file, and a temporary file. In addition its quite chatty and prints what its doing to standard output.
All its going to do is upper case the input file. If you're going to use these files for your own testing, name the python file "file_example.py"
!/opt/local/bin/python2.7 import optparse import os from pprint import pprint import sys import tempfile import time def main(cmdline=None): parser = optparse.OptionParser() parser.add_option('--input', default=None) parser.add_option('--output', default=None) opts, args = parser.parse_args(cmdline) print "starting in", os.getcwd() # dump the environment pprint(sorted(os.environ.items())) time.sleep(3) process_data(opts) make_a_file() make_tempfile() print "stopping" def process_data(opts): """Read a small datafile, uppercase it, and write it out """ if opts.input is not None: print "Reading", os.path.abspath(opts.input) instream = open(opts.input,'r') contents = instream.read() instream.close() contents = contents.upper() if opts.output is not None: print "Writing", os.path.abspath(opts.output) outstream = open(opts.output, 'w') outstream.write(contents) outstream.close() else: sys.stdout.write(contents) def make_tempfile(): """Create a temporary file that is deleted on exit""" stream = tempfile.NamedTemporaryFile() stream.write("I'm a temp file") print "Created a tempfile:", os.path.abspath(stream.name) stream.close() def make_a_file(): """Create a file that is not deleted on exit""" filename = os.path.abspath("condor_ancestors.txt") print "Writing to", os.path.abspath(filename) stream = open(filename, 'a') for var in os.environ.keys(): if var.startswith("_CONDOR_ANCESTOR"): stream.write("{0}={1}".format(var, os.environ[var])) stream.close() if __name__ == "__main__": main()
How it works
When condor launches a job, it creates a scratch directory on the execute host, usually named something like /var/lib/condor/hostname/execute/dir_$(pid) that is your jobs default working directory if you just open a file without providing a path, that's where it'll look unless you do something to change your working directory.
transfer_input_files will copy the listed files into that directory, in our environment anything that's left in that directory at the end of a job running will be copied back out.
(Look at the condor docs for a description of how to specify paths for where you want your input files to come from and where your output files should go to relative to your condor_submit file)
Example
The reason for my example script was I wanted to see what environment variables were set by condor, and what directory temporary files would end up in.
Condor sets all the various ways of specifying a tempory directory, so python (and most C programs) should happily create files in condors suggested directory.
In my example I had python print the full name for were the script running under condor thought its files were. And without the transfer_output_files option, condor_ancestor.txt and output.file both end up being copied back to where the condor_submit script was executed.
If you uncomment the example transfer_output_files, the ancestor file that was created wont be transfered. Also since several of python's tempfile functions delete the tempfile when the process exits, that file is never transfered back.
Directory before running:
$ ls simple.py test.condor test.file
Directory after a run
$ ls condor_ancestors.txt output.file simple.py test.file output.0.output running.log test.condor
Output of a run. (output.0.output)
starting in /usr/local/condor/local.dhcp-34-148/execute/dir_85041 [('TEMP', '/usr/local/condor/local.dhcp-34-148/execute/dir_85041'), ('TMP', '/usr/local/condor/local.dhcp-34-148/execute/dir_85041'), ('TMPDIR', '/usr/local/condor/local.dhcp-34-148/execute/dir_85041'), ('_CONDOR_ANCESTOR_79794', '79798:1332184343:2093136950'), ('_CONDOR_ANCESTOR_79798', '85041:1332281852:999993202'), ('_CONDOR_ANCESTOR_85041', '85045:1332281853:3386081725'), ('_CONDOR_JOB_AD', '/usr/local/condor/local.dhcp-34-148/execute/dir_85041/.job.ad'), ('_CONDOR_JOB_IWD', '/usr/local/condor/local.dhcp-34-148/execute/dir_85041'), ('_CONDOR_JOB_PIDS', ''), ('_CONDOR_MACHINE_AD', '/usr/local/condor/local.dhcp-34-148/execute/dir_85041/.machine.ad'), ('_CONDOR_SCRATCH_DIR', '/usr/local/condor/local.dhcp-34-148/execute/dir_85041'), ('_CONDOR_SLOT', '1'), ('__CF_USER_TEXT_ENCODING', '0x1F5:0:0')] Reading /usr/local/condor/local.dhcp-34-148/execute/dir_85041/test.file Writing /usr/local/condor/local.dhcp-34-148/execute/dir_85041/output.file Writing to /usr/local/condor/local.dhcp-34-148/execute/dir_85041/condor_ancestors.txt Created a tempfile: /usr/local/condor/local.dhcp-34-148/execute/dir_85041/tmpl2981S stopping
If should_transfer_files is undefined, you still may get the scratch directory for temporary files, but the processes working directory is the the initialdir
[('TEMP', '/usr/local/condor/local.dhcp-34-148/execute/dir_85071'), ('TMP', '/usr/local/condor/local.dhcp-34-148/execute/dir_85071'), ('TMPDIR', '/usr/local/condor/local.dhcp-34-148/execute/dir_85071'), ('_CONDOR_ANCESTOR_79794', '79798:1332184343:2093136950'), ('_CONDOR_ANCESTOR_79798', '85071:1332281894:999993203'), ('_CONDOR_ANCESTOR_85071', '85072:1332281894:819081569'), ('_CONDOR_JOB_AD', '/usr/local/condor/local.dhcp-34-148/execute/dir_85071/.job.ad'), ('_CONDOR_JOB_IWD', '/Users/diane/tmp/example'), ('_CONDOR_JOB_PIDS', ''), ('_CONDOR_MACHINE_AD', '/usr/local/condor/local.dhcp-34-148/execute/dir_85071/.machine.ad'), ('_CONDOR_SCRATCH_DIR', '/usr/local/condor/local.dhcp-34-148/execute/dir_85071'), ('_CONDOR_SLOT', '1'), ('__CF_USER_TEXT_ENCODING', '0x1F5:0:0')] Reading /Users/diane/tmp/example/test.file Writing /Users/diane/tmp/example/output.file Writing to /Users/diane/tmp/example/condor_ancestors.txt Created a tempfile: /usr/local/condor/local.dhcp-34-148/execute/dir_85071/tmphrwVKe stopping