BlueOx: A Java Framework for Distributed Data Analysis
Abstract
High energy physics experiments including those at the Tevatron and the upcoming LHC require analysis of large data sets which are best handled by distributed computation. We present the design and development of a distributed data analysis framework based on Java. Analysis jobs run through three phases: discovery of data sets available, brokering/assignment of data sets to analysis servers, and job execution. Each phase is represented by a set of abstract interfaces. These interfaces allow different techniques to be used without modification to the framework. For example, the communications interface has been implemented by both a packet protocol and a SOAP-based scheme. User authentication can be provided either through simple passwords or through a GSI certificates system. Data from CMS HCAL Testbeams, the L3 LEP experiment, and a hypothetical high-energy linear collider experiment have been interfaced with the framework.