José Rufino
Instituto Politécnico Nacional
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José Rufino.
geographic information retrieval | 2005
José Exposto; Joaquim Macedo; António Manuel Silva Pina; Albano Alves; José Rufino
This paper evaluates scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones. The work considers a distributed crawler where the assignment of pages to visit is based on page content geographical scope. For the initial assignment of a page to a partition we use a simple heuristic that marks a page within the same scope of the hosting web server geographical location. During download, if the analyze of a page contents recommends a different geographical scope, the page is forwarded to the well-located web server.A sample of the Portuguese Web pages, extracted during the year 2005, was used to evaluate: a) page download communication times and the b) overhead of pages exchange among servers. Evaluation results permit to compare our approach to conventional hash partitioning strategies.
international conference on parallel processing | 2012
Albano Alves; José Rufino; António Manuel Silva Pina; Luís Paulo Santos
Clusters that combine heterogeneous compute device architectures, coupled with novel programming models, have created a true alternative to traditional (homogeneous) cluster computing, allowing to leverage the performance of parallel applications. In this paper we introduce clOpenCL, a platform that supports the simple deployment and efficient running of OpenCL-based parallel applications that may span several cluster nodes, expanding the original single-node OpenCL model. clOpenCL is deployed through user level services, thus allowing OpenCL applications from different users to share the same cluster nodes and their compute devices. Data exchanges between distributed clOpenCL components rely on Open-MX, a high-performance communication library. We also present extensive experimental data and key conditions that must be addressed when exploiting clOpenCL with real applications.
european conference on parallel processing | 2003
Albano Alves; António Manuel Silva Pina; José Exposto; José Rufino
RoCL is a communication library that aims to exploit the low-level communication facilities of today’s cluster networking hardware and to merge, via the resource oriented paradigm, those facilities and the high-level degree of parallelism achieved on SMP systems through multi-threading.
international parallel and distributed processing symposium | 2004
José Rufino; Albano Alves; José Exposto; António Manuel Silva Pina
Summary form only given. We refine previous work on a model for a distributed hash table (DHT) with support to dynamic balancement across a set of heterogeneous cluster nodes. We present new high-level entities, invariants and algorithms developed to increase the level of parallelism and globally reduce memory utilization. In opposition to a global distribution mechanism, that relies on complete knowledge about the current distribution of the hash table, we adopt a local approach, based on the division of the DHT into separated regions, that possess only partial knowledge of the global hash table. Simulation results confirm the hypothesis that the increasing of parallelism has as counterpart the degradation of the quality of the balancement achieved with the global approach. However, when compared with consistent hashing and our global approach, the same results clarify the relative merits of the extension, showing that, when properly parameterized, the model is still competitive, both in terms of the quality of the distribution and scalability.
parallel processing and applied mathematics | 2005
José Rufino; António Manuel Silva Pina; Albano Alves; José Exposto
This paper presents a high level description of Domus, an architecture for cluster-oriented Distributed Hash Tables. As a data management layer, Domus supports the concurrent execution of multiple and heterogeneous DHTs, that may be simultaneously accessed by different distributed/parallel client applications. At system level, a load balancement mechanism allows for the (re)distribution of each DHT over cluster nodes, based on the monitoring of their resources, including CPUs, memory, storage and network. Two basic units of balancement are supported: vnodes, a coarse-grain unit, and partitions, a fine-grain unit. The design also takes advantage of the strict separation of object lookup and storage, at each cluster node, and for each DHT. Lookup follows a distributed strategy that benefits from the joint analysis of multiple partition-specific routing information, to shorten routing paths. Storage is accomplished through different kinds of data repositories, according to the specificity and requirements of each DHT.
parallel, distributed and network-based processing | 2007
José Rufino; Albano Alves; José Exposto; António Manuel Silva Pina
The Domus architecture for distributed hash tables (DHTs) is specially designed to support the concurrent deployment of multiple and heterogeneous DHTs, in a dynamic shared-all cluster environment. The execution model is compatible with the simultaneous access of several distributed/parallel client applications to the same or different running DHTs. Support to distributed routing and storage is dynamically configurable per node, as a function of applications requirements, node base resources and the overall cluster communication, memory and storage usage. pDomus is a prototype of Domus that creates an environment where to evaluate the model embedded concepts and planned features. In this paper, we present a series of experiments conduced to obtain figures of merit i) for the performance of basic dictionary operations, and ii) for the storage overhead resulting from several storage technologies. We also formulate a ranking formula that takes into account access patterns of clients to DHTs, to objectively select the most adequate storage technology, as a valuable metric for a wide range of application scenarios. Finally, we also evaluate client applications and services scalability, for a select dictionary operation. Results of the overall evaluation are promising and a motivation for further work
symposium on computer architecture and high performance computing | 2004
Albano Alves; António Manuel Silva Pina; José Exposto; José Rufino
The increasing complexity of high-demand long-running applications has faced programmers with the need to take into account both development hardness and execution time, me/spl mu/ provides the flexibility to control the amount of computational and communication power being used in order to maximize resources utilization and to deliver high performance. In this paper we focus on the aspects of the paradigm that go beyond traditional message passing approaches, promoting the idea that by raising the abstraction level of programming models, programmers will make better use of the available resources with clear impact on both productivity and performance. We introduce the resource as the abstraction used to represent and manage both physical resources - nodes, memory, processors and communication technologies - and logical resources - modules, processes, tasks, threads, groups, etc. We also concentrate on the task of specifying, locating and aggregating resources in order to support the mapping of applications into the target cluster hardware and the explicit management of memory hierarchy.
computational science and engineering | 2009
Albano Alves; António Manuel Silva Pina; José Exposto; José Rufino
The effective exploitation of multi-SAN SMP clusters and the use of generic clusters to support complex information systems require new approaches; multi-SAN SMP clusters introduce new levels of parallelism and traditional environments are mainly used to run scientific computations. In this paper we present a novel approach to the exploitation of clusters that allows integrating in a unique metaphor: the representation of physical resources, the modelling of applications and the mapping of application into physical resources. The proposed abstractions favoured the development of an API that allows combining and benefiting from the shared memory, message passing and global memory paradigms.
international conference on computational science | 2003
Albano Alves; António Manuel Silva Pina; José Exposto; José Rufino
In this paper we present ToCL a thread oriented communication library specially designed to fully exploit multithreading in a multi-networked cluster environment. ToCL provides a basic set of primitives to handle zero-copy message passing between application threads spread among cluster nodes. Large messages are fragmented and sent to remote threads as single messages using multiple low-level communication subsystems. The current implementation supports both Myrinet through GM and Gigabit Ethernet through VIA but we plan to extend it to other communication subsystems.
ieee international conference on high performance computing data and analytics | 2002
José Rufino; António Manuel Silva Pina; Albano Alves; José Exposto
In this paper we present the design and implementation of DPH, a storage layer for cluster environments. DPH is a Distributed Data Structure (DDS) based on the distribution of a paged hash table. It combines main memory with file system resources across the cluster in order to implement a distributed dictionary that can be used for the storage of very large data sets with key based addressing techniques. The DPH storage layer is supported by a collection of cluster-aware utilities and services. Access to the DPH interface is provided by a user-level API. A preliminary performance evaluation shows promising results.