Behram Khan
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Behram Khan.
Microprocessors and Microsystems | 2014
Roberto Giorgi; Rosa M. Badia; François Bodin; Albert Cohen; Paraskevas Evripidou; Paolo Faraboschi; Bernhard Fechner; Guang R. Gao; Arne Garbade; Rahulkumar Gayatri; Sylvain Girbal; Daniel Goodman; Behram Khan; Souad Koliai; Joshua Landwehr; Nhat Minh Lê; Feng Li; Mikel Luján; Avi Mendelson; Laurent Morin; Nacho Navarro; Tomasz Patejko; Antoniu Pop; Pedro Trancoso; Theo Ungerer; Ian Watson; Sebastian Weis; Stéphane Zuckerman; Mateo Valero
The improvements in semiconductor technologies are gradually enabling extreme-scale systems such as teradevices (i.e., chips composed by 1000 billion of transistors), most likely by 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a Future and Emerging Technology (FET) large-scale project funded by the European Union, which addresses such challenges at once by leveraging the dataflow principles. This paper presents an overview of the research carried out by the TERAFLUX partners and some preliminary results. Our platform comprises 1000+ general purpose cores per chip in order to properly explore the above challenges. An architectural template has been proposed and applications have been ported to the platform. Programming models, compilation tools, and reliability techniques have been developed. The evaluation is carried out by leveraging on modifications of the HP-Labs COTSon simulator.
high performance computing and communications | 2008
Behram Khan; Matthew Horsnell; Ian Rogers; Mikel Luján; Andrew Dinn; Ian Watson
Transactional memory (TM) is receiving attention as a way of expressing parallelism for programming multi-core systems. As a parallel programming model it is able to avoid the complexity of conventional locking. TM can enable multi-core hardware that dispenses with conventional bus-based cache coherence, resulting in simpler and more extensible systems. This is increasingly important as we move into the many-core era. Within TM, however, the processes of conflict detection and committing still require synchronization and the broadcast of data. By increasing the granularity of when synchronization is required, the demands on communication are reduced. Software implementations of TM have taken advantage of the fact that the object structure of data can be employed to further raise the level at which interference is observed. The contribution of this paper is the first hardware TM approach where the object structure is recognized and harnessed. This leads to novel commit and conflict detection mechanisms, and also to an elegant solution to the virtualization of version management, without the need for additional software TM support. A first implementation of the proposed hardware TM system is simulated. The initial evaluation is conducted with three benchmarks derived from the STAMP suite and a transactional version of Lees routing algorithm.
Journal of Parallel and Distributed Computing | 2013
Daniel Goodman; Behram Khan; Salman Khan; Mikel Luján; Ian Watson
Transactional memory is an alternative to locks for handling concurrency in multi-threaded environments. Instead of providing critical regions that only one thread can enter at a time, transactional memory records sufficient information to detect and correct for conflicts if they occur. This paper surveys the range of options for implementing software transactional memory in Scala. Where possible, we provide references to implementations that instantiate each technique. As part of this survey, we document for the first time several techniques developed in the implementation of Manchester University Transactions for Scala. We order the implementation techniques on a scale moving from the least to the most invasive in terms of modifications to the compilation and runtime environment. This shows that, while the less invasive options are easier to implement and more common, they are more verbose and invasive in the codes using them, often requiring changes to the syntax and program structure throughout the code.
applications of natural language to data bases | 2014
Iqra Javed; Hammad Afzal; Awais Majeed; Behram Khan
This paper presents an approach towards bi-lingual sentiment analysis of tweets. Social networks being most advanced and popular communication medium can help in designing better government and business strategies. There are a number of studies reported that use data from social networks; however, most of them are based on English language. In this research, we have focused on sentiment analysis of bilingual dataset (English and Roman-Urdu) on topic of national interest (General Elections). Our experiments produced encouraging results with 76% of tweet’s sentiment strength classified correctly. We have also created a bi-lingual lexicon that stores the sentiment strength of English and Roman Urdu terms. Our lexicon is available at: https://sites.google. com/a/mcs.edu.pk/codteem/biling_senti
2012 Data-Flow Execution Models for Extreme Scale Computing | 2012
Daniel Goodman; Salman Khan; Chris Seaton; Yegor Guskov; Behram Khan; Mikel Luján; Ian Watson
In this paper we present DFScala, a library for constructing and executing dataflow graphs in the Scala language. Through the use of Scala this library allows the programmer to construct coarse grained dataflow graphs that take advantage of functional semantics for the dataflow graph and both functional and imperative semantics within the dataflow nodes. This combination allows for very clean code which exhibits the properties of dataflow programs, but we believe is more accessible to imperative programmers. We first describe DFScala in detail, before using a number of benchmarks to evaluate both its scalability and its absolute performance relative to existing codes. DFScala has been constructed as part of the Teraflux project and is being used extensively as a basis for further research into dataflow programming.
high performance embedded architectures and compilers | 2010
Mohammad Ansari; Behram Khan; Mikel Luján; Christos Kotselidis; Chris C. Kirkham; Ian Watson
The optimistic nature of Transactional Memory (TM) systems can lead to the concurrent execution of transactions that are later found to conflict. Conflicts degrade scalability, and may lead to aborts that increase wasted work, and degrade performance. A promising approach to reducing conflicts at runtime is dynamically, and transparently, reordering the execution of transactions upon discovery of conflicts. This approach has been explored in Software TMs (STMs), but not in Hardware TMs (HTMs). Furthermore, STM implementations of this approach cannot be ported to HTMs easily. This paper investigates the feasibility of such reordering in HTMs, and presents two designs that are scalable, independent of the on-chip interconnect, require only minor modifications to each core, and add no execution overhead if no conflicts occur. The evaluation takes LogTM-SE as a base line and considers benchmarks with different levels of contention (transactional conflicts). The results show that the preferred design increases HTM performance by up to 17% when contention is low, 57% when contention is high, and never degrades performance. Finally, the designs are orthogonal to LogTM-SE; they require no modification to cache structures, and continue to support transaction virtualization, open and closed unbounded nesting, paging, thread suspension, and thread migration.
networks on chips | 2012
Javier Navaridas; Behram Khan; Salman Khan; Paolo Faraboschi; Mikel Luj´n
Architectural simulation is an essential tool when it comes to evaluating the design of future many-core chips. However, reproducing all the components of such complex systems precisely would require unreasonable amounts of computing power. Hence, a trade off between accuracy and compute time is needed. For this reason most state-of-the-art tools do not have accurate models for the networks-on-chip, and rely on timing models that permit fast-simulation. Generally, these models are very simplistic and disregard contention for the use of network resources. As the number of nodes in the network-on-chip grows, fluctuations with contention and other parameters can considerably affect the accuracy of such models. In this paper we present and evaluate a collection of timing models based on a reservation scheme which consider the contention for the use of network resources. These models provide results quickly while being more accurate than simple no-contention approaches.
acm symposium on parallel algorithms and architectures | 2008
Behram Khan; Matthew Horsnell; Ian Rogers; Mikel Luján; Andrew Dinn; Ian Watson
The contribution of this paper is the first Hardware Transactional Memory (HTM) where the object structure is recognized and harnessed. Our approach is similar to hardware support of paged virtual memory using a virtually addressed cache and a TLB, and is based on a cache hierarchy that allows the addressing of objects by unique object identifiers. The object-aware HTM allows cache overflows of uncommitted data. It also enables a novel commit and conflict detection mechanism. In this preliminary evaluation, the Lee-TM application exhibits overflows that in most previous HTMs would have had to be handled by software, impacting on performance. The simulation provides an insight into the scalability characteristics of the proposed HTM, which uses object and field granularity, lazy versioning and lazy conflict detection. For example, with 32 cores the broadcast of write sets is at under 5% of the bus bandwidth, showing the potential of object-aware HTM systems.
Advances in Adaptive Data Analysis | 2017
Mehreen Ahmed; Hammad Afzal; Awais Majeed; Behram Khan
The information-based prediction models using machine learning techniques have gained massive popularity during the last few decades. Such models have been applied in a number of domains such as medical diagnosis, crime prediction, movies rating, etc. Similar is the trend in telecom industry where prediction models have been applied to predict the dissatisfied customers who are likely to change the service provider. Due to immense financial cost of customer churn in telecom, the companies from all over the world have analyzed various factors (such as call cost, call quality, customer service response time, etc.) using several learners such as decision trees, support vector machines, neural networks, probabilistic models such as Bayes, etc. This paper presents a detailed survey of models from 2000 to 2015 describing the datasets used in churn prediction, impacting features in those datasets and classifiers that are used to implement prediction model. A total of 48 studies related to churn prediction in tel...
DFM '13 Proceedings of the 2013 Data-Flow Execution Models for Extreme Scale Computing | 2013
Daniel Goodman; Behram Khan; Mikel Luján; Ian Watson
In pure dataflow applications scheduling can have a huge effect on the memory footprint and number of active tasks in the program. However, in impure programs, scheduling not only effects the system resources, but can also effect the overall time complexity and accuracy of the program. To address both of these aspects this paper describes and analyses effective extensions to a dataflow scheduler to allow programmers to provide priority information describing the preferred execution order of a dataflow graph. We demonstrate that even very crude task priority metrics can be extremely effective, providing an average saving of 91% over the worst case scenario and 60% over the best case naive scenario. We also note that by specifying the scheduling information explicitly based on the algorithm, not the hardware, we provide portability to the application.