Cem Tekin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cem Tekin is active.

Explore More

Publication

Featured researches published by Cem Tekin.

IEEE Transactions on Information Theory | 2012

Online Learning of Rested and Restless Bandits

Cem Tekin; Mingyan Liu

In this paper, we study the online learning problem involving rested and restless bandits, in both a centralized and a decentralized setting. In a centralized setting, the system consists of a single player/user and a set of K finite-state discrete-time Markov chains (arms) with unknown state spaces (rewards) and statistics. The objective of the player is to decide in each step which M of the K arms to play over a sequence of trials so as to maximize its long-term reward. In a decentralized setting, multiple uncoordinated players each makes its own decision on which arm to play in a step, and if two or more players select the same arm simultaneously, a collision results and none of the players selecting that arm gets a reward. The objective of each player again is to maximize its long-term reward. We first show that logarithmic regret algorithms exist both for the centralized rested and restless bandit problems. For the decentralized setting, we propose an algorithm with logarithmic regret with respect to the optimal centralized arm allocation. Numerical results and extensive discussion are also provided to highlight insights obtained from this study.

IEEE ACM Transactions on Networking | 2012

Atomic congestion games on graphs and their applications in networking

Cem Tekin; Mingyan Liu; Richard Southwell; Jianwei Huang; Sahand Haji Ali Ahmad

In this paper, we introduce and analyze the properties of a class of games, the atomic congestion games on graphs (ACGGs), which is a generalization of the classical congestion games. In particular, an ACGG captures the spatial information that is often ignored in a classical congestion game. This is useful in many networking problems, e.g., wireless networks where interference among the users heavily depends on the spatial information. In an ACGG, a players payoff for using a resource is a function of the number of players who interact with it and use the same resource. Such spatial information can be captured by a graph. We study fundamental properties of the ACGGs: under what conditions these games possess a pure strategy Nash equilibrium (PNE), or the finite improvement property (FIP), which is sufficient for the existence of a PNE. We show that a PNE may not exist in general, but that it does exist in many important special cases including tree, loop, or regular bipartite networks. The FIP holds for important special cases including systems with two resources or identical payoff functions for each resource. Finally, we present two wireless network applications of ACGGs: power control and channel contention under IEEE 802.11.

allerton conference on communication, control, and computing | 2010

Online algorithms for the multi-armed bandit problem with Markovian rewards

Cem Tekin; Mingyan Liu

We consider the classical multi-armed bandit problem with Markovian rewards. When played an arm changes its state in a Markovian fashion while it remains frozen when not played. The player receives a state-dependent reward each time it plays an arm. The number of states and the state transition probabilities of an arm are unknown to the player. The players objective is to maximize its long-term total reward by learning the best arm over time. We show that under certain conditions on the state transition probabilities of the arms, a sample mean based index policy achieves logarithmic regret uniformly over the total number of trials. The result shows that sample mean based index policies can be applied to learning problems under the rested Markovian bandit model without loss of optimality in the order. Moreover, comparision between Anantharams index policy and UCB shows that by choosing a small exploration parameter UCB can have a smaller regret than Anantharams index policy.

international conference on computer communications | 2012

Approximately optimal adaptive learning in opportunistic spectrum access

Cem Tekin; Mingyan Liu

In this paper we develop an adaptive learning algorithm which is approximately optimal for an opportunistic spectrum access (OSA) problem with polynomial complexity. In this OSA problem each channel is modeled as a two state discrete time Markov chain with a bad state which yields no reward and a good state which yields reward. This is known as the Gilbert-Elliot channel model and represents variations in the channel condition due to fading, primary user activity, etc. There is a user who can transmit on one channel at a time, and whose goal is to maximize its throughput. Without knowing the transition probabilities and only observing the state of the channel currently selected, the user faces a partially observed Markov decision problem (POMDP) with unknown transition structure. In general, learning the optimal policy in this setting is intractable. We propose a computationally efficient learning algorithm which is approximately optimal for the infinite horizon average reward criterion.

IEEE Journal of Selected Topics in Signal Processing | 2014

Distributed Online Learning in Social Recommender Systems

Cem Tekin; Simpson Zhang; Mihaela van der Schaar

In this paper, we consider decentralized sequential decision making in distributed online recommender systems, where items are recommended to users based on their search query as well as their specific background including history of bought items, gender and age, all of which comprise the context information of the user. In contrast to centralized recommender systems, in which there is a single centralized seller who has access to the complete inventory of items as well as the complete record of sales and user information, in decentralized recommender systems each seller/learner only has access to the inventory of items and user information for its own products and not the products and user information of other sellers, but can get commission if it sells an item of another seller. Therefore, the sellers must distributedly find out for an incoming user which items to recommend (from the set of own items or items of another seller), in order to maximize the revenue from own sales and commissions. We formulate this problem as a cooperative contextual bandit problem, analytically bound the performance of the sellers compared to the best recommendation strategy given the complete realization of user arrivals and the inventory of items, as well as the context-dependent purchase probabilities of each item, and verify our results via numerical examples on a distributed data set adapted based on Amazon data. We evaluate the dependence of the performance of a seller on the inventory of items the seller has, the number of connections it has with the other sellers, and the commissions which the seller gets by selling items of other sellers to its users.

IEEE Transactions on Services Computing | 2016

Online Learning in Large-Scale Contextual Recommender Systems

Linqi Song; Cem Tekin; Mihaela van der Schaar

In this paper, we propose a novel large-scale, context-aware recommender system that provides accurate recommendations, scalability to a large number of diverse users and items, differential services, and does not suffer from “cold start” problems. Our proposed recommendation system relies on a novel algorithm which learns online the item preferences of users based on their click behavior, and constructs online item-cluster trees. The recommendations are then made by choosing an item-cluster level and then selecting an item within that cluster as a recommendation for the user. This approach is able to significantly improve the learning speed when the number of users and items is large, while still providing high recommendation accuracy. Each time a user arrives at the website, the system makes a recommendation based on the estimations of item payoffs by exploiting past context arrivals in a neighborhood of the current users context. It exploits the similarity of contexts to learn how to make better recommendations even when the number and diversity of users and items is large. This also addresses the cold start problem by using the information gained from similar users and items to make recommendations for new users and items. We theoretically prove that the proposed algorithm for item recommendations converges to the optimal item recommendations in the long-run. We also bound the probability of making a suboptimal item recommendation for each user arriving to the system while the system is learning. Experimental results show that our approach outperforms the state-of-the-art algorithms by over 20 percent in terms of click through rates.

allerton conference on communication, control, and computing | 2013

Distributed online Big Data classification using context information

Cem Tekin; Mihaela van der Schaar

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each others data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm.

IEEE Transactions on Signal Processing | 2015

Distributed Online Learning via Cooperative Contextual Bandits

Cem Tekin; Mihaela van der Schaar

In this paper, we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost and receives the reward but the provider learns the information. In our framework, learners are modeled as cooperative contextual bandits. Each learner seeks to maximize the expected reward from its arrivals, which involves trading off the reward received from its own actions, the information learned from its own actions, the reward received from the actions requested of others and the cost paid for these actions-taking into account what it has learned about the value of assistance from each other learner. We develop distributed online learning algorithms and provide analytic bounds to compare the efficiency of these with algorithms with the complete knowledge (oracle) benchmark (in which the expected reward of every action in every context is known by every learner). Our estimates show that regret-the loss incurred by the algorithm-is sublinear in time. Our theoretical framework can be used in many practical applications including Big Data mining, event detection in surveillance sensor networks and distributed online recommendation systems.

allerton conference on communication, control, and computing | 2011

Adaptive learning of uncontrolled restless bandits with logarithmic regret

Cem Tekin; Mingyan Liu

In this paper we consider the problem of learning the optimal policy for the uncontrolled restless bandit problem. In this problem only the state of the selected arm can be observed, the state transitions are independent of control and the transition law is unknown. We propose a learning algorithm which gives logarithmic regret uniformly over time with respect to the optimal finite horizon policy with known transition law under some assumptions on the transition probabilities of the arms and the structure of the optimal stationary policy for the infinite horizon average reward problem.

military communications conference | 2009

Enhancing Cognitive Radio dynamic spectrum sensing through adaptive learning

Cem Tekin; Steven Hong; Wayne E. Stark

Cognitive Radio (CR) networks present a difficult set of challenges due to the fluctuating nature of the available spectrum and wide ranging number of applications, each having different Quality of Service (QoS) requirements. This paper studies the key enabling technologies of Cognitive Radio and makes contributions in two key areas: sensing and learning. We shall first present the software testbed which is developed to implement the Cognitive Radio spectrum sensing system. Next, we derive the mathematical relationship between varying parameters and the QoS and test it on our system to verify the overall performance. Novel learning techniques which determine the statistics of primary user (PU) channel usage over time are proposed to enhance the cognitive radios dynamic spectrum sensing ability. Using our testbed, we shall demonstrate the feasibility of the innovative adaptive learning algorithms and their ability to increase spectrum sensing efficiency and improve performance over time without feedback from the receiver. We will then proceed to the domain where there are multiple non-cooperative cognitive users (secondary users) selfishly applying the learning algorithms to increase their data rate in channels with varying primary user activity. Finally we conclude with discussions about our results and future work.

Explore More