Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kai-Wei Chang is active.

Publication


Featured researches published by Kai-Wei Chang.


international conference on machine learning | 2008

A dual coordinate descent method for large-scale linear SVM

Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin; S. Sathiya Keerthi; S. Sundararajan

In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such large-scale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. The proposed method is simple and reaches an ε-accurate solution in O(log(1/ε)) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVMperf, and a recent primal coordinate descent implementation.


knowledge discovery and data mining | 2008

A sequential dual method for large scale multi-class linear svms

S. Sathiya Keerthi; S. Sundararajan; Kai-Wei Chang; Cho-Jui Hsieh; Chih-Jen Lin

Efficient training of direct multi-class formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time. The speed of training is enhanced further by shrinking and cooling heuristics. Experiments indicate that our method is much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods.


empirical methods in natural language processing | 2014

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Kai-Wei Chang; Wen-tau Yih; Bishan Yang; Christopher Meek

While relation extraction has traditionally been viewed as a task relying solely on textual data, recent work has shown that by taking as input existing facts in the form of entity-relation triples from both knowledge bases and textual data, the performance of relation extraction can be improved significantly. Following this new paradigm, we propose a tensor decomposition approach for knowledge base embedding that is highly scalable, and is especially suitable for relation extraction. By leveraging relational domain knowledge about entity type information, our learning algorithm is significantly faster than previous approaches and is better able to discover new relations missing from the database. In addition, when applied to a relation extraction task, our approach alone is comparable to several existing systems, and improves the weighted mean average precision of a state-of-theart method by 10 points when used as a subcomponent.


knowledge discovery and data mining | 2010

Large linear classification when data cannot fit in memory

Hsiang-Fu Yu; Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin

Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.


ACM Transactions on Knowledge Discovery From Data | 2012

Large Linear Classification When Data Cannot Fit in Memory

Hsiang-Fu Yu; Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin

Recent advances in linear classification have shown that for applications such as document classification, the training process can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. Because data cannot fit in memory, many design considerations are very different from those for traditional algorithms. We discuss and compare with existing approaches that are able to handle data larger than memory. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.


knowledge discovery and data mining | 2011

Selective block minimization for faster convergence of limited memory large-scale linear models

Kai-Wei Chang; Dan Roth

As the size of data sets used to build classifiers steadily increases, training a linear model efficiently with limited memory becomes essential. Several techniques deal with this problem by loading blocks of data from disk one at a time, but usually take a considerable number of iterations to converge to a reasonable model. Even the best block minimization techniques [1] require many block loads since they treat all training examples uniformly. As disk I/O is expensive, reducing the amount of disk access can dramatically decrease the training time. This paper introduces a selective block minimization (SBM) algorithm, a block minimization method that makes use of selective sampling. At each step, SBM updates the model using data consisting of two parts: (1) new data loaded from disk and (2) a set of informative samples already in memory from previous steps. We prove that, by updating the linear model in the dual form, the proposed method fully utilizes the data in memory and converges to a globally optimal solution on the entire data. Experiments show that the SBM algorithm dramatically reduces the number of blocks loaded from disk and consequently obtains an accurate and stable model quickly on both binary and multi-class classification.


international conference on data mining | 2012

Efficient Pattern-Based Time Series Classification on GPU

Kai-Wei Chang; Biplab Deka; Wen-mei W. Hwu; Dan Roth

Time series shapelet discovery algorithm finds subsequences from a set of time series for use as primitives for time series classification. This algorithm has drawn a lot of interest because of the interpretability of its results. However, computation requirements restrict the algorithm from dealing with large data sets and may limit its application in many domains. In this paper, we address this issue by redesigning the algorithm for implementation on highly parallel Graphics Process Units (GPUs). We investigate several concepts of GPU programming and propose a dynamic programming algorithm that is suitable for implementation on GPUs. Results show that the proposed GPU implementation significantly reduces the running time of the shapelet discovery algorithm. For example, on the largest sample dataset from the original authors, the running time is reduced from half a day to two minutes.


conference on computational natural language learning | 2014

The Illinois-Columbia System in the CoNLL-2014 Shared Task

Alla Rozovskaya; Kai-Wei Chang; Mark Sammons; Dan Roth; Nizar Habash

The CoNLL-2014 shared task is an extension of last year’s shared task and focuses on correcting grammatical errors in essays written by non-native learners of English. In this paper, we describe the Illinois-Columbia system that participated in the shared task. Our system ranked second on the original annotations and first on the revised annotations. The core of the system is based on the University of Illinois model that placed first in the CoNLL-2013 shared task. This baseline model has been improved and expanded for this year’s competition in several respects. We describe our underlying approach, which relates to our previous work, and describe the novel aspects of the system in more detail.


conference on computational natural language learning | 2015

A Joint Framework for Coreference Resolution and Mention Head Detection

Haoruo Peng; Kai-Wei Chang; Dan Roth

In coreference resolution, a fair amount of research treats mention detection as a preprocessed step and focuses on developing algorithms for clustering coreferred mentions. However, there are significant gaps between the performance on gold mentions and the performance on the real problem, when mentions are predicted from raw text via an imperfect Mention Detection (MD) module. Motivated by the goal of reducing such gaps, we develop an ILP-based joint coreference resolution and mention head formulation that is shown to yield significant improvements on coreference from raw text, outperforming existing state-ofart systems on both the ACE-2004 and the CoNLL-2012 datasets. At the same time, our joint approach is shown to improve mention detection by close to 15% F1. One key insight underlying our approach is that identifying and co-referring mention heads is not only sufficient but is more robust than working with complete mentions.


european conference on machine learning | 2013

Multi-core structural SVM training

Kai-Wei Chang; Vivek Srikumar; Dan Roth

Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMIDCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multicore hardware to parallelize learning with minimal synchronization between the model update and the inference phases.We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two real-world NLP problems: part-of-speech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multi-threaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.

Collaboration


Dive into the Kai-Wei Chang's collaboration.

Top Co-Authors

Avatar

Chih-Jen Lin

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Cho-Jui Hsieh

University of California

View shared research outputs
Top Co-Authors

Avatar

Hsiang-Fu Yu

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cho-Jui Hsieh

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge