Alvin Cheung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alvin Cheung is active.

Explore More

Publication

Featured researches published by Alvin Cheung.

international database engineering and applications symposium | 2006

Towards Traceability across Sovereign, Distributed RFID Databases

R. Agrawa; Alvin Cheung; Karin Kailing; S. Schonauer

Tracking and tracing individual items is a new and emerging trend in many industries. Driven by maturing technologies such as radio-frequency identification (RFID) and upcoming standards such as the electronic product code (EPC), a rapidly increasing number of enterprises are collecting vast amounts of tracking data. To enable traceability over the entire life-cycle of items data has to be shared across independent and possibly competing enterprises. The need to simultaneously compete and cooperate requires a traceability system design that allows companies to share their traceability data while maintaining complete sovereignty over what is shared and with whom. Based on an extensive study of traceability applications, we introduce the formal concept of traceability networks and highlight the technical challenges involved in sharing data in such a network. To address these challenges, we present an innovative combination of query processing techniques from P2P networks and distributed as well as parallel databases with confidentiality enforcement techniques

acm special interest group on data communication | 2016

Packet Transactions: High-Level Programming for Line-Rate Switches

Anirudh Sivaraman; Alvin Cheung; Mihai Budiu; Changhoon Kim; Mohammad Alizadeh; Hari Balakrishnan; George Varghese; Nick McKeown; Steve Licking

Many algorithms for congestion control, scheduling, network measurement, active queue management, and traffic engineering require custom processing of packets in the data plane of a network switch. To run at line rate, these data-plane algorithms must be implemented in hardware. With todays switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been built. This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on emerging programmable line-rate switching chips. The key challenge is that many data-plane algorithms create and modify algorithmic state. To achieve line-rate programmability for stateful algorithms, we introduce the notion of a packet transaction: a sequential packet-processing code block that is atomic and isolated from other such code blocks. We have developed this idea in Domino, a C-like imperative language to express data-plane algorithms. We show with many examples that Domino provides a convenient way to express sophisticated data-plane algorithms, and show that these algorithms can be run at line rate with modest estimated chip-area overhead.

programming language design and implementation | 2013

Optimizing database-backed applications with query synthesis

Alvin Cheung; Armando Solar-Lezama; Samuel Madden

Object-relational mapping libraries are a popular way for applications to interact with databases because they provide transparent access to the database using the same language as the application. Unfortunately, using such frameworks often leads to poor performance, as modularity concerns encourage developers to implement relational operations in application code. Such application code does not take advantage of the optimized relational implementations that database systems provide, such as efficient implementations of joins or push down of selection predicates. In this paper we present QBS, a system that automatically transforms fragments of application logic into SQL queries. QBS differs from traditional compiler optimizations as it relies on synthesis technology to generate invariants and postconditions for a code fragment. The postconditions and invariants are expressed using a new theory of ordered relations that allows us to reason precisely about both the contents and order of the records produced complex code fragments that compute joins and aggregates. The theory is close in expressiveness to SQL, so the synthesized postconditions can be readily translated to SQL queries. Using 75 code fragments automatically extracted from over 120k lines of open-source code written using the Java Hibernate ORM, we demonstrate that our approach can convert a variety of imperative constructs into relational specifications and significantly improve application performance asymptotically by orders of magnitude.

meeting of the association for computational linguistics | 2016

Summarizing Source Code using a Neural Attention Model

Srinivasan Iyer; Ioannis Konstas; Alvin Cheung; Luke Zettlemoyer

High quality source code is often paired with high level summaries of the computation it performs, for example in code documentation or in descriptions posted in online forums. Such summaries are extremely useful for applications such as code search but are expensive to manually author, hence only done for a small fraction of all code that is produced. In this paper, we present the first completely datadriven approach for generating high level summaries of source code. Our model, CODE-NN , uses Long Short Term Memory (LSTM) networks with attention to produce sentences that describe C# code snippets and SQL queries. CODE-NN is trained on a new corpus that is automatically collected from StackOverflow, which we release. Experiments demonstrate strong performance on two tasks: (1) code summarization, where we establish the first end-to-end learning results and outperform strong baselines, and (2) code retrieval, where our learned model improves the state of the art on a recently introduced C# benchmark by a large margin.

asia pacific workshop on systems | 2012

Undefined behavior: what happened to my code?

Xi Wang; Haogang Chen; Alvin Cheung; Zhihao Jia; Nickolai Zeldovich; M. Frans Kaashoek

System programming languages such as C grant compiler writers freedom to generate efficient code for a specific instruction set by defining certain language constructs as undefined behavior. Unfortunately, the rules for what is undefined behavior are subtle and programmers make mistakes that sometimes lead to security vulnerabilities. This position paper argues that the research community should help address the problems that arise from undefined behavior, and not dismiss them as esoteric C implementation issues. We show that these errors do happen in real-world systems, that the issues are tricky, and that current practices to address the issues are insufficient.

international conference on data engineering | 2007

Theseos: A Query Engine for Traceability across Sovereign, Distributed RFID Databases

Alvin Cheung; Karin Kailing; Stefan Schönauer

The ability to trace the history of individual products, especially their movement through supply and distribution chains, is key to many solutions such as targeted recalls and counterfeit detection. In most traceability applications a number of independent organizations have to work together. EPCglobal has proposed an architecture for a network of RFID databases where each database provides a standardized query interface. That architecture facilitates simple retrieval of traceability data from individual repositories, but it does not support complex traceability queries or cross-organizational query processing. Theseos (R. Agrawal, 2006) provides traceability applications with the ability to execute complex traceability queries that may span multiple RFID databases.

meeting of the association for computational linguistics | 2017

Learning a Neural Semantic Parser from User Feedback.

Srinivasan Iyer; Ioannis Konstas; Alvin Cheung; Jayant Krishnamurthy; Luke Zettlemoyer

We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal intervention. To achieve this, we adapt neural sequence models to map utterances directly to SQL with its full expressivity, bypassing any intermediate meaning representations. These models are immediately deployed online to solicit feedback from real users to flag incorrect queries. Finally, the popularity of SQL facilitates gathering annotations for incorrect predictions using the crowd, which is directly used to improve our models. This complete feedback loop, without intermediate representations or database specific engineering, opens up new ways of building high quality semantic parsers. Experiments suggest that this approach can be deployed quickly for any new target domain, as we show by learning a semantic parser for an online academic database from scratch.

international conference on management of data | 2014

Sloth: being lazy is a virtue (when issuing database queries)

Alvin Cheung; Samuel Madden; Armando Solar-Lezama

Many web applications store persistent data in databases. During execution, such applications spend a significant amount of time communicating with the database for retrieval and storing of persistent data over the network. These network round trips represent a significant fraction of the overall execution time for many applications and as a result increase their latency. While there has been prior work that aims to eliminate round trips by batching queries, they are limited by 1) a requirement that developers manually identify batching opportunities, or 2) the fact that they employ static program analysis techniques that cannot exploit many opportunities for batching. In this paper, we present Sloth, a new system that extends traditional lazy evaluation to expose query batching opportunities during application execution, even across loops, branches, and method boundaries. We evaluated Sloth using over 100 benchmarks from two large-scale open-source applications, and achieved up to a 3x reduction in page load time by delaying computation.

programming language design and implementation | 2017

Synthesizing highly expressive SQL queries from input-output examples

Chenglong Wang; Alvin Cheung; Rastislav Bodik

SQL is the de facto language for manipulating relational data. Though powerful, many users find it difficult to write SQL queries due to highly expressive constructs. While using the programming-by-example paradigm to help users write SQL queries is an attractive proposition, as evidenced by online help forums such as Stack Overflow, developing techniques for synthesizing SQL queries from given input-output (I/O) examples has been difficult, due to the large space of SQL queries as a result of its rich set of operators. In this paper, we present a new scalable and efficient algorithm for synthesizing SQL queries based on I/O examples. The key innovation of our algorithm is development of a language for abstract queries, i.e., queries with uninstantiated operators, that can be used to express a large space of SQL queries efficiently. Using abstract queries to represent the search space nicely decomposes the synthesis problem into two tasks: 1) searching for abstract queries that can potentially satisfy the given I/O examples, and 2) instantiating the found abstract queries and ranking the results. We have implemented this algorithm in a new tool called Scythe and evaluated it using 193 benchmarks collected from Stack Overflow. Our evaluation shows that Scythe can efficiently solve 74% of the benchmarks, most in just a few seconds, and the queries range from simple ones involving a single selection to complex queries with 6 nested subqueires.

foundations of software engineering | 2011

Partial replay of long-running applications

Alvin Cheung; Armando Solar-Lezama; Samuel Madden

Bugs in deployed software can be extremely difficult to track down. Invasive logging techniques, such as logging all non-deterministic inputs, can incur substantial runtime overheads. This paper shows how symbolic analysis can be used to re-create path equivalent executions for very long running programs such as databases and web servers. The goal is to help developers debug such long-running programs by allowing them to walk through an execution of the last few requests or transactions leading up to an error. The challenge is to provide this functionality without the high runtime overheads associated with traditional replay techniques based on input logging or memory snapshots. Our approach achieves this by recording a small amount of information about program execution, such as the direction of branches taken, and then using symbolic analysis to reconstruct the execution of the last few inputs processed by the application, as well as the state of memory before these inputs were executed. We implemented our technique in a new tool called bbr. In this paper, we show that it can be used to replay bugs in long-running single-threaded programs starting from the middle of an execution. We show that bbr incurs low recording overhead (avg. of 10%) during program execution, which is much less than existing replay schemes. We also show that it can reproduce real bugs from web servers, database systems, and other common utilities.

Explore More