Helen Balinsky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Helen Balinsky is active.

Explore More

Publication

Featured researches published by Helen Balinsky.

international conference on emerging security technologies | 2012

On the Helmholtz Principle for Data Mining

Boris Dadachev; Alexander Balinsky; Helen Balinsky; Steven J. Simske

Unusual behaviour detection and information extraction in streams of short documents and files (emails, news, tweets, log files, messages, etc.) are important problems in security applications. In [1], [2], a new approach to rapid change detection and automatic summarization of large documents was introduced. This approach is based on a theory of social networks and ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. In this article we modify, optimize and verify the approach from [1], [2] to unusual behaviour detection and information extraction from small documents.

document engineering | 2010

APEX: automated policy enforcement eXchange

Steven J. Simske; Helen Balinsky

The changing nature of document workflows, document privacy and document security merit a new approach to the enforcement of policy. We propose the use of automated means for enforcing policy, which provides advantages for compliance and auditing, adaptability to changes in policy, and compatibility with a cloud-based exchange. We describe the Automated Policy Enforcement eXchange (APEX) software system, which consists of: (1) a policy editor, (2) a policy server, (3) a local daemon on every PC/laptop to maintain local secure up-to-date storage and policy, and (4) local (policy-enforcing) wrappers to capture document-handling user actions such as document export, e-mail, print, edit and save. During the performance of relevant incremental change, or other user-elicited action, on a composite document, the document and its metadata are scanned for salient policy eliciting terms (PETs). The document is then partitioned based on relevant policies and the security policy for each part is determined. If the document contains no PETs, then the user-initiated actions are allowed; otherwise, alternative actions are suggested, including: (a) encryption, (b) redirecting to a secure printer and requiring authorization (e.g. PIN) for printing, and (c) disallowing printing until specific sensitive data is removed.

electronic imaging | 2006

Evaluating interface aesthetics: measure of symmetry

Helen Balinsky

Symmetry is one of the most fundamental principles in design. The choice between symmetry and asymmetry affects the layout and feeling of a design. A symmetrical page gives a feeling of permanence and stability, while informal or asymmetrical balance creates interest. The aim of this paper is to solve the problem of an automatic detection of axial and radial symmetry or lack of it in published documents. Previous approaches to this problem gave only a necessary condition for symmetry. We present a necessary and sufficient criterion for automatic symmetry detection and also introduce a Euclidean-type distance from any layout to the closest symmetrical one [3]. We present mathematical proof that the measure of symmetry we introduce is exact and accurate. It coincides with intuition and can be effectively calculated. Moreover, any other symmetry criterion will be a derivative of this measure.

document engineering | 2010

On helmholtz's principle for documents processing

Alexander Balinsky; Helen Balinsky; Steven J. Simske

Keyword extraction is a fundamental problem in text data mining and document processing. A large number of document processing applications directly depend on the quality and speed of keyword extraction algorithms. In this article, a novel approach to rapid change detection in data stream. and documents is developed. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keywords extraction, it delivers fast and effective tools to identify meaningful keywords using parameter-free methods. We also define a level of meaningfulness of the keywords which can be used to modify the set of keywords depending on application needs.

systems, man and cybernetics | 2011

Document sentences as a small world

Helen Balinsky; Alexander Balinsky; Steven J. Simske

In this paper we describe the possibility of constructing the well-known small world topology for an ordinary document, based on the actual document structure. Sentences in such a graph are represented by nodes, which are connected if and only if the corresponding sentences are neighbors or share at least one common keyword. This graph is built using a carefully selected one-parameter set of keywords. By varying this parameter - the level of meaningfulness - we transition the document-representing graph from a trivial path graph into a large random graph. During such a conversion, as the parameter is varied over its range, the graph becomes a small world. This in turn opens the possibility of applying many well-established ranking algorithms to the problem of ranking sentences and paragraphs in text documents. These rankings are, in turn, crucial for document understanding, summarization and information extraction. These graphs can also serve as a source of interesting small world graphs for the theory of complex networks.

document engineering | 2011

Automatic text summarization and small-world networks

Helen Balinsky; Alexander Balinsky; Steven J. Simske

Automatic text summarization is an important and challenging problem. Over the years, the amount of text available electronically has grown exponentially. This growth has created a huge demand for automatic methods and tools for text summarization. We can think of automatic summarization as a type of information compression. To achieve such compression, better modelling and understanding of document structures and internal relations is required. In this article, we develop a novel approach to extractive text summarization by modelling texts and documents as small-world networks. Based on our recent work on the detection of unusual behavior in text, we model a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtzs principle. We demonstrate that for some range of the parameters, the resulting graph becomes a small-world network. Such a remarkable structure opens the possibility of applying many measures and tools from social network theory to the problem of extracting the most important sentences and structures from text documents. We hope that documents will be also a new and rich source of examples of complex networks.

document engineering | 2009

Aesthetic measure of alignment and regularity

Helen Balinsky; Anthony J. Wiley; Matthew C. Roberts

To be effective as communications or sales tools, documents that are personalized and customized for each customer must be visually appealing and aesthetically pleasing. Producing perhaps millions of unique versions of essentially the same document not only presents challenges to the printing process but also disrupts the standard quality control procedures. The quality of the alignment in each document can easily distinguish professionally looking documents from amateur designs and some computer generated layouts. A multicomponent measure of document alignment and regularity, derived directly from designer knowledge, is developed and presented in computable form. The measure includes: edge quality, page connectivity, grid regularity and alignment statistics. It is clear that these components may have different levels of importance, relevance and acceptability for various document types and classes, thus the proposed measure should always be evaluated against the requirements of the desired class of documents.

document engineering | 2009

Aesthetically-driven layout engine

Helen Balinsky; Jonathan R. Howes; Anthony J. Wiley

A novel Aesthetically-Driven Layout (ADL) engine for automatic production of highly customized, non-flow documents is proposed. In a non-flow document, where each page is composed of separable images and text blocks, aesthetic considerations may take precedence over the sequencing of the content. Such layout methods are most suitable for the construction of personalized catalogues, advertising flyers and sales and marketing material, all of which rely heavily on their aesthetics in order to successfully reach their intended audience. The non-flow algorithm described here permits the dynamic creation of page layouts around pre-existing static page content. Pages pre-populated with static content may include reserved areas which are filled at run-time. The remainder of a page, which is neither convex, nor simply-connected, is automatically filled with customer-relevant content by following the professional manual design strategy of multiple levels of layout resolution. The page designers preference, style and aesthetic rules are taken into account at every stage with the highest scoring layout being selected.

document engineering | 2014

On automatic text segmentation

Boris Dadachev; Alexander Balinsky; Helen Balinsky

Automatic text segmentation, which is the task of breaking a text into topically-consistent segments, is a fundamental problem in Natural Language Processing, Document Classification and Information Retrieval. Text segmentation can significantly improve the performance of various text mining algorithms, by splitting heterogeneous documents into homogeneous fragments and thus facilitating subsequent processing. Applications range from screening of radio communication transcripts to document summarization, from automatic document classification to information visualization, from automatic filtering to security policy enforcement - all rely on, or can largely benefit from, automatic document segmentation. In this article, a novel approach for automatic text and data stream segmentation is presented and studied. The proposed automatic segmentation algorithm takes advantage of feature extraction and unusual behaviour detection algorithms developed in [4, 5]. It is entirely unsupervised and flexible to allow segmentation at different scales, such as short paragraphs and large sections. We also briefly review the most popular and important algorithms for automatic text segmentation and present detailed comparisons of our approach with several of those state-of-the-art algorithms.

trust security and privacy in computing and communications | 2011

Publicly Posted Composite Documents in Variably Ordered Workflows

Helen Balinsky; Liqun Chen; Steven J. Simske

Recently-introduced Publicly Posted Composite Documents (PPCDs) address the problem of composite documents with different formats and differential access control participating in cross-organizational workflows distributed over potentially non-secure channels. An early version of the PPCD only considered two simplest workflow types: ordered and unordered. In real life, however, fragments of such pure types are likely to be combined into mixed workflows with interleaved ordered and unordered workflow steps. In the current paper, we introduce a payload matrix for a generic mixed workflow. We provide a computationally beneficial solution for enforcing access in ordered workflows by reducing the volume of data that needs to be encrypted and then subsequently decrypted. Furthermore, we address the problem of enforcing order in transitions between different types of workflow steps: from an unordered step to ordered steps and vice versa, and from one unordered step to another. This prevents the participants of any workflow step from being able to access a PPCD document prior to their workflow step.

Explore More