El-Sayed Atlam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where El-Sayed Atlam is active.

Explore More

Publication

Featured researches published by El-Sayed Atlam.

Information Processing and Management | 2003

Documents similarity measurement using field association terms

El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Conventional approaches to text analysis and information retrieval which measured document similarity by using considering all of the information in texts are a relatively inefficiency for processing large text collections in heterogeneous subject areas. This paper outlined a new text manipulation system FA-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. FA-Sim is based on flexible text matching procedures carried out in various contexts and various field ranks. FA-Sim measures texts similarity by using specific field association (FA) terms instead of by comparing all text information. Similarity between texts is faster and higher by using FA-Sim than other two analysis methods. Therefore, Recall and Precision significantly improved by 39% and 37% over these two traditional methods.

Information Processing and Management | 2002

A new method for selecting English field association terms of compound words and its knowledge representation

El-Sayed Atlam; Kazuhiro Morita; Masao Fuketa; Jun-ichi Aoe

This paper presents a strategy for building a morphological machine dictionary of English that infers meaning of derivations by considering morphological affixes and their semantic classification. Derivations are grouped into a frame that is accessible to semantic stem and knowledge base. This paper also proposes an efficient method for selecting compound Field Association (FA) terms from a large pool of single FA terms for some specialized fields. For single FA terms, five levels of association are defined and two ranks are defined, based on stability and inheritance. About 85% of redundant compound FA terms can be removed effectively by using levels and ranks proposed in this paper. Recall averages of 60-80% are achieved, depending on the type of text. The proposed methods are applied to 22,000 relationships between verbs and nouns extracted from the large tagged corpus.

Information Processing and Management | 2004

Word classification and hierarchy using co-occurrence word information

Kazuhiro Morita; El-Sayed Atlam; Masao Fuketra; Kazuhiko Tsuda; Masaki Oono; Jun-ichi Aoe

By the development of the computer in recent years, calculating a complex advanced processing at high speed has become possible. Moreover, a lot of linguistic knowledge is used in the natural language processing (NLP) system for improving the system. Therefore, the necessity of co-occurrence word information in the natural language processing system increases further and various researches using co-occurrence word information are done. Moreover, in the natural language processing, dictionary is necessary and indispensable because the ability of the entire system is controlled by the amount and the quality of the dictionary. In this paper, the importance of co-occurrence word information in the natural language processing system was described. The classification technique of the co-occurrence word (receiving word) and the co-occurrence frequency was described and the classified group was expressed hierarchically. Moreover, this paper proposes a technique for an automatic construction system and a complete thesaurus. Experimental test operation of this system and effectiveness of the proposal technique is verified.

Information Processing and Management | 2006

Automatic building of new field association word candidates using search engine

El-Sayed Atlam; Ghada Elmarhomy; Kazuhiro Morita; Masao Fuketa; Jun-ichi Aoe

With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically.

International Journal of Computer Mathematics | 2005

A sentence classification technique using intention association expressions

Yuki Kadoya; Kazuhiro Morita; Masao Fuketa; Masaki Oono; El-Sayed Atlam; Toru Sumitomo; Jun-ichi Aoe

Although there are many text classification techniques depending on the vector space, it is difficult to detect the meaning related to the user’s intention (complaint, encouragement, request, invitation, etc.). The approach be discussed in this paper is very useful for understanding focus points in conversation. We present a technique for determining the speaker’s intention for sentences in conversation. Intention association expressions are introduced, and formal descriptions with weights are defined using these expressions to construct an intention classification. A deterministic multi-attribute pattern-matching algorithm is used to determine the intention class efficiently. In simulation results for 681 email messages of 5859 sentences, the multi-attribute pattern-matching algorithm is about 44.5 times faster than the Aho and Corasick method. The precision and recall of intention classification of sentences are 91% and 95%, respectively. The precision and recall of extraction of unnecessary sentences are 98% and 96%, respectively. The precision and recall of the classification of each email are 88% and 89%, respectively.

Information Processing and Management | 2007

Improvement of building field association term dictionary using passage retrieval

Uddin Sharif; Elmarhomy Ghada; El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Field Association (FA) terms are a limited set of discriminating terms that can specify document fields. Document fields can be decided efficiently if there are many relevant FA terms in that documents. An earlier approach built FA terms dictionary using a WWW search engine, but there were irrelevant selected FA terms in that dictionary because that approach extracted FA terms from the whole documents. This paper proposes a new approach for extracting FA terms using passage (portions of a document text) technique rather than extracting them from the whole documents. This approach extracts FA terms more accurately than the earlier approach. The proposed approach is evaluated for 38,372 articles from the large tagged corpus. According to experimental results, it turns out that by using the new approach about 24% more relevant FA terms are appending to the earlier FA term dictionary and around 32% irrelevant FA terms are deleted. Moreover, precision and recall are achieved 98% and 94% respectively using the new approach.

Information Processing and Management | 2000

Similarity measurement using term negative weight and its application to word similarity

El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Abstract A “term weighting” is a useful technique for keyword extraction and document classification. The traditional approach depends on high frequency terms, called positive weight ( PW ) function. This paper presents a new weighting method that depends on low frequency terms, called negative weight ( NW ) function. In this paper word similarity for typical verbs and objects is focused as an example for the application field. Negative weighted inverse verb frequency ( NWIVF ) function is well defined in this study and new similarity measurement is presented by combining the NWIVF and PWIVF ( positive weighted inverse verb frequency ) functions. The proposed method is applied to 11,000 relationships between verbs and nouns extracted from a large tagged corpus. By using this new method both recall and precision have improved by 33% and 18% respectively, over the positive weight method.

Information Sciences | 2004

Fast and compact updating algorithms of a double-array structure

Kazuhiro Morita; El-Sayed Atlam; Masao Fuketa; Kazuhiko Tsuda; Jun-ichi Aoe

In many information retrieval applications, it is necessary to be able to adopt a trie search for looking at the input character by character. As a fast and compact data structure for a trie, a double-array is presented. However, the insertion time is not faster than other dynamic retrieval methods because the double-array is a semi-static retrieval method that cannot treat high frequent updating. Further, the space efficiency of the double-array degrades with the number of deletions because it keeps empty elements produced by deletion. This paper presents a fast insertion algorithm by linking empty elements to find inserting positions quickly and a compression algorithm by reallocating empty elements for each deletion. From the simulation results for 100 thousands keys, it turned out that the insertion time and the space efficiency are achieved.

Software - Practice and Experience | 2003

A fast and compact elimination method of empty elements from a double‐array structure

Masaki Oono; El-Sayed Atlam; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

A double‐array is a well‐known data structure to implement the trie. However, the space efficiency of the double‐array degrades with the number of key deletions because the double‐array keeps empty elements produced by the key deletion. This paper presents a fast and compact elimination method of empty elements using properties of the trie nodes that have no siblings. The present elimination method is implemented by C language. From simulation results for large sets of keys, the present elimination method is about 30–330 times faster than the conventional elimination method and maintains high space efficiency. Copyright

Information Processing and Management | 2010

New methods for compression of MP double array by compact management of suffixes

Tshering C. Dorji; El-Sayed Atlam; Susumu Yata; Mahmoud Rokaya; Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Minimal Prefix (MP) double array is an efficient data structure for a trie. However, its space efficiency is degraded by the non-compact management of suffixes. This paper presents three methods to compress the MP double array. The first two methods compress the MP double array by accommodating short suffixes inside the leaf nodes, and pruning leaf nodes corresponding to the end marker symbol. These methods achieve size reduction of up to 20%, making insertion and deletion faster at the same time while maintaining the retrieval time of O(1). The third method eliminates empty spaces in the array that holds suffixes, and improves the maximum size reduction further by about 5% at the cost of increased insertion time. Compared to a Ternary Search Tree, the key retrieval of the compressed MP double array is 50% faster and its size is 3-5 times smaller.

Explore More