IEEE Access | 2021

A Theoretical Foundation for Syntactico-Semantic Pattern Recognition

 
 

Abstract


Conventionally syntactic pattern recognition tasks have been driven by grammars defining a syntactic structure. Syntactic Pattern recognition tasks were primarily relying on the ability of parsing algorithms to recognize the patterns in the input data. These algorithms essentially were dependent on the syntactic grammars defining the patterns. Context free grammars, a type of grammars have been particularly well studied for pattern recognition tasks to be solved by computer efficiently. Some of the key pattern recognition tasks had applications in Natural Language Processing (NLP). Though context free grammars are well suited for capturing rigid patterns and unambiguous patterns, there was a need to encapsulate the uncertainty aspects involved in some pattern recognition processes. Probabilistic context free grammars can well handle the need to capture uncertainty in the processes but not in a true sense they are able to capture the uncertainty associated with the semantic context governing the domain in which the pattern recognition processes are being attempted at. The paper formally puts forth an approach for syntactico-semantic pattern recognition. The syntactico-semantic pattern recognition attempts to capture the semantic context and the uncertainties involved thereof along with probabilistic reasoning. The approach consists of integration mapping between probabilistic context free grammar (PCFG) and Multi Entity Bayesian network (MEBN), a first-order logic for modeling probabilistic knowledge bases. Additionally, the paper outlines a modified version of the CYK parser algorithm for the defined mapping between PCFG and MEBN with a method to ensure the properness and consistency of such PCFG along with its key application, disambiguation of PP (Prepositional Phrase) attachment. The theoretical foundation proposed has been validated by a proof-of-concept implementation of the modified CYK algorithm for syntactico-semantic reasoning in Java with promising ability to disambiguate PP attachment uses cases of New York Times and Wikipedia corpus dataset samples.

Volume 9
Pages 135879-135889
DOI 10.1109/ACCESS.2021.3115445
Language English
Journal IEEE Access

Full Text