Zhuoren Jiang
Dalian Maritime University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhuoren Jiang.
international acm sigir conference on research and development in information retrieval | 2015
Xiaozhong Liu; Zhuoren Jiang; Liangcai Gao
Scientific publication retrieval/recommendation has been investigated in the past decade. However, to the best of our knowledge, few efforts have been made to help junior scholars and graduate students to understand and consume the essence of those scientific readings. This paper proposes a novel learning/reading environment, OER-based Collaborative PDF Reader (OCPR), that incorporates innovative scaffolding methods that can: 1. auto-characterize student emerging information need while reading a paper; and 2. enable students to readily access open educational resources (OER) based on their information need. By using metasearch methods, we pre-indexed 1,112,718 OERs, including presentation videos, slides, algorithm source code, or Wikipedia pages, for 41,378 STEM publications. Based on the computational information need, we use text mining and heterogeneous graph mining algorithms to recommend high quality OERs to help students better understand the scientific content in the paper. Evaluation results and exit surveys for an information retrieval course show that the OCPR system alone with the recommended OERs can effectively assist graduate students better understand the complex STEM publications. For instance, 78.42% of participants believe the OCPR system and recommended OERs can provide precise and useful information they need, while 78.43% of them believe the recommended OERs are close to exactly what they need when reading the paper. From OER ranking viewpoint, MRR, MAP and NDCG results prove that learning to rank and cold start solutions can efficiently integrate different text and graph ranking features.
exploiting semantic annotations in information retrieval | 2014
Zhuoren Jiang; Miao Chen; Xiaozhong Liu
Concepts have been used extensively in semantic annotating. Explicit Semantic Analysis (ESA) is a concept feature generator, which represents text by a concept-level vector, such as a vector of Wikipedia concepts. It is also considered a human-friendly way to annotate text - it generates concept vector that can be easily interpreted by human. We propose an approach, RescoredESA, based on ESA, according to aspects upon which ESA can enhance: 1) sometimes the output vectors do not assign high scores to concepts relevant to the text; 2) it considers words in the text when representing the text to concept-level vector while not considering the concepts explicitly occurring in the text, which can be an important source for assigning scores to ESA vector dimensions. We evaluate it against the 20 newsgroup classification task, and the result shows a slight enhancement when combining vectors from RescoredESA and bag-of-words.
association for information science and technology | 2016
Zhuoren Jiang; Xiaozhong Liu; Yan Chen
The citation relationships between publications, which are significant for assessing the importance of scholarly components within a network, have been used for various scientific applications. Missing citation metadata in scholarly databases, however, create problems for classical citation‐based ranking algorithms and challenge the performance of citation‐based retrieval systems. In this research, we utilize a two‐step citation analysis method to investigate the importance of publications for which citation information is partially missing. First, we calculate the importance of the author and then use his importance to estimate the publication importance for some selected articles. To evaluate this method, we designed a simulation experiment—“random citation‐missing”—to test the two‐step citation analysis that we carried out with the Association for Computing Machinery (ACM) Digital Library (DL). In this experiment, we simulated different scenarios in a large‐scale scientific digital library, from high‐quality citation data, to very poor quality data, The results show that a two‐step citation analysis can effectively uncover the importance of publications in different situations. More importantly, we found that the optimized impact from the importance of an author (first step) is exponentially increased when the quality of citation decreases. The findings from this study can further enhance citation‐based publication‐ranking algorithms for real‐world applications.
conference on information and knowledge management | 2017
Dong An; Liangcai Gao; Zhuoren Jiang; Runtao Liu; Zhi Tang
Citation metadata extraction plays an important role in academic information retrieval and knowledge management. Current works on this task generally use rule-based, template-based or learning-based approaches but these methods usually either rely on handcrafted features or are limited with domains. Recently, neural networks have shown strong ability in addressing sequence labeling tasks. In this paper, we propose a sequence labeling model for citation metadata extraction, called segment sequence labeling. Instead of inferring at word level, the input sequence is first divided into segments, and then features of the segments are computed to infer the label sequence of the segments. We first run experiments to validate the effectiveness of different parts of the model by comparing it with a CRF-based model and a neural network-based model. Experimental results show our model beats both models on most fields. Besides, our model is evaluated on public datasets UMass and Cora and has achieved significant performance improvement. Our model was trained on the data which were generated from BibTeX files collected on the Web and annotated automatically.
international acm sigir conference on research and development in information retrieval | 2018
Zhuoren Jiang; Yue Yin; Liangcai Gao; Yao Lu; Xiaozhong Liu
While the volume of scholarly publications has increased at a frenetic pace, accessing and consuming the useful candidate papers, in very large digital libraries, is becoming an essential and challenging task for scholars. Unfortunately, because of language barrier, some scientists (especially the junior ones or graduate students who do not master other languages) cannot efficiently locate the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Hierarchical Representation Learning on Heterogeneous Graph (HRLHG), to address this new problem. HRLHG can learn a representation function by mapping the publications, from multilingual repositories, to a low-dimensional joint embedding space from various kinds of vertexes and relations on a heterogeneous graph. By leveraging both global (task specific) plus local (task independent) information as well as a novel supervised hierarchical random walk algorithm, the proposed method can optimize the publication representations by maximizing the likelihood of locating the important cross-language neighborhoods on the graph. Experiment results show that the proposed method can not only outperform state-of-the-art baseline models, but also improve the interpretability of the representation model for cross-language citation recommendation task.
acm ieee joint conference on digital libraries | 2018
Ke Yuan; Liangcai Gao; Zhuoren Jiang; Zhi Tang
With the ever-increasing volume of formulae on the Web, formula retrieval has drawn much attention from researchers. However, most of the existing researches on formula retrieval treat each formula within an article equally, while different formulae in the same article have different importance to the article. In this paper, we address the issue to rank formulae within an article based on their importance. To evaluate the importance of each formula within an article, a formula citation graph is firstly built in a large scale corpus. And the inter-articles features of formulae are extracted by the link topology analysis of formulae based on the graph. Then the word embedding model is explored to extract the inner-article features by mining the semantic relevance between a formula and the corresponding article. Finally, we leverage learning to rank technique for formulae ranking within an article based on those features. The experimental results demonstrate that the proposed features are helpful for formula ranking and our approach yields better performance compared with other state-of-the-art methods.
acm ieee joint conference on digital libraries | 2018
Zhuoren Jiang; Yao Lu; Xiaozhong Liu
While citation recommendation can be important for scholars, unfortunately, because of language barrier, some scientists cannot efficiently retrieve and consume the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Publication Content and Citation Representation Fusion (PCCRF). PCCRF can learn a representation function by mapping the publications, from various languages and repositories, to a low-dimensional joint embedding space from both content semantic and citation relation viewpoints. The proposed method can optimize the publication representations by maximizing the likelihood of observing network neighborhoods (which are generated by a semi-supervised random walk algorithm) of publications. Experimental results show that the proposed method can be promising for cross-language citation recommendation.
National CCF Conference on Natural Language Processing and Chinese Computing | 2017
Runtao Liu; Liangcai Gao; Dong An; Zhuoren Jiang; Zhi Tang
Metadata information extraction from academic papers is of great value to many applications such as scholar search, digital library, and so on. This task has attracted much attention from researchers in the past decades, and many templates-based or statistical machine learning (e.g. SVM, CRF, etc.)-based extraction methods have been proposed, while this task is still a challenge because of the variety and complexity of page layout. To address this challenge, we try introducing the deep learning networks to this task in this paper, since deep learning has shown great power in many areas like computer vision (CV) and natural language processing (NLP). Firstly, we employ the deep learning networks to model the image information and the text information of paper headers respectively, which allow our approach to perform metadata extraction with little information loss. Then we formulate the problem, metadata extraction from a paper header, as two typical tasks of different areas: object detection in the area of CV, and sequence labeling in the area of NLP. Finally, the two deep networks generated from the above two tasks are combined together to give extraction results. The primary experiments show that our approach achieves state-of-the-art performance on several open datasets. At the same time, this approach can process both image data and text data, and does not need to design any classification feature.
Proceedings of the American Society for Information Science and Technology | 2014
Joanna Landrum; Liangcai Gao; Zhuoren Jiang; Noriko Hara; Xiaozhong Liu
Helping students better understand scientific publications is one of the essential tasks of higher education. This paper presents a novel learning/reading environment that incorporates innovative scaffolding methods that aim to enable students to readily access open educational resources (OER) and read and comprehend the paper in a collaborative reading environment within a course. The new system, OER-based Collaborative PDF Reader (OCPR), captures and characterizes students’ emerging information needs when they read a paper, while also auto-recommending high quality OERs, e.g., presentation videos, slides, or Wikipedia pages, when students ask for clarifying content (OER-based Scaffolding). The study was conducted in a course for students to read an article, ask questions, and rate resources suggested by the OCPR system. The preliminary results show that the students tend to ask elementary questions and rate video resources higher than other resources. The results will inform future improvement of the OCPR system.
conference on information and knowledge management | 2015
Zhuoren Jiang; Xiaozhong Liu; Liangcai Gao