标题: W-TextCNN: A TextCNN model with weighted word embeddings for Chinese address pattern classification
作者: Zhang, C (Zhang, Chen); Guo, RZ (Guo, Renzhong); Ma, XY (Ma, Xiangyuan); Kuai, X (Kuai, Xi); He, B (He, Biao)
来源出版物: COMPUTERS ENVIRONMENT AND URBAN SYSTEMS 卷: 95 文献号: 101819 DOI: 10.1016/j.compenvurbsys.2022.101819 出版年: JUL 2022
摘要: Geocoding is crucial to support location-based services and has become a widely accessible technique in geographic information systems (GIS). In a geocoding system, addresses are one of the main geographical reference texts as input. Address patterns refer to the organizational rules of combining address components into an address. In China, intricate rules and backwards address planning make address patterns not systematic and difficult to recognize, which creates significant challenges for database construction and address standardization. Inspired by deep learning methods, this paper provides a convolutional neural network for text with weighted word embeddings (W-TextCNN) for Chinese address pattern classification. Specifically, we define eight address patterns to represent the structures of addresses considering the characteristics of address components. For calculating addresses in the neural network, word embeddings with a weighted strategy are implemented for transforming address texts into real-valued vectors. The vectors are fed into a convolutional neural network for text (TextCNN) to train for classifying address patterns automatically. Furthermore, we apply W-TextCNN in the address corpus after fine-tuning the hyperparameters and compare it with several methods commonly used in text classification. We also design two tasks address segmentation and address matching to explore the effect of address pattern classification. The accuracy and F1 score of the model on classification achieve 97.45% and 96%, respectively, and W-TextCNN outperforms TextCNN because of the employment of the weighted word embeddings. Additionally, the results reveal the positive impact of address pattern classification on improving segmentation precision and address quality. The proposed model is expected to expand the toolkit of computational address study with deep learning methods.
作者关键词: Address patterns; Address components; Address structure; Geocoding; Weighted word embeddings; Convolutional neural network
地址: [Zhang, Chen; Guo, Renzhong; Ma, Xiangyuan] Wuhan Univ, Sch Resource & Environm Sci, Wuhan 430079, Peoples R China.
[Guo, Renzhong; Kuai, Xi; He, Biao] Shenzhen Univ, Res Inst Smart Cities, Sch Architecture & Urban Planning, Shenzhen 518060, Peoples R China.
通讯作者地址: Guo, RZ (通讯作者)，Wuhan Univ, Sch Resource & Environm Sci, Wuhan 430079, Peoples R China.
电子邮件地址: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org; email@example.com
版权所有 © 官方金沙娱场151_首页(welcome)