Climate Event Classification Based on Historical Meteorological Records and Its Presentation on A Spatio-Temporal Research Platform

paper, specified "short paper"
Authorship
  1. 1. Shang-Yun Wu

    Department of Atmospheric Sciences - National Central University, Department of Computer Science and Information Engineering - National Central University

  2. 2. Cheng-Han Wu

    Department of Computer Science and Information Engineering - National Central University

  3. 3. Pi-Ling Pai

    Research Center for Humanities and Social Sciences - Academia Sinica

  4. 4. Yu-Chun Wang

    Dharma Drum Institute of Liberal Arts

  5. 5. Richard Tzong-Han Tsai

    Research Center for Humanities and Social Sciences - Academia Sinica, Department of Computer Science and Information Engineering - National Central University

  6. 6. I-Chun Fan

    Institute of History and Philology - Academia Sinica, Research Center for Humanities and Social Sciences - Academia Sinica

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introduction
The impact of climate change has become more and more obvious. How to understand its cause and effect to find a way to deal with it has become an important research topic. To trace the occurrence and impact of climate disasters, many clues can be found in the rich records left by historical materials. "China's Three Thousand Years of Meteorological Records" (2004) extracts meteorological descriptions from 8,228 historical sources and organizes these descriptions by regions and dates. This important work contributes to the analysis of the temporal and spatial characteristics of meteorological phenomena. The "East Asian Historical Climate Database"(P.-K. Wang et al., 2018) is compiled based on chorographies and official histories. This study develops an event classification method based on the meteorological records in the early Qing Dynasty (1644-1795) in this database. By representing classical Chinese texts into word embedding vectors and the k-means algorithm, we overcome the difficulty of analyzing classical Chinese and not having enough training data. We then integrate the classification results with the map and timeline to develop a Spatio-Temporal search interface, which facilitates climatologist to access and analyze data according to the three dimensions of time, area and event categories.

Methodology
The main task of this study is the meteorological text classification, which is composed of three steps: preprocessing, text representation generation, and k-means clustering. We use 36,123 historical meteorological descriptions in the "East Asian Historical Climate Database" as the input data. As shown in Table 1, each record contains six fields.
To make meteorological data more suitable for machine learning semantics and classification of meteorological events, we first pre-process meteorological data, including (1) replacing GanZhi, year, month and number with
G,
Y,
M, and
N, respectively (2) removing place names and punctuation symbols because they have nothing to do with classification.

Then, we use the word2vec algorithm to convert each meteorological record written in classical Chinese into 200-dimensional embedding vectors and then use the k-means algorithm to divide all embedded vectors into k groups. We use the validation set to find the most suitable k value of 45, and evaluate the clustering results with the labeled 9,530 records. The overall accuracy is 82%. Table 2 details the 45 clusters.
From Table 1, we can see that many clusters contain the same climatic events, but after careful examination, these groups are still slightly different. Taking clusters 2 (Table 3), 29 (Table 4) and 37 (Table 5) as examples, although all three clusters can be roughly classified as flood hazards, it is found that most of the texts of cluster 37 refer to seasons. The records of cluster 29 mostly refer to building damage, while the records of the other two groups do not.
Table 1. The table of the East Asian Historical Climate Database

ID
year
province

county/
city

meteorological description
source

1795-11
1663
Anhui Province
Guichi County

秋,池州大水,城井行舟。【鄉父老云:與明萬曆三十六年水相似。】
Flood in Chi-zhou in autumn. Villagers said, "The flood is similar to the one happened in Wanli period in Ming Dynasty (1608)."

Volume twenty-nine of "Chi-zhou Local Gazetteers", published in Kang-xi period (1711) in Qing Dynasty

1795-12
1663
Anhui Province
Huangshan City

七月望,邑令陳恭備建學官,是日雷雨大作,千山如注,田間水數尺,忽浮飛木於縣東門。

In mid of July when Chen Gong, the county magistrate, was going to build a school, a thunderstorm happened. Flood was a few feets high, and floodwoods flew at the east gate of the county.

Volume eight of “Tai-ping County record”, published in Chia-ching period in Qing Dynasty

1795-13
1663
Anhui Province
Shitai County

秋大水,父老云:與萬曆三十六年水勢相似。
Old man commented about the autumn flood: “It was said to be similar to the one happened in the Wanli period in Ming Dynasty (1608).”

Volume two of "Shi-di Local Gazetteers", published in Kang-xi period in Qing Dynasty

1795-14
1663
Anhui Province
Dongzhi County

秋大水,十一月始退。
The autumn flood began to recede in November.

Volume seven of "Dong-Liu Local Gazetteers", published in Qian-long period in Qing Dynasty

1795-15
1663
Jiangxi Province
Jiujiang

大水,潰堤數處,禾黍盡沒。
Flood broke levees and crops were all be drowned.

Volume fifty-three of "De-Hwa Local Gazetteers", published in Tong-Zhi period in Qing Dynasty

1795-16
1663
Jiangxi Province
Ruichang County

秋八月,大水入城。
Flood crashed the city in August t

Volume one of "Rui-Chang Local Gazetteers", published in Kang-xi period in Qing Dynasty

Table 2. The 45 clusters and their semantics

Cluster_0:
Crop Failure

Cluster_15:
Abnormal Animal Behavior

Cluster_30:
Social Problem

Cluster_1:
Wind

Cluster_16:
Drought

Cluster_31:
Rain

Cluster_2:
Flood

Cluster_17:
Social Problem

Cluster_32:
Famine

Cluster_3:
Crop Failure

Cluster_18:
Crop Failure

Cluster_33:
Flood

Cluster_4:
Insect

Cluster_19:
Crop Failure

Cluster_34:
Crop Failure

Cluster_5:
Rain

Cluster_20:
Drought

Cluster_35:
Crop Failure

Cluster_6:
Flood

Cluster_21:
Undetermined

Cluster_36:
Rain

Cluster_7:
Famine

Cluster_22:
Social Problem

Cluster_37:
Flood

Cluster_8:
Social Problem

Cluster_23:
Famine

Cluster_38:
Undetermined

Cluster_9:
Social Problem

Cluster_24:
Rain

Cluster_39:
Flood

Cluster_10:
flood Cluster

Cluster_25:
Disease

Cluster_40:
Drought

Cluster_11:
Flood

Cluster_26:
Crop Failure

Cluster_41:
Rain

Cluster_12:
Flood

Cluster_27:
Insect

Cluster_42:
Crop Failure

Cluster_13:
Crop Failure

Cluster_28:
Wind

Cluster_43:
Rain

Cluster_14:
Social Problem

Cluster_29:
Flood

Cluster_44:
Crop Failure

Table 3. Examples of records in Cluster 2.

廣東省,大水
Guangdong Province, Great Flood.

安徽省,大水
Anhui Province, Great Flood.

河北省,大水
Hebei Province, Great Flood.

福建省,六月,大水
Fujian Province, June, Great Flood.

江西省,七月,大水
Jiangxi Province, July, Great Flood.

江西省,七月十五,大水
Jiangxi Province, 15th July, Great Flood.

江西省,七月,大水
Jiangxi Province, July, Great Flood.

湖南省,五月,大水
Hunan Province, May, Great Flood.

湖南省,五月,大水
Hunan Province, May, Great Flood.

Table 4. Examples of records in Cluster 29.

貴州省,四月十二日,大水,壞心心橋欄杆。
Guizhou Province, 12th April, Great Flood, The railing of the bridge was broken.

福建省,大水,東橋折水至西門橋頭十字街。
Fujian Province, Great flood, The folding water was from the east bridge to Ximen Bridge Crossroad.

福建省,五月初十日,南鄉大水入城。
Fujian Province, 10th May, Nanxiang water flooded into the city.

廣東省,夏五月,大水,城垣崩陷八處,共六十餘丈。

Guangdong Province, in summer and May, Great flood, The eight places of the city, total 60 feet, collapsed.

江西省,秋八月,大水,蘆溪市宗濂橋圮。
Jiangxi Province, in autumn and August, Great flood, the Zonglian Bridge collapsed in Luxi City.

甘肅省,霪雨,塌傾城垣。

Gansu Province, the city walls collapsed after a long period of rain.

江蘇省,泗州城陷沒,漕堤決。

Jiangsu Province, the city of Suzhou collapsed, the dike was broken.

廣東省,夏五月,大水,秋溪鯉魚溝堤潰。

Guangdong Province, in summer and May, Great flood, Qiuxi carp ditch dike burst.

江西省,秋七月,大水連漲七次,西溪橋鷹咀石俱沖圮。

Jiangxi Province, in autumn and July, floods rose seven times, and Yingzui Stone of Xixi Bridge collapsed.

廣東省,大水,城北平地可通舟。

Guangdong Province, Great floods, the city flat land was navigable.

河北省,堤下口決。

Hebei Province, Breach of the embankment.

Table 5. Examples of records in Cluster 37.

浙江省,秋大水。

Zhejiang Province, Autumn, Great flood.

安徽省,夏秋水。

Anhui Province, Summer and Autumn, Great flood.

湖南省,秋大水。

Hunan Province, Autumn, Great flood.

安徽省,秋大水。

Anhui Province, Autumn, Great flood.

江蘇省,秋大水。

Jiangsu Province, Autumn, Great flood.

湖南省,夏水。

Hunan Province, Summer, Great flood

浙江省,夏秋大水。

Zhejiang Province, Summer and Autumn, Great flood.

浙江省,秋大水。

Zhejiang Province, Autumn, Great flood.

江西省,春大水。

Jiangxi Province, Spring, Great flood.

Front-end interface
Then, we use the "Time and Space Infrastructure of Chinese Civilization" (CCTS) to present meteorological events on the map interface based on the year and location of the climate events in the historical meteorological records, providing an interface for researchers (see Figure 1). Our system is located at
http://iisrserv.csie.ncu.edu.tw:5000/English. The main features of the interface include a scrolling timeline, a pop-up condition selection window, and an instant response map. When the user selects the conditions, the map will immediately display the records satisfying the conditions. If the cursor is moved over the location on the map, the page below the map will show the meteorological records of the location in the timeline interval.

Figure 1. System interface.

Case Study
To show the usage of our system, we look into the meteorological records during 1650 to 1700, which is the late stage of the Little Ice Age, to investigate the phenomenon of climate change in Qing Dynasty of China.
First, we choose the “temperature” event category. Among these “temperature” records, there are 68 extreme cold climate records. We can see that this kind of phenomenon was located from tropical zone to tepid zone in China (see Figure 2), therefore, we can conclude that extreme cold records appear not only in middle- to high-latitude areas, but also in lower latitude area.

Figure 2. Areas with Low-Temperature Record from 1650 to 1700.
Extreme cold weather usually comes with disasters. We choose “rain” as our variables as step two. Among the records, there are 379 records related to “snow.” To deeply look into the records, besides directly meteorological phenomena of “snow,” we further collect the indirectly meteorological phenomena data, such as “three days in a row,” “frost-damaged trees,” ...etc (see Table 6). From Table 6, we can see that during the Little Ice Age (1650-1700), the extremely cold climate led to disasters is more frequent than that during Non-Little Ice Age (1745-1795).
Table 6. Numbers of Data related to “snow” during Little Ice Age and Non-Little Ice Age in Cluster “rain.”

Meteorological Phenomena Data

Little Ice Age

(1650-1700)

Non-Little Ice Age

1745-1795

Rain
3,014
1,618

Snow
379
119

Snow, more than 10 days /month /ten days in a row three days in a row or more
46
5

Snow, frost damaged trees
8
2

Snow, birds, animals and human beings freeze to death
17
6

When investigating the meteorological phenomena during Little Ice Age, we find out rare snow records in Taiwan. As shown in Table 7, these records are in terrain area, such as Chia-Yi county and Tainan City, which could be seen as a strong evidence of the extreme climate during Little Ice Age.
Table 7. Rare snow records in Taiwan.

year
province

county/
city

cluster
meteorological description
data sources

1683
Taiwan Province
--

Temperature (including frost and dew), and rain

冬十一月,雨雪。是夜冰堅厚寸餘。從來臺灣無雪無冰,此異事也。
In November, winter, it snows and rains. The ice is more than an inch thick that night. It's strange because there has not snowed since ancient times.

Volume ten of "Taiwan Local Gazetteers", published in Kang-xi period (1685) in Qing Dynasty

1683
Taiwan Province
--
Rain, crop failure, temperature (including frost and dew)

五月,大雨。霪雨連月,鄭氏之土田阡陌多被沖陷,有“高岸為谷”之歎。冬始雨雪,冰堅厚寸餘。臺土氣熱,從無霜雪。

In May, it rained heavily. After a long period of rain, the fields of Zheng's family were crashed, and the high bank became valley. It began to snow in winter, and the ice was more than an inch thick. Normally the field was hot, and there was no frost and snow in Taiwan.

Volume nine of "Rebuilt Taiwan Local Gazetteers", published in Kang-xi period in Qing Dynasty

1683
Taiwan Province
Chia-yi County
Flood, rain, crop failure, temperature (including frost and dew)

夏五月,大雨水,時霪雨連月,鄭氏土田多沖陷,有“高岸為谷”之歎。冬十一月,始雨雪,冰堅厚寸餘。諸羅有霜無雪,是歲甫入版圖,地氣自北而南,信有矣。

In May, it rained heavily. After a long period of rain, the fields of Zheng's family were crashed, and the high bank became valley. In November, Winter, it snowed and iced over. Normally the field was hot, and there was no frost and snow in Tsu-lo (ancient name of Taiwan). It was believed that the climate was from Northern area, starting from the year Tso-lo became Qing’s territory.

Volume twelve of "Tsu-lo Local Gazetteers", published in Kang-xi period in Qing Dynasty

1683
Taiwan Province
Tainan City

Flood, rain, flood, rain, crop failure, temperature (frost and dew), temperature (frost and dew), temperature (frost and dew), drought, rain

春,鯽魚潭涸。夏五月,大雨水,田園多沖陷。六月,澎湖潮水漲四尺。秋八月壬子,鹿耳門潮水漲。冬十有一月,雨雪冰。是臺地氣暖,從無霜雪,是歲八月甫入版圖,冬遂雨雪,冰堅寸許,地氣自北而南,運屬一統故也。

In Spring, crucian pond dried up. In May, Summer, it rained a lot, and fields are mostly crashed. In June, the tide at Penghu rose four feet high. On August 18th, the tide at Luermen rose. In November, Winter, it snowed and iced over. Normally the field was hot, and there was no frost and snow in Taiwan. However, started from August when Taiwan was under Qing’s authority, it rained and snowed in Winter, and the ice was more than an inch thick. The climate was from the Northern Area. It is because of territorial unity.

"Rebuilt Taiwan Local Gazetteers", published in Qian-long period in Qing Dynasty

Conclusion
Although technology has improved nowadays, it is still hard to predict the weather. In order to understand the impact of climate disasters and find a way to deal with it, tracing the historical climate event could be a solution. This study establishes a principle to classify meteorological phenomena based on "China's Three Thousand Years of Meteorological Records", develops a Spatio-Temporal research platform, and build an instant response front-end interface. By the climate case study, we collect the users’ feedback and improve the front-end interface, as well as enhance the precision of a mass of meteorological data analytics. Although the research platform only contains the meteorological data from 1647 to 1795, we hope to expand the capacity of the database and establish a mature Spatio-Temporal research platform in the future.

Bibliography

Chinea-Rios, M., Sanchis-Trilles, G. and Casacuberta, F. (2015). Sentence clustering using continuous vector space representation. Springer, pp. 432–40.

Dingding Wang, Tao Li, Shenghuo Zhu and Chris Ding (2008). Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization Paper presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore.

Gang Qian, Shamik Sural, Yuelong Gu and Sakti Pramanik (2004). Similarity between Euclidean and cosine angle distance for nearest neighbor queries Paper presented at the Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus.

K. M. Hammouda and M. S. Kamel (2004). Efficient phrase-based document indexing for Web document clustering.
IEEE Transactions on Knowledge and Data Engineering,
16(10): 1279–96 doi:
10.1109/TKDE.2004.58.

Ko-Chen, C. (1973). A preliminary study on the climatic fluctuations during the last 5,000 years in China.
Scientia Sinica,
16(2): 226–56.

Lili Kotlerman, Ido Dagan, Maya Gorodetsky and Ezra Daya (2012). Sentence clustering via projection over term clusters Paper presented at the Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Montréal, Canada.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. (Fifth Berkeley Symposium on Mathematical Statistics and Probability). University of California Press, pp. 281–97.

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space.
ArXiv Preprint ArXiv:1301.3781.

Solomon, S., Qin, D., Manning, M., Averyt, K. and Marquis, M. (2007).
Climate Change 2007-the Physical Science Basis: Working Group I Contribution to the Fourth Assessment Report of the IPCC. . Vol. 4. Cambridge university press.

Wang, P.-K., Lin, K., Liao, Y. C., Liao, H. M., Lin, Y. S., Hsu, C. T., Hsu, S. M., et al. (2018). Construction of the REACHES Climate Database Based on Historical Documents of China. Scientific Data, in press.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.