Created at 10am, Mar 5
Ms-RAGArtificial Intelligence
0
Surveying the Dead Minds: Historical-Psychological Text Analysis with Contextualized Construct Representation (CCR) for Classical Chinese
sslFGIYOaQSpcqHLedIME43xRmXcjEO-pAN10Fsb_xY
File Type
PDF
Entry Count
71
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Yuqi ChenPeking Universitycyq0722@pku.edu.cnSixuan LiXiaoying AI Lablisixuan@xiaoyingai.comYing LiPeking Universityyingliclaire@pku.edu.cnMohammad AtariUniversity of Massachusetts Amherstmatari@umass.eduAbstractIn this work, we develop a pipeline for historical-psychological text analysis in classical Chinese. Humans have produced texts in various languages for thousands of years; however, most of the computational literature is focused on contemporary languages and corpora. The emerging field of historical psychologyrelies on computational techniques to extract aspects of psychology from historical corpora using new methods developed in natural language processing (NLP). The present pipeline, called Contextualized Construct Representations (CCR), combines expert knowledge in psychometrics (i.e., psychological surveys) with text representations generated via transformer-based language models to measurepsychological constructs such as traditionalism, norm strength, and collectivism in classicalChinese corpora. Considering the scarcity of available data, we propose an indirect supervisedcontrastive learning approach and build the first Chinese historical psychology corpus(C-HI-PSY) to fine-tune pre-trained models. We evaluate the pipeline to demonstrate its superiorperformance compared with other approaches. The CCR method outperforms word embedding-based approaches across all of our tasks and exceeds prompting with GPT-4 in most tasks. Finally, we benchmark the pipeline against objective, external data to further verify its validity.

5 Benchmarking: Traditionalism, Authority, and Attitude toward Reform 4.3 Results To address the lack of benchmark datasets related to psychological measurement in classical Chinese, we further validate the effectiveness of the CCR method using externally annotated data. For the Semantic Textual Similarity (STS) task, we evaluate the DDR and CCR methods through a rigorous process involving 20 rounds of random sampling. In each round, 4,308 random paragraph pairs are constructed from the C-HI-PSY test set. Officials Attitudes toward Reform in the 11th Century Moral values and political orientations are closely intertwined (Federico et al., 2013; Kivikangas et al., 2021). For example, the attitude Framework Base Model Semantic Textual Similarity (Easy Task) Semantic Textual Similarity (Hard Task) Questionnaire Item Classification Psychological Measure Pears. Spear. Pears. Spear. Accuracy Pears. Spear. (a) DDR
id: d537df30beaf7909fd054acbb1b2b9af - page: 7
Word2Vec (CBOW) Word2Vec (Skip-gram) FastText (CBOW) FastText (Skip-gram) GloVe / / / / / .02.11 .08.11 .05.11 .10.10 .07.10 .02.10 .03.02 .02.01 .02.01 .09.11 .02.02 .01.01 .04.10 .01.01 .04.01 .03.02 .11.10 .01.01 .01.02 .09.11 .80.16 .87.15 .90.13 .85.16 .83.15 .22.07 .23.05 .18.07 .18.06 .23.08 .24.06 .20.07 .20.05 .16.09 .19.05 (b) Prompting GPT GPT GPT-3.5-turbo-0125 GPT-4-0125-preview .08 .62 .04 .52 .26 .40 .28 .30 .63 .77 .05.08 .08.10 .25.15 .27.17 (c) CCR (ours) BERT RoBERTa RoBERTa Bert-ancient-chinese Guwenbert-base Guwenbert-large .53.07 .29.07 .41.05 .55.07 .46.09 .44.07 .42.01 .25.01 .28.01 .43.01 .40.01 .31.01 .93.11 .90.11 .83.13 .30.04 .30.04 .20.06 .23.09 .22.04 .20.05 SBERT Paraphrase-multilingualMiniLM-L12-v2 .20.15 .21.14 .18.01 .19.01 .82.19 .15.04 .14.05 MacBERT+CoSENT text2vec-base-chinese .41.09 .40.09 .32.01 .31.01 .95.08 .21.10 .20.10 ERNIE+CoSENT
id: 7a36675db38f675109ff3448f5afa725 - page: 8
45.09 .45.09 .38.01 .37.01 .93.11 .21.03 .20.04 LERT+CoSENT text2vec-large-chinese .46.12 .47.08 .36.01 .38.01 .97.07 .28.05 .27.05 Table 2: Performance on the test set across three tasks using three methods: DDR, LLM Promping, and CCR. Details of models for the DDR method are explained in the Appendix B. Models for the CCR method have been fine-tuned on the C-HIS-PSY training set. Models for the prompting method include the versions of GPT-3.5 and GPT-4 that were released on January 25, 2024. of individuals toward reforms, policy changes, and new legislation often reflects traditionalism, conservatism, and respect for authority (Hackenburg et al., 2023; Koleva et al., 2012). Those with stronger traditionalist views are more likely to identify with the existing social order and resist changes to the status quo (Osborne et al., 2023; Jost and Hunyady, 2005).
id: 2a6b880c5db92e4adec19f6234dc3da2 - page: 8
1. Employing the best-performing fine-tuned SBET model, we use our CCR pipeline to measure the levels of traditionalism and attitudes toward authority expressed in their texts. For each individual official, results are aggregated by calculating the average score across all of their writings. Throughout Chinese history, there have been numerous instances of significant reforms, one of the most notable of which being the Wang Anshis New Policies in the 11th century, which faced mixed reactions from officials. We draw upon a dataset manually compiled by Wang (2022), who annotated the attitudes of 137 major officials toward the reform. Support for Reform Traditionalism -0.441*** Authority 0.472***
id: 2ae5e9ca95437dcc83d95a12c0034842 - page: 8
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "sslFGIYOaQSpcqHLedIME43xRmXcjEO-pAN10Fsb_xY", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "sslFGIYOaQSpcqHLedIME43xRmXcjEO-pAN10Fsb_xY", "level": 2}'