Definition
In deep learning, large language models are typically trained on data from a corpus as representative of current knowledge. However, natural language is not an ideal form for the reliable communication of concepts. Instead, formal logical statements are preferable since they are subject to verifiability, reliability, and applicability. Another reason for this preference is that natural language is not designed for an efficient and reliable flow of information and knowledge, but is instead designed as an evolutionary adaptation as formed from a prior set of natural constraints. As a formally structured language, logical statements are also more interpretable. They may be informally constructed in the form of a natural language statement, but a formalized logical statement is expected to follow a stricter set of rules, such as with the use of symbols for representing the logic-based operators that connect multiple simple statements and form verifiable propositions.
A second hypothesis, not exclusive to the rst one, is that meaning is recognized by processes that rely on spatial scale across natural language samples. It is presumed that the higher-scale representations encoded by natural language are captured in the many layers of the neural network in deep learning, but it is unknown whether tokenization of natural language samples by the use of subwords is completely efcient. It may be that these neural networks do not have the data nor network scale to robustly compute these higher-order representations from current tokenization practices for natural language. In this case, a model that can process a hierarchy of tokens at different spans across samples may be considered. An analogy is in the construction of the monumental pyramids in ancient Egypt. If they were constructed of small blocks, then it would be a more difcult assembly process than the actual construction, which is based on assembly by very large blocks . It follows that it woul
id: a1521af1e2dc562413311d96584e414a - page: 6
It does capture meaning at least at a subword level, however, since these tokens have correspondence to the property of meaning, and the models are capable of capturing the higher-level representations from this procedure (Figure 1).
id: 6d87fa8ab45f44a187553ee8f7c868a6 - page: 6
Figure 1. Tokenization of natural language samples. Each token may be assigned to a word or a string of words in a document. Generation of a text sequence by deep learning is dependent on the tokenization procedure. (A) The contiguous line represents a sequence of words as they appear in a document. Above this line are dashed lines which are tokens that correspond to individual words in the document. (B) Same as (A), except instead of subwords, the longer dashed lines represent a larger sequence of text. Therefore, each dash is a token that corresponds to many words in a document. 692
id: 994628663f2f5764a51fab80bc4211bb - page: 6
Encyclopedia 2023, 3 However, it is not certain that complex logic is easily captured by the above method. It is possible that training on translations of natural language samples to a logical format would increase performance of the large language models. Another possibility is that the tokenization process should include the higher-order features of natural language that correspond to logic, and other features that have a property of meaning in the context of cognition. 4. Large Language Models and Society
id: 5b101a18ff27f8297bbd2dcd89081b23 - page: 7