2019年语料库语言学国际研讨会：跨学科视角

2019-10-27 1422 文：图：编辑：

系列讲座：

1）Multifactorial Analysis in the Study of Language

时间：2019.10.25，09：00-11：00

地点：外语学院报告厅

主讲人：许家金教授，北京外国语大学

讲座内容：

介绍语料库语言学的聚类分析、主成分分析法、对应分析、二元逻辑回归、决策树等统计方法。

2）Exploratory Corpus Research: Data Preparation and Visualization Analysis

时间：2019.10.25，13：00-15：00

地点：外语学院报告厅

主讲人：李文中教授，浙江工商大学

讲座内容：

语料库探索性研究可以帮助我们通过对语料库文本数据及范畴的频率、分布以及关系分析，进而把握某些语言特征的使用型式、演变趋势以及影响效应，以获得对所观察语言现象的洞见，更好的设计研究问题或假设。本研究从语料库文本基本数据检索入手，从实际案例出发，通过正则表达式和R应用，对数据进一步整理、转换和清理，基于具体的研究意图，应用不同的统计方法和可视化手段，深入理解研究问题的设计困难，探究难题的解决路径。

3）The BYU Corpora: from Beginning to Advanced in Just Two Hours!

时间：2019.10.25，15：30-17：30

地点：外语学院报告厅

主讲人：Mark Davies教授，Brigham Young University

讲座内容：

We have previously modelled the life-cycle of a word in text (Renouf, 2013), identifying the stages of activity and change which it undergoes in a large corpus of UK news text across time. Our objective in this paper is to observe the evolution of a term from a similar perspective. A term is a special case, a word or phrase which names a concept or object within a specialised domain of study or activity. It is a semantically fixed entity at the time of coinage (Sager, 1994). Terms vary in kind, from formulae such as H2N2, to classical formations such as infectious mononucleosis, to scientific discoveries like event horizon, to ‘lay’ medical terms like Asian flu, to terms closer to general language, such as mouse or desktop in computing. Within a mainstream newspaper, a term occurs in two main locations: in the publication’s sections of specialised text; or in the general news section if, like HS2, it reflects a real world event or, like PVC, has established its place there over time.

The approach is based on the theory that the life-cycle of a term in diachronic news text will actually differ from that of a word, for a number of reasons. Firstly, terms come into being somewhat differently. A word is typically created by general word formation rules, whereas a term is typically based on a formula, classical roots or linguistic components within a specialised domain, or it undergoes “terminologisation”, combining pre-existing general words to create a specialised meaning, such as standalone or downsize. Secondly, a word typically resides in general text, whereas a term may take one of several paths. If highly specialised, it will probably remain in its rarefied domain, quoted occasionally in general text. If it is less rarefied, a layman’s term, it may move into general use, where its underlying concept will probably acquire a “conceptual fuzziness” (Halskov, 2005). Alternatively, a popular term may undergo a process of “determinologisation” (Meyer & Mackintosh: 2000) in everyday use, gaining a new metonymic or metaphorical sense, as with water-cooler, in water-cooler conversation.

In the paper, we shall study a selection of terms, partly informed by the work of Condamines & Picton (2014), and Meyer & Mackintosh (ibid.), and explore their paths through the corpus, observing features of terminological and determinological evolution, including the co-existence of evolving meanings and uses, ‘primary and secondary’ determinologisation, inflectional specificity versus lemmatisation, the creative use of layman’s and specialist terms, and the issue of grammatical change. The description of these features forms the structure of the paper’s findings. The corpus in question is a 1.5 billion-word diachronic collection of Independent and Guardian texts, from 1984-2015, processed by the WebCorp Linguist’s Search Engine (http://wse1.webcorp.org.uk/). It is hoped that this corpus-based, diachronic perspective will help to clarify the evolution of terms, and throw new light on the nature of semantic change in terminology for corpus linguists generally.

4）Making Large Corpora Useful: Creating Virtual Corpora in the BYU Corpora

时间：2019.10.26，09：40-10：20

地点：综合楼报告厅

主讲人：Mark Davies教授，Brigham Young University

讲座内容：

Corpora such as the BNC, COCA, or many of the corpora from Sketch Engine are “general purpose” corpora. But many users want data related to a particular topic, such as engineering, astronomy, biology, photography, or basketball, or even something as specific as endocrinology, pasta, dams, refugees in Europe, or China’s “Belt and Road” initiative. The BYU corpora (now www.english-corpora.org) allow users to quickly and easily create these “Virtual Corpora”. Using either keywords in the corpus texts or metadata referring to the texts (e.g. country, data, or title), users can create a Virtual Corpus containing hundreds of thousands or even millions of words in just 2-3 seconds. Once they have created the Virtual Corpus, they can then search within it (as though it were its own standalone corpus), compare different Virtual Corpora, and (in just 1-2 seconds) generate a list of keywords from a Virtual Corpus (which can be very useful for teaching). The end result is that even in very large corpora (such as the 14 billion word iWeb corpus, or the 8+ billion word, ever-growing NOW corpus), teachers, learners, and researchers can quickly and easily extract useful data on their particular topic of interest.

5）A Contrastive Study of Local Grammar Patterns of Evaluation in Research Articles by Chinese and Western Law Scholars

时间：2019.10.26，10：20-11：00

地点：综合楼报告厅

主讲人：卫乃兴教授，北京航空航天大学

讲座内容：

Local grammar is a new approach to grammatical and functional descriptions of language. Previous local grammar studies indicate that they can offer more specific, more transparent, and more precise information about language in actual use. This talk reports on our findings of a contrastive study of the local grammar patterns of evaluation in research articles by Chinese and Western law scholars, by using data from the self-built Beijing Collection of Academic Research Essays corpus (Beijing CARE). The Pattern Grammar approach (Hunston & Francis 2000) and a local grammar method (Hunston & Sinclair 2000; Barnbrook 2002) were adopted as analytical instruments. The data reveal substantial differences in the use of evaluative patterns by the two groups of scholars. Specifically, Western law scholars use evaluative patterns significantly more frequently than their Chinese counterparts, suggesting that Chinese law academics are relatively weak at expressing subjective attitudinal meanings. Secondly, our analyses have brought to the fore two sets of different characteristic local grammar patterns for the two groups of scholars, with distinctive functional constituents. Thirdly and more interestingly, the two sets of characteristic local grammar patterns embody distinctive semantic parameters. Whilst Western scholars frequently evaluate objects in the physical world, their Chinese counterparts strongly tend to comment on the law-related and research-related objects. In a similar vein, Western scholars’ patterns focus on the meanings of “likelihood” and “reasonability” while Chinese scholars’ patterns feature the meanings of “importance”, “ability/responsibility” and “difficulty”, etc. These differences are discussed in terms of disciplinary culture, epistemology and discourse strategies.

6）Variation of Metadiscursive Verb Patterns in Medical English: An Intradisciplinary Probe

时间：2019.10.26，11：00-11：40

地点：综合楼报告厅

主讲人：许家金教授，北京外国语大学

讲座内容：

As a cover term for the negotiation of propositional information and reader engagement, metadiscourse has gained considerable attention from scholars of academic discourse. The recent proposal of metadiscursive nouns (Jiang & Hyland 2018) has shed new light on the well-researched field of metadiscourse. Following Jiang & Hyland (2018) and studies on reporting verbs, we attempt to examine the metadiscursive verb patterns (MVPs, e.g. see + <ENDOPHORIC>, {=STUDY} + [SUGGEST], {=WRITER} + [OBSERVE]) in the self-built MedDEAP corpus of five million words, a clinical medicine English research article corpus consisting of 18 sub-disciplines. An intradisciplinary investigation into MVPs will be conducted to analyze their structural and functional variation across the sub-disciplines of medical academic English. The MVP variability will be accounted for by the disciplinary plurality of medicine per se in that the established field is by no means monolithic. Many MVPs exhibit a preference to some sub-disciplines over others. Moreover, medical sub-disciplines are characterized by methodological and conceptual cross-fertilisation. Hence, the variation of MVPs is a natural linguistic representation of interdisciplinary synergy.

7）Rethinking Corpus Linguistics: the Issue of Meaning

时间：2019.10.27，08：30-09：10

地点：综合楼报告厅

主讲人：Wolfgang Teubert教授，University of Birmingham

讲座内容：

There is no meaning without language. Therefore for me the core remit of linguistic research is meaning. Discourse is made up of recurrent elements, lexical items, longer phrases, even text segments. To understand a text we must know what the elements constituting this text mean.

Corpus linguistics took off in the sixties of last century, inspired by the computer promising us the clear, unambiguous, objective answer to many unsolved problems, one of them the ‘true’ meaning of what a text says. But while computers are good at processing data, they cannot deal with meaning. This is why corpus linguistics, following J.R. Firth, has developed tools delivering for each lexical item ‘an abstraction on the syntagmatic level’. This is not the same as its meaning, defined by the Cobuild dictionary as ‘the thing or idea that it refers to or represents, and which can be explained using other words’.

This is why I have introduced the paraphrase principle. It is based on the recognition that a lexical item means only what is said about it. If I don’t know what the item friendly fire means, people will paraphrase it for me, e.g. “Friendly fire is an attack by military force on one’s own troops.” Depending of what they themselves have been told, they will give me different paraphrases. Every new text in which the friendly fire occurs adds something to its meaning, contributing to its paraphrastic content. So we read in the weekly The Statesman of 28 May 2019 “Admittedly friendly fire is not terribly uncommon,” which adds to what we already know that it is not a rare incident. Thus the meaning of a lexical item (type) is the sum of the paraphrases we find in a discourse for a lexical item. The paraphrase principle repudiates the existence of a fixed meaning. Each new occurrence will add something to its meaning. It can endorse or reject what others have said, or it adds a new feature, or presents a new perspective. The meaning of a lexical item is never stable; it evolves as long as it is used.

To deal with the paraphrase principle, corpus linguistics has to study the diachronic dimension of discourse, and it has to identify what is new in each occurrence of a lexical item. Based on an explicit methodology of computational data processing, it has to extract, organise and present the paraphrastic content for each lexical item (type) under consideration. For each lexical item (token), it has to show how it differs from other occurrences. But this evidence is not yet the meaning of the lexical item. It is the result of data processing. It is data, nothing else.

To find out what friendly fire means, I have to interpret this data. Interpretation, though, is free. It is not a science; it is a craft, or even an art. My interpretation will be different from yours. There is no ‘true’ and no final interpretation. All our interpretations of friendly fire will contribute to its meaning, requiring ever more interpretations. In the end, friendly fire may mean something different to each of us. By comparing our interpretations, we now know much better what it can mean.

8）On “New Corpus-Driven” Approach

时间：2019.10.27，09：10-09：50

地点：综合楼报告厅

主讲人：何安平教授，华南师范大学

讲座内容：

Quantitative analysis characterized by frequency driven has always been the core of corpus linguistics. This speech is about the third phase of quantitative research in recent years, which shifts its focus to the “way that information in the corpus is organised” and is called a “new corpus-driven” approach (Hunston 2016, 2018). I will first introduce the background and definition of this new term.

Then, I will summarize some key features and strength of the approach by demonstrating some research cases.

Finally, I will share some reflections on this new approach, regarding it as one of corpus linguistic trends which is challenging traditional corpus researchers.

9）Automated Evaluation of Translation Scripts in a Large-Scale Translation Contest

时间：2019.10.27，10：20-11：00

地点：综合楼报告厅

主讲人：粱茂成教授，北京航空航天大学

讲座内容：

Translation is a common performance task for foreign language learners. The evaluation of translation scripts, however, is not only labor-intensive and time-consuming, but also rather subjective and prone to low reliability. The study of the automated evaluation of translation is therefore of immediate practical significance.

This study draws on the state-of-the-art Doc2vec technology in NLP to construct a model for the automated evaluation of translation. The model was then used to generate a score for each of the 11,049 translation scripts collected from the Han Suyin Translation Contest. When the machine-generated scores were compared with human-generated scores, it was found that the Doc2vec-based model can produce scores with high reliability and validity. The model can efficiently identify features of good translation, and can therefore be reliably used as a second rater in large-scale translation tests.

Some of the limitations in the automated evaluation of translation are also discussed.

供稿单位：

讲座一览

2019年语料库语言学国际研讨会：跨学科视角

快速链接

专题网