Download Creating and Digitizing Language Corpora, Volume 1: by Joan C. Beal, Karen P. Corrigan, Hermann L. Moisl PDF

By Joan C. Beal, Karen P. Corrigan, Hermann L. Moisl

More than a few digital corpora has turn into more and more available through the WWW and CD-ROM. This improvement coincided with advancements within the criteria governing the amassing, encoding and archiving of such information. much less cognizance, despite the fact that, has been paid to creating different varieties of electronic facts to be had. this is often very true of that which one could describe as 'unconventional', specifically, dialects, baby language and bilingual databases. This e-book is a primary step towards constructing related criteria for enriching and conserving those missed assets.

Show description

Read or Download Creating and Digitizing Language Corpora, Volume 1: Synchronic Databases PDF

Similar linguistics books

A Companion to the Philosophy of Language (Blackwell Companions to Philosophy)

Written via a world meeting of major philosophers, this quantity offers a survey of latest philosophy of language. in addition to offering a synoptic view of the most important concerns, figures, techniques and debates, each one essay makes new and unique contributions to ongoing debate. themes lined contain: rule following, modality, realism, indeterminacy of translation, inscrutability of reference, names and inflexible vacation spot, Davidson's software, that means and verification, purpose and conference, radical interpretation, tacit wisdom, metaphor, causal theories of semantics, items and standards of identification, theories of fact, strength and pragmatics, essentialism, demonstratives, reference and necessity, identification, that means and privateness of language, vagueness and the sorites paradox, holisms, propositional attitudes, analyticity.

The Phonology of Catalan (Phonology of the World's Languages)

This is often the main accomplished account of Catalan phonology ever released. Catalan is a Romance language, occupying a place someplace among French, Spanish, and Italian. it's the first language of six and a part million humans within the northeastern Spain and of the peoples of Andorra, French Catalonia, the Balearic Islands, and a small sector of Sardinia.

Геральдический словарь-атлас на 6 языках

Геральдический словарь, в котором представлены изображения и названия 530 геральдических фигур на 6 языках: французском, английском, немецком, испанском, итальянском, голландском.

Extra resources for Creating and Digitizing Language Corpora, Volume 1: Synchronic Databases

Example text

1 Types of data All documents in the corpus are of one or more of the following types: Text: Audio footage: Audio transcription: Video footage: Video transcription: document originally conceived as written work recording of live speech transcription of speech recording of live speech with visuals transcription of video In addition, we have comprehensive information about the individuals involved: Author: Participant: author of a document person appearing in audio or video We also hold associated administrative information, used, for example, in tracking documents from initial contact with the contributor to full copyright clearance, including clearance of third-party copyright.

PostgreSQL enables us to make use of built-in advanced text indexing, and we also have the capacity to extend or modify the way this facility works, possibly using word stemming, and so on. For security reasons, and to allow more flexibility in the future, the online 30 Jean Anderson, Dave Beavan and Christian Kay database is separate from the administrative one. The online database holds only publicly accessible information, which means that a potential security exploit would not release private information.

This method is normally faster than rekeying, but the lack of a Scots dictionary for the software means that it can generate wrong presumptions about words and the result requires careful proofreading. For example, scanning of a page of text from a story written in the Doric (north-eastern) variety produced for ‘hot’, for ‘out’, and <0> (zero) for ‘of’. The Scots past tense verb ending <-it> was separated from its stem, producing rather than the correct ‘pumped’.

Download PDF sample

Rated 4.45 of 5 – based on 9 votes