Tuesday, January 30, 2007

University of Bielefeld

Department of English

M.Ed. British and American Studies

Lecture: “How to Make a Dictionary”

BM 2: Introduction to English Linguistics

Lecturer: Prof. Dr. Dafydd Gibbon

Contact: gibbon@uni-bielefeld.de

Tuesday: 8.00- 10.00, H 14

Winter term 2006/ 2007


Welcome to Melanie Zahn’s Web log
“How to Make a Dictionary”

Web Portfolio by: Melanie Zahn

-------------------------------------




Please Note: On a web log, the entries of a learner’s portfolio emerge in an inverse order. This means that the latest entries are to be found at the beginning of this portfolio, whereas the oldest entries emerge at the bottom of the web page. In order to have full access to all entries, please consult the ARCHIVES from October to January, because not every entry will be shown on the “Current Posts” site.

If you want to consult my Web log “Introduction to Linguistics”, please click on my profile and use the link called “Introduction to Linguistics” :-)

Contact: Melanze@web.de

Registration Number: 1666313


Internet Sources:


http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/HTMD/

http://www.spectrum.uni-bielefeld.de/~ttrippel/htmd/index.html

http://www.sandtomatoes.com/tutorial/tutorial.html

www.wikipedia.com

http://www.etymonline.com/

http://dict.leo.org/



Further Reading:

Cruse, D. A.: Lexical semantics / D. A. Cruse . - Cambridge [u.a.] : Cambridge Univ. Press , 1986 .

Crystal, David: The Cambridge encyclopedia of the English language. - 2. ed. . - Cambridge [u.a.] : Cambridge Univ. Press , 2003 .


Eco, Umberto: Semiotics and the philosophy of language, London: Macmillan, 1984

Horst M. Müller (Hrsg.): Arbeitsbuch Linguistik. - Paderborn [u.a.] : Schöningh , 2002 .

Ilson, Robert (ed.):Lexicography : an emerging internat. profession. Manchester : Manchester Univ. Pr. [u.a.] , 1986 .

James R. Hurford and Brendan Heasley: Semantics : a coursebook. - Cambridge [u.a.] : Cambridge Univ. Pr. , 1983 .

Katamba, Francis: Morphology - New York : St. Martins Press , 1993 .


Landau, Sidney I.: Dictionaries : the art and craft of lexicography. - Reprint, orig. publ.: New York,

Scribner, 1984 . - Cambridge [u.a.] : Cambridge Univ. Pr. , 1989

Lyons, John: Semantics / John Lyons . - Cambridge

Matthews, Peter H.: Morphology / P. H. Matthews . - 2. ed. . - Cambridge [u.a.] : Cambridge Univ. Pr. , 1991 .

Mugglestone, Lynda (ed.): Lexicography and the OED : pioneers in the untrodden forest. - Oxford [u.a.] : Oxford Univ. Press , 2000 .

O'Grady, William and Michael Dobrobolsky: Contemporary linguistics : an introduction . - U.S. ed. / prepared by Mark Aronoff . - New York : St. Martin's Pr. , 1989 .

Ooi, Vincent B. Y.: Computer corpus lexicography. - Edinburgh : Edinburgh Univ. Press , 1998 .

Pinker, Steven: The language instinct. - London [u.a.] : Lane, Penguin Pr. , 1994 .


Saeed, John Ibrahim: Semantics - 2. ed. . - Oxford [u.a.] : Blackwell , 2003 .

Stephan Gramley and Kurt-Michael Pätzold: A survey of modern English . - 2. ed. . - London [u.a.] : Routledge , 2004 .


Van Eynde, Frank and Dafydd Gibbon(eds.): Lexicon Development for Speech and Language Processing, Kluwer Academic Publishers, Dordrecht, 2000

Walker, Donald E. (ed.):Automating the lexicon : research and practice in a multilingual environment. Oxford [u.a.] : Oxford Univ. Press , 1995.

Saturday, January 27, 2007

How to Make a Dictionary; Session 12, Tuesday, 2007-01-22


Computational Lexicography

Review of Lexicography Principles

It is imoprtant to know about the theroy and the funtion of computational lexicography in order to understand how lexical information is related to a larger context.

Linguists use computational lexicography in order to learn more about new vocabulary and their relation within a given context.

Words are located in a text corpus, isolated, regrouped and reintegrated into an immediate context.

The following summary is about how this is exactly done with KWIC (KeyWord In Context) and what the basic notion of concordance means.


Criteria for Good Lexicography

Quantity:
-Completeness of coverage:
--- extensional coverage: number of entries
-intentional coverage: number of lexical information

Quality:
-Correctness of information:
-Types of lexical information

Consistency of structure:
-Macrostructure
-Mesostructure
-Microstructure



Lexicographic workflow cycle:

Data acquisition- -----------→-------Lexicon construction
-Recordings --------------------------------- Metadata
-Text collection --------------------------- Information retrieval
-Concordance ----------------------------- Linguistic analyses
-Dictionaries
---------------------------------------------------------
Lexical evaluation: -----←---------Access to data:
Internal:---------------------------------- - Traditional print media
- consistency ----------------------------- Hyperlexicon: CD, internet
- completeness --------------------------- Software with lexicon component:
External ---------------------------------- word processing
- utility for users ----------------------- speech processing




1.) Lexical Data Acquisition

From Corpus to lexicon


----------------Layer 4: LEXICON WITH GENERALISATION HIERARCHIES
--------------------------------(general type, default inheritance)
LEXICON ----Layer 3: LEXOCON WITH SELECTED GENERALISATION
--------------------------------(procedurally optimised: semasiologically, onomasiologically)
----------------Layer 2: LEXICON MATRIX
--------------------------------(entries x data categories, no generalisation)
----------------Layer 1: CORPUS LEXICON
--------------------------------(wordlist, concordance, HMM, ...)
___________________________________________________________________________
CORPUS -----Layer 2: SECONDARY DATA
-------------------------------(transcription, annotation, metadata)
----------------Layer 1: PRIMARY DATA
------------------------- (audio/ video recording)





From Corpus to Lexicon...

Concordance

A KWIC (Key Word In Context) concordance is a special kind of preliminary, corpus- based dictionary: Each word in a text corpus is paired with its contexts of occurrence in this corpus!
(Google for instance is a very special form of KWIC concordance!)

Example of the process:

Bill Bryson: Notes from a Small Island

"My first sight of England was on a foggy March night in 1793 when I arrived on the midnight ferry from Calais."



ALPHABETICALLY ORDERED KWIC

Keywords with right- hand contexts

Alphabetic order ------Keywords in context
1973 -------------------.- -when i arrived
a -----------------------..--foggy march night
arrived...................................on the midnight
calais
england --------------------was on a
ferry -----------------------from calais
first ------------------------sight of england
foggy --------------.--------march
from -----------------------calais
i ----------------------.-----arrived on the
in --------------------------1973 when i
march ----------------------night in 1973
midnight -------------------ferry from calais
my -------------------------first sight
night -----------------------in 1973 when
of ------------------------ --england was on
on --------------------------a foggy march
on --------------------------the midnight ferry
sight -----------------------of england was
the -------------------------midnight
was ------------------------on a foggy
when----------------------- i arrived on




CONCRDANCING ON THE WEB:
The first:
--------- HyprLex
--------- VerbMobil HyprLex
Some more:
-------- General information on concording
-------- Corpus Linguistics



A KWIC CONCORDANCE ENGINE

KIWIC concordance construction





1. CORPUS CREATION ----------2. TOKENISATION
----------------------------------------------------------
------------------------------3. KEYWORDLIST -------4. CONTEXT
---------------------------------EXTRACTION -------------COLLATION
---------------------------------------------------------↓↓
.......................6. OUTPUT FORMATING ← 5. KEYWORD SEARCH



SIMPLEST KWIC PROCEDURE

1.)Corpus creation: make a corpus of texts in electronic format
2.)Tokenisation (re-process each text):
-----1.process punctuation marks
-----2.break the text into context units (lines/sentences)
3.)Keyword list extraction (all words in text)
4.)Context collation (for each keyword)
5.)Search for KWIC in corpus
6.)Store output and format (for printing, hypertext [CD, web])



SIMPLE KWIC CONCORDANCE


KWIC: 1. Corpus collation

My first sight of England was on a foggy March night in 1973 when I arrived on the midnight ferry from Calais.


KWIC procedure: 2. Tokenisation

In the text:
My first sight of England was on a foggy March night in 1973 when I arrived on the midnight ferry from Calais.

Process
- upper case (capital) letters
-punctuation marks

To produce:

my first sight of england was on a foggy march night in 1973 when i arrived on the midnight ferry from calais.


KWIC procedure: 3. Keyword List

-Replace each SP (space) sequence by a LF (linefeed) / NL (newline)
-Sort the list alphabetically
-Remove duplicate words

1.
my
first
sight
of
england
was
on
a
foggy
march
night
in
1973
when
i
arrived
on
the
midnight
ferry
from
calais


2.+ 3.
1973
a
arrived
calais
england
ferry
first
foggy
from
i
in
march
midnight
my
night
of
on
sight
the
was
when



KWIC procedure: 4. Contexts

-Pick context unit
- left and right contexts
- m words at beginning
- n words at end
-Add m boundary marks at beginning and n at end
-Split into units of length m + 1 + n

# my first
my first sight
first sight of
sight of england
of england was
england was on
was on a
on a foggy
a foggy
a foggy march
foggy march night
march night in


# my first ------------ night in 1973
my first sight ---------in 1973 when
first sight of ----------1973 when i
sight of england ------ when i arrived
of england was --------i arrived on
england was on -------arrived on the
was on a --------------on the midnight
on a foggy ------------the midnight ferry
a foggy march --------midnight ferry from
foggy march night ----ferry from calais
march night in ------ -from calais#



KWIC procedure: 5. Search

For example:

-on is found in the middle of the following context units:
------ was on a
------ arrived on the

-arrived is found in the middle of the following context units:
------i arrived on




KWIC procedure: 6. Output (which has to be stored under a professional layout)

1973: -----in ---------1973 ---------when
a: ---------on --------a -------------foggy
arrived: ---i ---------arrived ------on
calais: -----from -----calais ---------#
england: --of ---------england ----was
ferry: -----midnight -ferry ---------from
first: ------my -------first ---------sight
foggy: -----a ---------foggy --------march
from: -----ferry ----- from --------calais
i: ---------when ----- i -------------arrived
in: --------night ----- in -----------1973
march: ---foggy ----- march ------night
midnight: -the -------midnight ---ferry
my: -------# --------my -----------first
night: ----march ---- night --------in
of: -------sight ------ of ---------.--england
on: ------.arrived --- on -----------the
on: ------was ------- the ----------midnight
sight: ----first -------sight --------of
the: ------on ---------the -----.---midnight
was: -----england --- was ---------on
when: ---1973 -------when ----- --i





COMPUTING A KWIC CONCORDANCE

From Text Corpus to KWIC Concordance

NOW THE SAME PROCESS IN HTML !


KWIC procedure: 1. Preprocess

Swordlist = ""’
while (<>) {
chomp;
s/e\.g\./EG/ ;
s/M\.A\. /MA/ ;
tr/ [.,;: ""-) ( ] / / ;
tr/ [A-Z] / [a-z] / ;
tr/ \ t/ / ;
s/ */ /g ;
Swordlist = Swordlist . S_ ;
}

NORMALISED TEXT



KWIC procedure: 2. Contexts

Scontextlength = 5 ;
@contextlist = ( ) ;
for (Si = (@ wordlist – Scontextlength) ; Si++) {
print OUTPUT Swordlist [Si] ;
Scontextlist [Si] = Swordlist [Si] ;
for (Sj=1 ; Sjprint OUTPUT " " . Swordlist [Si + Sj] ;
Scontextlist [Si] = Scontextlist [Si] . " " .
Swordlist [Si + Sj] ;
}
print OUTPUT "\n" ;
}

CONTEXTS



KWIC procedure: 3. Keyword List

@wordlist = split (/ / ,Swordlist) ;
@sortedwordlist = sort { Sa cmp Sb } @wordlist ;
Sprev = " " ;
Scount = 0 ;
@uniquewordlist = ( ) ;
for ( Si=0 ; Si <@sortewordlist; Si++ ) {
Sa = Ssortedwortlist [Si] ;
if ( Sa ne Sprev ) { Sprev = Sa ; Print OUTPUT Sa . "\n" ; Suniquewordlist [Scount] = Sa ; Scount++ ; }

KEYWORDLIST




KWIC procedure: 4. Search

for (Si=0 ; Si<@uniquewordlist; Si++) {
Sa = Suniquewordlist [Si] ;
for (Sj=0 ; Sj<@contextlist ; Sj++) { @context = split ( / / ,Scontextlist [Sj] ) ;
if (Sa eq Scontext [2] ) { Scontext = Scontext [0] . " " . Scontext [1] . " " . Scontext [2] : Scontext [3] . " " . Scontext [4] ; print OUTPUT Scontext ; ---
....
}
-}
}

CONCORDANCE



KWIC procedure: 5. Format

1.)Design a page layout with text objects:
1.Title
2.Headings
3.Body Text
4.Tables
2.)Implement- to test the algorithm- in HTML



KWIC procedure: Source

-The Perl implementation follows the procedure exactly
-However, the code is for demonstration purposes only, because it does not allow:
----- flexible handing of contexts and filenames
----- treatment of more than one text
----- modularity of organisation
------ format scalability and search efficiency
Project: re-write the code to do these things

PERL SOURCE CODE



KWIC: Scaling UP

-The Iibido Concordance was made using exactly the same procedure, but:
----- using UNIX (Linux) shell sprinting, not Perl
→ because this is much more flexible
→ the Toolbox system uses the RDF format for output:
----- Multi-Dictionary Formatter (MDF), or
----- Lexique Pro
-Today one could also use XML stylesheets


IBIBIO CONCORDANCE




DICTIONARY MAKING:
Why KWIC is used....

KWIC: Dictionary Making

-The function of a KWIC is:
To make searching for lexical information more efficient by putting context information about words in one place
for making "Word Sketches" (Adam Kilgarriff)
- grammatical descriptions: part of speech
- dictionaries: examples of use, collocations, ...
Project: Make concordances from your text corpora and use them to collect lexical information for your Toolbox lexical databases



THE STATUS OF DICTIONARIES

Remember that the dictionary is:
-one of the three main components of language documentation:
----→ corpus of recordings and texts
----→ dictionary
----→ sketch grammar
-the central component of any linguistic descriptions
-the most useful linguistic product for use by the speech community, or non- linguists in general



THE IBIBIO DICTIONARY

-The Ibibio Dictionary
--uses information from Elaine Kaufmann’s Ibibio Dictionary
-the information was re-typed into an Office table format
-this was converted into:
→ Toolbox format for further lexicographic extension
→ LaTeX for formatting (cf. the Ibibio Concordance)
-Project: extend the Ibibio corpus, concordance in scope and content




CONCLUSION!

-It is faster to do this way if you
----- have a large text corpus
----- want to make: - a detailed syntagmatic or morphological description
----------------------------- a large dictionary
-have little time to do this

Monday, January 22, 2007

How to Make a Dictionary; Session 11, Tuesday, 2007-01-16



TYPES OF LEXICAL INFORMATION:

FOCUS ON SEMANTICS, THE STUDY OF MEANING




SEMANTICS is the study of MEANING in the sense of linguistic forms and interpretation of signs.



Revision: Main types of a definition


A definition generally consists of two parts: a definiendum (which has to be defined) and a definiens (an actual definition that consists of genus proximum and differentia specifica)

Other types of definitions:

-Componential definition
splits the meaning of a lexical item into components
e.g. standard dictionary definition by genus proximum and differentia specifica

-Syntagmatic definition
contextual definition (illustrates the meaning in a larger context with similar and different words) : definition by text examples

-Paradigmatic definitions (typical of onomasiological dictionaries, Thesaurus)
present world fields (e.g. in a thesaurus, synonym dictionary)
and give semantic relations: - hyponyms, hyperonyms
------------------------------ co- hyponyms: synonyms, antonyms




ANALYSING A CORPUS

For example: on Poodles

A Poodle hybrid is a cross (hybrid) between a Poodle and some other breed of dog.
Poodle hybrids have become very popular as pets.
They play a big role in the current designer dog trend.
The Poodle’s nonshedding coat is the usual impetus behind such experimentation, where potential pet owners are looking for a nonsehdding version of a breed for health or hygienic reasons.
Some of these crosses have been developed deliberately, while others have happened accidentally.


Definitions for a small number of words from these texts:

Poodle: /’pu:dl/ n a breed of do with thick curling hair which is often cut into an elaborate patter.
hybrid: /’haIbrId/ n an animal or a plant that has parents of different species or varieties: A mule is a hybrid of a male donkey and a female horse.
breed: /bri:d/ n a particular type of animal or plant. Its members have a similar appearance and are usually developed by deliberate selection.
pet: --/pet/ n an animal or a bird kept as a companion and treated with care and affection.


Antonyms to a number of words from these texts:

Antonyms in the text: ------deliberately vs. accidentally
Supplementary antonyms: Poodle vs. Terrier
---------------------------------hybrid vs. pure bred
---------------------------------pet vs. beast/ brute


* Annotation on antonyms: there are different kinds of antonyms:
There are antonyms that have got opposite meanings: simple opposites of the scheme:
either...or
and complementary opposites where you cannot decide whether there is an either... or
relation!
You cannot say what is the opposite of a poodle, for instance, because there is a
whole set of possible opposites!

Inversive opposites are antonyms of the type of relation:
-parent/ child
-father/ son
-to buy/ to sell (one buys, the other sells)


Synonyms in the text: hybrid ↔ cross

Semantic word field: a set of related words
Related words: hybrid, cross, breed, designer dog

Poodle (specific term), dog (more general term), pet (most general term)
-----------------------------------------------------
hyponym ---------------hyperonym --------------hyperonym





REVISION: MICROSTRUCTURE

The lexicon microstructure contains the properties of lexical entries such as types of lexical information and lexical data categories (DATCATS).

Properties of words as lexical items are MODALITY/ APPEARANCE which can be realised orthographic (written language) and phonemic (spoken language, broad transcription), STRUCTURE that operates on an internal (morphology) and an external (syntax) level and last but not least:
MEANING (semantics: signified ↔ signifier).



MODEL: TYPES OF LEXICAL INFORMATION

------------------------------------------MEANING
------------------------------------------CONTENT
------------------------------------------(Semantics, Pragmatics)
-----------------------------------
STRUCTURE ---------------------------------
(Organisation, syntax)------------------------
-------------------------------------------MODALITY
----------------------------------------APPEARANCE
-------------------------------------------RENDERING





MICROSTRUCTURE: INFORMATION TYPES


Modality, appearance, ---Structure --------------------------Meaning
form
___________
-- ________________ --------___________________________________________
Spelling --------
-Pronun---Internal -----External ----.-Components ------------Word fields,
--------------------Ciation ---(Morphology) -(POS) ----------------------------------relations
_____
---------____---- ---_____ -------______ -----_______________ -----________________
Poodle
--------pu:dl -------------- ----------noun --------dog with a haircut ------antonym: terrier...
-----------------------------------------------------------------------------
------------------------------------------------LEXICAL --------------SEMANTICS




REVISION: Define definition

A standard dictionary definition consists of two main elements:

X is a Y kind of Z

Definitio per genus proximum and differentia specifica, which means: definition by the nearest kind and specific differences!

The definition consists of two parts: the definiendum and the definiens!
The definiendum is the word that has to be defined. The definiens is some kind of explanation that consists of a definition via genus proximum and differentia specifica.

The definiens can emerge in the form of a list of examples, a sample text corpus, a model (picture: ostensive definition) or a real example.

Examples (DCE 1987)

babble: so say or talk quickly and foolishly or in a way that is hard to understand.
→ Definiendum: babble
→ Definiens: definition via genus proximum and differentia specifica

baby: a very young child, especially one who has not yet learned to speak or walk
→ Definiendum: baby
→ Definiens: definition via genus proximum and differentia specifica

bad: not good; unpleasant, unwanted, or unacceptable
→ Definiendum: bad
→ Definiens: by giving a list co- hyponyms, specifically of synonyms

blue: of the colour of the sky or of the deep sea on a fine day
→ Definiendum: blue
→ Definiens: an example of several objects of the defined colour


The genus proximum is also a superordinate term to the definiendum.
It can therefore be defined as a hyperonym.

The definiendum is a subordinate term, a hyponym.

Hyperonyms and hyponyms can be arranged in a hierarchy, a tree structure of terms.
Such a tree structure is called a TAXONOMY.
It contains a generalisation- specialisation relation, paradigmatic relation.
In semasiological dictionaries paradigmatic relations are often expressed in terms of hyponym- hyperonym relations (definiendum and its genus proximum), but they also work with
CO- HYPONYMS (synonyms or antonyms).
The whole part relation of the previous possible syntagmatic relations is called MERONOMY. A meronomy defines a syntagmatic hierarchy, how to built up larger units from smaller units.

Example of a TAXONOMY (relations of hyponyms and hyperonyms)

....

living creature

animal

dog

poodle

Conclusion from this specific kind of taxonomy: a poodle is an animal/ a living creature



Taxonomies are used in many contexts:

In traditional lexicography they are:
-cross- references in standard definitions
-thesaurus construction

In Artificial Intelligence and Text Technology they are used in:
ISA hierarchies (inheritance hierarchies)
Ontologies

In theories of the lexicon they define:
-type hierarchies (e.g. Head-driven Phrase Structure Grammar [HPSG]
-default hierarchies (e.g. ILEX theory; DATR implementations)



Example of a text that contains semantic components, relations, fields and definitions.

GINGER BEER
Fermentation has been used by mankind for thousands of years for raising bread, fermenting wine and brewing beer.
The products of the fermentation of sugar by baker’s yeast Saccharomyces cerevisiae (a fungus) are ethyl alcohol and carbon dioxide.
Carbon dioxide causes bread to rise and gives effervescent drinks their bubbles.
This action of yeast on sugar is used to "carbonate" beverages, as in the addition of bubbles to champagne).


Semantic/ paradigmatic relations:
Co- hyponyms: Synonyms: Saccharomyces cerevisiar ↔ fungus
----------------------------Raising, fermenting, brewing, rise
Hyperonym, hyponym: ----Fungus ← yeast


Semantic fields (co- hyponyms): wine, beer, champagne, alcohol, product, beverages

Definitions
Fermentation: ---Sugar is converted into alcohol through the process of fermentation.
Product: ---------A thing that is grown or produced, usually for sale.
Fungus, pl. fungi: any of various types of plant without leaves or flowers and containing no ------------------green colouring. Fungi usually grow on other plants or on decaying matter. ------------------Mildew and mushrooms are examples of fungi.
Bubble: ----------A floating ball formed of liquid and containing air or gas.
Carbon dioxide: -The gas breathed out by people and animals from the lungs or produced by ------------------burning carbon.
Yeast: -----------A type of fungus used in making beer or wine, or to make bread rise.

Saturday, January 20, 2007

How to Make a Dictionary, Session 10, Tuesday 2006-12-19


Types of lexical information: grammar
(Parts of speech categories & subcategories)



Types of lexical information: Focus on SYNTAX

Types of lexical information are given in the microstructure of a dictionary!
Whereas Grammar is about the order of words in a sentence, syntax is about the structure of single words (word syntax: morphology), sentences (phrasal syntax which is generally analysed, texts (text syntax) and dialogues.
Syntactic categories can therefore be parts of speech (f.ex. lexical words [nouns, verbs, adjectives] vs. function words), subcategories and phrasal categories.


The structure of language

Language consists of constitutive relations which can be both structural or semiotic.
Structural relations are meant to be syntagmatic or paradigmatic.
Semiotic relations can be defined as interpretation relations and realisation relations.



SYNTAX
Start off with SENTENCE SYNTAX!


A sentence consists of words that are in relation to each other. Words therefore do not only have an internal structure, but also an external, context bound structure.
Words can occur in contexts like this:

Mr. Bush accepted Mr Rumsfeld’s resignation after November mid-elections in which the Republicans lost control of both the House of Representatives and the State.

Public discontent over the conduct of the Iraq war was seen as a major factor in the defeat.


The single words which occur in this context belong to very different parts of speech. Only a variety of words from different POS can provide a coherent text.

Examples of words that belong to different POS categories:

Mr. Bush: noun, subject of the main clause, proper noun: name
Accepted: verb in past tense form (the tense is marked by the suffix morpheme –ed)
Resignation: noun (in object position, being part of the to the predicate)
After: temporal adverb
In: Preposition
Which: Demonstrative pronoun: proximal
The: definite article (POS: determiners)
Lost: adjective
Of: (multifunctional) preposition
Both: dual (quantifier: pronoun category)
Over: preposition
Was seen: irregular pas form of the verbe "to be" which serves as an auxiliary and past participle form of the verbe "to see" which form together the aspect "passive past perfect".
A: indefinite article (POS: determiners)



PART OF SPEECH
The different word categories!


I) NOUNS CATEGORIES

1.) DETERMINERS

-Articles
definite article: the
indefinite article: a

● Possessives
- my, your, his, her, its, our, their

Demonstratives
proximal (to the speaker): this
distal (to the speaker): that

Quantifiers
cardinal numbers: one, two, etc.
existential: some, several, few, many, etc.
dual: both
universal: each, every, all, etc.


2.) NOUNS

Proper nouns
names:
-personal
-place
-product, etc.

Common nouns
countable nouns such as: knife, fork, spoon, etc.

Mass nouns (uncountable nouns)
bread (a slice of bread)
butter (a piece of butter)
jam (a spoonful of jam)


3.) ADJECTIVES

There are different types of adjectives:
-Scalar adjectives:
Can be arranged on a kind of scale: → small... big
→ cold... hot
→ hairless... hairy
→ Special feature of scalar adjectives:
They can emerge with different adverbs of degree, whereas polar adjectives don’t!
For instance: very, highly, extremely, incredibly

-Polar adjectives
Cannot emerge in combination with adverbs of degree. They describe whether one feature or the other!
→ alive vs. dead
→ married vs. unmarried
→ pregnant vs. not pregnant
Appraisive adjectives
→ good, great, wonderful, etc.

-Ordinal adjectives
→ first, second, etc.


3.) PRONOUNS

● Personal Pronouns
I/ me, you, he/him, she/her, we/us, they

Possessive Pronouns
mine, yours, his, hers, its, ours, theirs

● Demonstrative pronouns
proximal: this
distal: that, yonder (an archaic form)

● Quantifier pronouns
cardinal numbers: one, two, etc.
existential: some, several, few, many, etc.
dual: both
universal: each, every, all, etc.

Relative pronouns
→ are more like conjunctions!


II) VERB CATEGORIES

1.) VERBS

Main verbs:
-finite forms:
person (1st, 2nd, 3rd)
number (singular, plural)
tense (The English language has two tenses: present, past)
-non- finite forms
infinitive
participle: present, perfect

Periphrastic verbs (auxiliary verb + non- finite main verb):
-modal: can, may, will, shall, ought, etc.
-aspectual: be + present (continuous)
have + past participle (perfect)
-passive: be + past participle

Example: It might........ have...... been .......being ......repaired
...................modal .......perfect.. continuous ..passive ....main verb


Adverbs

Deictic
here, there; now, then

Time
soon

Place
source
path
goal

Direction
into, etc.
towards

● Manner

Degree
→ better dealt in connection with adjectives




III) GLUE CATEGORIES/ FUNCTION WORDS

1.) PREPOSITIONS

● Glue categories basically make nominal expressions into adverbial expressions
→ they transform many categories into adverbs, except of the "all purpose preposition" of
The meaning of the preposition of is in fact very large and differenciated:

For example: - a bottle of water (relation)
.....................- because of... (of in relation with a certain conjunction)
.....................- to think of... (fixed expression: a verb is employed in combination with a
.......................certain preposition in order to express a certain idea)
.....................- in front of... (of is part of a fixed expression: preposition which indicates a
.......................certain direction)


2.) CONJUNCTIONS

● Co-ordinating conjunctions
and, but

● Subordinating conjunctions
conjunction- like relative pronouns: make sentences (clauses) into adjective- like noun modifiers
basically: make sentences (clauses) into adverb- like verb modifiers



3.) INTERJECTIONS

Interjections link parts together
Examples: "Hi!", "er", "huh?"

● They may also be expressions of subjective reactions
"Ouch!", "Wow!"

Examples:
-"yeah"
-"mmmm" (delicious)
-"mhm?"




THE STRUCTURE OF LANGUAGE
The sign hierarchy: RANKS

Signs are structured in terms of their position in a size hierarchy; the positions in the hierarchy are sometimes referred to as RANKS.

The MAIN RANKS are:
-dialogue
-monologue/ text
-sentence
-word
-morpheme
-phoneme

Signs at each of these ranks have got an internal and an external structure and possess semiotic relations (functions and realisations).




RANKS

SIGN rank Internal Structure External Structure Interpretation Realisation

Dialogue: turns, texts social interaction communication prosody, gesture


Text: sentences dialogue components speech acts prosody, gesture


Sentence: phrases, words parts of narrative, propositions prosody, gesture
Argumentative texts


Word: stems, affixes functional parts of complex states, Phonemes:
sentences properties, word prosody
events


Morpheme: phonemes, syllables parts of words simple states


Phoneme: distinctive features syllables encoding of Phonetic segments
morphemes allophones
into sounds





STRUCTURE AND CONSTITUTIVE RELATIONS

What is structure?

Language structure is determined by following kinds of constitutive relations:

Structural relations:
syntactic relations
→ Function words are the "glue" between lexical words
→ Combinatory relations which create larger signs (and their realisations and
interpretations) from smaller signs (and their realisations and interpretations)

Paradigmatic relations
there is a choice of words
classificatory relations of similarity and difference between signs.

Semiotic relations
realisation: the visual appearance or acoustic representation of signs (other senses may also be involved)
interpretation: the assignment of meaning to a sign




SYNTAGMATIC RELATIONS

Syntagmatic relations can be defined as linguistic "glue":
-combinatory relations which create larger signs (and their realisations and interpretations) from smaller signs (and their realisations and interpretations)

Examples of relations on different ranks:
Phonology:
- Consonants and vowels are glued together as core and periphery of syllables.
Morphology:
- lexical morphemes and affixes are glued together into stems
- stems are glued together into compound stems
- stems and inflections are glued together into words
Syntax:
- nouns and verbs are glued together as the subjects and verbs of sentences



STRUCTURES AND SYNTAGMATIC RELATIONS

Phonological rank:


..............------..............SYLLABLE
_______________________________________


.............................................------------.........RHYME
..............................------------.....____________________

...............ONSET ...........--------NUCLEUS ......-----..CODA
_________________ -----_________ -----____________
/ s t r ...........................ε ŋ .............=0 s /







OTHER SYNTAGMATIC REALTIONS

Syntagmatic relations are very often hierarchical.
Therefore to some extent, structures in phonology, morphology and syntax can be similar, if they are hierarchical.





MORPHOLOGICAL SYNTAGMATIC RELATIONS



..........................................................stem
___________________________________________________________________________
.............................................--------------...........predicate
..........................-----......_______________________________________
c-stem .........................-----------..verbal ..............................object
______________ .-------.._____________ ........____________
day to day ...............bath.. room .......clean ..er






SYNTACTIC SYNTAGMATIC REALTIONS



.................................................................sentence
____________________________________________________________________
........................................................-..................predicate
........................................................________________________
............subject .......................----........verbal ..............................object
__________________ ...........________ ......---.........____________
The loud smoker
.........is being ...........a nuiscance





PARADIGMATIC RELATIONS

Paradigmatic relations can be defined as classificatory relations of similarity and difference between signs.
The similarity and difference concerns the:
-internal structure
-external structure
-meaning
-appearance

Tuesday, January 16, 2007

How to Make a Dictionary, Session 9, Tuesday 2006-12-12


Introduction to a Field Linguists Toolbox
Guest Lecturer: Sascha Griffiths


The linguistic association SIL (compare: www.sil.org) is documenting unknown languages all over the world with the help of a specific system called TOOLBOX.
TOOLBOX was developed in order to help linguists to generate their field work studies of foreign languages. With TOOLBOX, new vocabulary, grammar, morphology, syntax and phonology can be registered and used to create new dictionaries.

The term toolbox is derived from the word "shoebox", the ancient method of gaining foreign language information: in times when modern computer systems have not been available, linguists had to carry their information on foreign languages in ordinary shoeboxes. During their field work studies, they noted the information they got from several interviews with native foreign language speakers on cards they collected in ordinary boxes.
Since modern computer systems, laptops, hardware and software are available and easily transportable, the old shoeboxes have been replaced by computer toolboxes that provide modern (dictionary) databases.

Toolbox is a computer program that allows us to enter and review (new) lexical entries easily. The main page consists of two windows whereof the left one contains the ordinary dictionary microstructure. The right window shows the specific dictionary entry.



Concordance
An important aspect a linguist has to consider, is the concordance of new terms/ new lexical entries entered into toolbox.
This means for instance, that the linguist has to count the amount of times a new word occurs.
When it appears very often, it must be an important lexical of functional word. It may be part of the basic vocabulary of a specific language (fundamental vocabulary) or may be essential to grammar or syntax (for instance: it may be essential to the creation of a time and act as an auxiliary, modus or aspect).
A part from the frequency of a word, the linguist also has to consider its environment or context. Therefore, it is important to know where a specific word "normally" appears.
Are there any preferences of appearance, or are there even specific conditions that have to be given in order for a specific word to appear?
The answer to these questions can tell a lot about the usage of words, their importance and their relation to larger contexts in general.
Since spoken (and written) language consists of the combination of words on the basis of specific grammatical rules and usage limitations, unknown languages can be observed, described and finally explored by the previous methods of concordance.

The recorded data can easily be exported via toolbox. One time entered into the database system, it is relatively easy to create a dictionary data base.



Inflection and Compounding

Inflection
A word consists of a stem and an inflection (a stem is whether a root or a derived stem!).
The inflection is related to the external structure, to the syntax of the phrase/ utterance. The inflection a word takes has to fit to the environment of the word, it has to be embedded into the context. Even if an inflection is totally missing, this absence carries an information on morphology: a stem + a zero inflection can mean singular or indefinite form of word!

In Latin and in German the inflection system is even more complicated than in English. English does not differ between different case- forms of nouns. The first noun within a sentence has to be the subject, while supplementary nouns that follow the verb of the sentence have to take the function of the object(s).
In Latin and German the sentence structure is less stable and static. Objects can be differentiated from the subject by their inflection form and can therefore also emerge at the beginning of the sentence. In English this is not possible without changing the meaning or aspect (f. ex.: active vs. passive) of the utterance.

Example:

German

Ich sehe den Mann. (Accusative)

English

I see the man. (No inflection, the subject has to be in first position)

German

Den Mann sehe ich. (Possible sentence/ variation)

English

*The man I see. (This sentence is grammatically incorrect)


Whereas German uses inflection in many cases only in combination with its articles (determiners), Latin even possesses a very complex noun-bound inflection in 6 cases (Nominative, Genitive, Dative, Accusative, Ablative, Vocative).

For example:

ara, arae (Nom., sing., pl.)
arae, ararum (Gen., sing., pl.)
arae, aris (Dat., sing., pl.)
aram, aras (Acc., sing., pl.)
ara, aris (Abl., sing., pl.)


With the help of derivation, words can even change their part of speech:

Example:
to run (verb) → runner (noun)

In a view cases, even zero derivation (the absence of a suffix) can lead to a POS shift of words:

Example:
to run (verb) → a run (noun)



Compounding

Compounds normally consist of a binary division (2 items that can be identified by drawing an internal tree structure).
But very long compounds can also consist of more than just two items (divisions).
A compound stem can consist of a derived stem which can consist of a root.


Finally, there are only three possible ways of creating new words in a particular language:
1.) Creating words by the invention of new forms of roots.
2.) Creating words by deriving already existing linguistic material.
3.) Creating new words by compounding two or more already existing terms.

Monday, January 08, 2007

How to Make a Dictionary, Session 8, Tuesday 2006-12-05

Types of lexical information: MORPHOLOGY
Introduction to Inflection and Word Formation


New word formation
New concepts, objects and inventions require new words/ new vocabulary.
New words can be invented or derived from already existing linguistic material.
New words can potentially be invented by everybody.
But they are more likely to be spread out within a speech community, if they are invented/ used by people who own political power or enjoy a certain popularity/ celebrity such as scientists, engineers, product branding companies or poets.


The poem "Jabberwocky" by Lewis Carroll, the author of Alice Through the Looking Glass, is a famous poem in which the author mainly uses terms he invented himself. It is a poem full of vocabulary that does not exist in English, but have been derived from English language material. Because of this, the reader is able to understand the broad contend of the poem.
The poem has been translated into German by Christian Enzensberger who calls it "Der Zipferlake".
It is famous for its interesting word- building phenomenon. The author invents new roots and morphemes which leads to the creation of new POS and meanings. Lewis Carroll forms new words by putting different parts of two or more existing stems together f.ex.: chortle, galumph. He also creates compound words from at least two existing stems, f.ex.: snicker-snack

Original version by Lewis Carroll:


Jabberwocky
by Lewis Carroll

Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.
Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!
He took his vorpal sword in hand:
Long time the manxome foe he sought
So rested he by the Tumtum tree,
And stood awhile in thought.
And as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And burbled as it came!
One, two! One, two! And through and through
The vorpal blade went snicker-snack!
He left it dead, and with its head
He went galumphing back.
And hast thou slain the Jabberwock?
Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!
He chortled in his joy.
Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.




German translation of the "Jabberwocky" by Christian Enzensberger

Der Zipferlake
von Christian Enzensberger

Verdaustig war's und glasse Wieben
rotterten gorkicht im Gemank;
Gar elump war der Pluckerwank,
Und die gabben Schweisel frieben.
»Hab acht vorm Zipferlak, mein Kind!
Sein Maul ist beiß, sein Griff ist bohr!
Vorm Fliegelflagel sieh dich vor,
Dem mampfen Schnatterrind!«
Er zückt' sein scharfbefifftes Schwert,
Den Feind zu futzen ohne Saum;
Und lehnt' sich an den Dudelbaum,
Und stand da lang in sich gekehrt.
In sich gekeimt, so stand er hier,
Da kam verschnoff der Zipferlak
Mit Flammenlefze angewackt
Und gurgt in seiner Gier!
Mit eins! Mit zwei! und bis aufs Bein!
Die biffe Klinge ritscheropf!
Trennt er vom Hals den toten Kopf,
Und wichernd springt er heim.
»Vom Zipferlak hast uns befreit?
Komm an mein Herz, aromer Sohn!
O blumer Tag! O schlusse Fron!«
So kröpfte er vor Freud.
Verdaustig war's und glasse Wieben
rotterten gorkicht im Gemank;
Gar elump war der Pluckerwank,
Und die gabben Schweisel frieben.




Morphological Structure
Branches of Morphology



Morphology deals with:

..............MORPHOLOGY
...............................
INFLECTION .......WORD FORMATION
............................................
...................DERIVATION ....COMPOUNDING


The process of inflection and derivation have in common that one stem is used and that certain affixes, mainly in form of suffixes, but also in form of prefixes, infixes or circonfixes are added. But the main difference between inflection and derivation is defined by its linguistic valence. While inflection is used in order to expand paradigmatic creativity in the sense of providing the creation of syntagmatic adaptation to the text, the external function of word formation in terms of derivation consists of paradigmatic creativity in terms of inventing new language material and expand vocabulary.
The process of compouning, in contrast, consists of putting two existing stems together and create one new word meaning.



Reminder: What are linguistic SIGNS?


DIALOGUE → Intonation → Social relations

TEXT → Intonation → Description

SENTENCE → Accent/ Intonation → State/ Event

WORD → Phonemes/ Stress → Entity/ Prop




Morphology sketch
The function of Inflection

Inflection has got an internal structure and an external function. Its external function is to mark the relation of words to their contend. In this sense it does not provide any change in the basic meaning of words.
The internal structure of morphology is due to the form words can take. Affixes (prefix, suffix, infix) and superfixes can be added to stems. Stems can also underlie a vowel change.
In word formation, morphology has slightly different functions. It aims at creating new words, shifting words within their part of speech or creating new meanings. In principle there is an infinite extendability of the lexicon.
On the basis of its internal structure, word forms can be created by inventing new roots or morphemes (blending, abbreviation,...). But, inventing new roots or morphology is very difficult and rather unlikely to occur in everyday’s live (unless scientists or companies for instance are searching deliberately). A more common technique is derivation, meaning that a common stem underlies a vowel change or receives a new affix in form of a prefix, suffix or infix.
Compounding is also a very popular mechanism of creating new vocabulary. Two stems are put together, eventually with an interfix or an inflection-like affix.



The internal structure of words

MORPHEMES are the smallest meaningful parts of words!

There are two main morpheme types:

Lexical morphemes (content morphemes, roots) which have got an open set of possible words (f. ex.: girl, boy, car, box, spoon, grass, sky)

Grammatical morphemes (structural morphemes) which can be defined as a closed set of words. There are free grammatical which are independent words (prepositions, conjunctions, auxiliary verbs) and bound morphemes which emerge in word formation and inflection (affixes; most often in form of suffixes).




Morphemes and allomorphs
Morphemes can be realised differently in different contexts (environment of the phoneme in question). Variant pronunciations are called ALLOMORPHS.



The function of morphemes
How are words built?

Inflection marks the syntagmatic relation of words to their contexts. Syntactic agreements can therefore be in person, number and case. In English there is only subject- verb agreement, whereas there is subject-verb, determiner- adjective- noun and preposition- nominals agreement in German.
Inflection can also depend on situational contexts. Verbs correlate to time and space. Nominals to quantity and definiteness relations.



The internal structure of words

English words consist of a stem and an inflection. Stems carry a lexical meaning and inflections have got grammatical meanings. Inflections relate words to their syntactic (person, case, number agreement) and semantic (tense/time, quantity, speaker-addressee) context.

For example: cats
.....................
..............stem .inflection


STEMS of English words can be SIMPLE (i.e. ROOTS, lexical morphemes) such as red, table, run, car etc. or COMPLEX.
Complex stems can be derivations (a stem and a derivational affix; f.ex.: beauty + ful = beautiful), compounds (the combination of at least two different stems written together or separated by a hyphen; f.ex.: armchair, red-head).
There is also the possibility of using both, a derivation and a stem whose combination is called synthetic compounding (f.ex.: bus-driver, steam-roller).



A hierarchy of words and their parts

WORDS consist of 1 STEM and an INFLECTION
..................................................
STEM/ BASE ............................INFLECTION: affix
.................................................................prefix
COMPOUND STEM: 2 stems........................... suffix
................................................................. infix
DERIVED STEM: 1 stem + affix .................x...circumfix .
?..................................................................
superfix
ROOT (lexical morpheme) .............................ablaut



Remember: Words as signs
....................................... Phrase semantics
Inflected Words
....................................→ Stress

....................................→ Lexical semantics
Compound Word
...................................→ Stress

...................................→ Lexical semantics
Derived Word
...................................→ Stress

..................................→ Lexical semantics
Morpheme
..................................→ Phonemes, Stress




WHAT IS...?

A WORD is: .......................a stem + an inflection
An INFLECTION is: ...........a suffix or an ablaut
A STEM is: ........................either a ROOT (lexical morpheme)
.........................................or a DERIVED STEM (i.e. stem + affix)
.........................................or a COMPOUND STEM (stem + stem)
A DERIVED STEM is: .......either a ROOT (zero derivation)
.........................................or a DERIVED STEM with an affix
A COMPOUND STEM is: ...a derived stem/ word + a derived stem/ word
.........................................or a compound stem + a compound stem




Simple and complex words

Simple words are short words consisting of one syllable:
f. ex.: car, star, cat

Complex words can be:
-blends and abbreviations (based on simplex roots consisting of more than one stem):
such as: brunch; NATO
-derivations (based on one root):
f.ex.: unable, impossible, happiness, antidisestablishmentarianism
-compounds (based on more than one root/ stem):
f.ex.: tatpurusa (endocentric): jam-jar, honeypot, harddisk, bus-stop
........dvandva (bicentric): whisky-soda, gentleman- farmer
........bahavrihi (exocentric): red-head, redskin, blue-stocking



Questions???

What is SANSKRIT?
Sanskrit is a classical language of India, a liturgical language of Hinduism, Buddhism, and Jainism, and one of the 23 official languages of India.
Dating back to at least 1500 B.C., its position in the cultures of South and Southeast Asia is akin to that of Latin and Greek in Europe. It appears in pre-Classical form as Vedic Sanskrit (appearing in the Vedas), with the language of the Rigveda being the oldest and most archaic stage preserved. This fact and comparative studies in historical linguistics show that it is one of the earliest attested members of the Indo-European language family.
Today, Sanskrit is spoken by a very small group of people, but continues to be widely used as a ceremonial language in Hindu religious rituals in the forms of hymns and mantras. The vast literary tradition of Sanskrit in the form of the Hindu scriptures and the philosophical writings are also studied. Scholarly discussions on various topics in Indian philosophy continue to be held in the Sanskrit language in a few traditional institutions in India. The corpus of Sanskrit literature encompasses a rich tradition of poetry and literature, as well as scientific, technical, philosophical and religious texts.
The scope of this article is the Classical Sanskrit language as laid out in the grammar of Panini, around 500 BC.
(Source: http://www.wikipedia.com/)


Who was PANINI?
Panini was an ancient Indian grammarian from Gandhara (traditionally 520- 460 BC, but estimates range from the 7 th to 5 th centuries BC). He is most famous for his Sanskrit grammar, particularly for his formulation of the 3,959 rules of Sanskrit morphology in the grammar known as Aadhyayi (meaning "eight chapters"). It is the earliest known grammar of Sanskrit (though scholars agree it likely built on earlier works), and the earliest known work on descriptive linguistics, generative linguistics, and perhaps linguistics as a whole. Panini's comprehensive and scientific theory of grammar is conventionally taken to mark the end of the period of Vedic Sanskrit, by definition introducing Classical Sanskrit.
(Source: http://www.wikipedia.com/)




The internal structure of words

Examples:

1.) Bus-driver

Bus-driver
........
bus driver
...........
....drive .-er


2.) Bata- base

Data- base
........
data ..base



3.) Newspaperman

Newspaperman
................
news paper man




4.) Newsreader

Newsreader
..........
news reader
.............
......read ..-er


5.) Nevertheless

Nevertheless
............
never the less

Sunday, January 07, 2007

How to Make a Dictionary, Session 7


Types of Lexical Information: PRONUNCIATION

Dictionaries are written in metalanguage which is a language used to talk about language itself.
One item of metalanguage a dictionary contains is a broad or phonemic transcription of the terms listed in the lexicon database. Phonemic transcriptions provide information about the correct pronunciation of a word, it is written in square brackets.
For example: eddy → visual surface structure
................../‘edi / → pronunciation: another type of surface structure

eddy .............................vs. ............/‘edi /
Orthography ↔ Spelling ....Pronunciation ↔ Phonology

Whereas the idea of written words is relatively stable, speech sounds are often shortened, reduced or left out in order to provide faster speaking.


Nevertheless the precedent example of a phonemic transcription, there are two possibilities of representing speech sounds in dictionaries:

Phonology and phonemics is the study of phonemes, of abstract speech sounds which serve as a symbol, whereas phonetics comprehends the study of phones, meaning concrete speech sounds in concrete utterances.
A PHONEME can be defined as the smallest word distinguishing sign of oral speech.

In Phonetics, each phoneme has got its own INTERNAL STRUCTURE, meaning that different phonemes have got distinctive features concerning their place of articulation, the manner of articulation and a differentialization concerning the dichotomy voiced versus voiceless.

A PHONEME has also got an EXTERNAL STRUCTURE, because it is related to other phonemes with which it forms larger syllables and words. In Phonology, a vowel normally forms the nucleus of a syllable, whereas consonants can be found at the margins of syllables.


Phonemes underlie certain rendering rules such as:
Pronunciation rules (acoustic modality)
Spelling (visual modality)
Sound-spelling rules (inter-modality conversion)


Representation of sounds in dictionaries

Sounds are represented by phonemic symbols and written in IPA. They have got an internal structure (configurations of distinctive features) and an external structure (syllables). Larger combinations of phonemes are called syllables. Syllables also have got an internal and an external structure. Their internal structure can be defined as "configurations of sequential features" (consonantal, vocalic; voiced, unvoiced etc.) and simultaneous features (f.ex.: tone, accent). Their external structure can be defined as a combination of syllables which leads to the construction of words.
The basic English syllable structure is: CCCVVCCC, having vowels as its nucleus an consonants at its margins. Nevertheless the fact that affricates consist of two phonetic parts (a plosive and a fricative) they only count as one phoneme.
Syllable structures can be illustrated in some kind of map that is called transition network or state diagramme. When transcribing words, each phoneme can be integrated in this network and is represented by one circle, node or state. The position of the circle within the diagram describes the correct position of the phoneme within the syllable/ word (f. ex.: for consonants: place of articulation/ position of the tongue or the vocal tract obstruction in general; and for vowels: position of the tongue measured in frontness or backness).


Trying to define the term "PHONEME"

There are several ways of defining phonemes, depending on which of the four sign components the linguist focuses:

THE CONTRASTIVE FUNCTION OF PHONEMES: In this sense, a phoneme can be defined as the smallest word-distinguishing sound segment

THE EXTERNAL SOUND STRUCTURE: A phoneme is the smallest unit of a syllable

THE INTERNAL SOUND STRUCTURE: A phoneme incorporates distinctive features

THE RENDERING OF PHONEMES: Phonemes provide a set of allophones



Description of sounds

As we already said, transcriptions can be phonetic or phonemic. In dictionaries and lexicons word transcriptions are nearly always phonemic because they refer to a broad symbol of how the word normally has to be pronounced in the standard language.
Nevertheless, if the linguist aims at representing actual speech sounds/ speech pronunciation in detail, he can also use a phonetic transcription. Then he enters the field work and considers he knowledge on articulatory phonetics, a branch of phonetics that deals with the production of speech sounds. There are also two other dimensions to the description of speech in phonetics: acoustic phonetics is about how speech waves are transferred from the mouth to the ear by sound waves that travel the air in terms of time, amplitude and frequency and the third branch of auditory phonetics is about how speech sounds are perceived and transformed in the ear (from sound waves in the outer ear to mechanical movements in the middle ear (transformed by: hammer, anvil, stirrup) and to neural signals by passing through the oval window to the cochlea situated in the inner ear.



English and German

English and German are different in pronunciation and spelling rules. There are some phonemes in German that does nor exist in English and vice versa.

Some examples of German VOWELS that do not occur in standard English:
The rounded close-mid back vowel: [o]
The rounded open front vowel: [oe]
The rounded close-mid front vowel: [Ø]
The rounded close/close-mid front vowel: [Y]

Example of an English VOWEL that do not occur in standard German:
The unrounded open-mid/ open vowel: [æ]


Some examples of German CONSONANTS that do not occur in standard English:
The palatal fricative [ç]
The velar fricative [x]

Some examples of English CONSONANTS that do not occur in standard German:
The voiced dental fricative: [ð]
The unvoiced dental fricative: [o ]



Spelling

Nevertheless the fact that the Latin alphabet used in many languages all over the world is originally meant to be phonographic, our spelling often does not have anything to do with how words are really pronounced.
To express the phoneme [∫] for instance, German normally uses the graphical letter combination sch, whereas English uses sh.

There are even some German letters that do not exist in standard English orthography, like the German "Umlaute":
ö, ü and ä and the German "Scharfes S": ß

Thursday, January 04, 2007

How to Make a Dictionary, Session 6, Tuesday 2006-11-21


Lexicon data and their structure


In the last lessons we learned about the different lexical database structures:
-Microstructure (Single lexical articles/ entries)
-Mesostructure (Interrelation of lexicon entries and relation to external information)
-Macrostructure (The order of all lexical entries)
-Megastrucrture (The whole dictionary/lexicon and its metadata)

They can be defined as different structuring elements within a dictionary:

MICROSTRUCTURE: is the smallest part of a dictionary and copes with lexical entries.
The lexicon microstructure contains information on lexical entries (single words) such as grammatical information (syntax, part of speech (POS), inflectional class, valence of verbs etc.), phonetic information on spelling and pronunciation and both information on representation of meaning (semantics, definitions) and corpus references (usage examples and words in context).
The information on syntax, grammar and meaning (definition in terms of semantics and pragmatics) and the corpus reference (usage examples) can be defined as DatCats.

MESOSTRUCTURE: copes most of all with cross-references and relations between lexical entries and their definitions. Single entries may not only be linked to the rest of the dictionary (interrelation of lexicon entries) but also to external information (external references).
MACROSTRUCTURE: contains the organisation and order of the content and body of a dictionary e.g. the list of lexical entries.

MEGASTRUCTURE: is the overall structure of the dictionary. It contains the metadata and can be defined as complete document.



*Detour: information on the CORPUS

The term CORPUS can as a collection of language material. It can consist of written texts one can find in newspapers or books for instance or of oral speech transcriptions which are usually written in IPA.
Moreover, it may contain additional information on the part of speech of the definiens, on its lemma (which is a grammaticalized form/ variation of a word, f.ex: -ed suffix to express the past) and phonetic transcriptions or other kinds of annotations.


Problematic issues in lexicography

A lexicographer meets various problems while working with lexical databases:

1.) The problem of the ambiguity of terms:
-some words have got synonyms, two different word forms that have the same meaning
-there is also polysemy, meaning that one word form has got two one more different meanings

2.) The problem of finding and to searching for words:
-how to find related words in a language with inflectional prefixes?
-how to cope with orthographic ambiguity?
-how to structure picture lexicons?

3.) The problem of constant language change
-how to integrate "new" words and/ or new word meanings?

How the lexicographer solves some of these problems:
- The problem of ambiguity and polysemy can be solved by enumeration and linkage within
the entire dictionary
-In order to cope with constant language change, the dictionary has to be reprinted
Regularly. One needs new editions.



Different methods of creating lexicons

There are three main methods to the creation of dictionaries and lexicons:
-Introspection based lexicon creation
-Questionnaire based lexicon creation
-Corpus based lexicon creation

1.) Introspection based lexicon creation
-a trained linguist takes a look inside the language and reflects his own language use
-he considers the fact that language acts as a social filter. Utterances have to be relevant, important and adequate

2.) Questionnaire based lexicon creation
Questionnaires are mainly used in comparative linguistics.
They are very useful in order to explore unknown languages, meaning languages the linguist does not speak/ know himself. In order to explore the unknown language in detail, the linguist asks different native speakers to explain certain language features, phenomenon and to complete questionnaires.
They are asked for translations and explanations. In this context the fact that language acts as a social filter should be considered. The interviewed native speakers should come from various social classes within the society.
The use of a questionnaire is intended to do research in morphology, to provide translations and use and create language software programs/ computer systems that can be used by other experts, linguists and translators.

___________________________________________________________________________
*DetourDetailed information on the use of questionnaires (from: http://www.spectrum.uni-
bielefeld.de/~ttrippel/htmd/questionnaire_short.html):

An example Questionnaire used for the exploration of Australian Languages:

Questionnaire on Motion in Australian Languages (modified)
David Wilkins, David Nash and Jane Simpson (used with permission)
April 1998


Introduction
The purpose of this questionnaire is to gain a first comparative picture of the lexical resources Australian languages draw on for the expression of motion, and the manner in which motion descriptions are "packaged". In the nature of our design, and our discussion, we rely heavily on Talmy's (1985) notion of lexicalization patterns, in particular his cross-linguistic discussion of systems of motion description. We are interested, for instance, in patterns of semantic conflation (that is, what other semantic information besides 'motion' may be encoded in a verb root) and patterns of semantic distribution (that is, what types of information are encoded in the different morphemes that come together to build a description of a motion event).
We will assume a "pretheoretical" understanding of what constitutes a motion event and a motion description. In this questionnaire, the primary focus is on "translocational motion" (i.e.change of location of an entity along a path from one place to another). We further restrict our focus to motion descriptions in which the Subject argument of a verb (in an active clause) is the entity ('figure') in motion (an accompanying entity may also be in motion, but that is not our focus of interest). In narrowing our focus in this way, we depart from Talmy's own manner of investigation, since he was also interested in patterns of location, causative location and causative motion.

The questionnaire
This questionnaire is designed in a "modular fashion". There are four independent modules, and we would be glad to receive answers to any of the "modules". A researcher should not feel that they need to answer the whole questionnaire if that seems too daunting. Where you do not know the answer to a question, please say so (rather than leaving a part of a module blank). The ordering of modules reflects our own sense of which types of information are more important to enable us to do some cross-language comparison.
Name of Researcher:
Name of Language:
Primary Place of Research:
Primary Data Resources:
May we distribute your filled in questionnaire?: YES NO
How many inflecting, unanalysable, mono-morphemic verb roots does the language possess: (tick one of the following)
LESS THAN 50_________ 50 to 200 __________MORE THAN 200________
Can you give us a more precise figure? (If so, what source(s) is the figure based on?):

MODULE I : Motion Verbs and Patterns of Motion Expression
Below we present 26 English motion verbs or descriptions. We would like you to provide any (and all) expressional equivalents for the language under discussion. We are not only interested in mono-morphemic verb roots, we are also interested in more complex expressions. For instance, in Arrernte, there is no monomorphemic root for 'to fly'. However, Arrernte speakers do commonly talk about the motion of birds, airplanes and insects by combining a general motion verb and the locative phrase alkere-le (sky-LOC) 'in the sky' in the same clause - e.g. alkere-le alhe-me ('in sky going') = 'flying'; alkere-le unthe-me ('in sky wandering') = 'flying around'; alkere-le apetye-me ('in sky coming') = 'flying this way', and so on.
(N.B. While it would be nice to know translation equivalents, it is more important for us to know what expressions people actually use, no matter how infrequently.)
We do not assume that the following will provide a one-to-one list of equivalents. In some cases the same verb or expression may cover several notions we have distinguished on the list, and in other cases the distinctions won't be fine-grained enough and you'll need to provide several equivalents, detailing the distinctions. We simply ask you to give us as much detail as is feasible.
Please include the following information in any response:
the transitivity of the verb in the expression (in relation to the meaning expressed)
a morphemic break down and gloss of each morpheme in all complex expressions
where relevant, an indication of any animacy or category constraints which apply to the moving entity in the expression (e.g. does the moving entity have to be a liquid?)

The List
a. "to go" b. "to come" c. "to return" ("to go back") d. "to take to" ("take along"; "carry") e. "to bring" f. "to move" (from one place to another e.g. they shifted into the shade; they moved camp) g. "to leave behind" ("to abandon"; "to leave something somewhere and go off") h. "to move" ( with no overall change of location; move on the spot or about a fixed point e.g the bush is moving, his eyes/hair moved) i. "to move quickly" ("hurry away"; "hurry off") j. "to walk" k. "to run" l. "to crawl (of baby)" m. "to fly (of bird)" n. "to hover" ("to flutter" - e.g of hawk; butterfly) o. "to swim" (of fish? of person?) p. "to roll" (e.g. of ball or boulder or tumbleweed) q. "to creep up on" ("to sneak along"; "sneak up on") r. "to follow someone/something" s. "to track someone/something" t. "ascend" ("get up on to"; "to climb up") u. "to descend" ("get down off/out of") v. "to fall" (down from a height) (does this contrast with "to fall over"?; "collapse"?) w. "emerge" ("exit"; "appear"; "come out"; "rise (of sun)") x. "to enter " ("to go into" (e.g. a house, a camp)) y. "to cross over" ("go across") z. "to pass by"

MODULE II : Motion-Rich 'Textlet' or Text Fragment
So that one can get a feel about how motion description really works in the language, could you please provide a piece of natural continuous text which is rich in motion expression, and which you feel is representative. All that is needed is a small text or text fragment of between 5 and 20 clauses in length, in which the focus is the motion of one or more of the "protagonists". Of course, we need you to provide morphemic breaks, interlinear glosses, and a free translation. It would also be useful if you could provide notes, as you go along, to any specific motion related features that the 'outsider' should attend to. (An example will be provided. - Note that, we'd prefer it if you did not rely on a translation from English, but instead used a small text that was generated directly from the mind and mouth (or pen) of a native speaker.)

MODULE III: Grammatical Marking of Ground and Path
In Talmy's (1985:61) terms the basic components of a motion event are:
Figure= the entity that is in motion Ground= the entity or entities that the Figure is moving in relation to Path= the course followed (and trajectory) of the Figure (often deduced from the Ground which is specified) Motion= the actual predication of a motion act.
So, in the sentence 'the baby crawled up the hill', the Figure is 'the baby', the Ground is 'the hill', the Path is specified with 'up', and the assertion of Motion is encoded in the verb 'crawl' .
This module of the questionnaire is particularly concerned with the way in which Grounds and Paths (including direction) may be grammatically coded. We would appreciate it if you used some of the expressions from the list in Module I of this questionnaire in glossed example sentences to illustrate the types of marking asked about below.

A. Marking of grounds
a) How are "goals" of motion marked? (i.e. what cases, adpositions, or other means are used to mark ground NPs functioning as "goals of motion?) (e.g. The child crawled to(wards) the tree.; They returned to camp; The lizard got up onto the rock.;)
b) Can one make a distinction between 'to X' and 'towards X'? For all motion verbs? How? (e.g. The leaf fell towards the ground. vs. The leaf fell to the ground.)
c) How are "sources" of motion marked? (e.g. The woman moved away from the fire. ; They travelledfrom Sydney.; The baby bird fell out of the tree.; The dog fell off of the truck.)
d) How are ground NPs which refer to the route or path along/on which motion takes place marked? (e.g. He's walkingalong the track.; The horse wandered along the sides of the fence.)
e) How are ground NPs which refer to the medium in which motion takes place marked? (e.g. The bird is flying through the air.; The children are running through the sand?)
f) How does one mark a ground NP which refers to a place through (or via) which the figure travels in order to get to another place? (e.g. They travelled from Alice Springs to Elliott via Tennant Creek ; She came through here on her way to church.)
g) With expressions like "enter" (or "go into") and "exit" (or "come out of"), how are the ground NPs which refer to the space "entered" and "exited" marked? (e.g. The snake enteredits burrow.; The owl came out from the hollow of the tree.)
h) With expressions of "crossing" and "passing" how are grounds indicating the entity 'crossed' and 'passed' marked? (e.g. Those people ran past our house; A dingo crossedthe road.)
i) Languages like English can string several Grounds together with one motion verb (e.g. The dog carried the meat from the creek along the path to the tree.). Other languages have strong restrictions, preferring one Ground per motion verb. Do you have a sense of how many grounds can occur naturally with a motion verb? Is it possible (natural) to say things like:
- He went from the tree to the rock.
- He went into the house through the rear door.
- He came along the road towards our car.
-The dog carried the meat from the creek along the path to the tree.
j) If you use adpositions or case endings to express these ideas, can they occur independently as the main predicate in a sentence as in? (If they are possible, what do they mean? Can they have motion readings or only static spatial readings?)
- The dog (is) from the tree
- The dog (is) to the tree
- The dog (is) along the road
- The dog (is) into the house
- The rabbit (is) out of its burrow

B. Path Direction
Are there any form of directionals (i.e. grammaticised directional elements like Warlpiri -rni 'hither, to here',-rra 'thither, to there', -mpa 'past, by, across)? If so, what part of speech class do they attach to, or co-occur with? If they combine with verbs, are they restricted to motion verbs or can they, for instance, occur with perception verbs or speech act verbs (or all verbs)?
Does the language have anything akin to the 'associated motion' category discussed by Koch (1984); Tunbridge (1988); and Wilkins (1989, 1991)? If a language has anything like this, it is usually some form of verb affix, verb compounding or fixed construction, and the most commonly coded notions tend to be 'do verb action while going along' ('she cried all along the way') or 'go/come and do verb action' ('she came and told me'; 'she went and hit him'). Please describe any phenomena that seem to be relevant. (In a language like Adnyamathanha (Tunbridge 1988), where this category is very elaborate, you find the following verb affixes: -mana- 'come and V', -namana- 'quickly come and V', -vara- 'go and V', -navara- 'quickly go and V', -ndhena- 'V once while coming'; -nali- 'V continuously while coming';-ndheli- 'V once while going', -nangga- 'V all the way along', -enhi- 'V while keeping moving'; and -wandha- 'V and leave'. In origin such suffixes (or compounding elements) are very often general motion verbs)
MODULE IV: What Element of the Clause Encodes Path?:
The verb-framed vs. satellite-framed typology
Talmy (1985) observed that, in motion descriptions, a language like English differs typologically from a language like Spanish, by virtue of the fact that Spanish tends to conflate 'motion' and 'path' together in the verb root, while English tends to code path in a separate (adverbial/prepositional) element which functions as a satellite to the verb. He judges patterns of expression to be characteristic for a language if they are (i) colloquial in style (rather than formal or stilted), (ii) frequent; and (iii) pervasive (rather than limited) in application. Thus, in English, the characteristic mode of expression is to say "go up", "go down", "go in", "go out" and so on, while it is less characteristic to to say "ascend", "descend", "enter", "exit", and so on. The former pattern exemplifies "satellite-framing" (i.e. 'go' provides the motion concept, while 'up', 'down', 'in', 'out' realizes the path). For languages like Spanish, verbs like "enter" and "ascend" are the characteristic mode of expression, and the verb roots can been seen to simultaneously code "motion" and "path" (i.e. "verb-framing"). (Note: Satellites to the verb-root may be affixes on the motion verb root; or clitics; or path adverbs; or particles; or preverbs)
Please try to assess whether the language you are working on is verb-framed or satellite framed (or somewhere in between or something else), by answering the following 'diagnostic' questions:
Are verb roots meaning 'enter', 'exit', 'descend', 'climb up' a more characteristic form of expression, in Talmy's terms, than more analytic counterparts such as "go into", "go out of", "go down", "go up"?
How common is it for verbs in the language to conflate both 'motion' and 'manner' (that is, are there a rich class of verb roots like 'run', 'swim', 'slither', 'hop', 'limp', 'crawl', 'stroll', etc.)? According to Talmy, if a language characteristically conflates 'motion' with 'manner' in verb roots, it is NOT common for the same language to also characteristically conflate 'motion' with 'path'.
When both manner and path notions appear in a motion description, how does information get distributed among elements? To answer this question we list sentences below which try to elicit some of the relevant distinctions. Again, don't go for word-for-word translations. Give us what you think would be the normal ("characteristic) way of expressing the idea (or something close to it). And, please include the following information: - the transitivity of the verb in question in relation to the meaning expressed (including the expected case on the subject of the sentence) - an interlinear morpheme-by-morpheme gloss
i) The child ran to the other side of the street/path/creek.
ii) The child ran across the street.
iii) The baby crawled into the house/shed/camp. (Where the "into" path is to be stressed, is the form of expression done more like: "crawlingly enter" or "crawl into" or "crawl to the inside of"?)
iv) The baby crawled up the rock (Can one distinguish "crawl to the top of the rock" and "ascend the rock by crawling"?).
v) The snake slithered into the string bag.
vi) The boy fell to the ground. (while standing on the ground? vs from out of a tree?)
vii) The rock/boy fell down into the water. (where entry into the water is stressed)
viii) The girl climbed up onto the branch of the tree.
Can one "accumulate" path notions with just one verb? In English, one is not only able to string a number of different Grounds together, one can also accumulate a string of simple Path-satellites. As an example, Slobin (1996:83) notes that it is quite normal for English speakers to say things like "The bird flew down from out of the hole in the tree" (where down-from-out-of specifies the trajectory). In this English sentence, there is only one specified ground ('the hole in the tree'), but a complex of three units of Path information ('down', 'from', and 'out of'). The closest Spanish approximation would be "El pájaro salió del aguejaro del árbol volando hacia abajo" which translates literally as 'The bird exited of the hole of the tree flying towards below'. Thus, in contrast to English, Spanish, like other verb-framed languages, tends to render complex Path information through multiple clauses, since they do not allow for the accumulation of path expressions. So, what about the language under investigation?

OTHER INFORMATION
Please provide any other information on the language that you feel is relevant to this research endeavour. In particular, if there are publications or sections of publications concerning the language which deal directly with motion description, we would be grateful if you brought this to our attention (and we will collate and share all such references).
THANKS FOR ALL YOUR HELP

References cited in questionnaire:
Koch, Harold. 1984.
'The Category of "Associated Motion" in Kaytej', Language in Central Australia, 1 23-34
Slobin, Dan. 1996.
'From "thought and language" to "thinking for speaking"' in Gumperz and Levinson eds. Rethinking Linguistic Relativity. CUP. 70-96
Talmy, Leonard. 1985.
'Lexicalization patterns: semantic structure in lexical forms'. in Shopen ed. Language Typology and Syntactic Description III: Grammatical categories and the lexicon. CUP. 57-149
Tunbridge, Dorothy. 1988.
'Affixes of Motion and Direction in Adnyamathanha' in Austin ed. Complex Sentence Constructions in Australian Languages. John Benjamins. 267-283
Wilkins, David P. 1989..
Mparntwe Arrernte (Aranda): Studies in the structure and semantics of grammar. Unpublished PhD dissertation. A.N.U.
Wilkins, David P. 1991.
'The Semantics, Pragmatics and Diachronic Development of "Associated Motion" in Mparntwe Arrernte'. Buffalo Papers in Linguistics, 207-257.
___________________________________________________________________________

3.) Corpus based lexicon creation
The Corpus based lexicon creation mainly deals with the form of words and their found.
It is based on various corpora, mainly on words in context (texts) which show very well the concordance of words, but also on wordlists and distribution analysis.


Hierarchy of lexicon and corpus types
The Complexity of Lexicography

LEXICON

4. ORDER LEXICON (abstract lexicon):
maximally declarative generalisation network

3. ORDER LEXICON (optimised lexicon):
procedurally optimised local generalisations

2. ORDER LEXICON (protolexicon):
flat tabular lexicon

1. ORDER LEXICON (corpus lexicon):
wordlist, concordance, HMM


CORPUS
Tertiary corpus:
classificatory markup annotation

Secondary corpus:
transcription, symbol-signal labelling annotation

Primary corpus:
recorded audio-visual corpus; manuscript


The corpus based lexicon creation application:
The Summer Institute of Linguistics is famous for its fieldwork tools and for the creation of language databases. When high tech computer programs were not available, these databases were called "shoebox", because fieldwork data have been collected on cards, arranged in ordinary boxes. Later, data like base texts, morphology, phonology, syntax, grammar, part of speech, time and aspect, valence of verbs and translations have been collected in online databases. These lexicon databases consist of lists and tables.