Developing a web-based dictionary database

paper
Authorship
  1. 1. Jonathan J. Webster

    City University of Hong Kong

  2. 2. Martin S.P. Chiu

    City University of Hong Kong

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Developing a web-based dictionary database
Jonathan J. Webster
City University of Hong Kong
ctjjw@cityu.edu.hk
Martin S.P. Chiu
City University of Hong Kong
ctmartin@cityu.edu.hk
Keywords: WWW, OODB (object-oriented database), lexicography

Introduction
City University recently established the CityU-B.A.(Hons)Language Information Science (BALIS) Website Management Programme as an opportunity for our students to gain hands-on experience in setting-up and managing a web server dedicated to publishing information on the Chinese Language and topics in Chinese Linguistics. Through their participation in the programme, students will learn how to perform all the tasks related to web design and development, including HTML authoring and editing, internet programming, and server management and security. The information to be published on the server will be related to Chinese Linguistics and the Chinese Language, a major research emphasis of the Department of Chinese, Translation and Linguistics at City University. The following information will be made available from the website: (a) an on-line bilingual glossary of linguistics terminology (Chinese/English); (b) an archive of recorded speech samples from Chinese dialects; (c) an archive of papers on Chinese linguistics; (d) an on-line bilingual Chinese/English language dictionary.
The CityU-BALIS Website, to be publicly launched in 1997, will provide a point of access on the Web for those interested in finding out more about the Chinese language. Moreover, with the expertise our students will have acquired through their active participation in managing and promoting the CityU-BALIS Website, they will be better prepared for careers where skills related to website publishing, programming and management are required. To meet our objectives, both in terms of producing an innovative, informative and interactive website, as well as to better train our students, we must constantly experiment with the latest technological advances for communicating information over the Web. The on-line bilingual dictionary, the last item in the list above, illustrates our on-going efforts to merge advances in language information science with the dynamic technology associated with the World Wide Web.

The on-line bilingual Chinese/English language dictionary will provide web access to lexical information for Chinese and English. Our goal is to provide an on-line bilingual dictionary data retrieval tool for language learners and language professionals. The dictionary database is intended to provide comprehensive coverage of Chinese and English words. Users will be able to (a) retrieve dictionary data for Chinese/English words/phrases as they would with a conventional dictionary; and (b) retrieve entries matching syntactic and/or semantic criteria provided by the user. The three main components or tiers of the application include (1) the back-end database for serving lexical information; (2) a Web-based client-interface for front-end query input; and (3) a middle tier for handling the exchange between the client's queries and the database.

Accessing dictionary data on the internet
How will our WWW Bilingual Chinese-English language dictionary database compare with other attempts at providing internet access to lexical/dictionary data? There are, for example, several 'Webster Servers' available on the internet. Examples (as reported by Steeve McCauley, http://www.eps.mcgill.ca/~steeve) include webster.cs.indiana. edu (restricted access), citi.umich.edu (open), webster.cs.mcgill.ca (restricted), webster.eps.mcgill.ca (experimental). McCauley's 'Windows Webster Client' is an example of software which facilitates downloading dictionary data from such servers. Typical dictionary data is generally retrievable much as one would expect to find in an ordinary dictionary.
In a recent paper, Webster (1995) reports on development of an application in which HTML forms serve as the front-end to a lexical database. Lexical information and data retrieval strategies are based on the Longman Language Activator (LLA). A Visual Basic CGI application connects a front-end HTML form with the back-end relational database implemented in MS Access. The LLA was chosen for its unique organization of dictionary information which is intended to make it easier for the user to find the right word or phrase for a particular context. The LLA adopts three access strategies: first and foremost by concept using the Key Words, second by entry word, and third by what the LLA calls 'access maps'. The home page for the application is, in fact, several forms, the first and topmost consisting of the field into which the user enters a word/phrase to search for and a submit button for transmitting a URL request to the web server. The URL request is a VB CGI executable program which queries the Access database, and returns the word list information for the search word. The design of the program was modelled after examples of VB/Access CGI programming provided by R. Denny, the designer of WinHTTPD and the WinCGI. Basic CGI initializing operations are handled by Denny's CGI.BAS module. This application was experimental only. No attempt was made to enter all the information contained in the LLA. The primary objective was to demonstrate the potential of rendering a particular approach to lexical information retrieval in the form of a hypermedia presentation for easy web access. The application was implemented in Windows 95 using 32-bit VB 4.0 and Access 7. The web server was O'Reilly's 32-bit Website 1.0.

Dictionary Database
As discussed in Webster and Ning (1996), if we are to duplicate the human lexicon for computational purposes, we must then endeavour to duplicate the human attempt at defining words, i.e. defining word meaning based on associations or relations with surrounding words and phrases. The human lexicon is best described as an inventory of words as objects. Corresponding to each word are natural classes of (a) other words or objects in thematic relations, e.g. Agent-Goal; (b) other words in taxonomic relations, whether superordinate or subordinate; and (c) other words in synonym/ antonym relations. The dictionary database is being implemented as an object-oriented database using POET, which will be made accessible on the World Wide Web using remote ActiveX controls.
The primary and immediate objectives of this proposed research into developing a web-based dictionary database are (a) to provide access to the information contained in our database over the Internet, and (b) to demonstrate not only the feasibility but also the advantages of an object-oriented database design modelled after the human lexicon.

References
Jonathan Webster. 1995. "Web Access to a Lexical Database using VB/Access CGI Programming," in the Proceedings of the 10th Pacific Asia Conference on Language Information and Computation, City University of Hong Kong, pp.249-254.
Jonathan Webster and C.Y. Ning. 1996. "The CityU-B.A.(Hons) in Language Information Science Website Management Programme: equipping Hong Kong tertiary students with website management skills" in Collaboration via The Virtual Orient Express. Social Science Research Centre, The University of Hong Kong, pp. 379-388.

Jonathan Webster and C.Y. Ning. 1996. "WWW Bilingual Chinese-English Language Dictionary Database," Euralex '96 Proceedings, pp. 189-196

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1997

Hosted at Queen's University

Kingston, Ontario, Canada

June 3, 1997 - June 7, 1997

76 works by 119 authors indexed

Series: ACH/ALLC (9), ACH/ICCH (17), ALLC/EADH (24)

Organizers: ACH, ALLC

Tags