Consistency Checking of Newly Added Index Terms using Inverted File and Heading Format in a CDS/ISIS Database

A library which maintains a database in ISIS should regularly check the consistency of index terms in records that are newly added to the database to improve the retrieval efficiency. This is normally done by taking printouts of all terms in alphabetical or any other convenient order to check the spelling and other errors. Libraries use either a thesaurus or local authority lists in order to control the vocabulary. During entering of data there is a chance of making spelling or entry errors. When a list is printed it contains all the terms that are newly added to the database. Normally, in special libraries the number of index terms assigned to a document record will be quite high. If all the terms are printed then the number of terms to be checked will be large, and it may lead to overlooking of terms with errors.

Several methods exist to check the list or the terms straight from the database before or after data entry. The list can also be checked using a wordprocessor. The method explained below creates a list of only those terms that are new, misspelt, or wrongly entered. More than 60% of the terms assigned to the new records must have been already available in the database, which are already checked and verified. Printing them again if they they occur in the newly added records is a waste of time and also makes it difficult to check the list. The proposed method makes use of already existing terms to filter the new index terms as well as terms that are wrongly entered, while generating the list.

The fields that usually require checking are keyword and author. They are normally entered as repeatable fields. Access is required to each of these terms individually for checking their existence in the database. ISIS does not have facility to access each occurrence in a repeatable field, except by using a pascal programme or manually entering each term as search expression. This problem is solved using the heading format in ISIS. Keyword field is taken as an example below.

FDT (part)
Tag Number

Field Name Length Repeatable
005 Date of Entry 10  
300 Primary keyword 200 R
301 Controlled keyword 200 R

005-Date of Entry

This field is used to group the newly added records for the convenience of processing. For records added before 14th of May, date of entry is 1995-05-01, and for those added after 14th it is 1995-05-15. This field is inverted as 5 0 "0D-"v5 (It produces a term 0D-1995-05-01; zero is used to make the term appear at the beginning of the dictionary list.)

300-Primary Keywords

This field is used to enter keywords that are not available in the index tool used by the library. In other words, these keywords are local. The field is indexed as:

300 0 (|KW = |v300/) (produces a term KW = Satellite Communication)

301-Controlled Keywords

This field is used to enter keywords assigned using a thesaurus. It is indexed just like the previous keyword field as 301 0 (|KW = |v301/).

To generate the output, search the current date and save the records in a save file. Move to print menu. Select print worksheet, enter the save file name and provide other parameters to print the output in single or multiple column. `Y'to sort the output file. Move to sort worksheet and enter the details as follows:

Number of Headings : 1
Heading Format : if s(mfn) = ref (|("KW=" v1), s(mfn)) then v1/& mf(v1) fi
Length of the First Sort Key : 50
HeadingProcessing Indicator : 1
FST for First Sort Key : 10 (v300/)+10(v301/)

Exit to generate the list of keywords in the alphabetical order. This output will contain only keywords that are new or wrongly entered. As there is a format exit (mf), it will print all the mfn numbers for a keyword that is new in this set. If the format exit is not used, the data entry operator will have to search the term and find out all the new records that have the new keyword. The mfn command, as in the case of generating compact index as explained in the ISIS manual will print all the mfn numbers including those of terms that are not printed in the list. This will make the list a mess. Therefore, the format exit `mf' can be used to get only the mfns of newly added terms.

How does the Heading Format Work?

If s(mfn) = ref (|("KW = " v1), s(mfn)), then v1/&mf(v1) fi

The first part S(mfn) extracts the mfn number of the first sorted heading as a string. The second part ref (|("KW = "v1), s(mfn)) uses the first heading, i.e. it is a keyword from the new records to locate the term in the inverted file and gets the mfn string of the same. It is assumed that the inverted file is updated after adding the new records. Both these parts together compare the mfn string and if it is same then the term is from the new set (when the ref command is used to extract the mfn, it gets the first mfn only though additional postings exist for the term). If the mfn strings are not the same then the keyword is from any one of the previous set. In the second case the keyword already exists in the database and the format will not print the keyword in the list. The same steps can be used to generate a list of authors or other fields wherever consistency check is required. Remember to change the sort FST to change the field tag and format.

Programme Listing

program mf (s1 : string;|w, occ: real; s2: string) [FORMAT];
    var urs, sep : string;
    un, f, urn : real;
{Program to put mfn numbers for a selected heading from the INV file}
begin
    f: = find (`KW = `|s1);
    if f = 0 then begin un : = nxtpost;
repeat
    urn : = posting (`MFN');
urs : = encint (urn, 0);
    if size (s2)>0 then sep: = `,' else sep: = ";
    s2 : = s2|sep|urs;
    un : = nxtpost;
"until un <0; end, urs: = ", urn = 0, sep. = ", end.

Sample Output

Full listing of dictionary terms of keywords in a database with 4 records :

    KW = ADHESIVES
    KW = CHEMICAL REACTIONS
    KW = COMMUNICATION SATELLITES
    KW = SPACE COMMERCIALISATION

Though the last two records contain all the above index terms the output generated is as follow:

    Adhesives
    3, 4
    Chemical Reactions
    3

- N. Narayanan Kutty & K. Mohana Kumar
VSSC Library, ISRO Post, Kerala, India - 695 022
Email : root%vssct@sirnetm.ernet.in