Computer
Computer
Computer
IR FORCE 19,
FEASIBILITY OF COMPUTER PROCESSING OF TECHNICAL
H INFORMATION ON THE DESIGN OF INSTRUCTIONAL SYSTEMS
ul
By
M F.L. Scheffler
A F.J. Da Polito
R.L. McAdams
M.J. Gee
N University of Dayton Research Institute
3(^1) College Park Avenue
Dayton, Ohio 45469
I
ADVANCED SYSTEMS DIVISION
Wright-Patterson Air Force Base, Oh Io 45433
1/ US OEPARTMENTOP HEALTH.
EDUCATiONA NEI.FARE
R
NATtONAL INSTITUTE OF
EDUCATION
DOCUMENT HAS BE-Ero REPRO
uCED EXACTLY AS RECEIVED F RCM
E
HE PERSON CR CRGA4!1ST:ON OW. GIs January 1974
S
1NG POINTS OF VIE 7. CR CP,NIONS
`A'r ED DO NOL ...EL E. ICAO,,,' , E PRI Final Report for the Period 1 July 1972 31 March 1973
L4' Or P,CIAL OSAL_ of
r),,A, ON RCci'.-CN
0
U RApproved for public release; distribution unlimited.
C
E
LABORATORY
0
AIR FORCE SYSTEMS COMMAND
BROOKS AIR FORCE BASE,TEXAS 78235
NOTICE
AFHRL-TR-73-40
4. TITLE (and Subtitle) 5, TYPE OF REPORT & PERIOD COVEREO
Feasibility of Computer Processing of Final Report
Technical Information on the Design of _11 July 1972 - 31 March 19731,
e. PERFORMING ORG, REPORT NUMBER
Instructional Systems TJDRI- TR -73 -23
7. AUTHOR(s) a. CONTRACT OR GRANT NUMBER(s)
F. L. Scheffler, F. J. DaPolito, F3361.5-72-C-2091
R. L. McAdams, and M. J. Gee
9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT, PROJECT, TASK
AREA & WORK UNIT NUMBERS
University of Dayton Research Institute
300 College Park Ave. 62703F
(thin 454(,4 1710 03 33
II. CONTROLLING OFFICE NAME AND ADDRESS 12, REPORT DATE
HQ Air FOrce Human Resources Laboratory January 1974
Brooks Air Force Base, Texas 78235 13. NUMBER OF PAGES
114
14. MONITORING AGENCY NAME a ACDRESS(01 different from Controlling 011ica) IS. SECURITY CLASS. (oi this report)
Advanced Systems Division of Air Force ..
17. DISTRIBUTION STATEMENT (of the abstract entered in Block 20, if different from Report)
1B SUPPLEMENTARY NOTES
19. KEY WORDS (Continua on reverse aide if necessary and identify by block number)
Document Retrieval Systems Feasibility Training Research In-
Information Retrieval Systems Training Research formation
On-line Information Systems User Needs Study ISD Information Sys-
Instructional Systems Design Evaluatidn tems (contld)
20. ABSTRACT (Continue on reveres aide if necessary and identify by block number)
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAOE(Whern Doe Entered)
SUMMARY
PURPOSE
APPROACH
A study was made of the current and projected informatiorl needs of both
researchers and practitioners by conducting a series of structured interviews
with typical potential users. Questions were asked to elicit characteristics of
information need requirements. However, discussions in response to the
questions were permitted to range to include each individual's expressions of
information needs and his perception of a suitable information environment.
A survey was made of various available information storage and retrieval
systems which could meet the apparent requirements. Ten different systems
were studied from the standpoints of retrieval capability, input and updating
requirements, data manipulation, accessibility by the user, hardware require-
ments and availability, and costs.
A review was made of question answering interactive retrieval systems
based on artificial intelligence concepts. The purpose of this review was to
project the likely state of development of such systems within a three- to five-
year peri od with consideration of the need for consultive or prescriptive
guidance in instructional system design.
RESULTS
The users' needs study revealed that four types of data bases could be
considered: (1) a literature-derived data base, (2) a data base for interactive
prescriptive guidance, (3) a data base listing experts in various phases of
training and human performance, and (4) a data base consisting of a subject-
index catalog of already designed and existing instructional systems. The need
for cross-referencing between data bases was clearly seen as a desirable
feature.
The investigation of available information storage and retrieval systems
indicated that nearly all systems investigated were suitable for document re-
trieval. Howeve.:, the techniques and software for an interactive presriptive
I
or question-answering system is beyond the present state of the art. Of the
information retrieval systems surveyed, the Avionics Central system operated
by the Air Force Avionics Laboratory at Wiight-Patterson AFB, Ohio is con-
sidered the most suitable. This system is a natural language interactive
information retrieval system which provides for automatic indexing, file
maintenance, and rapid retrieval of abstracts and related items.
The results of investigating question-answering systems based on
artificial intelligence conce'pts show that recent theoretical break-throughs
have brought the state of the art to the threshold of developing practicable
generalized semantic information systems. Nevertheless, further signifi-
cant advances are yet necessary before a validated and useful prescriptive
system can be developed. Because of rapid developments in this field, an
active current awareness should be maintained. The only type of prescriptive
guidance system which appears practical within the three- to five-year time
period is the "computerized handbook" concept or human-derived extractions
from available literature and experts' knowledge.
CONCLUSIONS
2
PREFACE
This is a final technical report and covers the work accomplished from
1 July 1972 through 31 March 1973.
The authors acknowledge the assistance and cooperation of other
organizations and personnel. In particular we are grateful to the Avionics
Laboratory for permitting us to establish a model date base with the Avionics
Central software system, to the Foreign Technology Division for converting
MT/ST data to computer -readable form, and to the Office for Computing
Activities of the University of Dayton for reformatting data into the proper
input format for the model data base, and to the Applied Science Associates
for cooperation in providing suitable abstracts.
Mr. Robert R. Roalef, Air Force Avionics Laboratory
Mr. Frank G. Jonr.ts, Air Force Avionics Laboratory
Mr. William R. Mace, Air Force Foreign Technology Division
Mr. Jack A. Pugh, University of Dayton
Mr. Sanford P. Schumacher, Applied Science Associates
Mrs. Jacqueline F. March, University of Dayton Research Institute
Miss Deborah M. Rekart, University of Dayton
3
TABLE OF CONTENTS
2. 1 Methodology 9
3 INVESTIGATION OF INFORMATION 20
STORAGE AND RETRIEVAL SYSTEMS
4
TABLE OF CONTENTS
SECTION TITLE PAGE.
4 NATURAL LANGUAGE QUESTION- 27
ANSWERING SYSTEMS
5.4 Costs 48
REFERENCES 50
5
TABLE OF CONTENTS
SECTION TITLE PAGE
B-1 Introduction 94
B-1.1 Description 94
B-1.2 Term Definitions 94
13-2.1 Starting Up 97
B-2.2 Communicating 98
13-2.3 Requesting a Search 98
B-2.4 Modifying a Search 103
B-2.5 Displaying Search Results 104
SECTION 1
INTRODUCTION
7
The project was conducted in four phases to accomplish the purpose
of the feasibility study. These phases are as follows:
1.. Study of Users' Needs
2. Survey of ten information systems
3. Review of the state of the art of question-answering systems
4. Design, establishment, and testing of a model data base.
Each of these phases is treated in a separate section in the balance
of the report.
8
SECTION 2
2.1 METHODOLOGY
11
important areas of interest. A real need exists for reliable criteria for
measuring and evaluating training effectiveness.
Practitioners are interested in the actual development of training
courses for specific training situations. They are less concerned with the
reasons for the effectiveness of training techniques; rather they need to know
techniques which will work for their parlicular training situation. The re-
liability of training techniques is of concern to the practitioner. The
practitioner needs to be assured that the procedures he uses in designing a
training course will be able to develop,effectively the skills needed by the
trainees.
2.3.2 Current Sources of Information
Some interesting observations were made regarding current sources
of information. AFHRL personnel expressed the opinion that numerous de-
ficiencies exist in current literature sources. One severe problem is the
lack of standardization and constancy of terminology. Often new phrases or
terms are created to refer to already existing concepts described by other
terminology. Another serious problem is that author-generated or informa-
tion center-generated abstracts of technical literature are often inadequate.
Thus, to screen articles or reports of possible interest, the researcher can-
not depend on the abstracts alone, but he must usually read a major portion
of the document, often, only then, to discover that the article is of no value.
If more pertinent information and data were provided in a convenient format,
the user could determine the probable relevance of the documents with far
greater efficiency. With current literature sources and formats, the frustra-
tion factor often becomes so great that the researcher frequpntly resorts to
his own methods with inadequate information. AFHRL personnel are litera-
ture conscious and would make more extensive use of literature if the litera-
ture itself were formatted and described more adequately. Retrieval of ,
12
A valid taxonomic scheme could prove useful in organizing technin:-
cal literature in training and instructional system design, but the probability
of developing a sound taxonomy of training and instructional concepts which
would be acceptable among the experts in the field seems very low.
Both researchers and practitioners tended to rely on current
information sources in the following order of importance:
(1) Personal knowledge and experience
(2) Contacts with colleagues and personnel in appropriate
disciplines
(3) Personal library collection of training documents
(4) Government Technical Reports
(5) Key scientific and technical journals
(6) Selective dissemination of information (SDI) bibliographies
and abstract journals, such as Research in Education
(7) Air Force manuals, state-of-the-art reviews, handbooks,
technical hooks, and bibliographies
2.3.3 Identification of Infor ation Needs
The question concerning identification of information needs which
.must be satisfied to perform the work of AFHRL'personnel was difficult to
answer. Both practitioners and researchers indicated that information needs
vary widely depending on the specific assignments on which they were working,
The identification of information needs is basically an analytical
problem, i.e., to determine what constitutes the essential inform on one
needs to answer a question. If one could determine the basic 'informs ion
units or elements he needs, the problem then arises of how to acquire re-
liable data and information relating to these needs and integrating the nfor-
mational units to a proper answer. One primary problem is to correl to
available theoretical and experimental studies and results to applied pr cti-
cal problems in a reliable fashion.
Air Force training requirements are usually complex and consist
of many task-oriented steps. The trainees have a very diverse range of
skills and knowledge prior to commencing a training program. At the Ad-
vanced Systems Division, researchers are concerned with future long-range
efforts of a multi-disciplinary nature. Therefore, information needs are
13
difficult to state and can cover a wide range. An item of information which
was of little or no interest previously may suddenly become very significant.
Because of the dynamic nature of training information needs, an information
storage and retrieval system must be highly flexible in order to accommo-
date these needs.
2. 3.4 Situations aqorm mation Needs
14
The practitioner, in the Air Training Command (ATC), a highly
experienced person in applied training, suggested two types of queries
which must be handled. The first type is a question concerning general
procedures. For example, one requester wanted to know how he could
most effectively establish a five-day middle management course in manage-
ment procedures and policies. In answering a query of this nature, the
practitioner first attempted to find out what specific training courses were
available on the topic. ,Had they been available, a listing of any such
courses would have been obtained and examined. If such a course had been
discovered in this way, the relevant course descriptions would have been
obtained directly. Other suggestions for administering the course might
also have been obtained.
The second type of query indicated by the practitioner is of a
more, specific nature. For example, he may be asked how important color
is in training films. To answer this type of query, he would search technical
reports and state-of-the-art reviews. The bibliographies contained therein
often guide him to more appropriate specific documents. He provides the
requester with appropriate document references and pertinent excerpts from
documents on hand.
Discussions were held concerning the information activities in
which the Advanced Systems Division of AFHRL is currently engaged, and
future information requirements. Although queries from outside organiza-
tions are received, the more prevalent need is in planning research
activities on more generalized training/instructional problems which the
Division has predicted or detected or which have been suggested by some
organization. However, with a good information system there could be
considerably more information activities involving responses to queries.
The questions exist, but there is no focal point to which training-related
questions can be directed. It is expected that the quantity of such questions
would increase greatly if a good computer-based technical information sys-
tem were available.
2. 3. 5 Concept of an Information System
To focus on the desirability and feasibility of a computer-based
information storage and retrieval system, it was important to derive from
potential users their concept of an information system with specific con-
sideration of the functions to be fulfilled by the system. Some responses to
this question were quite specific regarding particular access or retrieval
points which should be proVided. Other responses were more concerned
with the levels of sophistication which would be possible, and the significance
of such levels of sophistication to the user. The user's ability to interact
effectively with the system in accomplishing his information seeking tasks
was emphasized.
15
Specific retrieval parameters which should be provided include:
subject area; author; title; independent and dependent variables studied; date;
corporate author; sponsoring organization; journal source; type of study
(lab experiment, field study,. correlation study); instrumentation, stimuli,
media or hardware used; type of subjects; methodology or experimental de-
sign; modality; training environment; overall quality rating of the study.
It would be highly desirable for the system to be able to display on request
the bibliography or list of references contained in the retrieved documents.
In addition to retrieval, it was suggested that the ideal system would also
manipulate stored information such that areas of needed training research
would be pinpointed, contradictory results would be indicated, and needs
for additional research would be suggested.
The practitioner would like to see a computer.-basee. information
system with a data base both for applicable research literature and for exist-
ing instructional programs on various subjects. The system should be up-
datable so that invalid material could be deleted and replaced and so that new
programs and modifications of existing programs in the data base could be
added. The following descriptive and/or retrieval information should be
supplied with a narrative description of the instructional system.
(1) Date that the course was developed.
(2) Time required to complete the course; total instructional
hours.
(3) Objectives of the training course; include aircraft/weapon
system if appropriate.
(4) Location (responsible military organization) of training
program.
(5) Name of the organization which developed the training
course.
(6) Equipment required.
(7) Trainee characteristics (previous training, average
military rank, educational level, etc.).
(8) Evaluation of effectiveness.
It was suggested that a correlation of job specialty codes with
training courses would be useful. AF Manual 39-1 and AF Manual 36-1
were cited as examples. A typical question which might be addressed to
16
the system would be: "what self-instruction courses are available for the
maintenance of certain electronics equipment?"
In discussing the more general ch-racteristics of information
systems, three levels of sophistication were indicated. The most sophisti-
cated system would consist of an interactive on-line system which could be
used by the practitioner in the field to assist him in an instructional system
development problem. The interaction would occur such that the system would
lead him like a consultant in natural or near-natural English to refine his query
until a well-defined search strategy had been formulated. In many respects,
the interactive program would perform similarly to a Computer-Aided Instruc-
tion (CAI) program. The computer could suggest additional keywords and
phrases to the user which he could seect at his option to provide as precise
a search strategy as appropriate. The search of the system would result in
prescriptive guidelines to the user. For example, in posing the problem of
aircraft recognition, the ideal prescription would provide detailed advice
such as: "Show seven shapes for 0.2 seconds at five-second intervals". What-
ever the prescriptive statements or principles, an indication of the level of
confidence for the statement should be given, and appropriate source ma-
terials from which the statement is derived should be provided. Another
feature of this most-sophisticated level would be the ability to manipulate
specific data and information such that new concepts could be generated.
The second level of sophistication would permit an expert in in-
structional systems to obtain guidance to aid him in solving instructional
problems by retrieving prescriptive statements with considerably less
prompting from the computer in formulating his query or search strategy.
The results of the search would be basically the same with either system,
i.e., principles or prescriptive statements. Preferably, an indicated confi-
dence level and references to bibliographic source material would be provided.
One way of conceptualizing this second level of sophistication is as a "com-
puterized handbook" of instructional technology. A handboolE is basically a
distillation and extraction of facts and principles derived from extensive re-
search and application-oriented studies. The computer-based prescriptive
System would present the user with this type of information in response to
queries of the system.
The third level of sophistication would be similar to document
storage and retrieval systems in widespread use today. With such a system,
an expert in instructional research and development would retrieve biblio-
graphic references in response to his search. For example, if the user re-
quested all references on "PROFICIENCY MEASUREMENT", the system
might respond, "250 documents satisfy your request. " By successive inter-
actions, additional search restrictions can be applied by requiring additional
search terms and/or by limiting the output by date, by type of document, or
by some other specifiable retrieval parameters until a reasonable number
17
of retrievals is obtained. Display of document titles, authors, abstracts,
and other records would be possible. The end result of a search would be
a bibliographic listing. Detailed study on the part of the user would require
him to refer to the original document.
2. 3. 6 Use of Information, Res onse Time and Current
Awareness Needs
The use of information generated by an information system may
vary depending on whether the user is a practitioner or a researcher. The
practitioner generally would use information directly, to apply it to designing
instructional courses. Both researchers and practitioners indicated that re-
sults from an information system would be used to assist in consulting.
Researchers would use the information system to identify areas
needing research, to obtain research information related to a topic or pro-
ject, and to learn of research procedures used on similar projects. In
performing a search of the data base, the researcher may be able to con-
firm the need for a research project by the fact that the system did not
reveal material on the topic being considered. The search results may
show that research in the area of consideration is uncoordinated and iso-
lated. Such results from the search of the system could offer guidelines to
the researcher in defining his project, so that past research can be utilized
and gaps in research can be filled in a systematic manner.
The response time required also would vary depending on the use
to be made of the information, A practitioner designing a course of instruc-
tion needs immediate response. A researcher or practitioner in a consulting
role needs a response preferably of less than one day. Researchers develop-
ing and planning new areas of research can tolerate much longer response
times, perhaps as long as two weeks. However, in any case, the ability to
search an information system interactively was strongly indicated as being
highly desirable. Interactive capability would permit negotiation of the search
request with the system until appropriate results were obtained. This capa-
bility would be especially useful in preventing the researcher from pursuing
literature searching which ultimately would prove to be of no value.
Current awareness needs, while important, do not appear to be
crucial in terms of an information system. Selective Dissemination of In-
formation (SDI) as a spin-off from updating an information system on topics
related to training research would be a desirable, but not essential, function,
Both researchers and practitioners feel that sources they now employ --
reading of selected journals, receipt of current awareness announcements,
and daily interaction with colleagues and others -- are generally sufficient.
18
2.4 SUMMARY OF USERS' NEEDS
19
SECTION 3
20
Name Vendor/User
Document Processing System IBM/Foreign Technology Division,
USAF
21
(4) A text processing automatic indexing system would be desirable.
(5) The feasibility of making the system available to .AFHRL
personnel must be seriously considered.
(6) The economic factors involved, including telephone line charges,
hardware required to operate the system, storage requirements,
and operational costs should be reasot able.
3. 3 RESULTS or THE INFORMATION SYSTEMS SURVEY
Review of the IS&R systems studied 'showed that any one of the ten
systems could be used, but to achieve the most suitable information system
factors such as the various system features, costs, and availability to AFHRL
were carefully considered in selecting a specific system. The information
system characteristics were reviewed ins terms of data base creation, user
interaction, special features, hardware and software requirements, costs,
and advantages and disadvantages. The advanced state of development of
most of the systems for document retrieval operations indicated that they
met the criteria established for selection. Thus the final selection required
close scrutiny of trade-offs in system performance, costs and availability
to AFHRL, practitioners, and other users.
The Avionics Central system operated by the Avionics Laboratory with
Mead Data Central software satisfies the criteria established. The informa-
tion that would be desirable to put into a data base can be composed of the
full text of the document in machine-readable form. The text processing
automatic indexing feature provides that every word of every document
(except for defined common words such as THE, OR, BUT, IT, etc. ) be-
comes a retrieval parameter. Thus, a powerful search capability is avail-
able to the user. He can retrieve by word phrases as well as by single words
and logical combinations thereof. The internal file structure of this system
is a completely controlled indexed sequential structure wherein the data con-
tent of the user-defined data base is stored in both a.sequential (serial-indexed)
and an inverted (word-indexed) mode. The structure is oriented toward terminal
inquiry response, and the user has no need for physical control over the inter-
nal structure.
When adding, deleting, or changing documents, the update programs
essentially completely rebuild the system. Therefore, it is sometimes more
efficient to treat massive files as separate data bases rather than to merge
.1.w.wahr.111,
*The Avionics Central System operates primarily With the Mead Data Central
Software. However, additional capabilities have been added to the system
by Avionics Laboratory personnel.
22
them with smaller, more rapidly changing files. itional segments can
be added to existing files at any time, whenever it is de mined that they
are needed. This system is open-ended and extremely flex e. =
23
The Avionics Central System is available presently from 0745-1045
and 1315-1600 EST. The system is accessible through a dial-up terminal
on local phone lines. Both 10 character per second and 30. character per
second transmission rates are available. In testing this system, we found
response times to be less than 30 seconds, and usually in the 5-10 second
range. The local supplier has provided a very comprehensive system with
on-line retrieval and batch updating capabilities. The documentation,
including a section on definition of terms and the provision of examples, is
well done.
This recommendation is based on information that is currently
available. However, if requirements and circumstances change, consider -
ation should be given to evaluati4 again Control Data's Query /Update,
IBM's STAIRS, and Battelle's BASIS 70.
ANTICIPATED DEVELOPMENTS IN THE STATE OF THE ART OF
OlviPUTER-BASED IS&R SYSTEMS IN 3-5 YEARS
3.5.1 ae
The mos ignificant advance in the state of the art in three to
five years will be the red ion in the cost per byte of mass random access
storage. Devices that will m this possible include the TERA-BIT* 20-
inc tape memory system now ma .eted by Ampex and the SCROLL 2-inch
tape memory system which is under elopment at Control Data Corporation.
Bubble memories and laser driven'memo es will also compete for their
share of the mass storage market. In additi. devices commonly known as
disc storage systems will continue to improve s
ON.* was
tly in cost/petformance
and in storage capacity; however, the limit of such vices probably will
soon he reached in terms of storage densit'y as well as cost/performance
ratio. Until all of the bugs are eliminated from the 20-inch d 2-inch
magnetic tape systems mentioned above, it is anticipated that t standard
1/2 inch tape storage systems will continue to be used with densiti- of
2300 bits per inch (bpi) becoming more common.
Within the next five years the cost/performance ratio of mass
storage devices is expected to decrease by a factor somewhere between 10
and 100. By the late 1970's the increased use of Metal Oxide Semiconductor,
Large Scale Integration (MOS, LSI) will continue to decrease the cost for
main memory. It is anticipated that by the late 1970's, main memories will
be of the 2-4 million byte size for a medium-priced computing system.
* This product name comes from the term TERA (for trillion), used to
identify the capacity of the system.
24
3.5.2 Input Devices ,
The cost of data entry also will decrease significantly by the late
1970's, as the cost for video display devices which will be connected to disc
or tape storage systems decreases. Continued research on cathode ray
tubes (CRT's) and the new "plasma" display clnrice will result in some im-
provement. However, the decrease in cost of the LSI logic and memory will_
reduce the cost of data entry most dramatically.
For information storage and retrieval of literature and other pub-
lished reports, an alternative data entry procedure has already been developed
and will be much more commonly practiced in 3-5 years, This technique in-
volves the preparation of the article prior to publication in machine-readable
form. As the various editing processes proceed for a given publication, this
machine-readable code will also be edited and a machine-processable form
of the final journal article will be distributed to appropriate centers for
computer sotrage and retrieval. This technique will eliminate the most
costly aspect of preparing data bases of journal articles and technical reports
which exists at the present time, namely, the need for converting hardcopy
printed form to machine-readable form. Updates, supplementary data and in-
formation, and editing corrections to the data bases will be made by the
input devices already mentioned. In addition, it is anticipated that within
the next five years audio response input devices may be perfected sufficiently
for large-scale use. The use of such devices would completely eliminate the
need for keying information for input.
3.5.3 Output Devices
lb
25
3.5.4 Data processing
As has already been mentioned, the decrease in costs for logic
and integrated circuits used in memory will play a significant role in de-
creasing the cost of computing systems in the late 70's. More important
than costs however, is the fact that new file structures and file access pro-
cedures will be available in packaged form which will allow the use of con-
tent addressable memory. With such a package, memory can be addressed
by content rather than by location. The inq this development will have
on information storage and retrieval is of vital significance. Information
contained within Main memory can be addressedimmediately; thus, the re-
sponse times corresponding to information storage requirements will be
greatly minimized without a significant increase in cost over current serial
processing machines. In order to achieve this cost reduction, for information
storage and retrieval, rnochines with very Much larger processing capacity
will be required. The trillion byte memories that will be available, whether
'laser driven or on magnetic tape, will have extremely high transfer rates
and will require very large processors in order to bupport them.
In order to make these extremely large processors cost effective,
it is anticipated that national networks will be utilized. Network technology
is reasonably well understood at the present time, and the growth of many
such networks is anticipated. Large special purpose data bases will be
accessible at reasonable costs from any point in the country. Already there
are commercially available nationwide networks for special purpose on-line
information storage and retrieval. Examples are the Med line system, which
contains information derived from medical periodical literature; and the
Science Inforination Association which utilizes Baftellefs BASIS 70 for such
data bases as the National Technical Information Service bibliographic refer-
ence data of unclassified unlimited technical reports and the Chemical
Abstracts Condensates data base of journal references in chemistry and
chemical engineering.
Due to the rapidly advancing state of the art in information storage
R.1 d retrieval, and because of the increased emphasis on developing sophisti-
cated equipment for this application, it is anticipated that information storage
and retrieval costs in the next three to five years will decrease dramatically.
26
SECTION 4
27
veloped by Mc Cawley.3 These neo- transformationali.sts see semantics
rather than syntax as the focal point of investigation. According to these
investigators, it is unnecessary and undesirable for a grammar to have both
a syntactic categorical component and a semantic interpretation component.
A single component was proposed which would generate directly semantic
representations expressing meaning and containing sufficient information for
the operation of transformations. Thus, as a possible move it the right
direction, the "generative semanticists" have emphasized the investigation
of the nature of human language and de-emphasized the goal of mechanically
generating strings with the correct order and the appropriate constituent.
structure.
Given the''actual expressed goals of Chomsky's standard theory, it is
not surprising that meaning and semantic representation is the aspect of
human language which had been given the least consideration. Even the
generative semanticists, who place major e7.nphasis on semantic representa-
tions as underlying surface structures, regard meaning only from the point
of view of descriptive linguistics. Winograd 4states that Ineaning must
be seen, not as a function of words and sentences (e.g., of phonological or
grammatical entities), but rather of the intention of the speaker and the
ability of the hearer to reconstruct this intended meaning. And if one is to
discuss meaning and semantic representation with cognitive import, it must
be discussed in the context of a theory which attempts to explain human
thought processes. Just as one cannot describe syntax in the abstract and
hope to achieve a model of how syntactic structures are stored in the mind,
one cannot isolate a "linguistic" meaning for sentences which can be de-
scribed apart from a concern with cognitive structures and expect this
linguistic meaning to-relleartge-aZtal rEpeeseleStrdn-b-r7FrealTilTrin
mind.
28
Thus, in spite of the claims which Chomsky has made about his theory
seeking to give it cognitive status, neither he nor the theory of generative
semantics make hypotheses concerning the structure of linguistic knowledge
in the mind. Due to the lack of such a cognition-based theory to serve as a
foundation, significant progress is not likely to occur in the near future in
developing an artificial natural language conversational man-machine dialog
question-answering system.
Considerations of a less theoretical nature are significant for the
selection of an 1$&R system which can serve nearer-term needs .of AFHRL.
Storing a piece of information so that it can be retrieved is the basic language
problein in information processing today. Advantages and disadvantages of
the various approaches available are considered below.
At one extreme, index terms can be chosen from a relatively small
list of descriptive words. This allows the searcher to query the file readily,
and its simplicity is appealing. A problem arises as the collection increases
in size, however, since the amount of information cited under any one term
can grow to unmanageable proportions. In addition, since the type of vocabu-
lary involved is of a "static nature", constant revisions of the authorized list
are required due to the "dynamic nature" of natural human language. It is
known that language is in constant flux: terms are constantly introduced and
discarded. This is especially true of scientific and technical literature where
the occurrences of newly-coined terms introduced by definitions, or the change
of existing terms by means of explication, is a readily observable phenomenon.
- The....b4,5i.c., drawback to the_fixesi vor_abulary index.ing.system described,_
above is that converting from one authorized vocabulary to another usually
involves a great expenditure of time and effort. Large computer runs ate
necessary as well as much manual indexing. Furthermore, since a com-
plete conversion from one vocabulary to another is usually impossible, a
searcher must know old as well as new terms in order to locate all documents
in which he is interested.
A second approach to information retrieval, at the opposite extreme,
relies on text-word indexing from actual text of documents or selected por-
tions thereof. This provides great flexibility in meeting new demands of
technology but creates for the searcher the task of exhausting alternative
means of expressing concepts as they may have occurred in the author's
natural language.
Most organizations employ a system that lies somewhere between the
two extremes cited above. One problem is that many reference tools neces-
sary to search collections at present are not normally available to the user
(such tools as thesauri, identifier lists, open-ended term lists, frequency
counts, etc.).
29
The implementation of on-line search techniques has prompted interest
in the use of natural language for information retrieval. Today, many
people making queries via on-line systems are not information processing
specialists. These users, therefore, have no interest in learning a
structured artificial language to query a data base. Thus, widespread use
of computers by non-computer users makes the idea of using natural
language for information storage and retrieval very appealing.
The review of the literature on riat,...al language question-answering
systems in conjunction with our own findings of user needs and available
IS&R systems has confirmed our conviction that a text-processing automatic
indexing system indeed represents the best choice for an information stor-
age and retrieval system to meet the needs of AFHRL personnel. Although
a true artificially-intelligent computer-based interactive information system
does not seem feasible in the near term, pending further advances in linguistic
theory, some features of a prescriptive guidance system as envisioned by
AFHRL are already possible with the computer and IS&R technology available
today.
10
t>'
30
SECTION 5
31
TABLE 1
108 SUBJPOPN SUBJECT Describes the characteristics of 100 Normal Mixed College Students
POPULATION the subject population Aged 20-30
112 INDVAR INDEPENDENT Describes the independent Type of Instructional Materials
VARIABLE variables or stimuli utilized
in the experiment
116 DEPVAR ' DEPENDENT Describes the dependent variables Effect on Learning
VARIABLE or responses studied
120 MEAS/STAT MEASUREMENT/ Describes the measurement tech- ANOVA; Chi-Square Analysis
STATISTICAL METHODS niques and/or statistical methods
used In the study
124 ADDNLRSCH ADDITIONAL RESEARCH Describes the abstractor's sugges-
SUGGESTIONS tions for additional research based
on the document
128 CROSSREF CROSS REFERENCES Provides cross references to other
items, either in the data base at
hand or to items in other data bases
132 QUALINDEX OVERALL QUALITY Abstractor's assessment of the 1
INDEX quality of the document derived
from the summary and evaluation
form accompanying the compre- .
hensive index. Values are 1.2. or 3.
with adclitiorial specialized segments to meet the needs of AFHRL. The
specific segments, segment definitions and typical specific data, are
presented in Table I.
5.2 RETRIEVAL FROM THE MODEL DATA BASE
Retrieval from the data base is accomplished using the Avionics
Central system. Specific user instructions are provided as Appendix B
to this reperL. There are two basic operations in retrieving information:
the specification to the system of the retrieval requirements and the display
to the user of the records or specified portions of the records. Because of
the full-text index/retrieval capability Of the system and the file design by
segments, the user has a vast range of capabilities available to him. The
omprehensive abstracts prepared as input to the model data base are
ticularly amenable for providing a wide scope of information to the user.
Si e the user can display such items as the conclusions, the independ-
ent variabl the dependent variable, the subject population characteristics,
and other ele nts, the system has available many desirable features.
Certainly the sys -m designed is far superior to a system which would merely
generate a bibliogra. y of access numbers, perhaps in conjunction with
titles and authors.
Indeed, the system des : is entirely adequate to accommodate the
computerized handbook concept scribed in Section 2. The computerized
handbook concept could be impleme ed by incorporating prescriptions as an
additional data base. These prescript s would represent human-derived
extractions and syntheses from textbooks, technical reports, and other
source material which could easily be cross eferenced. Retrieval of
prescriptions could be effected by searching the ppropriate segment for
that data base. Corresponding descriptive keywor could also be incorpor-
ated to aid in retrieving the appropriate prescription
5. 2. 1 Computer-User-Dialog
The first step in searching the data base is to "sign o ". The
signing on is accomplished by dialing the computer's telephone numb
The sign-on dialog 13 indicated by showing computer messages in all ca tal
letters and user responses in all lower-case letters. Control is returned
the computer (transmit) with a teletype terminal by depressing the control
key and the letter 's' simultaneously. This is indicated by (cntrl a). As
40
soon as the signal is established the computer message is: YOU ARE NOW
IN COMMUNICATION WITH AVIONICS CENTRAL (000). PLEASE ENTER
10 CHARACTER IDENTIFICATION.
(002)
41
END OF OUTPUT FOR THIS QUERY. (227)
REPLY/ (cntrl s)
AFHRL IS THE CURRENT FILE, MESSAGE OPTION. ANSWER YES TO
CONTINUE, OR ENTER FILE AND MESSAGE OPTION. (047)
REPLY/ yes (cntrl s)
ENTER REQUEST. (048)
At this point, the user has executed a complete search cycle and is
now ready to proceed to the next query. When he is finished, it is only
necessary to hang up the phone, and the user is automatically logged out.
Further details of search and display procedures are given in the following
paragraphs. A more complete description of user procedures is given in
Appendix B.
5.2.2 Search Procedures
The user initiates a search of the data base by using $ sign as a
"look up operator. The segment is then indicated by acronym or segment
number. An operator is then specified which tells the system how to search.
The argument represents the particular value or text components of'the
segment to be matched in the search process. Logical AND and OR express-
ions are used to connect individual search arguments. The expression
"$ANY" instructs the system to look in all segments for a match. Examples
of search specifications are shown below.
Example 1. Give me documents concerned with the subject
ADJUNCT PROGRAMMING
Search spe-ification:
$any equ 'adjunct programming'
The search specification tells the system to search the data base
in all segments for documents containing the search phrase
ADJUNCT PROGRAMMING. Note that search phrases are
enclosed in single quote marks.
Example 2. Give me documents on PROGRAMMED INSTRUCTION
authored by L. BRIGGS
$any equ 'programmed instruction' and $author equ 'briggs,
42
Example 3. Give me documents in which a TACHISTOSCOPE
was used to study AIRCRAFT RECOGNITION
$any equ (aircraft or airplane) and recogni****,
and $appar/mecii equ tachistoscope
Note that the asterisks. permit a word stem search to include
any word whose first seven characters are RECOGNI; thus
documents containing recognize, recognition, and recognized,
would all qualify. This search is broader in nature than If
the phrase 'AIRCRAFT RECOGNITION' had been specified.
At the completion of the computer search, the user is notified of
the number of documents that satisfy his request. If the number appears
reasonable, he can continue with the display operation. However, the,
number of documents may be rather large. In this case, the user can modi-
fy the original request to narrow its scope. For example, by searching
$ANY EQU 'PROGRAMMED INSTRUCTION', 130 documents, qualify. There
are several means of narrowing the request. One means is to add logical
AND words or phrases. Another method would be to require 'PROGRAMMED
INSTRUCTION' to be in the title. Still another method would be to recothe
the date to be 1972 or 1973. In the particular instance cited, the search was
modified as follows;
REPLY/ $any equ 'programmed instruction' (cntrl s)
130 ANSWERS, HOW DO YOU WANT TO PROCESS THEM:
43
taining the word phrase 'programmed instruction' in the title should be more
relevant than those documents merely mentioning programmed instruction in
the abstract or in some other segment. The reply 00,p (print all segments
at the printer) directed that the complete records of the qualifying documents
be printed off-line and forwarded to the user; the address of the user is
maintained by Avionics Central.
Another possibility is that no documents qualify in response to the
request. In Example 3 above, on tachistoscopes used for aircraft recognition
there were no documents which qualified. The original request was modi-
fied as follows:
NO ANSWER FILLS REQUEST. (150)
REPLY/cntrl s
ENTER FILE, MESSAGE OPTION. (040)
REPLY/afhr1,1kwid (cntrl s)
ENTER REQUEST. (048)
REPLY/$appar/medi equ tachistoscope (cntrl s)
YOUR REQUEST IS BEING PROCESSED (109)
44
ing personnel in aircraft recognition. It should also be noted, of course,
that the model data base searched consists only of about 500 abstracts,
which can account for a low number of retrievals.
Keywords can be very useful for searching. Keywords used in
manual indexing are derived from a controlled vocabulary with a fixed
format, and therefore represent uniformity both in format and with regard
to the semantic representations of the keywords.
5.2.3 Display
One of the chief advantages of the Avionics Central model data
base system is the wide number of display options available to the user.
Since the Avionics Central system is interactive, the display features can be
used to great advantage in screening the retrieved materials for relevance.
By successive search modifications, the user can sharpen the precision of
results until he achieves exactly that subset of documents from the data base
which he wants. This technique of retrieval display, modification, display,
etc. is called the interactive iterative (I2) retrieval technique.
For the first screening, only access numbers and titles may be
displayed. From this screening, obviously nonrelevant documents often
can be spotted. Also recurring factors causing nonrelevant retrievals
may suggest terms or phrases which can be negated in a search modifica-
tion for nonteaching purposes, e.g., student records, may formulate a
search with $ANY EQU COMPUTERS AND EDUCATION OR TRAINING AND
$ANY NEQ CAI OR COMPUTER-ASSTED INSTRUCTION. The expression
"neq" means "not equal to". As the I technique derives the set of documents
which looks promising, the researcher can request more segments to be
displayed for more detailed screening or, indeed, to provide the information
he wants directly. If desired, the complete records can be ordered off-
line and sent to the user. The user is cautioned that complete records or
lengthy segments such as the abstract do require considerable time to be
printed or displayed on-line, especially if there are very many documents.
With CRT display terminals, the "paging" feature permits the user to browse
through records very conveniently. The paging feature refers to the capa-
bility of displaying records at various intervals, for example, every tenth
document record.
In addition to displaying whole records or segments of records,
the user can enter a display command for only those segments in which the
search words were found (HITS), segments in which the words were found
plus additional segments (1-iii-N-PIC), words in context around the search
words in whatever segment search word appears (KWIC-IT), and words
in context around the search words in whatever segment the words appear
45
plus additional segments (KWIC-N-PIC). Further explanation is provided
in Appendix B. An example is given below for the phrase "teaching machine".
AFHRL LIBRARY LOCATION
AC 20283 2 (1 of 2 ans.)
ACCESSION NUMBER:
00000283
REPORT TITLE:
HIERARCHICAL PREVIEW VS PROBLEM ORIENTED REVIEW IN LEARN-
ING AN IMAGINARY SCIENCE
REPORT AUTHOR:
MERRILL, M b; STOLUROW, L M
CITATION:
MERRILL, M.D., & STOLUROW, L. M, HIERARCHICAL PREVIEW VS
PROBLEM ORIENTED REVIEW IN LEARNING AN IMAGINARY SCIENCE.
"AMERICAN EDUCATIONAL RESEARCH JOURNAL," NOVEMBER 1966,
"3"(4), 251-261.
ABSTRACT:-
THE MATERIALS WERE PRESENTED BY MEANS OF SOCRATES, A
COMPUTER-BASED TEACHING MACHINE USING AUTOTUTOR TEACHING
MACHINES AS INTERFACE UNITS (STOLUROW & DAVIS, 1965; MERRILL,
1964). SIX HUNDRED ...
KEYWORDS:
FRAME; LEARNING; PROBLEM SOLVING; RESEARCH; RETENTION;
REVIEW; TEACHING MACHINE; TESTING
APPARATUS/MEDIA USED:
CAI/CMI; PROGRAMMED TEXT; TEACHING MACHINES; SELF-INSTRUCTION
SUBJECT POPULATION:
675 NORMAL MALE COLLEGE FRESHMEN
46
list. This concept was described under the paragraph above under search-
ing. Keywords are handled by the system just as any other textual data,
such as abstract words.
Updating with Avionics Central Is a batch mode operation and a time-
consuming one. In a computer system such as Mead Data Central there
is a trade-off between rapid updating and rapid response on retrieval.
Avionics Central has elected to bias the system heavily in favor of rapid
response. For a system such as the model data base and for the majority
of IS&R systems, this situation is ideal, since updating is performed rather
infrequently and rarely are existing records edited.
Updating is accomplished by supplying additional increments of abstract
record data to Avionics Central. The mechanics of updating are accommo-
dated automatically by the software. In essence, the inverted files are
regenerated, and the serial file is added onto. A very convenient feature
is the ability to edit previously existing records to correct mistakes, to
insert additional data, or to update with new validated data which supercedes
that data which previously existed in the record.
The sighificant aspect of the updating process is that computer-readable
tape in the correct format must be supplied. In the creation of the model
data base, MT/ST tapes were converted to machine-readable form, and
supplementary data available only in hardcopy were keypunched and merged.
The format to be provided to Avionics Central is 80-column card image
records as follows:
Col. 1 - 60 Col. 61 Col. 62 - 69
Data/text Blank Acc. No. with Leading
zeroes (right justified)
Col. 70 - 74 Col. 75 - 77 Col. 78 - 80
Segment No. File No. (003) Sequence No. (Line no.
within the segment for
(right justified) multiline items; right
justified)
In initially processing data into the model data base, inconsistencies
in input data were present which created many problems in applying
corrections. Some guidelines for preparing input data in MT/ST form
which should help in future updating are as follows:
1. The MT/ST codes should be applied to permit easy separation
into segments as desired.
47
2. Fields or segments should be separated by clearly identifiable
delimiters (something other than spaces). Special characters
such as @ would work.
3. The access number should be carried (entered) with each
separate field.
4. The MT/ST tape must correspond to the hardcopy. Erasing
hardcopy does not change the tape. The MT/ST tape must be
corrected.
5. Avoid using lower case 1 and alpha 0 to represent numeric 1
and 0. The numeric values must be used when intended.
6. Within the text of abstracts, use.a consistent means of indicating
points to be made. Inter- as well as intratext consistency is
important. As an,example, "The following performance criteria
were applied:
(1)
(2)
(3)
etc.
7. The identical format must be used in providing data for the
various segments; again inter- as well as intratext consistency
is required. An example follows:
correct format - AV Communication Review
incorrect format - A V Communication Review
A-V Communication Re View
Maintenance of the data base is a function of the operation of the
computer system. For the model data base, maintenance is accomplished
by Avionics Central. The files are maintained with random access disk
storage of data. File integrity is ensured by locking of files through the
identification number.
5.4 COSTS
48
Assurnink a data base of comprehensive abstracts similar to the r&-kodel
data base, estinated annual costs are broken down assuming the commercial
rates for Mead Data Central which are known. To fit within the three to five
year time period it is assumed that about 20,000 abstracts may be available
for the data base.
Softviare/hardware leasing $1500/mo. x 12 mos. $18,000
including user-system
interactions
Storage of data on discs $10/106 charaicters/ $12,000
mo. x 100.10° char.
x 12 mos.
Conversion of MT/ST $1000/update x4 $ 4,000,
tapes updateslyr,
Preparation of computer- $ 2,000
readable update tapes
and updating of data base
through Avionics Central,
assuming quarterly up-
dates
Total annual cost $36, 000
It should be noted that the costs of the information system do not include the
costs of preparing the comprehensive abstracts. It should also be pointed
out that the comprehensive abstracts represent a data base of document
surrogates from which one actually can retrieve facts and principles as well
as evaluative and reference data. As such the data base is unique, since
other IS&R systems do not provide the depth of analysis and comprehensive-
ness of coverage in the field of training and educational research as well as
actual instructional system design.
Considering the value of such a data base, not only to the Air Force
but to many potential users in the training community, it seems reasonable
to consider amortizing the cost of maintaining and updating the system.
This could be accomplished by making it available not only to other Air
Force and military organizations, but also to the aerospace industry,
psychologists, universities, the Department of Health Education and Wel-
fare, and other organizations. By charging for services, cost sharing
would be achieved, just as Avionics Central is cost sharing Mead Data
Central software and hardware among many users.
49
REFERENCES
evelo mers.
Psychology Volume 3 No. 2, January 1972.
5. Schumacher, S.
Design and211e2f rz.ucth1Lauyi2ler...2 USAF Human
Resources Laboratory, Technical Report. AFHRL-TR-73-41,
December 1973.
50
APPENDIX A
IS83La:Item Page
Document Processing System 52
BASIS-70 65
DIALOG 74
ORBIT II 77
NAS.T3 80
Query/ Update 85
STAIRS 88
51
DOCUMENT PROCESSING SYSTEM
1.0 DESCRIPTION
1.1 INTRODUCTION
The IBM Document Processing System (DPS) operates under control
of Operating System/360. The system is designed to process narrative and
bibliographic data into interrelated data sets. Searching is done in an on-line
interactive mode, and positional or word phrase logic as well as normal Boolean
logic is available.
5Z
with Boolean and/or positional lo t . For example:
$1 fiber, fibre
$2 glass
$3 $2 & $1 (+1)
$4 polymer & composite (sen)
$5 $3 & $4
This search requires that the phrase "glass fiber" or "glass fibre"
occur in the same document as a sentence containing the words, "polymer" and
-\ "composite". This search could be made broader by using different positional
operators, e.g., (+2) in line $3. The (+2) operators would retrieve
the terms "glass" and "fiber" or "fibre" separated by a maximum of one word in
either direction. If a narrower search is desired, (+1) could be used in line $4.
After the search is executed, the number of retrievals is printed
on -line: the user then has the option of the qualification mode. If he wants
only the very recent unclassified material he may add these lines:
if date ge 71 or $6 if date ge 71
and classif It 1 $7 if classif It
$8 if $6 and $7
By labelling his qualifying statements, the user can use multiple levels of
Boolean logic among qualifiers.
Output is available either on-line or off-line. The document titles
and all bibliographic elements can be listed on-line. Abitracts are not available
on-line because of the time needed for printing. However, complete output
(title, bibliographic data, abitract text) can be ordered off-line.
I. 4 SPECIAL FEATURES
DPS provides for truncation of terms by entering the notation ($),
after the word root, e.g., produc($). The truncation feature can only be used in
a logical OR string, but by labelling one can readily incorporate a truncated word
stem in a positional logic statement as follows:
$1 Manufactur($)
$2 Facility
$3 $1 & $2( +1)
DFS also provides the option of establishing a Synonym/Equivalent
data set. These lists contain related search terms to be searched in addition' to
the requested Dictionary entry. The system designer must decide what words
(if any) are synonymous or equivalent to each Dictionary term,and must specify
these relationships in the file.
53
In the Foreign Technology Division (FTD) application for Centralized
Information Reference and Control On-Line (CIRCOL),auxiliary files have been
established called the Nonsignificant Word File and the Word Form Conversion
File. The Word Form Conversion File permits alternative forms to be converted
to the standard Dictionary form and the Nonsignificant Word File extends the
function of the Coniumn Word File, thus permitting the rejection of more than
255 different words and permitting nonsignificant words longer than eight characters
to be rejected.
The Limit Range (LRANGE) feature incorporated in CIRCOL permits
one to specify the portion of the file to be searched. Unless the LRANGE is
specified, the entire file is automatically searched. By restricting the search
to a portion of the file, the turnaround time in the interactive mode is improved
considerably. The feature is particularly useful in "negotiating" a search request.
For example, if a search strategy of 100,000 documents results in 200 documents
and the total file consists of more than 800,000 documents, a search of the entire
file will retrieve about 1500 to 1600 documents. Therefore the search strategy
should be made more restrictive before searching the entire file.
2,0 HARDWARE
DPS is operating on the IBM 360/65 at FTD. It uses OS and supports
teletype speed devices. Response time is rather slow, largely due to the size
of the data base.
3.0 COSTS
The costs for DPS would be dependent on arrangements that could be
made for installing and operating software. The FTD system is currently unavailable
for other data bases outside of FTD.
4.0 CONCLUSIONS
Advantages:
1. The system is an interactive natural language-based system.
2. The system is well capable of accommodating the data base
envisioned for the AFHRL applications.
3. The cost would probably be fairly low.
Disadvantages:
1. The software programis fairly old; it was written for second
generation hardware and does not operate efficiently on third
generation equipment.
54
2. The turnaround ti e is slow.
3. Output Options are limited; customized output formats must
be designed.
4. The Dictionary File, NSW, CW, and WFC Files must be
continually maintained by human lexicographic decision-
making.
5. Only batch mode updating is possible.
6. Previous records are not editable.
In view of the above, DPS does not we rrant further consideration.
55
MEAD DATA CENTRAL
1.0 DESCRIPTION
1. 1 INTRODUCTION
56
1.3 USER INTERACTION
On-line use is available for interactive interrogation and display
of information. A user accesses the on-line system through an "access key"
which determines those data bases and file of the accessible data bases which he
may enter. The on-line interrogation is done with free form (quasi-English)
with high level operators [equals (=); less than (<); between (I); greater than
(>)0 and Boolean connectors (AND, OR, NOT).
In conducting a search there are two basic comparative conditions
available: logical conditions and arithmetic conditions. Both require the speci-
fication of segment name, operator and argument. The argument is the actual
data item. For example a logical condition : $AUTHOR EQU MILLER means
that the word "MILLER" (a raument) is to be found in the segment (field) known
as "AUTHOR". The operator may be indicated either by the = sign or EQU.
For example an arithmetic condition:,
$LENGTH-2P 1.0 YD, 1 FT, 8,1 IN
means that the arithmetic value of "1.0 YD, 1 FT, 8.1 IN" (or"56.1 IN" or
4 FT, 8. UN") must be exceeded in the segment (field) known as "LENGTH".
The use of Boolean connectors with the system allows for two
levels of conjunctivity and one level of disjunctivity. They are defined as follows:
A and B or C means A & (B or C)
A & B or C means (A & B) or C
The use of more than one operator to connect arguments for the same segment
name is allowed. For example:
$DATE BTN JUN 70 AND JAN 72
57
EQU'
$NR BTN./ X, X, X, . . . , X (where X stands for the known
accession numbers). If the operator is "EQU" then each specified accession
number is set for retrieval. If the operator is "BTN" then the numbers are
expected in pairs and each par represents a range of accession numbers to be
retrieved.
Output from Data Central is available in several forms. One
feature of the output is the ability to sort the output information as desired.
The prim- ry report capability of Data Central lies in the ability of Data Central
to interface formatting subroutines written in any of the existing procedural
languages or in one generalized subroutine. The system has the capability to
insert, at appropriate points in data codes to effect color display on the current-
ly available color CRT devices. The system also allows the user to skip the
remaining data in the report fea one entry and move immediately to, another
entry report (paging).
. 58
a) Single quotation marks about a word phrase requires the word
phrase to appear in the text searched to effect retrieval. For
example: $ TEXT EQU 'ALUMINUM ALLOY'.
b) - (Wn) - this is exemplified by:
$ PROJECT-STATUS EQU BALLISTIC (W6) MISSILE
With this specified search condition, the two logical compon-
ents must occur'Within six words of each other.
c) (WMn) - This is exemplified by:
$PROJECT-STATUS EQU BALLISTIC (WM6) MISSILE
In this condition, the first phrase component must appear'with-
in the specified number of words in front of (Minus) the position
of the second specified phrase component.
d) (WPn) - this is exemplified by:
$PROJECT-STATUS EQU MISSILE (WP6) BALLISTIC
The first phrase component must appear the specified number
of words behind the second phrase component.
e) (WPnMn) - or (WMnPn) This is exemplified by:
$PROJECT-STATUS EQU BALLISTIC (WP2M4) MISSILE
In this condition the length of each directional distance is
separately specified (e.g., BALL:ISTIC must appear Within
P his 2 words or Minus 4 words of MISSILE).
The use of a universal character (*) in the argument allows for
variants in spelling, e.g., SM*TH* implies SMITH, SMYTHE, etc. The use
of multiple universal characters appended to a root word allows for root/stem
expd.nsion, e.g., TAX***** implies TAX, TAXAlii,g, TAXPAYER, TAXPAYEM,
TAXATION, etc.
An on-line tutorial or Computer Aided Instruction (CAI) capability
is available to the user for assistance at the user's option.
2.0 HARDWARE
Data Central operates on any IBM 360/370 equipment under OS or DOS
and supports the following terminal types: TTY, 1050, 2740, 2741, 2260, CC-30.
The recommended minimum hardware configuration run Data Central is an
IBM 360/40 with 128K core and multiprogramming capabilities. Response time
has been in the neighborhood of 5 to 30 seconds depending upon the load.
59
3.0 COSTS
60
THE AMIC SYSTEM
1.0 DESCRIPTION
1.1 INTRODUCTION
The AMIC system is an inverted file information system which is
keyword code oriented. The system operates only in a batch mode, although
on-line capabilities could be added through additional programming. The
system is readily adaptable to numerous applications by appropriate coding
techniques.
61
Access number-code number units are passed through an auto-
matic hierarchical/synonym generating program which automatically creates
broader hierarchical classes and equivalent synonyms for the original input
keyword, depending on the system designer's specifications for this file.
An inverted file is established for the particular update .run. This
file is in the same format as the main file and can be searched just as the main
file. Since the newly-established inverted file consists only of update material,
this file can be searched to provide current awareness or Selective Dissimination
of Information (SDI) output for the users; this can be done automatically.
The inverted file of update data is added to the previously existing
data base by a sort/merge procedure. Any duplicate entries are automatically
eliminated. To conserve search time, the search file can be partitioned into
large segments, usually by date, so that older material is "semi-retired" and
is searched only on dem&nd. In the AMIC system, five-yea increments of the
data base are maintained in the "active" search mode.
62
2.0 HARDWARE
The AMIC system operates on the CDC 6600 utilizing the System
Indexed Sequential (SIS) file structure and the CDC 6600 operation system
(Scope 3.3). This system is maintained at the Aeronautical Systems Division
(ASD) computer facility. The minimum core capacity would consist of about
165K (Octal).
3.0 COSTS
The costs of using the AMIC system would be minimal if used on
the ASD Computer Center. Arrangements would have to be made with the
Center to operate the system. It is estimated that an initial investment of
about $10, 000 -25, 000 would be required to design the system and to initiate
the file structures required; the cost would depend on the complexity of the
system. It is assumed that the system design and start-up could be accom-
plished under contract. System maintenance, updating and searching would
cost about $500 per month.
4.0 CONCLUSION
We find the following advantages with the. AMIC system:
1. This system is undoubtedly the lowest cost automated system
available.
2. There is a compef-ent local supplier.
3. The system has proved highly effective, even for rather complex
strategies.
4, Thesaurus generating software is provided with the system.
5. The system is one of the few available with automatic
hierarchical posting capabilities.
6. SDI output can be provided directly with each update.
7. The files are completely editable.
8. The cut-off feature simulates to some degree on-time inter-
action.
We find the following disadvantages:
1. Only batch mode is currently available.
2. Manual assignment, of authorized keywords (indexing) is re-
quired.
3; There is a limited number of keyword/data item elements
available, although this restriction Is not severe (10 million).
63
4. The searching must be performed by manual selection of
keywords from the authorized keywdrd/data item listings;
a subsequent search term linking by Boolean operators for
a basically one-time search is then required.
5. Only Boolean logic is available.
64
BASIS -70
1.0 DESCRIPTION
1.1 INTRODUCTION
'60
it is imperative that the user have rapid access to them. The LIST option assures
this capability. The format of this command is to type the word LIST followed by
ALL or the line numbers desired separated by commas.
For example:
/LIST 1,3,4
or
/LIST ALL
The RESTART command is used to switch to a different information
or data base without going through the LOGIN procedure a second time.
For example:
/RESTART
The result is that all previous statements, index terms, logic com-
binations,and commands are_ erased and the user may start with Line Number 1
for whictleAr file is desired.
Current developments that Battelle is reviewing with regard to
BASIS-70 'a'ikt
a) On-line updating, purging, and editing of files so that qualified
users can conduct their own file maintenance.
b) Development of an interface whict will permit simultaneous
interaction with computer files and a microform storage device
from a single CRT terminal. Random access to indexes stored
on high-speed disks and linked to archival data stored on micro-
form media will significantly reduce the overall costs of operating
and maintaining massive information files in an on-line mode.
c) On-line generation of a display of graphic data.
2,0 HARDWARE
BASIS-70 is operated on Lattelleis CDC 6400 computer via the Intcrcom
Timesharing Operating System and supports teletype speed devices or CRT
devices. Response time has usually been less than 5 seconds.
67
3.0 COSTS
The rental costs of BASIS-70 are $1350/mo. for 4 hours per day service
and $2, 000 /mo, for all-day service. Thew, prices a'ssum.e a 25 million character
data base.
The system could be installed on the WPAFB CDC-6600 for
approximately $25, 000 -$35, 000. Costs to load a data base range from $10, 000-
$12,000 and updating the data base costs between $250-$300/million characters.
4.0 CONCLUSIONS
We find the following advantages with BASIS 70:
1. There is a near-local competent supplier.
2. The system has on-line and batch capability.
3. Response times are good.
68
RIQS
1.0 DESCRIPTION
1. 1 INTRODUCTION
70
( 7) HA ER E. L.
( 8) WESTIN HOUSE ELECTRIC CORP.
( 9) MANSFIE
(10) THE ONWAR 'nEP OF. AUTOMATIC PROCESSING
OF INFORMATIO &IMPEDED BY NINE
PRINCIPAL BARRIER
71
%STOP-stops the current job and displays CONTROL CARDS ?; addi-
tional cards then can be typed in, and they will be appended to the control
card record.
%START-is used in conjunction with %STOP to reinstate job processing
at the point where the job was stopped.
?BEFORE SEARCH PRINT "please return this output to F. Scheffler"
BEGIN SEARCH
IF #8 OR #I3 ("LANGUAGE OR. LANGUAGES") THEN PRINT RECORD.
2.0 HARDWARE
3.0 COST
CONCLUSION
72
3. The SPSS package is a very impressive feature, but we
question whether or not AFHRL would find it useful for
its particular applications.
4. Free text searching on a variety of data types is provided.
We find the following disadvantages:
1. The main disadvantage with RIQS is the fact that it is
now only available on the Northwestern University ma-
chine and has been developed specifically for their soft-
ware; it would be very difficult to convert to some other
system.
2. The system does not permit very large data bases to be
handled.
3. The system does not provide inverted file structures and
therefore large data bases are not conveniently handled.
4. The supplier is not local.
5. Updating is done only in the batch or background mode.
(This could be an advantage by precluding unauthorized
individuals.)
Although RIQS appears to be a good system, in view of the
above factors, we feel this system does not warrant further consideration.
73
DIALOG
1.0 DESCRIPTION
I. 1 INTRODUCTION
1. 3 USER INTERACTION
74
a Boolean expression, which requires that all retrieved items contain the
key words in the relationship desired. For example, the user may ask for
the index term to be equal to sets No. 1 and No. 3 (IT=1*3), where the * is the
logical connector AND The other connectors are +, which is the logical OR,
and - which is the logical NOT.
Once the desired literature references are displayed on the
terminal, the user may desire to PRINT the information, or he can modify
his search expression and continue the search.
2.0 HARDWARE/SOFTWARE
75
3.0 COST
4.0 CONCLUSION
76
ORBIT II
1.0 DESCRIPTION
1. 1 INTRODUCTION
1. 3 USER INTERACTION
77
The PRINT command causes the program to print out infor-
rziation in either the on-line or off-line mode with format options.
Although ORBIT II could be used for 'retrieval in a batch pro-
cessing mode,SDC does not recommend such use, since it does not utilize all
of the special interactive features carefully designed into the system.
2.0 HARDWARE/SOFTWARE
78
3.0 COST
4.0 CONCLUSION
79
NASIS
1.0 DESCRIPTION
1.1 INTRODUCTION
80
1.3 USER INTERACTION
r.
81
1.4 SPECIAL FEATURES
The SAVE command is used to save the current screen image
appearing in the output data of the terminal display screen. This information
is stored in another special set and the user can at any time request output,
of the contents of this set.
The user can use the RESTART command to restore the re-
trieval system to the point in his strategy that was being executed either
when the system crashed or when he was forced off the system. The com-
mand strings comprising the then-current strategy are saved. Each com-
mand string is retrieved and re-executed individually until the 'save current
strategy' is exhausted; the user can then continue his strategy.
The NASIS system provides a means of generating reports
,which allows the user to format column headings, report titles, report pagi-
nation, arithmetic sums for any field name, and the numeric tally for any
field name. This special feature is called the report generator and allows
the user to produce an overall report in a very flexible manner.
The statistical capability provided within the NASIS system
serves multiple purposes. Retrieval statistics consist of the connect time
and central processing unit (cpu) time, the total strategy length, the number
of strategies currently, stored, the data set names of the stored strategies,
the total number of terminal sessions and the date of the first and last
terminal session.
The retrieval of statistics consists of two reports. The first
report contains the activity of the EXPLEX, the average length of the strategy
per sessions, the total connect and the total cpu time, the number of
strategies currently stored, the average number of times COMMAND was in-
voked per session and the total number of terminal sessions, The second
report contains the total number of transactions per maintenance run, the
total records currently in the dataplex anchor file, the average percentage
of the file affected by the maintenance runs, the frequency of maintenance
runs, and statistics per maintenance run of add, delete and update
transactions.
The most significant special feature of the NASIS system is
that it it a highly sophisticated data base manipulation language which en-
compasses the realm of information storage and retrieval as w 1 as manage-
ment information systems. Therbfore it has great potential for any general
purpose functions. The system i1 more sophisticated than is re uired for the
envisioned AFHRL applications that we are Investigating.
82
2.0 HARDWARE
The NASIS system currently operates on an IBM 360/67 under
the Time Sharing Syt.tem (TSS). However, a conversion of the system is
being made to operate on any 360/370 under OS by January 1, 1973.
TO system currently supports the CC-30 and IBM- 2741's as
remote terminals to the system. At this time it is estimated that the pre-
sent conversion to OS will require one megabyte of core. It is being written
in PL/ 1 and assembly language and will support 20 simultaneous users.
3. 0 COST
4, 0 CONCLUSION
83
2. To build a data base, a trained PL/ 1 programmer is
required to get the job done.
E The system may be too Comprehensive and therefore
the cost to operate it may be high. The system might
be somewhat cumbersome to operate.
4. Updating is done only in batch mode, (This could be an
advantage by precluding unauthorized individuals.)
84
QUERY/U.PDATE
I.0 DESCRIPTION
1.1 INTRODUCTION
85
1, 3 USER INTERACTION
86
2.0 HARD ARE/SOFTWARE
Q is designed for the Control Data CYBER 70/Models 72,
73 and 74 and 6000 ries Computer Systems. The control system program
named INTERCOM is quired in order to permit time sharing use of the
system and to provide to final access. The. amount of core requirements
to use QU have not yet been etermined since the:software is currently under
development.
3.0 COST
87
STAIRS
1.0 DESCRIPTION
1. 1 INTRODUCTION
88
beginning of each document in the Text data set, identifies the document by
number, and contains, in formatted fields, text extracted from the document.
Words or paragraphs that are considered insignificant are referred to as
stop words or stop paragraphs. They are furnished to the program through
the STOP data sets.
For each significant word, the Inverted data set requires a
list of its occurrences. The Text File Build program generates the Inverted
data set and the Dictionary data set and puts the output onto disk storage.
In summary, for each data base in the system, four files are
automatically created: a Dictionary file, which contains every unique signifi-
cant word (i. e., words other than "and, " "the, " etc.) in that data base, along
with synonyms, the number of occurrences, and the number of documents;
an Inverted file, containing pointers (document number, paragraph number,
sentence number, and word number) to every occurrence of that word; a
pointer to the Text Index file, which contains control and security informa-
tion, and a pointer to the text; the Text file itself.
STAIRS data bases can be updated off-line by means of the
Data Base Merge program. Up to four data bases can be merged by com-
bining corresponding data sets and reorganizing them. A special feature of
this facility provides for the removal of documents in each of the data bases.
This group of data base utility programs that accepts the output of TEXT-
PAC is known as A Query and Retrieval Interactive Utility System (AQUARIUS).
AQUARIUS precludes unauthorized access to data bases. Each of the data
bases has its privacy to authorized users protected at the data base, and at
document, paragraph and field levels.
1. 3 USER INTERACTION
89
1.4 SPECIAL FEATURES
The HELP function prompts the user whenever he is unsure of
functions or command format. This function provides tutorial information
when the user signs on to the system. If the user requests HELP at this
time the system provides a description of the available functions and com-
mands. The user is prompted to ask_for more information or to proceed
under normal mode.
During processing, the user can invoke the HELP function in
any situation. The HELP processor shows the user the area in which he
reqUires assistance.
2.0 HARDWARE/SOFTWARE
3.0 COST
4.0 CONCLUSION
4 90
2. The system is written in assembler and PL/1 and is
therefore not transferrable to any system other than
an IBM 360/370 without conversion.
3, It must run under C/CS, also making it not transferrable.
4. The minimum hardware requirement is a 360/40 with
256K main memory. This appears to be expensive.
The STAIRS system provides many capabilities that AFHRL
would probably not use at this time; however, this certainly is a system
that should be considered again.
91
z
APPENDIX B
USER'S MANUAL FOR THE MODEL DATA
92
USER'S MANUAL FOR THE AFHRL MODEL DATA BASE
TABLE OF CONTENTS
Section page
B -1 INTRODUCTION 94
B-1. 1 DESCRIPTION '94
B -1. 2 TERM DEFINITIONS 94
B -2. 2 COMMUNICATING 98
B -2. 3 REQUESTING A SEARCH 98
93
B-1. INTRODUCTION
B-1. 1 DESCRIPTION
94
MtBSAGE OPTION - The choice of communication with the
computer whiCh is selected by the user. The long form (which is used
throughout this manual) is for less experienced users. More experienced
users normally will select the cryptic short form to expedite the query.
ACRONYM - Abbreviations used for the names assigned to
individual files and segments 'of files.
SEGMENT - A section of a file; corresponds to a block or
field on a standard form. Sometimes called a 'data element'.
OPERATOR - A computer term which tells the computer the
manner in which a query is.to be made, i.e., find documents which contain
the search subject, or discard documents which contain specified terms to be
negated.
ARGUMENT - The search subject, i. e. , the term, phrase or
value within a segment to be searched. The specific argument should
correspond to data which could be expected to be found in the segment(s),
e. g. an author's name within the AUTHOR segment. For segments contain-
ing text, words or word phrases serve as arguments. In most cases, the
argument is entered in plain English. The computer has a subprogram which
converts plurals to singulars and searches on both variations of a word, i.e. ,
'studies' will retrieve both 'study' and 'studies'.
$ - A computer instruction which tells, the computer that the
word immediately following (segment acronym) is where to look for the search
subject (argument).
CONNECTOR - A Boolean AND (intersection) or OR (union)
which instructs the computer how the elements of the search are to be com-
bined.
95
OUTPUT - Refers to those sections of the documents identified
as the result of a search which will be presented on the display or printed at the
central facility. Six options are currently available, as indicated in the
instructions, Paragraph 2. 5. I. For all options the Avionics Central Accession
number is always presented.
SEQUENCE - The process of arranging the retrieved documents
in some prescribed order prior to display. The Avionics Central system
currently permits five hierarchical levels of sequencing, which is adequate
for most purposes.
ACCESSION NUMBER - A unique number assigned to each
individual document stored in the system. Used internally by the computer
to identify documents, and by the user to Identify specific documents.
SORT LENGTH - In the sequencing or sorting process, the
computer arranges the retrieved documents in order, character by
character, on the information in the selected segment. In general,, the sort
length is chosen by the user as the minimum number of characters which will
put the documents in the proper sequence. For short segments, a low number
is specified; for long segments, a large number is provided.
MODE - In context with this manual, mode refers to whether
the user wants the documents sequenced in'ascending or descending order.
For alphabetical segments, they will be in alphabetical order. For numerical
segments, they will be in numerical order.
QUERY - Synonymous with 'Request'.
RECURSIVE COMMAND - Commands by which the user may
interrupt the normal sequence of interaction. They permit him to skip
forward or backward, change style of output, change the span of keywords-
in-context, etc. , and other 'goodies' which give him extreme flexibility in
manipulating the response of the computer. These commands are available
only with CRT terminals.
DIAGNOSTIC MESSAGES - Whenever the user makes an error,
or some system limit is exceeded, the computer presents a diagnostic mes-
sage to the user. All messages are followed by a 3-digit number in paren-
theses, i. e, , (xxxj. In the event that the user does not understand the
diagnostic message, he can invoke a tutorial message which explains the
diagnostic message; the user can usually then determine what action to take
next.
96
TUTORIAL MESSAGES - The Avionics Central system has a
built-in series of tutorial messages whereby, if a user is confused as to
what he should do in response to a given message from the computer, all
he has to do is enter the Recursive Command $$WHAT and the computer
will display a complete explanation of what he is required to do. By,
pressing the End, Reset, and Transmit keys or the control key and
key simultaneously for a teletype terminal, the computer returns to the
previous point at which a user iesponse was required and permits him to
respond.
CENTRAL (000)
RESPONSE
97
B-2.2 COMMUNICATING
MESSAGE
ENTER FILE, MESSAGE OPTION. (040)
RESPONSE
afhrl, lkwic (cntrl s)
B-2.3 REQUESTING A SEARCH
B - 2.3.1 General Procedure
MESSAGE
EXAMPLE
$author equ eckstrand, g (critrl a)
(look in author . (find) (the author name
segment)
Eckstrand, G)
98
B- 2.3.2 AFHRL Segments
For the AFHRL file, the segments are listed as
follpws:
SEGMENT NO, SEGMENT, ACRONYM FULL HEADING
1 A CNUIvi AVIONICS CENTRAL NUMBER
2 UPDATE LAST DATE OF UPDATE
4 ORIGACT ORIGINATING ACTIVITY
8 CLASS REPORT SECURITY CLASSIFICATION
12 DECLASS REPORT DECLASSIFICATION CODE
16 TITLE REPORT TILE
20 DNOTES DESCRIPTIVE NOTES
24 AUTHOR REPORT AUTHOR
28 DATE REPORT DATE
32 PAGES NUMBER OF PAGES
36 REFERENCES NUMBER OF REFERENCES
40 CONTR/PROJ CONTRACT, PROJECT, TASK,
WORK UNIT
62 CITN CITATION
99
SEGMENT NO. SEGMENT ACRONYM FULL HEADING
63 ABSTRTYP ABSTRACT TYPE AND DESCRIPTI
64 ABSTRACT ABSTRACT
65 CONCL CONCLUSIONS
66 EVAL EVALUATION
68 KEYWORDS KEYWORDS
69 UNIQWDS UNIQUE KEYWORDS
72 SYMBOL LOCATICN SYMBOL
76 SAFE/CAB LOCATION FILE
80 COMMENTS COMMENTS
100 RSCHMEni RESEARCH METHOD
104 APPAR /MEDI APP1RATUS/MEDIA USED
108 SUBJPOPN SUBJECT POPULATION
112 INDVAR INDEPENDENT VARIABLE
116 DEPVAR DEPENDENT VARIABLE
120 MEAS/STAT MEASUREMENT/STATISTICAL
METHODS
EXAMPLE
$title eu 4-qu ruct ion
THE COMMAND $ANY MEANS TO LOOK IN ALL SEGMENTS
EXAMPLE
101
B-2.3.5 Connectors
CONNECTORS PERMIT VARIOUS MULTIPLE
LEVELS OF SEARCHING FOR A GIVEN REQUEST
Example
Example
$title equ instructor and $abstract equ motivation (cntrl s)
(THE WORD INSTRUCTOR MUST APPEAR IN TITLE,
AND THE WORD MOTIVATION MUST APPEAR IN
ABSTRACT SEGMENT)
Example
$title equ training or instruction or education (cntrl s)
IF ANY ONE OF THE THREE SPECIFIED WORDS
TRAINING, INSTRUCTION, OR EDUCATION APPEARS
IN THE TITLE, RETRIEVAL WILL RESULT
ExamPle
102
B -2.4 MODIFYING A SEARCH
MESSAGE
modify (cntrl s)
MESSAGE
RESPONSE
EXAMPLE
103
XXX ANSWERS, HOW DO YOU WANT TO PROCESS THEM:
REPLY
RESPONSE
RESPONSE
99, c (cntrl s)
ENTER SEGMENTS TO BE DISPLAYED IN ANSWERS
1,16 (cntrl s)
EXAMPLE
00, P (cntrl s)
ALL SEGMENTS FOR ALL RETRIEVED DOCUMENTS WILL
BE PRINTED OFFLINE AND MAILED TO THE USER
B -2. 5. 2 Activating Display
PRIOR TO DISPLAY, THE COMPUTER SENDS THIS MESSAGE:
MESSAGE
SET PAPER
PRESS SPACE BAR TWICE
PRESS CONTROL AND X-OFF(S) KEYS SIMULTANEOUSLY
105
WHEN ALL DATA HAVE BEEN DISPLAYED, COMPUTER
SENDS THIS MESSAGE
MESSAGE
MESSAGE
AFHRL, LKWIC
IS THE CURRENT FILE, MESSAGE OPTION ANSWER
YES TO CONTINUE OR ENTER FILE, MESSAGE OPTION
RESPONSE
lob