- Ingwersen, P. (2002). Cognitive perspectives of document
representation. In: Fidel, R., Bruce, H. , Ingwersen, P. and Vakkari, P. (eds.)
Emerging Frameworks and Methods - Proceedings of the Fourth International
Conference on Conceptions of Library and Information Science (CoLIS 4),
Seattle, July, 2002. Colorado: Libraries Unlimited, 2002, 285-300.
The paper reviews and analyses the cognitive conception of polyrepresentation or multi-evidence applied to information retrieval. Three types of aboutness are discussed, i.e., author, indexer, and user aboutness, as well as isness of information objects, that is, other forms of metadata, also serving as document features. The assumption that highly relevant objects are found in the retrieval overlaps of cognitively and functionally different origin is analysed with reference to performed empirical tests, and the utility of clustering of objects by complex representations for navigation or visualisation purposes is briefly analysed
- Larsen, B. (2002). Exploiting citation overlaps for Information Retrieval:
generating a boomerang effect from the network of scientific papers. Scientometrics,
2002, 54(2), p. 155-178.
A new citation search strategy is proposed for Information Retrieval (IR)
based on the principle of polyrepresentation (Ingwersen, 1992, 1996). The
strategy exploits logical overlaps between a range of cognitively different
interpretations of the same documents in a structured manner, i.e. so-called
cognitive overlaps of representations. The strategy is essentially a ‘cycling
strategy’ starting with documents retrieved by a subject search, wherefrom
new documents are identified automatically by following the network of
citations in scientific papers backwards and forwards in time. In contrast
to earlier citation search strategies the proposed strategy does not require
known relevant documents (seed documents) as a starting point, but may be
based on a subject search. A pilot study is reported where the ability of
the strategy to retrieve additional relevant documents is analysed. Results
show that a very large amount of documents can be retrieved by the strategy,
and that these may be segmented in a number of distinct ‘overlap levels’.
It is demonstrated that the combined core of the higher-level overlaps
contains higher relevance density than found in the original retrieval
results. Based on these results it is suggested that the documents be
displayed in order of their presence in higher-level overlaps, so as to
maximise the chances that as many relevant documents as possible will be
presented first to a user.
- Schneider, J. & Borlund, P. (2002).
Preliminary study of the
potentiality of bibliometric methods for the construction of thesauri. In: Fidel, R., Bruce, H. , Ingwersen, P. and Vakkari, P. (eds.)
Emerging Frameworks and Methods - Proceedings of the Fourth International
Conference on Conceptions of Library and Information Science (CoLIS 4),
Seattle, July, 2002. Colorado: Libraries Unlimited, 2002, 151-165.
The paper presents the overall idea of how bibliometric methods may be
applied to thesaurus construction as a supplement to intellectual and manual
construction and maintenance processes.
The paper reports on the initial experiment of the bibliometric based
creation of a text corpus from which candidate thesaurus terms can be
extracted and relationships uncovered. The results are promising as to
the possibility of creating a valid sample of overlapping documents by use
of the data set isolation method (Ingwersen & Christensen, 1997).
- Björneborn, L. and Ingwersen, P. (2001).
webometrics. Scientometrics, 50(1), p. 65-82.
Since the mid-1990s has emerged a new research field, webometrics, investigating the nature and properties of the Web drawing on modern informetric methodologies. The article attempts to point to selected areas of webometric research that demonstrate interesting progress and space for development as well as to some currently less promising areas. Recent investigations of search engine coverage and performance are reviewed as a frame for selected quality and content analyses. Problems with measuring Web Impact Factors (Web-IF) are discussed. Concluding the article, new directions of webometrics are outlined for performing knowledge discovery and issue tracking on the Web, partly based on bibliometric methodologies used in bibliographic and citation databases. In this framework graph theoretic approaches, including path analysis, transversal links, 'weak ties' and 'small-world' phenomena are integrated.
- Ingwersen, P. (2001). Cognitive Information
Review of Information Science and Technology, vol. 34, p. 3-52.
This chapter reviews and discusses critically the development during the last decade of the cognitive approach to information retrieval (IR) research and theory. The focus is analytic and empirical research on the complex nature of information need formation and situation, their inherent association with the concept of relevance, and the development of cognitive and related IR theory and evaluation methods. The time span is largely 1992-2000 (references to earlier works are provided as needed). Thus, the review complements and extends the previous ARIST chapters on cognitive research (ALLEN, 1991) and the user-oriented perspectives of IR research and analysis methods (SUGAR).
Since its start in 1977, the cognitive approach to information science has developed in two periods. The first covers 1977-1991 and can briefly be characterized as user- and intermediary-oriented. The second period is 1992-2000 (the major concentration of this chapter), when the approach turns into a holistic view of all the interactive communication processes that occur during information transfer.
Following the introduction, the review falls into five major sections. The first section highlights the scientific developments, characteristics, and substantial results of the cognitive approach in the first period. This section also includes drawbacks and criticisms of the approach, that is, the lack of realism, theory integration, and holistic perspective. This is followed by a section covering the development in the second period about the focus shift into a holistic cognitive view, with a subsection on views of information processing. The third section covers information structures, with a subsection on information need. The fourth section looks into the dimensions of cognitive IR theory in a holistic perspective. Subsections concern polyrepresentation of information objects, the cognitive space and IR interactions, and relevance and evaluation issues. This section includes work task conceptions and issues concerned with feedback and query modification. The fifth section approaches the integration of cognitive models of information seeking, IR, and scientific communication, including a discussion of critical issues. A concluding section ends the review.
- Ingwersen, P. (2001). Users in
context. In: Agosti, M., Crestani,
F. and Pasi, G. (eds.) Lectures on Information Retrieval. Bonn: Springer
Verlag, 2001, p. 157-178 (Lecture Notes in Computer Science: 1980).
Users as actors in interactive information retrieval (IIR) are seen in the contexts of their perceived work tasks and information seeking behaviour. The paper models IIR processes by demonstrating a variety of approaches, ranging from Ingwersen's cognitive communication model for IR interaction, over Saraceveic' stratifed model which includes a typol- ogy of relevance conceptions, to Borlund's model of work task perception, information need development and relevance assessments. Other associ- ated models and perspectives of IIR are discussed when appropriate to the major focus points of the contribution: information need develop- ment and typology; understanding of relevance in IIR; and experimental problems in IIR.
- Ingwersen, P., Noyons, E. and
Larsen, B.: Mapping
national research profiles in social science disciplines. Journal of
Documentation, 2001, 57(6), p. 715-740.
The paper investigates the advantages of graphical mapping of national
research publication and citation profiles from scientific fields in order
to provide additional information with respect to research performance. By
means of multi-dimensional scaling techniques national social science
profiles from seventeen OECD countries and two periods, 1989-1993 and
1994-1998, are mapped, each profile represented by a vector of either
publication volumes or citation values for nine social science fields. Aside
from demonstrating the developments of publication volumes and citedness
ranges as well as patterns, the graphical maps display clusters and
similarities of national profiles over time. Combined with international
rankings of averaged national impact factors (NIF) relative to the average
world impact of field (WIF) for the same number of fields and periods, the
graphical display supplies additional otherwise concealed information of the
differences in research patterns between countries - even when the NIFs are
quite similar. The analyses show that low Pearson correlation coefficients
can be applied to flag extraordinary instances of either high or low
national citation impacts during a period. Most importantly, the graphical
maps make a strong case for adjusting or tuning the baseline impact to the
actual national publication profiles when comparing NIFs of different
countries. A new indicator, the Tuned Citation Impact Index (TCII) is
proposed. It is constructed from the amount of expected citations a country
ought to have received in each research field aggregated over its true
profile. Common baseline profiles, like those of the world or EU, are
consequently not regarded as the ideal benchmark. In the case illustrated by
the journal publications of the social sciences the paper verifies the
hypothesis that a dominant central cluster exists consisting of the large
Anglo-American countries: USA, Canada and the UK. A further hypothesis, that
the smaller northern EU countries with English as the second language are
located together and close to the central cluster on the publication maps is
only partly satisfied in the second period. A third hypothesis, that
countries located near the central cluster on the citation maps may hold
high(er) NIFs is falsified.
- Larsen, B.: Polyrepræsentation som princip for indeksering og
genfinding af videnskabelige fuldtekstdokumenter [Polyrepresentation as
principle for indexing and retrieving scientific full text documents]. In: Biblioteksarbejde,
2001, no. 62: 15-26
The article describes the inspiration and motivation behind the author's
PhD project and the methodology to be used. The main purpose of the project
is to carry out empirical tests of the principle of polyrepresentation as
put forward by Ingwersen in 1996. With the cognitive viewpoint in
Information Retrieval as theoretical framework the idea is to work with a
large number of representations of the same documents. By utilizing overlaps
between these poly-representations it is hoped that the uncertainty and
inconsistency inherent in IR can be reduced and better performing IR systems
designed. The article outlines the research questions as well as the
methodology to be used in the project.
- Larsen, B. and Ingwersen,
Synchronous and diachronous
citation analysis for Information Retrieval - generating a boomerang effect
form the network of scientific papers. In: Davis, M. and Wilson, C.S.
(eds.), Proceedings of the 8th International Conference on Scientometrics
& Informetrics, ISSI2001, Sydney, July 16-20, 2001. Sydney: The
University of New South Wales, 2001, Vol. 1: 355-368.
The paper describes a new set of citation search strategies for
Information Retrieval (IR) purposes based on the principle of
polyrepresentation (Ingwersen, 1996) exploiting overlaps between a range of
representations of the same documents in a structured manner. In contrast to
earlier strategies these do not require known relevant documents (seed
documents) as starting point, but is based on a subject search. A pilot
study is reported where the performance of the strategy is compared to a
normal subject search. The proposed strategy does not outperform the regular
search, but results are good enough to indicate that it would be interesting
to carry out further investigations of the approach.
- Ingwersen, P. and Christensen,
F.H. (1997). Data set isolation for bibliometric online analyses of
research publications: Fundamental Methodological Issues. Journal of the
American Society for Information Science, 48(3), 205-217.
The aim of the article is to emphasize and illustrate the retrieval dimensions of data collection activity online and their influence on the research evaluation outcome. The attempt is to reinforce the link between online retrieval and bibliometrics. Given that various forms of publication counts and citation analyses provide a valuable and revealing quantitative starting point for more qualitative indications and assessments of Science and Technology (S&T) performance, it is evident that their reliability and objectivity must be undisputed as far as possible. The article discusses the basic problems and limitations inherent in online bibliometric data collection and analyses, and points to possible solutions by means of illustrative case studies and examples. The reason for performing local publication analyses online often arises because of the increased use of external research assessments made by centralized bodies. For small institutions in small countries, like the North European one, such self-analyses may in addition provide valuable and inexpensive insights into novel S&T niches to explore. The major concern is the extent to which online bibliographic and domain dependent databases, as a supplement to the Institute for Scientific Information (ISI) citation files, are suitable for quantitative analysis and mapping of R&D outcome. By merging these two different types of databases into a single cluster, the method of duplicate removal becomes crucial.
The article introduces a novel removal procedure by describing and exemplifying the principle of Reversed Duplicate Removal (RDR). RDR enables the analyst to take control of the location of the duplicates and to perform tailored analyses of the overlap of identical documents between files. It is well known that the databases themselves present obstacles directly associated with the process of performing online retrieval of the information necessary for further analysis. Problems encountered are, for instance, poor or inconsistent subject indexing within a single database or among several databases. Name form inconsistencies as to authors, institutions, and journals, the lack or inaccessibility of vital data in the database structures, etc., also present obstacles. On the other hand, comprehensive online bibliometric analyses are in many ways easier, faster, and less expensive to perform locally than those made using the independent CD-ROM versions of the relevant databases. In contrast to the online versions, the CD-ROM systems demonstrate a vital shortage of robust data processing and manipulation facilities. The downloading of records from a variety of CD-ROM files, the cleaning-up process, and the ensuing data processing activities become cumbersome and resource demanding. Regardless of database versioning, the degree of awareness of these retrieval and set isolation factors, such as the relevant search commands, syntax, and the analysis assumptions on the part of the analyst, plays an important role for the quality of the analysis outcome.
- Borlund, P. (2000a). Evaluation
of interactive information retrieval systems. Åbo: Åbo Akademi
University Press, 2000. 276. Doctoral dissertation: Åbo Akademi University.
The present dissertation work concerns the development of an alternative
approach to the evaluation of interactive information retrieval systems (IIR
systems). Alternatively, with respect to the experimental Cranfield model
which still is the dominating evaluation approach to the evaluation of IR
and IIR systems. The three revolutions (the cognitive, the relevance, and
the interactive revolution) put forward by Robertson and Hancock-Beaulieu
(1992) are used as the framework to explain the current demand for
alternative approaches to IIR systems evaluation. The three revolutions
point to requirements that are not fulfilled by the Cranfield model. The
Cranfield model does not deal with dynamic information needs but treats
information needs as a static concept entirely reflected by the user request
and search statement. Further, this model uses only binary, topical
relevance ignoring the fact that relevance is a multidimensional and dynamic
concept. The conclusion is that the batch-driven mode of the Cranfield model
is not suitable for the evaluation of IIR systems which, if carried out as
realistically as possible, requires human interaction, potentially dynamic
information need interpretations, and the assignment of multidimensional and
The main contribution of the work is the proposal of the three-part package
to the evaluation of IIR systems - the so-called 'IIR evaluation package'.
The aim of the package is two-fold: 1) to facilitate evaluation of IIR
systems as realistically as possible with reference to actual information
seeking and retrieval processes, though still in a relatively controlled
evaluation environment; and 2) to calculate the IIR system performance
taking into account the non-binary nature of the assigned relevance
assessments and respecting the existing and different types of
The essential elements of the package are the sub-component of a simulated
work task situation and the performance measures of Relative Relevance (RR)
and Ranked Half-Life (RHL). The simulated work task situation, which is a
short 'cover story', serves two main functions: 1) it triggers and develops
a simulated information need by allowing for user interpretations of the
situation, leading to cognitively individual information need
interpretations as in real life; and 2) it is the platform against which
situational relevance is judged. Further, by being the same for all test
persons experimental control is provided. As such the concept of a simulated
work task situation ensures the experiment both realism and control. The
performance measures of RR and RHL are capable of bridging the
interpretative distance between the objective and subjective types of
relevance involved in the evaluation of IIR systems, as well as managing
non-binary relevance assessments. The RR measure bridges horizontally across
the applied types of relevance. The RHL indicates the vertical position of
the median value of the assigned relevance values for one type of
The package is validated throughout the chapters of the dissertation. This
is done analytically in Chapters 2, 3, and 4 with the introductions and
discussions of the cognitive viewpoint, which the present dissertation is
based upon, the concept of relevance; and the tradition of IR evaluation.
Empirically, the concept of a simulated work task situation is tested in
Chapter 6, which consequently reports on the applicability of the concept to
IIR evaluation. Chapter 7 demonstrates the performance measures of RR and
RHL based on data collected for the purposes of Chapter 6.
With the package being anchored in the holistic nature of the cognitive
viewpoint, as a hybrid of the system-driven and the cognitive user-oriented
approaches to IR evaluation, it is seen as a first instance of a cognitive
approach to the evaluation of IIR systems.
- Borlund, P. (2000b). Experimental
Components for the evaluation of interactive information retrieval systems.
In: Journal of Documentation, Vol. 56, no. 1, 2000, 71-90.
This paper presents a set of basic components which constitutes the
experimental setting intended for the evaluation of interactive information
retrieval (IIR) systems the aim of which is to facilitate evaluation of IIR
systems in a way which is as close as possible to realistic IR processes.
The experimental setting consists of three components: (1) the involvement
of potential users as test persons; (2) the application of dynamic and
individual information needs; and (3) the use of multidimensional and
dynamic relevance judgements. Hidden under the information need component is
the essential central sub-component, the simulated work task situation, the
tool that triggers the (simulated) dynamic information needs.
This paper also reports on the empirical findings of the meta-evaluation of
the application of this sub-component the purpose of which is to discover
whether the application of the simulated work task situations to future
evaluation of IIR systems can be recommended. Investigations are out to
determine whether any search behavioural differences exist between test
persons' treatment of their own real information needs versus simulated
information needs. The hypothesis is that if no difference exists one can
correctly substitute real information needs for simulated information needs
through the application of simulated work task situations.
The empirical results of the meta-evaluation provide positive evidence for
the application of simulated work task situations to the evaluation of IIR
systems. The results also indicate that tailoring work task situations to
the group of test persons is important in motivating them. Furthermore, the
results of the evaluation show that different versions of semantic openness
of the simulated situations make no difference to the test persons' search
- Borlund, P. and Ingwersen, P. (1998).
Measures of Relative
Relevance and Ranked Half-Life: Performance Indicators for Interactive IR.
In: Wilkinson, R., Croft, B., and van Rijsbergen, C., eds. Proceedings of
the 21st ACM Sigir Conference on Research and Development of Information
Retrieval. Melbourne, 1998. Melbourne: ACM Press. 1998, 324-331.
This paper introduces the concepts of the relative relevance (RR)
measure and a new performance indicator of the positional strength of the
retrieved and ranked documents. The former is seen as a measure of
associative performance computed by the application of the Jaccard formula.
The latter is named the Ranked Half-Life (RHL) indicator and denotes the
degree to which relevant documents are located on the top of a ranked
retrieval result. The measures are proposed to be applied in addition to the
traditional performance parameters such as precision and/or recall in
connection with evaluation of interactive IR systems. The RR measure
describes the degree of agreement between the types of relevance applied in
evaluation of information retrieval (IR) systems in a non-binary assessment
context. It is shown that the measure has potential to bridge the gap
between subjective and objective relevance, as it makes it possible to
understand and interpret the relation between these two main classes of
relevance used in interactive IR experiments. The relevance concepts are
defined, and the application of the measures is demonstrated by
interrelating three types of relevance assessments: algorithmic;
intellectual topicality and; situational assessments. Further, the paper
shows that for a given set of queries at given precision levels the RHL
indicator adds to the understanding of comparisons of IR performance.
- Borlund, P. and Ingwersen, P. (1997).
The development of a
method for the evaluation of interactive information retrieval systems.
In: Journal of Documentation, Vol. 53, no. 3, 1997, 225-250.
The paper describes the ideas and assumptions underlying the development of
a new method for the evaluation and testing of interactive Information
Retrieval (IR) systems, and reports on the initial tests of the proposed
method. The method is designed to collect different types of empirical data,
i.e. cognitive data as well as traditional systems performance data. The
method is based on the novel concept of a 'simulated work task situation' or
scenario and the involvement of real end users. The method is also based on
a mixture of simulated and real information needs, and involves a group of
test persons as well as assessments made by individual panel members. The
relevance assessments are made with reference to the concepts of topical as
well as situational relevance. The method takes into account the dynamic
nature of information needs which are assumed to develop over time for the
same user, a variability which is presumed to be strongly connected to the
processes of relevance assessment.