Text Access Potentials for Interactive Information Retrieval

Home Up


[Drawing © by Robert A. Wilson)







  • Ingwersen, P. (2002). Cognitive perspectives of document representation. In: Fidel, R., Bruce, H. , Ingwersen, P. and Vakkari, P. (eds.) Emerging Frameworks and Methods - Proceedings of the Fourth International Conference on Conceptions of Library and Information Science (CoLIS 4), Seattle, July, 2002. Colorado: Libraries Unlimited, 2002, 285-300.

    The paper reviews and analyses the cognitive conception of polyrepresentation or multi-evidence applied to information retrieval. Three types of aboutness are discussed, i.e., author, indexer, and user aboutness, as well as isness of information objects, that is, other forms of metadata, also serving as document features. The assumption that highly relevant objects are found in the retrieval overlaps of cognitively and functionally different origin is analysed with reference to performed empirical tests, and the utility of clustering of objects by complex representations for navigation or visualisation purposes is briefly analysed

  • Larsen, B. (2002). Exploiting citation overlaps for Information Retrieval: generating a boomerang effect from the network of scientific papers. Scientometrics, 2002, 54(2), p. 155-178.

    A new citation search strategy is proposed for Information Retrieval (IR) based on the principle of polyrepresentation (Ingwersen, 1992, 1996). The strategy exploits logical overlaps between a range of cognitively different interpretations of the same documents in a structured manner, i.e. so-called cognitive overlaps of representations. The strategy is essentially a ‘cycling strategy’ starting with documents retrieved by a subject search, wherefrom new documents are identified automatically by following the network of citations in scientific papers backwards and forwards in time. In contrast to earlier citation search strategies the proposed strategy does not require known relevant documents (seed documents) as a starting point, but may be based on a subject search. A pilot study is reported where the ability of the strategy to retrieve additional relevant documents is analysed. Results show that a very large amount of documents can be retrieved by the strategy, and that these may be segmented in a number of distinct ‘overlap levels’. It is demonstrated that the combined core of the higher-level overlaps contains higher relevance density than found in the original retrieval results. Based on these results it is suggested that the documents be displayed in order of their presence in higher-level overlaps, so as to maximise the chances that as many relevant documents as possible will be presented first to a user.

  • Schneider, J. & Borlund, P. (2002). Preliminary study of the potentiality of bibliometric methods for the construction of thesauri. In: Fidel, R., Bruce, H. , Ingwersen, P. and Vakkari, P. (eds.) Emerging Frameworks and Methods - Proceedings of the Fourth International Conference on Conceptions of Library and Information Science (CoLIS 4), Seattle, July, 2002. Colorado: Libraries Unlimited, 2002, 151-165.

    The paper presents the overall idea of how bibliometric methods may be applied to thesaurus construction as a supplement to intellectual and manual construction and maintenance processes. 
    The paper reports on the initial experiment of the bibliometric based creation of a text corpus from which candidate thesaurus terms can be extracted and relationships uncovered.  The results are promising as to the possibility of creating a valid sample of overlapping documents by use of the data set isolation method (Ingwersen & Christensen, 1997).


  • Björneborn, L. and Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1), p. 65-82.

    Since the mid-1990s has emerged a new research field, webometrics, investigating the nature and properties of the Web drawing on modern informetric methodologies. The article attempts to point to selected areas of webometric research that demonstrate interesting progress and space for development as well as to some currently less promising areas. Recent investigations of search engine coverage and performance are reviewed as a frame for selected quality and content analyses. Problems with measuring Web Impact Factors (Web-IF) are discussed. Concluding the article, new directions of webometrics are outlined for performing knowledge discovery and issue tracking on the Web, partly based on bibliometric methodologies used in bibliographic and citation databases. In this framework graph theoretic approaches, including path analysis, transversal links, 'weak ties' and 'small-world' phenomena are integrated.

  • Ingwersen, P. (2001). Cognitive Information Retrieval. Annual Review of Information Science and Technology, vol. 34, p. 3-52.

    This chapter reviews and discusses critically the development during the last decade of the cognitive approach to information retrieval (IR) research and theory. The focus is analytic and empirical research on the complex nature of information need formation and situation, their inherent association with the concept of relevance, and the development of cognitive and related IR theory and evaluation methods. The time span is largely 1992-2000 (references to earlier works are provided as needed). Thus, the review complements and extends the previous ARIST chapters on cognitive research (ALLEN, 1991) and the user-oriented perspectives of IR research and analysis methods (SUGAR). 
    Since its start in 1977, the cognitive approach to information science has developed in two periods. The first covers 1977-1991 and can briefly be characterized as user- and intermediary-oriented. The second period is 1992-2000 (the major concentration of this chapter), when the approach turns into a holistic view of all the interactive communication processes that occur during information transfer. 
    Following the introduction, the review falls into five major sections. The first section highlights the scientific developments, characteristics, and substantial results of the cognitive approach in the first period. This section also includes drawbacks and criticisms of the approach, that is, the lack of realism, theory integration, and holistic perspective. This is followed by a section covering the development in the second period about the focus shift into a holistic cognitive view, with a subsection on views of information processing. The third section covers information structures, with a subsection on information need. The fourth section looks into the dimensions of cognitive IR theory in a holistic perspective. Subsections concern polyrepresentation of information objects, the cognitive space and IR interactions, and relevance and evaluation issues. This section includes work task conceptions and issues concerned with feedback and query modification. The fifth section approaches the integration of cognitive models of information seeking, IR, and scientific communication, including a discussion of critical issues. A concluding section ends the review.

  • Ingwersen, P. (2001). Users in context. In: Agosti, M., Crestani, F. and Pasi, G. (eds.) Lectures on Information Retrieval. Bonn: Springer Verlag, 2001, p. 157-178 (Lecture Notes in Computer Science: 1980).

    Users as actors in interactive information retrieval (IIR) are seen in the contexts of their perceived work tasks and information seeking behaviour. The paper models IIR processes by demonstrating a variety of approaches, ranging from Ingwersen's cognitive communication model for IR interaction, over Saraceveic' stratifed model which includes a typol- ogy of relevance conceptions, to Borlund's model of work task perception, information need development and relevance assessments. Other associ- ated models and perspectives of IIR are discussed when appropriate to the major focus points of the contribution: information need develop- ment and typology; understanding of relevance in IIR; and experimental problems in IIR.

  • Ingwersen, P., Noyons, E. and Larsen, B.: Mapping national research profiles in social science disciplines. Journal of Documentation, 2001, 57(6), p. 715-740.


    The paper investigates the advantages of graphical mapping of national research publication and citation profiles from scientific fields in order to provide additional information with respect to research performance. By means of multi-dimensional scaling techniques national social science profiles from seventeen OECD countries and two periods, 1989-1993 and 1994-1998, are mapped, each profile represented by a vector of either publication volumes or citation values for nine social science fields. Aside from demonstrating the developments of publication volumes and citedness ranges as well as patterns, the graphical maps display clusters and similarities of national profiles over time. Combined with international rankings of averaged national impact factors (NIF) relative to the average world impact of field (WIF) for the same number of fields and periods, the graphical display supplies additional otherwise concealed information of the differences in research patterns between countries - even when the NIFs are quite similar. The analyses show that low Pearson correlation coefficients can be applied to flag extraordinary instances of either high or low national citation impacts during a period. Most importantly, the graphical maps make a strong case for adjusting or tuning the baseline impact to the actual national publication profiles when comparing NIFs of different countries. A new indicator, the Tuned Citation Impact Index (TCII) is proposed. It is constructed from the amount of expected citations a country ought to have received in each research field aggregated over its true profile. Common baseline profiles, like those of the world or EU, are consequently not regarded as the ideal benchmark. In the case illustrated by the journal publications of the social sciences the paper verifies the hypothesis that a dominant central cluster exists consisting of the large Anglo-American countries: USA, Canada and the UK. A further hypothesis, that the smaller northern EU countries with English as the second language are located together and close to the central cluster on the publication maps is only partly satisfied in the second period. A third hypothesis, that countries located near the central cluster on the citation maps may hold high(er) NIFs is falsified.

  • Larsen, B.: Polyrepræsentation som princip for indeksering og genfinding af videnskabelige fuldtekstdokumenter [Polyrepresentation as principle for indexing and retrieving scientific full text documents]. In: Biblioteksarbejde, 2001, no. 62: 15-26

    The article describes the inspiration and motivation behind the author's PhD project and the methodology to be used. The main purpose of the project is to carry out empirical tests of the principle of polyrepresentation as put forward by Ingwersen in 1996. With the cognitive viewpoint in Information Retrieval as theoretical framework the idea is to work with a large number of representations of the same documents. By utilizing overlaps between these poly-representations it is hoped that the uncertainty and inconsistency inherent in IR can be reduced and better performing IR systems designed. The article outlines the research questions as well as the methodology to be used in the project.

  • Larsen, B.  and Ingwersen, P.: Synchronous and diachronous citation analysis for Information Retrieval - generating a boomerang effect form the network of scientific papers. In: Davis, M. and Wilson, C.S. (eds.), Proceedings of the 8th International Conference on Scientometrics & Informetrics, ISSI2001, Sydney, July 16-20, 2001. Sydney: The University of New South Wales, 2001, Vol. 1: 355-368.

    The paper describes a new set of citation search strategies for Information Retrieval (IR) purposes based on the principle of polyrepresentation (Ingwersen, 1996) exploiting overlaps between a range of representations of the same documents in a structured manner. In contrast to earlier strategies these do not require known relevant documents (seed documents) as starting point, but is based on a subject search. A pilot study is reported where the performance of the strategy is compared to a normal subject search. The proposed strategy does not outperform the regular search, but results are good enough to indicate that it would be interesting to carry out further investigations of the approach.

Theoretical framework

  • Ingwersen, P. and Christensen, F.H. (1997). Data set isolation for bibliometric online analyses of research publications: Fundamental Methodological Issues. Journal of the American Society for Information Science, 48(3), 205-217.

    The aim of the article is to emphasize and illustrate the retrieval dimensions of data collection activity online and their influence on the research evaluation outcome. The attempt is to reinforce the link between online retrieval and bibliometrics. Given that various forms of publication counts and citation analyses provide a valuable and revealing quantitative starting point for more qualitative indications and assessments of Science and Technology (S&T) performance, it is evident that their reliability and objectivity must be undisputed as far as possible. The article discusses the basic problems and limitations inherent in online bibliometric data collection and analyses, and points to possible solutions by means of illustrative case studies and examples. The reason for performing local publication analyses online often arises because of the increased use of external research assessments made by centralized bodies. For small institutions in small countries, like the North European one, such self-analyses may in addition provide valuable and inexpensive insights into novel S&T niches to explore. The major concern is the extent to which online bibliographic and domain dependent databases, as a supplement to the Institute for Scientific Information (ISI) citation files, are suitable for quantitative analysis and mapping of R&D outcome. By merging these two different types of databases into a single cluster, the method of duplicate removal becomes crucial. 
    The article introduces a novel removal procedure by describing and exemplifying the principle of Reversed Duplicate Removal (RDR). RDR enables the analyst to take control of the location of the duplicates and to perform tailored analyses of the overlap of identical documents between files. It is well known that the databases themselves present obstacles directly associated with the process of performing online retrieval of the information necessary for further analysis. Problems encountered are, for instance, poor or inconsistent subject indexing within a single database or among several databases. Name form inconsistencies as to authors, institutions, and journals, the lack or inaccessibility of vital data in the database structures, etc., also present obstacles. On the other hand, comprehensive online bibliometric analyses are in many ways easier, faster, and less expensive to perform locally than those made using the independent CD-ROM versions of the relevant databases. In contrast to the online versions, the CD-ROM systems demonstrate a vital shortage of robust data processing and manipulation facilities. The downloading of records from a variety of CD-ROM files, the cleaning-up process, and the ensuing data processing activities become cumbersome and resource demanding. Regardless of database versioning, the degree of awareness of these retrieval and set isolation factors, such as the relevant search commands, syntax, and the analysis assumptions on the part of the analyst, plays an important role for the quality of the analysis outcome. 

  • Borlund, P. (2000a). Evaluation of interactive information retrieval systems. Åbo: Åbo Akademi University Press, 2000. 276. Doctoral dissertation: Åbo Akademi University.

    The present dissertation work concerns the development of an alternative approach to the evaluation of interactive information retrieval systems (IIR systems). Alternatively, with respect to the experimental Cranfield model which still is the dominating evaluation approach to the evaluation of IR and IIR systems. The three revolutions (the cognitive, the relevance, and the interactive revolution) put forward by Robertson and Hancock-Beaulieu (1992) are used as the framework to explain the current demand for alternative approaches to IIR systems evaluation. The three revolutions point to requirements that are not fulfilled by the Cranfield model. The Cranfield model does not deal with dynamic information needs but treats information needs as a static concept entirely reflected by the user request and search statement. Further, this model uses only binary, topical relevance ignoring the fact that relevance is a multidimensional and dynamic concept. The conclusion is that the batch-driven mode of the Cranfield model is not suitable for the evaluation of IIR systems which, if carried out as realistically as possible, requires human interaction, potentially dynamic information need interpretations, and the assignment of multidimensional and dynamic relevance. 

    The main contribution of the work is the proposal of the three-part package to the evaluation of IIR systems - the so-called 'IIR evaluation package'. The aim of the package is two-fold: 1) to facilitate evaluation of IIR systems as realistically as possible with reference to actual information seeking and retrieval processes, though still in a relatively controlled evaluation environment; and 2) to calculate the IIR system performance taking into account the non-binary nature of the assigned relevance assessments and respecting the existing and different types of relevance. 

    The essential elements of the package are the sub-component of a simulated work task situation and the performance measures of Relative Relevance (RR) and Ranked Half-Life (RHL). The simulated work task situation, which is a short 'cover story', serves two main functions: 1) it triggers and develops a simulated information need by allowing for user interpretations of the situation, leading to cognitively individual information need interpretations as in real life; and 2) it is the platform against which situational relevance is judged. Further, by being the same for all test persons experimental control is provided. As such the concept of a simulated work task situation ensures the experiment both realism and control. The performance measures of RR and RHL are capable of bridging the interpretative distance between the objective and subjective types of relevance involved in the evaluation of IIR systems, as well as managing non-binary relevance assessments. The RR measure bridges horizontally across the applied types of relevance. The RHL indicates the vertical position of the median value of the assigned relevance values for one type of relevance. 

    The package is validated throughout the chapters of the dissertation. This is done analytically in Chapters 2, 3, and 4 with the introductions and discussions of the cognitive viewpoint, which the present dissertation is based upon, the concept of relevance; and the tradition of IR evaluation. Empirically, the concept of a simulated work task situation is tested in Chapter 6, which consequently reports on the applicability of the concept to IIR evaluation. Chapter 7 demonstrates the performance measures of RR and RHL based on data collected for the purposes of Chapter 6. 

    With the package being anchored in the holistic nature of the cognitive viewpoint, as a hybrid of the system-driven and the cognitive user-oriented approaches to IR evaluation, it is seen as a first instance of a cognitive approach to the evaluation of IIR systems.

  • Borlund, P. (2000b). Experimental Components for the evaluation of interactive information retrieval systems. In: Journal of Documentation, Vol. 56, no. 1, 2000, 71-90.

    This paper presents a set of basic components which constitutes the experimental setting intended for the evaluation of interactive information retrieval (IIR) systems the aim of which is to facilitate evaluation of IIR systems in a way which is as close as possible to realistic IR processes. The experimental setting consists of three components: (1) the involvement of potential users as test persons; (2) the application of dynamic and individual information needs; and (3) the use of multidimensional and dynamic relevance judgements. Hidden under the information need component is the essential central sub-component, the simulated work task situation, the tool that triggers the (simulated) dynamic information needs. 
    This paper also reports on the empirical findings of the meta-evaluation of the application of this sub-component the purpose of which is to discover whether the application of the simulated work task situations to future evaluation of IIR systems can be recommended. Investigations are out to determine whether any search behavioural differences exist between test persons' treatment of their own real information needs versus simulated information needs. The hypothesis is that if no difference exists one can correctly substitute real information needs for simulated information needs through the application of simulated work task situations. 
    The empirical results of the meta-evaluation provide positive evidence for the application of simulated work task situations to the evaluation of IIR systems. The results also indicate that tailoring work task situations to the group of test persons is important in motivating them. Furthermore, the results of the evaluation show that different versions of semantic openness of the simulated situations make no difference to the test persons' search treatment.

  • Borlund, P. and Ingwersen, P. (1998). Measures of Relative Relevance and Ranked Half-Life: Performance Indicators for Interactive IR. In: Wilkinson, R., Croft, B., and van Rijsbergen, C., eds. Proceedings of the 21st ACM Sigir Conference on Research and Development of Information Retrieval. Melbourne, 1998. Melbourne: ACM Press. 1998, 324-331.

    This paper introduces the concepts of the relative relevance (RR) measure and a new performance indicator of the positional strength of the retrieved and ranked documents. The former is seen as a measure of associative performance computed by the application of the Jaccard formula. The latter is named the Ranked Half-Life (RHL) indicator and denotes the degree to which relevant documents are located on the top of a ranked retrieval result. The measures are proposed to be applied in addition to the traditional performance parameters such as precision and/or recall in connection with evaluation of interactive IR systems. The RR measure describes the degree of agreement between the types of relevance applied in evaluation of information retrieval (IR) systems in a non-binary assessment context. It is shown that the measure has potential to bridge the gap between subjective and objective relevance, as it makes it possible to understand and interpret the relation between these two main classes of relevance used in interactive IR experiments. The relevance concepts are defined, and the application of the measures is demonstrated by interrelating three types of relevance assessments: algorithmic; intellectual topicality and; situational assessments. Further, the paper shows that for a given set of queries at given precision levels the RHL indicator adds to the understanding of comparisons of IR performance.

  • Borlund, P. and Ingwersen, P. (1997). The development of a method for the evaluation of interactive information retrieval systems. In: Journal of Documentation, Vol. 53, no. 3, 1997, 225-250.

    The paper describes the ideas and assumptions underlying the development of a new method for the evaluation and testing of interactive Information Retrieval (IR) systems, and reports on the initial tests of the proposed method. The method is designed to collect different types of empirical data, i.e. cognitive data as well as traditional systems performance data. The method is based on the novel concept of a 'simulated work task situation' or scenario and the involvement of real end users. The method is also based on a mixture of simulated and real information needs, and involves a group of test persons as well as assessments made by individual panel members. The relevance assessments are made with reference to the concepts of topical as well as situational relevance. The method takes into account the dynamic nature of information needs which are assumed to develop over time for the same user, a variability which is presumed to be strongly connected to the processes of relevance assessment.

Modified 03-05-2004 by DBIT\blar