دانش انتخاب مجموعه بر اساس بازیابی اطلاعات توزیع شده / Knowledge based collection selection for distributed information retrieval

دانش انتخاب مجموعه بر اساس بازیابی اطلاعات توزیع شده Knowledge based collection selection for distributed information retrieval

  • نوع فایل : کتاب
  • زبان : انگلیسی
  • ناشر : Elsevier
  • چاپ و سال / کشور: 2018

توضیحات

رشته های مرتبط مهندسی کامپیوتر
گرایش های مرتبط مهندسی نرم افزار
مجله پردازش و مدیریت اطلاعات – Information Processing & Management
دانشگاه College of Computer Science and Technology – Zhejiang University – China

منتشر شده در نشریه الزویر
کلمات کلیدی انتخاب مجموعه، بازیابی اطلاعات توزیع شده، پایگاه دانش، توسعه پرس و جو

Description

1. Introduction Distributed Information Retrieval (DIR), also known as Federated Search (FS) or Federated IR (FIR), concerns with aggregating multiple searchable sources of information under a single interface (Crestani & Markov, 2013). DIR consists of four main phases: collection (server/resource) description, collection selection, results merging, and results presentation. Given a query and a set of collection descriptions, collection selection ranks available collections based on their computed scores, then determines which collections to search (Callan, 2002). In a specific search circumstance, users are often interested in top-ranked search results. However, not all collections contain information that users need. If search engine only retrieve a small number of collections and get a similar effect to retrieve all collections, it would significantly enhance the efficiency of retrieval system. Collection selection plays an important role in reducing computational overhead and improving retrieval efficiency. Recent years have seen a great deal of work on collection selection, which can be divided according to the mechanism to describe a collection: dictionary-based methods (Aly, Hiemstra, & Demeester, 2013, Callan, Lu, & Croft, 1995, Gravano & Garcia-Molina, 1995, Xu & Croft, 1999, Yuwono & Lee, 1997) and sampling-based methods (Baillie, Carman, & Crestani, 2011, Kulkarni, Tigelaar, Hiemstra, & Callan, 2012, Mendoza, Marín, Gil-Costa, & Ferrarotti, 2016, Paltoglou, Salampasis, & Satratzemi, 2011, Shokouhi, 2007, Shokouhi, Zobel, Tahaghoghi, & Scholer, 2007, Si & Callan, 2003, Thomas & Shokouhi, 2009, Wauer, Schuster, & Schill, 2011).Dictionary-based methods use the word statistics of all documents as collection description, and then exploit a scoring function to reflect the similarity between a collection and a query. However, it is unfeasible to acquire the word statistics of all collections in an uncollaborative distributed information retrieval environment. Another problem is that the scoring function based on word statistics loses a large amount of semantic information in calculating collection score, e.g., synonym, polysemy, and the order of words. These methods also have a low effectiveness in the environment of skewed collection sizes.
اگر شما نسبت به این اثر یا عنوان محق هستید، لطفا از طریق "بخش تماس با ما" با ما تماس بگیرید و برای اطلاعات بیشتر، صفحه قوانین و مقررات را مطالعه نمایید.

دیدگاه کاربران


لطفا در این قسمت فقط نظر شخصی در مورد این عنوان را وارد نمایید و در صورتیکه مشکلی با دانلود یا استفاده از این فایل دارید در صفحه کاربری تیکت ثبت کنید.

بارگزاری