Apache Lucene
Apache Lucene
Apache Lucene
Lucene has also been used to implement recommendation systems.[13] For example, Lucene's
'MoreLikeThis' Class can generate recommendations for similar documents. In a comparison of the term
vector-based similarity approach of 'MoreLikeThis' with citation-based document similarity measures, such
as co-citation and co-citation proximity analysis, Lucene's approach excelled at recommending documents
with very similar structural characteristics and more narrow relatedness.[14] In contrast, citation-based
document similarity measures tended to be more suitable for recommending more broadly related
documents,[14] meaning citation-based approaches may be more suitable for generating serendipitous
recommendations, as long as documents to be recommended contain in-text citations.
Lucene-based projects
Lucene itself is just an indexing and search library and does not contain crawling and HTML parsing
functionality. However, several projects extend Lucene's capability:
See also
Free and open-
source software
portal
Enterprise search
Information extraction
List of information retrieval libraries
Text mining
References
1. "Welcome to Apache Lucene" (https://lucene.apache.org/). Lucene™ News section.
Archived (https://web.archive.org/web/20210212123326/https://lucene.apache.org/) from the
original on 12 February 2020. Retrieved 12 February 2020.
2. Kamphuis, Chris; de Vries, Arjen P.; Boytsov, Leonid; Lin, Jimmy (2020), Jose, Joemon M.;
Yilmaz, Emine; Magalhães, João; Castells, Pablo (eds.), "Which BM25 Do You Mean? A
Large-Scale Reproducibility Study of Scoring Variants", Advances in Information Retrieval,
Cham: Springer International Publishing, 12036: 28–34, doi:10.1007/978-3-030-45442-5_4
(https://doi.org/10.1007%2F978-3-030-45442-5_4), ISBN 978-3-030-45441-8,
PMC 7148026 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148026)
3. Grand, Adrien; Muir, Robert; Ferenczi, Jim; Lin, Jimmy (2020), Jose, Joemon M.; Yilmaz,
Emine; Magalhães, João; Castells, Pablo (eds.), "From MAXSCORE to Block-Max Wand:
The Story of How Lucene Significantly Improved Query Evaluation Performance", Advances
in Information Retrieval, Cham: Springer International Publishing, 12036: 20–27,
doi:10.1007/978-3-030-45442-5_3 (https://doi.org/10.1007%2F978-3-030-45442-5_3),
ISBN 978-3-030-45441-8, PMC 7148045 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC71
48045)
4. Azzopardi, Leif; Moshfeghi, Yashar; Halvey, Martin; Alkhawaldeh, Rami S.; Balog, Krisztian;
Di Buccio, Emanuele; Ceccarelli, Diego; Fernández-Luna, Juan M.; Hull, Charlie; Mannix,
Jake; Palchowdhury, Sauparna (2017-02-14). "Lucene4IR: Developing Information Retrieval
Evaluation Resources using Lucene" (https://dl.acm.org/doi/10.1145/3053408.3053421).
ACM SIGIR Forum. 50 (2): 58–75. doi:10.1145/3053408.3053421 (https://doi.org/10.1145%2
F3053408.3053421). ISSN 0163-5840 (https://www.worldcat.org/issn/0163-5840).
S2CID 212416159 (https://api.semanticscholar.org/CorpusID:212416159).
5. "LuceneImplementations" (http://wiki.apache.org/lucene-java/LuceneImplementations).
apache.org. Archived (https://web.archive.org/web/20151006021755/http://wiki.apache.org/l
ucene-java/LuceneImplementations) from the original on 6 October 2015. Retrieved
23 September 2015.
6. KeywordAnalyzer "Better Search with Apache Lucene and Solr" (https://web.archive.org/we
b/20120131154001/http://trijug.org/downloads/TriJug-11-07.pdf) (PDF). 19 November 2007.
Archived from the original (http://trijug.org/downloads/TriJug-11-07.pdf) (PDF) on 31 January
2012.
7. Cutting, Doug (2019-06-07). "I wrote a couple of search engines at Xerox PARC, then V-
Twin at Apple, then re-wrote Excite's search, then Lucene. So, Lucene might be considered
V-Twin 3.0? Almost 25 years later, V-Twin still lives on as Mac OS X Search Kit!" (https://twitt
er.com/cutting/status/1137030687003774976). @cutting. Retrieved 2019-06-19.
8. Barker, Deane (2016). Web Content Management. O'Reilly. p. 233. ISBN 978-1491908105.
9. "Apache Lucene - Welcome to Apache Lucene" (https://lucene.apache.org/). apache.org.
Archived (https://web.archive.org/web/20160204002101/https://lucene.apache.org/) from the
original on 4 February 2016. Retrieved 4 February 2016.
10. McCandless, Michael; Hatcher, Erik; Gospodnetić, Otis (2010). Lucene in Action, Second
Edition (https://archive.org/details/luceneactionseco00hatc). Manning. p. 8 (https://archive.or
g/details/luceneactionseco00hatc/page/n46). ISBN 978-1933988177.
11. "GNU/Linux Semantic Storage System" (https://web.archive.org/web/20100601210729/htt
p://www.glscube.org/downloads/glscube_design.pdf) (PDF). glscube.org. Archived from the
original (http://www.glscube.org/downloads/glscube_design.pdf) (PDF) on 2010-06-01.
12. "Apache Lucene - Query Parser Syntax" (https://lucene.apache.org/core/2_9_4/queryparser
syntax.html#Fuzzy+Searches). lucene.apache.org. Archived (https://web.archive.org/web/20
170502011748/http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Fuzzy+Search
es) from the original on 2017-05-02.
13. J. Beel, S. Langer, and B. Gipp, “The Architecture and Datasets of Docear’s Research Paper
Recommender System,” in Proceedings of the 3rd International Workshop on Mining
Scientific Publications (WOSP 2014) at the ACM/IEEE Joint Conference on Digital Libraries
(JCDL 2014), London, UK, 2014
14. M. Schwarzer, M. Schubotz, N. Meuschke, C. Breitinger, V. Markl, and B. Gipp,
https://www.gipp.com/wp-content/papercite-data/pdf/schwarzer2016.pdf "Evaluating Link-
based Recommendations for Wikipedia" in Proceedings of the 16th ACM/IEEE-CS Joint
Conference on Digital Libraries (JCDL), New York, NY, USA, 2016, pp. 191-200.
15. Wayner, Peter. "11 cutting-edge databases worth exploring now" (http://www.infoworld.com/a
rticle/2984469/database/11-cutting-edge-databases-worth-exploring-now.html). InfoWorld.
Archived (https://web.archive.org/web/20150921214828/http://www.infoworld.com/article/29
84469/database/11-cutting-edge-databases-worth-exploring-now.html) from the original on
21 September 2015. Retrieved 21 September 2015.
16. "Elasticsearch: RESTful, Distributed Search & Analytics - Elastic" (https://www.elastic.co/pro
ducts/elasticsearch). elastic.co. Archived (https://web.archive.org/web/20151008055359/http
s://www.elastic.co/products/elasticsearch) from the original on 8 October 2015. Retrieved
23 September 2015.
17. "The Future of Compass & Elasticsearch" (https://web.archive.org/web/20151015021211/htt
p://thedudeabides.com/articles/the_future_of_compass/). the dude abides. Archived from the
original (http://thedudeabides.com/articles/the_future_of_compass/) on 2015-10-15.
Retrieved 2015-10-14.
18. Natividad, Angela. "Socialtext Updates Search, Goes Kino" (http://www.cmswire.com/cms/e
nterprise-20/socialtext-updates-search-goes-kino-001037.php). CMS Wire. Archived (https://
web.archive.org/web/20120929122221/http://www.cmswire.com/cms/enterprise-20/socialtex
t-updates-search-goes-kino-001037.php) from the original on 2012-09-29. Retrieved
2011-05-31.
19. Marvin Humphrey. "KinoSearch - Search engine library. - metacpan.org" (http://p3rl.org/Kino
Search#DESCRIPTION). p3rl.org. Retrieved 23 September 2015.
20. Diment, Kieren; Trout, Matt S (2009). "Catalyst Cookbook". The Definitive Guide to Catalyst
(https://archive.org/details/definitiveguidet00dime_868). Apress. p. 280 (https://archive.org/d
etails/definitiveguidet00dime_868/page/n343). ISBN 978-1-4302-2365-8.
21. Wishart, D. S.; et al. (January 2009). "HMDB: a knowledgebase for the human metabolome"
(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686599). Nucleic Acids Res. 37 (Database
issue): D603–10. doi:10.1093/nar/gkn810 (https://doi.org/10.1093%2Fnar%2Fgkn810).
PMC 2686599 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686599). PMID 18953024
(https://pubmed.ncbi.nlm.nih.gov/18953024).
22. Lim, Emilia; Pon, Allison; Djoumbou, Yannick; Knox, Craig; Shrivastava, Savita; Guo, An
Chi; Neveu, Vanessa; Wishart, David S. (January 2010). "T3DB: a comprehensively
annotated database of common toxins and their targets" (https://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC2808899). Nucleic Acids Res. 38 (Database issue): D781–6.
doi:10.1093/nar/gkp934 (https://doi.org/10.1093%2Fnar%2Fgkp934). PMC 2808899 (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC2808899). PMID 19897546 (https://pubmed.ncbi.nl
m.nih.gov/19897546).
Bibliography
Gospodnetic, Otis; Erik Hatcher; Michael McCandless (28 June 2009). Lucene in Action
(2nd ed.). Manning Publications. ISBN 978-1-9339-8817-7.
Gospodnetic, Otis; Erik Hatcher (1 December 2004). Lucene in Action (1st ed.). Manning
Publications. ISBN 978-1-9323-9428-3.
External links
Official website (https://lucene.apache.org/)