分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 网页技术

lucene构建restful风格的简单搜索引擎服务

发布时间:2023-09-06 01:42责任编辑:董明明关键词:暂无标签
???????????????????????????
???????????????????????

来自于本人博客: lucene构建restful风格的简单搜索引擎服务


本人的博客如今也要改成使用lucene进行全文检索的功能,因此在这里把代码贴出来与大家分享

一,文件夹结构:

二,配置文件:

总共同拥有四个配置文件:bonecp-config.xml,IKAnalyzer.cfg.xml,log4j.properties,system-config.xml

1.bonecp-config.xml是配置jdbc连接池用的,不用这个配置也行,bonecp包有默认配置

2.IKAnalyzer.cfg.xml是IKAnalyzer分词要用的字典配置文件

这里也能够不用配置<?xmlversion="1.0"encoding="UTF-8"?

><!DOCTYPEpropertiesSYSTEM"http://java.sun.com/dtd/properties.dtd"><properties><comment>IKAnalyzer扩展配置</comment><!--用户能够在这里配置自己的扩展字典--><entrykey="ext_dict">/data/lucene/dict/1_dict.txt;/data/lucene/dict/2_dict.txt;/data/lucene/dict/3_dict.txt;/data/lucene/dict/4_dict.txt;/data/lucene/dict/5_dict.txt;/data/lucene/dict/6_dict.txt;/data/lucene/dict/7_dict.txt;/data/lucene/dict/8_dict.txt;</entry><!--用户能够在这里配置自己的扩展停止词字典<entrykey="ext_stopwords">/data/lucene/dict/stopword.dic</entry>--></properties>

3.log4j.properties这个不用多说了

4.system-config.xml是一些系统的配置參数


<?xmlversion="1.0"encoding="UTF-8"?><configs><mysql><port>3306</port><user>test</user><password>test</password><partitionCount>6</partitionCount><maxWait>3600</maxWait><driverClass>com.mysql.jdbc.Driver</driverClass><idleMaxAge>1800</idleMaxAge><idleConnectionTestPeriod>300</idleConnectionTestPeriod><host>jdbc:mysql://localhost/blog?

characterEncode=UTF-8</host></mysql><search><!--这里的路径能够自己修改--><indexPath>/data/lucene/index</indexPath><recommendNetIndexPath>/data/lucene/index/recommendNet</recommendNetIndexPath><searcNum>10</searcNum><resultNum>10000</resultNum></search></configs>

三,监听器SystemStartupListener,实现了ServletContextListener

packagecom.blog.listener;importjava.io.File;importjava.net.URL;importjava.sql.SQLException;importjava.util.List;importjavax.servlet.ServletContextEvent;importjavax.servlet.ServletContextListener;importorg.apache.log4j.Logger;importorg.dom4j.Document;importorg.dom4j.DocumentException;importorg.dom4j.Element;importorg.dom4j.io.SAXReader;importcom.blog.db.DBFactory;importcom.blog.search.BlogSearch;importcom.blog.search.index.BlogIndex;publicclassSystemStartupListenerimplementsServletContextListener{privatestaticLoggerlog=Logger.getLogger(SystemStartupListener.class);publicvoidcontextDestroyed(ServletContextEventarg0){DBFactory.shutDown();}publicvoidcontextInitialized(ServletContextEventarg0){SAXReaderreader=newSAXReader();try{URLurl=this.getClass().getClassLoader().getResource("system-config.xml");Stringpath=url.getFile();Documentdoc=reader.read(newFile(path));ElementrootEle=doc.getRootElement();Listlist=rootEle.elements("mysql");if(list.size()>0){ElementmysqlEle=(Element)list.get(0);if(null!=mysqlEle){Stringhost=mysqlEle.elementText("host");Stringport=mysqlEle.elementText("port");Stringuser=mysqlEle.elementText("user");Stringpassword=mysqlEle.elementText("password");IntegerpartitionCount=Integer.parseInt(mysqlEle.elementText("partitionCount"));IntegermaxWait=Integer.parseInt(mysqlEle.elementText("maxWait"));StringdriverClass=mysqlEle.elementText("driverClass");IntegeridleMaxAge=Integer.parseInt(mysqlEle.elementText("idleMaxAge"));IntegeridleConnectionTestPeriod=Integer.parseInt(mysqlEle.elementText("idleConnectionTestPeriod"));DBFactory.init(driverClass,host,user,password,partitionCount,maxWait,idleMaxAge,idleConnectionTestPeriod);}}else{thrownewRuntimeException("初始化失败....");}list=rootEle.elements("search");if(list.size()>0){ElementsearchEle=(Element)list.get(0);StringindexPath=searchEle.elementText("indexPath");//索引文件的存放位置StringsearcNum=searchEle.elementText("searcNum");//一次搜索结果数StringresultNum=searchEle.elementText("resultNum");StringrecommendNetIndexPath=searchEle.elementText("recommendNetIndexPath");System.setProperty("searcNum",searcNum);System.setProperty("resultNum",resultNum);System.setProperty("indexFilePath",indexPath);System.setProperty("recommendNetIndexPath",recommendNetIndexPath);BlogIndex.buildIndex(recommendNetIndexPath);}else{thrownewRuntimeException("初始化失败....");}log.info("初始化搜索.....");BlogSearch.init();}catch(DocumentExceptione){log.error("解析配置文件出错.....",e);}catch(Exceptione){log.error("出现未知错误....",e);}}}

四。util包中的Constant常量类

packagecom.blog.util;publicclassConstant{publicstaticfinalIntegersearcNum=Integer.parseInt(System.getProperty("searcNum"));publicstaticfinalIntegerresultNum=Integer.parseInt(System.getProperty("resultNum"));}

util包中的DataToJson类:

packagecom.blog.util;importjava.util.List;importcom.google.gson.JsonArray;importcom.google.gson.JsonObject;publicclassDataToJson{publicstaticStringparseDataToJson(List<Long>ids,inttotalCount){JsonObjectjson=newJsonObject();json.addProperty("totalCount",totalCount);JsonArrayarray=newJsonArray();if(ids.size()>0){for(Longid:ids){JsonObjectobj=newJsonObject();obj.addProperty("id",id);array.add(obj);}}json.add("data",array);returnjson.toString();}}

五。entity包中的实体类:

Dashboard:

packagecom.blog.search.entity;publicclassDashboard{privateLongid;privateStringcontent;privateStringtitle;publicLonggetId(){returnid;}publicvoidsetId(Longid){this.id=id;}publicStringgetContent(){returncontent;}publicvoidsetContent(Stringcontent){this.content=content;}publicStringgetTitle(){returntitle;}publicvoidsetTitle(Stringtitle){this.title=title;}}

六,lucene相关的索引和检索类:

index包中的BlogIndex:

packagecom.blog.search.index;importjava.io.File;importjava.io.IOException;importorg.apache.log4j.Logger;importorg.apache.lucene.analysis.Analyzer;importorg.apache.lucene.document.Document;importorg.apache.lucene.document.Field;importorg.apache.lucene.document.StringField;importorg.apache.lucene.index.IndexWriter;importorg.apache.lucene.index.IndexWriterConfig;importorg.apache.lucene.index.IndexWriterConfig.OpenMode;importorg.apache.lucene.index.Term;importorg.apache.lucene.store.Directory;importorg.apache.lucene.store.FSDirectory;importorg.apache.lucene.util.Version;importorg.wltea.analyzer.lucene.IKAnalyzer;importcom.blog.search.entity.Dashboard;publicclassBlogIndex{privatestaticfinalStringindexFilePath=System.getProperty("indexFilePath");privatestaticLoggerlog=Logger.getLogger(BlogIndex.class);publicBlogIndex(){}//这种方法在没有索引的时候须要在初始化时调用publicstaticvoidbuildIndex(Stringpath){Filefile=newFile(path);if(file.isDirectory()&&file.listFiles().length==0){Directorydir;try{dir=FSDirectory.open(newFile(path));Analyzeranalyzer=newIKAnalyzer(true);//配置类IndexWriterConfigiwc=newIndexWriterConfig(Version.LUCENE_43,analyzer);iwc.setOpenMode(OpenMode.CREATE);IndexWriterwriter=newIndexWriter(dir,iwc);writer.deleteAll();writer.close();}catch(IOExceptione){//TODOAuto-generatedcatchblocke.printStackTrace();}}}@SuppressWarnings("deprecation")privateDocumentgetDocument(Dashboarddashboard)throwsException{Documentdoc=newDocument();doc.add(newField("title",dashboard.getTitle(),Field.Store.YES,Field.Index.ANALYZED));doc.add(newField("content",dashboard.getContent(),Field.Store.NO,Field.Index.ANALYZED));FieldidField=newStringField("id",dashboard.getId().toString(),Field.Store.YES);doc.add(idField);returndoc;}publicvoidwriteToIndex(Dashboarddashboard)throwsException{Documentdoc=getDocument(dashboard);IndexWriterwriter=null;try{Directorydir=FSDirectory.open(newFile(indexFilePath));//分析器Analyzeranalyzer=newIKAnalyzer(true);//配置类IndexWriterConfigiwc=newIndexWriterConfig(Version.LUCENE_43,analyzer);writer=newIndexWriter(dir,iwc);}catch(Exceptione){e.printStackTrace();}writer.addDocument(doc);writer.commit();writer.close();}publicvoiddeleteIndex(Longid){IndexWriterwriter=null;try{Directorydir=FSDirectory.open(newFile(indexFilePath));Analyzeranalyzer=newIKAnalyzer(true);IndexWriterConfigiwc=newIndexWriterConfig(Version.LUCENE_43,analyzer);writer=newIndexWriter(dir,iwc);writer.deleteDocuments(newTerm("id",id.toString()));writer.commit();}catch(Exceptione){log.error("删除索引出错.....");}finally{if(writer!=null){try{writer.close();}catch(IOExceptione){//TODOAuto-generatedcatchblocke.printStackTrace();}}}}publicvoidupdateIndex(Dashboarddashboard)throwsException{Documentdoc=getDocument(dashboard);IndexWriterwriter=null;try{Directorydir=FSDirectory.open(newFile(indexFilePath));//分析器Analyzeranalyzer=newIKAnalyzer(true);//配置类IndexWriterConfigiwc=newIndexWriterConfig(Version.LUCENE_43,analyzer);//iwc.setOpenMode(OpenMode.CREATE);writer=newIndexWriter(dir,iwc);}catch(Exceptione){e.printStackTrace();}writer.updateDocument(newTerm("id",dashboard.getId().toString()),doc);writer.commit();writer.close();}}

七,search包以下的BlogSearch类:

packagecom.blog.search;importjava.io.File;importjava.io.IOException;importjava.util.ArrayList;importjava.util.List;importjava.util.Map;importjava.util.concurrent.ConcurrentHashMap;importorg.apache.log4j.Logger;importorg.apache.lucene.analysis.Analyzer;importorg.apache.lucene.document.Document;importorg.apache.lucene.index.DirectoryReader;importorg.apache.lucene.index.IndexReader;importorg.apache.lucene.queryparser.classic.MultiFieldQueryParser;importorg.apache.lucene.queryparser.classic.QueryParser;importorg.apache.lucene.queryparser.classic.QueryParser.Operator;importorg.apache.lucene.search.IndexSearcher;importorg.apache.lucene.search.Query;importorg.apache.lucene.search.ScoreDoc;importorg.apache.lucene.search.TopDocs;importorg.apache.lucene.store.FSDirectory;importorg.apache.lucene.util.Version;importorg.wltea.analyzer.lucene.IKAnalyzer;importcom.blog.util.Constant;importcom.blog.util.DataToJson;publicclassBlogSearch{privatestaticLoggerlog=Logger.getLogger(BlogSearch.class);privatestaticfinalStringindexFilePath=System.getProperty("indexFilePath");privatestaticString[]field={"title","content"};privateIndexSearchersearcher;//存储初始化的IndexReader,节省每次又一次打开索引文件的性能开销privatestaticMap<String,IndexReader>readers=newConcurrentHashMap<String,IndexReader>();privatestaticObjectlock=newObject();publicstaticvoidinit(){try{IndexReaderreader=DirectoryReader.open(FSDirectory.open(newFile(indexFilePath)));readers.put("blogsearch",reader);log.info(readers.toString());}catch(IOExceptione){log.error("初始化搜索器出错.......",e);}}publicTopDocssearch(Stringkeyword){try{Analyzeranalyzer=newIKAnalyzer(true);QueryParserparser=newMultiFieldQueryParser(Version.LUCENE_43,field,analyzer);parser.setDefaultOperator(Operator.AND);//将关键字包装成Query对象Queryquery=parser.parse(keyword);//加锁为了防止在一个线程读取IndexReader之后。可是还没有运行查询之前。索引改变了,//导致IndexReader对象被关闭后又一次创建,可能导致关闭异常的问题synchronized(lock){IndexReaderreader=readers.get("blogsearch");IndexReadernewReader=DirectoryReader.openIfChanged((DirectoryReader)reader);if(newReader==null){//假设为空。表示索引没有变化newReader=reader;}else{readers.put("blogsearch",newReader);reader.close();}searcher=newIndexSearcher(newReader);}//newReader=DirectoryReader.open(FSDirectory.open(newFile(indexFilePath)));TopDocsresults=searcher.search(query,Constant.resultNum);returnresults;}catch(Exceptione){log.error("搜索关键字出错......",e);returnnull;}}publicStringgetResult(Stringkeyword,intpageSize){TopDocstd=search(keyword);inttotalCount=td.totalHits;ScoreDoc[]h=td.scoreDocs;List<Long>ids=newArrayList<Long>(h.length);if(h.length==0){&
我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved