共计 16262 个字符,预计需要花费 41 分钟才能阅读完成。
查看了下 solr 生成索引的源码,特记下 (昨天写的,今天看了感觉非常乱,今天特意整理下)
先说下创建索引源码流程 :
源码类:
1.CommonHttpSolrServer (SolrServer 的子类)
2.SolrServer(抽象类)
3.SolrRequest (基类)
4.AbstractUpdateRequest (抽象类、SolrRequest 的子类)
5.UpdateRequest (AbstractUpdateRequest 的子类)
6.SolrInputDocument(设置需要索引的名称和值、这个应该放在第一位)
创建索引代码:
- 查询数据库数据,或者其他文档数据进行索引
private void updateBook(String sql, String url, String idColumn,
String timeColumn,BufferedWriter dataFile) throws Exception {
long start = System.currentTimeMillis();
<SPAN></SPAN> SolrUtil solrUtil = new SolrUtil(url);// 初始化索引
SolrDocument doc = SqlSh.getSolrMaxDoc(solrUtil, idColumn, timeColumn);
if (doc == null) {
CommonLogger.getLogger().error(“solr no data.”);
return;
}
int maxId = Integer.parseInt(doc.get(idColumn).toString());
long maxTime = Long.parseLong(doc.get(timeColumn).toString())*1000;
Date maxDate = new Date(maxTime);
DateFormat dateFormat2 = new SimpleDateFormat(“yyyy-MM-dd HH:mm:ss”);
// 获取数据库需要索引的数据
ResultSet rs = stmt_m.executeQuery(String.format(sql,
dateFormat2.format(maxDate)));
// 获取需要创建索引的 key
initColumeMeta(rs.getMetaData());
// 解析数据并索引
parseRs(rs, solrUtil);
rs.close();
// 优化索引
solrUtil.server.optimize();
CommonLogger.getLogger().info(
“update book time:” + (System.currentTimeMillis() – start)
/ 1000 + “s”);
}
2. 咱们看下上面代码的 parseRs 方法:
// 下面是简单的解析数据方法并写索引
private void parseRs(ResultSet rs, SolrUtil solrUtil) throws <SPAN></SPAN> Exception {Collection<SolrInputDocument> docs=new ArrayList<SolrInputDocument>();
SolrInputDocument doc = null;
int locBk = 0;
boolean flag=true;
StringBuilder sb=null;
String vl=null;
try {while (rs.next()) {doc = new SolrInputDocument();
for (int i = 0; i < ToolMain.columnNames.length; i++) {
doc.addField(ToolMain.columnNames[i],
getColumnValue(rs.getObject(ToolMain.columnNames[i]),
ToolMain.columnTypes[i]));// 此方法为设置一个域,可以添加一个参数来设置权重
}
docs.add(doc);
locBk++;
if (docs.size() >= 1000) {solrUtil.addDocList(docs);// 创建索引和提交索引操作都在这里面
docs.clear();}
}
if (docs.size() > 0) {solrUtil.addDocList(docs);
docs.clear();}
} catch (Exception e) {throw e;} finally {docs.clear();
docs = null;
}
}
更多详情见请继续阅读下一页的精彩内容 :http://www.linuxidc.com/Linux/2013-11/92253p2.htm
Solr 的详细介绍 :请点这里
Solr 的下载地址 :请点这里
相关阅读:
Solr3.6.1 在 Tomcat6 下的环境搭建 http://www.linuxidc.com/Linux/2013-01/77664.htm
基于 Tomcat 的 Solr3.5 集群部署 http://www.linuxidc.com/Linux/2012-12/75297.htm
在 Linux 上使用 Nginx 为 Solr 集群做负载均衡 http://www.linuxidc.com/Linux/2012-12/75257.htm
Linux 下安装使用 Solr http://www.linuxidc.com/Linux/2012-10/72029.htm
在 Ubuntu 12.04 LTS 上通过 Tomcat 部署 Solr 4 http://www.linuxidc.com/Linux/2012-09/71158.htm
Solr 实现 Low Level 查询解析(QParser)http://www.linuxidc.com/Linux/2012-05/59755.htm
基于 Solr 3.5 搭建搜索服务器 http://www.linuxidc.com/Linux/2012-05/59743.htm
Solr 3.5 开发应用教程 PDF 高清版 http://www.linuxidc.com/Linux/2013-10/91048.htm
Solr 4.0 部署实例教程 http://www.linuxidc.com/Linux/2013-10/91041.htm
3. 下面来说明下 SolrUtil 类,此类主要是封装了 CommonHttpSolrServer
import Java.util.Collection;
import log.CommonLogger;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.common.SolrInputDocument;
public class SolrUtil {
public CommonsHttpSolrServer server = null;
public String url = “”;//url 为 solr 服务的地址
public String shards = “”;
public SolrUtil(String url) {
this.url = url;
initSolr();
}
public SolrUtil(String url,String shards) {
this.url = url;
this.shards=shards;
initSolr();
}
// 初始化 Server
private void initSolr() {
try {
server = new CommonsHttpSolrServer(url);
server.setSoTimeout(60*1000);
server.setConnectionTimeout(60*1000);
server.setDefaultMaxConnectionsPerHost(1000);
server.setMaxTotalConnections(1000);
server.setFollowRedirects(false);
server.setAllowCompression(true);
} catch (Exception e) {
e.printStackTrace();
System.exit(-1);
}
}
// 封装了 add、commit
public void addDocList(Collection<SolrInputDocument> docs) {
try {
server.add(docs);
server.commit();
docs.clear();// 释放
} catch (Exception e) {
CommonLogger.getLogger().error(“addDocList error.”, e);
}
}
public void deleteDocByQuery(String query) throws Exception {
try {
server.deleteByQuery(query);
server.commit();
} catch (Exception e) {
CommonLogger.getLogger().error(“deleteDocByQuery error.”, e);
throw e;
}
}
}
4. 现在来看看 solr 创建索引的源码
其实源码执行的操作无非是 生成请求 request 返回 response
1. 上面代码中的 SolrInputDocument 类所做的操作
public class SolrInputDocument implements Map<String,SolrInputField>, Iterable<SolrInputField>, Serializable // 实现了 Map 和 Iterable 的接口并且实现了接口中的方法,其主要的类为 SolrInputFiled 类
public class SolrInputField implements Iterable<Object>, Serializable // 类中只有三个属性,String key,Object value,还包括评分 float boost = 1.0f; 默认是 1.0f(如果做权重的话可以设置这个值)
再来看下执行的 CommonHttpSolrServer 类所做的操作(表现形式在 SolrUtil 中的 addDocList)
2. 添加文档方法
public UpdateResponse add(Collection<SolrInputDocument> docs) throws SolrServerException, IOException {
UpdateRequest req = new UpdateRequest();// 创建一个 request
req.add(docs);// 调用 UpdateRequest 的 add 方法,添加索引文档
return req.process(this);// 亲 重点是这个方法(返回的是 response)
}
// 再看下 UpdateRequest 的 add 方法
private List<SolrInputDocument> documents = null;
public UpdateRequest add(final Collection<SolrInputDocument> docs)
{
if(documents == null) {
documents = new ArrayList<SolrInputDocument>(docs.size()+1 );
}
documents.addAll(docs);
return this;
}
3. 提交方法 commit,调用的是 SolrServer 类中的
public UpdateResponse commit(boolean waitFlush, boolean waitSearcher) throws Solr ServerException, IOException {
return new UpdateRequest().setAction( UpdateRequest.ACTION.COMMIT, waitFlush, waitSearcher).process(this);// 看到了吗?
setAction 都是为了对对象 ModifiableSolrParams(这个对象在最终 CommonHttpSolrServerrequest 的 request 方法中用的到)
在提交索引的时候也是调用的 process 方法
}
查看了下 solr 生成索引的源码,特记下 (昨天写的,今天看了感觉非常乱,今天特意整理下)
先说下创建索引源码流程 :
源码类:
1.CommonHttpSolrServer (SolrServer 的子类)
2.SolrServer(抽象类)
3.SolrRequest (基类)
4.AbstractUpdateRequest (抽象类、SolrRequest 的子类)
5.UpdateRequest (AbstractUpdateRequest 的子类)
6.SolrInputDocument(设置需要索引的名称和值、这个应该放在第一位)
创建索引代码:
- 查询数据库数据,或者其他文档数据进行索引
private void updateBook(String sql, String url, String idColumn,
String timeColumn,BufferedWriter dataFile) throws Exception {
long start = System.currentTimeMillis();
<SPAN></SPAN> SolrUtil solrUtil = new SolrUtil(url);// 初始化索引
SolrDocument doc = SqlSh.getSolrMaxDoc(solrUtil, idColumn, timeColumn);
if (doc == null) {
CommonLogger.getLogger().error(“solr no data.”);
return;
}
int maxId = Integer.parseInt(doc.get(idColumn).toString());
long maxTime = Long.parseLong(doc.get(timeColumn).toString())*1000;
Date maxDate = new Date(maxTime);
DateFormat dateFormat2 = new SimpleDateFormat(“yyyy-MM-dd HH:mm:ss”);
// 获取数据库需要索引的数据
ResultSet rs = stmt_m.executeQuery(String.format(sql,
dateFormat2.format(maxDate)));
// 获取需要创建索引的 key
initColumeMeta(rs.getMetaData());
// 解析数据并索引
parseRs(rs, solrUtil);
rs.close();
// 优化索引
solrUtil.server.optimize();
CommonLogger.getLogger().info(
“update book time:” + (System.currentTimeMillis() – start)
/ 1000 + “s”);
}
2. 咱们看下上面代码的 parseRs 方法:
// 下面是简单的解析数据方法并写索引
private void parseRs(ResultSet rs, SolrUtil solrUtil) throws <SPAN></SPAN> Exception {Collection<SolrInputDocument> docs=new ArrayList<SolrInputDocument>();
SolrInputDocument doc = null;
int locBk = 0;
boolean flag=true;
StringBuilder sb=null;
String vl=null;
try {while (rs.next()) {doc = new SolrInputDocument();
for (int i = 0; i < ToolMain.columnNames.length; i++) {
doc.addField(ToolMain.columnNames[i],
getColumnValue(rs.getObject(ToolMain.columnNames[i]),
ToolMain.columnTypes[i]));// 此方法为设置一个域,可以添加一个参数来设置权重
}
docs.add(doc);
locBk++;
if (docs.size() >= 1000) {solrUtil.addDocList(docs);// 创建索引和提交索引操作都在这里面
docs.clear();}
}
if (docs.size() > 0) {solrUtil.addDocList(docs);
docs.clear();}
} catch (Exception e) {throw e;} finally {docs.clear();
docs = null;
}
}
更多详情见请继续阅读下一页的精彩内容 :http://www.linuxidc.com/Linux/2013-11/92253p2.htm
Solr 的详细介绍 :请点这里
Solr 的下载地址 :请点这里
相关阅读:
Solr3.6.1 在 Tomcat6 下的环境搭建 http://www.linuxidc.com/Linux/2013-01/77664.htm
基于 Tomcat 的 Solr3.5 集群部署 http://www.linuxidc.com/Linux/2012-12/75297.htm
在 Linux 上使用 Nginx 为 Solr 集群做负载均衡 http://www.linuxidc.com/Linux/2012-12/75257.htm
Linux 下安装使用 Solr http://www.linuxidc.com/Linux/2012-10/72029.htm
在 Ubuntu 12.04 LTS 上通过 Tomcat 部署 Solr 4 http://www.linuxidc.com/Linux/2012-09/71158.htm
Solr 实现 Low Level 查询解析(QParser)http://www.linuxidc.com/Linux/2012-05/59755.htm
基于 Solr 3.5 搭建搜索服务器 http://www.linuxidc.com/Linux/2012-05/59743.htm
Solr 3.5 开发应用教程 PDF 高清版 http://www.linuxidc.com/Linux/2013-10/91048.htm
Solr 4.0 部署实例教程 http://www.linuxidc.com/Linux/2013-10/91041.htm
4. 优化索引
public UpdateResponse optimize(boolean waitFlush, boolean waitSearcher, int maxSegments) throws SolrServerException, IOException {
return new UpdateRequest().setAction( UpdateRequest.ACTION.OPTIMIZE, waitFlush, waitSearcher, maxSegments).process(this);// 同样调用 process,通过 setAction 参数,在 CommonHttpSolrServer 类方法 request()中主要执行的是合并和压缩 setAction 都是为了对对象 ModifiableSolrParams(这个对象在最终 CommonHttpSolrServer 的 request 方法中用的到)进行赋值
}
5. 既然上面都提到了 process 方法,那我们来看看
@Override
public UpdateResponse process(SolrServer server) throws SolrServerException, IOException
{
long startTime = System.currentTimeMillis();
UpdateResponse res = new UpdateResponse();
res.setResponse(server.request( this) );// 这里面这个方法可是重点之重啊,这是调用了 CommonHttpSolrServer 类中的 request 方法
res.setElapsedTime(System.currentTimeMillis()-startTime );
return res;
}
6. 最终的方法是 SolrServer 的子类 CommonHttpSolrServer 类的 request 方法,咋再来看看这个方法是怎么工作的
public NamedList<Object> request(final SolrRequest request, ResponseParser processor) throws SolrServerException, IOException {
HttpMethod method = null;
InputStream is = null;
SolrParams params = request.getParams();
Collection<ContentStream> streams = requestWriter.getContentStreams(request);
String path = requestWriter.getPath(request);
// 创建索引进来的是 /update /select 为查询
if(path == null || !path.startsWith( “/”) ) {
path = “/select”;
}
ResponseParser parser = request.getResponseParser();
if(parser == null) {
parser = _parser;
}
// The parser ‘wt=’ and ‘version=’ params are used instead of the original params
ModifiableSolrParams wparams = new ModifiableSolrParams();
wparams.set(CommonParams.WT, parser.getWriterType() );
wparams.set(CommonParams.VERSION, parser.getVersion());
if(params == null) {
params = wparams;
}
else {
params = new DefaultSolrParams(wparams, params);
}
if(_invariantParams != null) {
params = new DefaultSolrParams(_invariantParams, params);
}
int tries = _maxRetries + 1;
try {
while(tries– > 0) {
// Note: since we aren’t do intermittent time keeping
// ourselves, the potential non-timeout latency could be as
// much as tries-times (plus scheduling effects) the given
// timeAllowed.
try {// 通过使用查看 solr 源码,在使用 UpdateRequest 对象时会自动设置为 Post
if(SolrRequest.METHOD.GET == request.getMethod() ) {
if(streams != null) {
<SPAN></SPAN>throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, “GET can’t send streams!”);
}
method = new GetMethod(_baseURL + path + ClientUtils.toQueryString( params, false) );
}
else if(SolrRequest.METHOD.POST == request.getMethod() ) {// 所以我们直接看
String url = _baseURL + path;
boolean isMultipart = (streams != null && streams.size() > 1 );
if (streams == null || isMultipart) {
PostMethod post = new PostMethod(url);// 设置 post,包括 request 头部、内容、参数、等等一些操作
post.getParams().setContentCharset(“UTF-8”);
if (!this.useMultiPartPost && !isMultipart) {
post.addRequestHeader(“Content-Type”,
“application/x-www-form-urlencoded; charset=UTF-8”);
}
List<Part> parts = new LinkedList<Part>();
Iterator<String> iter = params.getParameterNamesIterator();
while (iter.hasNext()) {
String p = iter.next();
String[] vals = params.getParams(p);
if (vals != null) {
for (String v : vals) {
if (this.useMultiPartPost || isMultipart) {
parts.add(new StringPart(p, v, “UTF-8”));
} else {
post.addParameter(p, v);
}
}
}
}
if (isMultipart) {
int i = 0;
for (ContentStream content : streams) {
final ContentStream c = content;
String charSet = null;
PartSource source = new PartSource() {
public long getLength() {
return c.getSize();
}
public String getFileName() {
return c.getName();
}
public InputStream createInputStream() throws IOException {
return c.getStream();
}
};
parts.add(new FilePart(c.getName(), source,
c.getContentType(), charSet));
}
}
if (parts.size() > 0) {
post.setRequestEntity(new MultipartRequestEntity(parts
.toArray(new Part[parts.size()]), post.getParams()));
}
method = post;
}
// It is has one stream, it is the post body, put the params in the URL
else {
String pstr = ClientUtils.toQueryString(params, false);
PostMethod post = new PostMethod(url + pstr);
// Single stream as body
// Using a loop just to get the first one
final ContentStream[] contentStream = new ContentStream[1];
for (ContentStream content : streams) {
contentStream[0] = content;
break;
}
if (contentStream[0] instanceof RequestWriter.LazyContentStream) {
post.setRequestEntity(new RequestEntity() {
public long getContentLength() {
return -1;
}
public String getContentType() {
return contentStream[0].getContentType();
}
public boolean isRepeatable() {
return false;
}
public void writeRequest(OutputStream outputStream) throws IOException {
((RequestWriter.LazyContentStream) contentStream[0]).writeTo(outputStream);
}
}
);
} else {
is = contentStream[0].getStream();
post.setRequestEntity(new InputStreamRequestEntity(is, contentStream[0].getContentType()));
}
method = post;
}
}
else {
throw new SolrServerException(“Unsupported method: “+request.getMethod() );
}
}
catch(NoHttpResponseException r) {
// This is generally safe to retry on
method.releaseConnection();
method = null;
if(is != null) {
is.close();
}
// If out of tries then just rethrow (as normal error).
if(( tries < 1) ) {
throw r;
}
//log.warn(“Caught: ” + r + “. Retrying…”);
}
}
}
catch(IOException ex) {
throw new SolrServerException(“error reading streams”, ex);
}
method.setFollowRedirects(_followRedirects);
method.addRequestHeader(“User-Agent”, AGENT);
if(_allowCompression) {
method.setRequestHeader(new Header( “Accept-Encoding”, “gzip,deflate”) );
}
try {
// Execute the method.
//System.out.println(“EXECUTE:”+method.getURI() );
// 执行请求,返回状态码,然后组装 response 最后返回
int statusCode = _httpClient.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
StringBuilder msg = new StringBuilder();
msg.append(method.getStatusLine().getReasonPhrase());
msg.append(“\n\n”);
msg.append(method.getStatusText() );
msg.append(“\n\n”);
msg.append(“request: “+method.getURI() );
throw new SolrException(statusCode, Java.net.URLDecoder.decode(msg.toString(), “UTF-8”) );
}
// Read the contents
String charset = “UTF-8”;
if(method instanceof HttpMethodBase) {
charset = ((HttpMethodBase)method).getResponseCharSet();
}
InputStream respBody = method.getResponseBodyAsStream();
// Jakarta Commons HTTPClient doesn’t handle any
// compression natively. Handle gzip or deflate
// here if applicable.
if(_allowCompression) {
Header contentEncodingHeader = method.getResponseHeader(“Content-Encoding”);
if(contentEncodingHeader != null) {
String contentEncoding = contentEncodingHeader.getValue();
if(contentEncoding.contains( “gzip”) ) {
//log.debug(“wrapping response in GZIPInputStream”);
respBody = new GZIPInputStream(respBody);
}
else if(contentEncoding.contains( “deflate”) ) {
//log.debug(“wrapping response in InflaterInputStream”);
respBody = new InflaterInputStream(respBody);
}
}
else {
Header contentTypeHeader = method.getResponseHeader(“Content-Type”);
if(contentTypeHeader != null) {
String contentType = contentTypeHeader.getValue();
if(contentType != null) {
if(contentType.startsWith( “application/x-gzip-compressed”) ) {
//log.debug(“wrapping response in GZIPInputStream”);
respBody = new GZIPInputStream(respBody);
}
else if (contentType.startsWith(“application/x-deflate”) ) {
//log.debug(“wrapping response in InflaterInputStream”);
respBody = new InflaterInputStream(respBody);
}
}
}
}
}
return processor.processResponse(respBody, charset);
}
catch (HttpException e) {
throw new SolrServerException(e);
}
catch (IOException e) {
throw new SolrServerException(e);
}
finally {
method.releaseConnection();
if(is != null) {
is.close();
}
}
}
下面是文字说明:
1. 查询数据库或者读取文件等等 按找自己的方式存入 SolrInputDocument 中、SolrInputDocument 中会定义一个 map 来存储(正真的对象是 SolrInputFiled)
2. 初始化 CommonHttpSolrServer,包括服务 url(solr 服务地址)、超时时间、最大链接数等等(SolrUtil 类)
3.SolrServer 类的 add/commit/optimize 方法最终调用的都是 AbstractUpdateRequest 类中的 process 方法