Let’s move on now to putting luncene and azure work into the project, starting first with a couple callouts on things we will be using.
1. AzureDirectory project form CodePlex.com (if you do not want to be on Azure, replace this class with one of the directory classes that ships with Lucene.Net)
2. We will also use a few helper functions I built in a previous post: http://chrisrizzuto.wordpress.com/2011/06/20/f-to-parse-userids-urls-and-hash-tags-from-text/
The code below writes to the index, AzureDirectory manages accessing the indexes from BlobStorage on Azure.
let WriteDoc(doc:string, catalog:string) =
let dir = new AzureDirectory(acct, catalog)
let analyzer = new StandardAnalyzer()
let luceneDoc = new Document()
let jsonDoc = JsonConvert.DeserializeObject<JObject>(doc)
let getVal(obj:JObject, key:string) =
try
let s = obj.[key].ToString()
s.Substring(1, s.Length-2)
with
| exp -> ""
try
if jsonDoc.Property("body") <> null then
let (urls, tags, usrs) = m_regex.parseTxtForTokens(getVal(jsonDoc, "body"))
luceneDoc.Add(new Field("AZIndex.urls", urls.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))
luceneDoc.Add(new Field("AZIndex.tags", tags.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))
luceneDoc.Add(new Field("AZIndex.usrs", usrs.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))
with
| exp -> exp |> ignore
let newID = getDocID(getVal(jsonDoc, "docType"))
if jsonDoc.Property("id") = null then
jsonDoc.["id"] <- JToken.FromObject(newID)
let fld = new Field("content", doc, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
luceneDoc.Add(fld)
for prop in jsonDoc.Properties() do
let fldname = prop.Name
let fldval = prop.Value.Value<string>()
let fld = new Field(fldname, fldval, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
luceneDoc.Add(fld)
let index = new IndexWriter(dir, analyzer, true)
index.AddDocument(luceneDoc)
index.Close()
THe function below accesses the index files and performs a lucene search.
let query(parms:string, catalog:string) =
let azDir = new AzureDirectory(acct, catalog)
let index = new IndexSearcher(azDir, true)
let parser = new QueryParser("content", new StandardAnalyzer())
let q = parser.Parse(parms)
let results:Hits = index.Search(q)
let sb = new StringBuilder("")
for x in 0 .. results.Length()-1 do
sb.Append(results.Doc(x).Get("content") + ",") |> ignore
index.Close()
let s = sb.ToString()
let output = JsonConvert.DeserializeObject<seq<JObject>>("[" + s.Substring(0, s.Length-1) + "]")
JsonConvert.SerializeObject(output)
Now, the nice thing is here we have a mechanism for storing any JSON object, regardless of schema in lucene, and easy predictable way to interface with the index files for searching, and returning JSON. Total project consist of the HttpHandler from Part 1 of this post and the regex functions. Pretty small, and was easily added to an Azure Web Role project.