RSS

Daily Archives: July 6, 2011

fDB: F# and JSON Database on Lucene (and Azure) – Part 2

Let’s move on now to putting luncene and azure work into the project, starting first with a couple callouts on things we will be using.
1. AzureDirectory project form CodePlex.com (if you do not want to be on Azure, replace this class with one of the directory classes that ships with Lucene.Net)
2. We will also use a few helper functions I built in a previous post: http://chrisrizzuto.wordpress.com/2011/06/20/f-to-parse-userids-urls-and-hash-tags-from-text/

The code below writes to the index, AzureDirectory manages accessing the indexes from BlobStorage on Azure.

    let WriteDoc(doc:string, catalog:string) =
        let dir = new AzureDirectory(acct, catalog)
        let analyzer = new StandardAnalyzer()
        let luceneDoc = new Document()
        let jsonDoc = JsonConvert.DeserializeObject<JObject>(doc)

        let getVal(obj:JObject, key:string) =
            try
                let s = obj.[key].ToString()
                s.Substring(1, s.Length-2)
            with
                | exp -> ""
        
        try
            if jsonDoc.Property("body") <> null then
                let (urls, tags, usrs) = m_regex.parseTxtForTokens(getVal(jsonDoc, "body"))
                luceneDoc.Add(new Field("AZIndex.urls", urls.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))       
                luceneDoc.Add(new Field("AZIndex.tags", tags.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))      
                luceneDoc.Add(new Field("AZIndex.usrs", usrs.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))
        with
            | exp -> exp |> ignore

        let newID = getDocID(getVal(jsonDoc, "docType"))
        if jsonDoc.Property("id") = null then
            jsonDoc.["id"] <- JToken.FromObject(newID)

        let fld = new Field("content", doc, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
        
        luceneDoc.Add(fld)

        for prop in jsonDoc.Properties() do
            let fldname = prop.Name
            let fldval = prop.Value.Value<string>()
            let fld = new Field(fldname, fldval, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
            luceneDoc.Add(fld)

        let index = new IndexWriter(dir, analyzer, true)
        index.AddDocument(luceneDoc)
        index.Close()

THe function below accesses the index files and performs a lucene search.

    let query(parms:string, catalog:string) =
        let azDir = new AzureDirectory(acct, catalog)
        let index = new IndexSearcher(azDir, true)
        let parser = new QueryParser("content", new StandardAnalyzer())
        let q = parser.Parse(parms)
        let results:Hits = index.Search(q)
        let sb = new StringBuilder("")

        for x in 0 .. results.Length()-1 do
            sb.Append(results.Doc(x).Get("content") + ",") |> ignore
            
        index.Close()
        let s = sb.ToString()
        
        let output = JsonConvert.DeserializeObject<seq<JObject>>("[" + s.Substring(0, s.Length-1) + "]")
        JsonConvert.SerializeObject(output)    

Now, the nice thing is here we have a mechanism for storing any JSON object, regardless of schema in lucene, and easy predictable way to interface with the index files for searching, and returning JSON. Total project consist of the HttpHandler from Part 1 of this post and the regex functions. Pretty small, and was easily added to an Azure Web Role project.

 
Leave a comment

Posted by on July 6, 2011 in Uncategorized

 
 
Follow

Get every new post delivered to your Inbox.