RSS

fDB: F# and JSON Database on Lucene (and Azure) – Part 2

06 Jul

Let’s move on now to putting luncene and azure work into the project, starting first with a couple callouts on things we will be using.
1. AzureDirectory project form CodePlex.com (if you do not want to be on Azure, replace this class with one of the directory classes that ships with Lucene.Net)
2. We will also use a few helper functions I built in a previous post: http://chrisrizzuto.wordpress.com/2011/06/20/f-to-parse-userids-urls-and-hash-tags-from-text/

The code below writes to the index, AzureDirectory manages accessing the indexes from BlobStorage on Azure.

    let WriteDoc(doc:string, catalog:string) =
        let dir = new AzureDirectory(acct, catalog)
        let analyzer = new StandardAnalyzer()
        let luceneDoc = new Document()
        let jsonDoc = JsonConvert.DeserializeObject<JObject>(doc)

        let getVal(obj:JObject, key:string) =
            try
                let s = obj.[key].ToString()
                s.Substring(1, s.Length-2)
            with
                | exp -> ""
        
        try
            if jsonDoc.Property("body") <> null then
                let (urls, tags, usrs) = m_regex.parseTxtForTokens(getVal(jsonDoc, "body"))
                luceneDoc.Add(new Field("AZIndex.urls", urls.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))       
                luceneDoc.Add(new Field("AZIndex.tags", tags.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))      
                luceneDoc.Add(new Field("AZIndex.usrs", usrs.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))
        with
            | exp -> exp |> ignore

        let newID = getDocID(getVal(jsonDoc, "docType"))
        if jsonDoc.Property("id") = null then
            jsonDoc.["id"] <- JToken.FromObject(newID)

        let fld = new Field("content", doc, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
        
        luceneDoc.Add(fld)

        for prop in jsonDoc.Properties() do
            let fldname = prop.Name
            let fldval = prop.Value.Value<string>()
            let fld = new Field(fldname, fldval, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
            luceneDoc.Add(fld)

        let index = new IndexWriter(dir, analyzer, true)
        index.AddDocument(luceneDoc)
        index.Close()

THe function below accesses the index files and performs a lucene search.

    let query(parms:string, catalog:string) =
        let azDir = new AzureDirectory(acct, catalog)
        let index = new IndexSearcher(azDir, true)
        let parser = new QueryParser("content", new StandardAnalyzer())
        let q = parser.Parse(parms)
        let results:Hits = index.Search(q)
        let sb = new StringBuilder("")

        for x in 0 .. results.Length()-1 do
            sb.Append(results.Doc(x).Get("content") + ",") |> ignore
            
        index.Close()
        let s = sb.ToString()
        
        let output = JsonConvert.DeserializeObject<seq<JObject>>("[" + s.Substring(0, s.Length-1) + "]")
        JsonConvert.SerializeObject(output)    

Now, the nice thing is here we have a mechanism for storing any JSON object, regardless of schema in lucene, and easy predictable way to interface with the index files for searching, and returning JSON. Total project consist of the HttpHandler from Part 1 of this post and the regex functions. Pretty small, and was easily added to an Azure Web Role project.

Advertisement
 

About chrisriz

I love technology. Especially that the promotes value in the form of entertainment and/or education. Also - I love watching UFC, eating healthy, and exercising.
Leave a comment

Posted by on July 6, 2011 in Uncategorized

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.