RSS

Monthly Archives: July 2011

fDB: F# and JSON Database on Lucene (and Azure) – Part 2

Let’s move on now to putting luncene and azure work into the project, starting first with a couple callouts on things we will be using.
1. AzureDirectory project form CodePlex.com (if you do not want to be on Azure, replace this class with one of the directory classes that ships with Lucene.Net)
2. We will also use a few helper functions I built in a previous post: http://chrisrizzuto.wordpress.com/2011/06/20/f-to-parse-userids-urls-and-hash-tags-from-text/

The code below writes to the index, AzureDirectory manages accessing the indexes from BlobStorage on Azure.

    let WriteDoc(doc:string, catalog:string) =
        let dir = new AzureDirectory(acct, catalog)
        let analyzer = new StandardAnalyzer()
        let luceneDoc = new Document()
        let jsonDoc = JsonConvert.DeserializeObject<JObject>(doc)

        let getVal(obj:JObject, key:string) =
            try
                let s = obj.[key].ToString()
                s.Substring(1, s.Length-2)
            with
                | exp -> ""
        
        try
            if jsonDoc.Property("body") <> null then
                let (urls, tags, usrs) = m_regex.parseTxtForTokens(getVal(jsonDoc, "body"))
                luceneDoc.Add(new Field("AZIndex.urls", urls.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))       
                luceneDoc.Add(new Field("AZIndex.tags", tags.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))      
                luceneDoc.Add(new Field("AZIndex.usrs", usrs.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))
        with
            | exp -> exp |> ignore

        let newID = getDocID(getVal(jsonDoc, "docType"))
        if jsonDoc.Property("id") = null then
            jsonDoc.["id"] <- JToken.FromObject(newID)

        let fld = new Field("content", doc, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
        
        luceneDoc.Add(fld)

        for prop in jsonDoc.Properties() do
            let fldname = prop.Name
            let fldval = prop.Value.Value<string>()
            let fld = new Field(fldname, fldval, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
            luceneDoc.Add(fld)

        let index = new IndexWriter(dir, analyzer, true)
        index.AddDocument(luceneDoc)
        index.Close()

THe function below accesses the index files and performs a lucene search.

    let query(parms:string, catalog:string) =
        let azDir = new AzureDirectory(acct, catalog)
        let index = new IndexSearcher(azDir, true)
        let parser = new QueryParser("content", new StandardAnalyzer())
        let q = parser.Parse(parms)
        let results:Hits = index.Search(q)
        let sb = new StringBuilder("")

        for x in 0 .. results.Length()-1 do
            sb.Append(results.Doc(x).Get("content") + ",") |> ignore
            
        index.Close()
        let s = sb.ToString()
        
        let output = JsonConvert.DeserializeObject<seq<JObject>>("[" + s.Substring(0, s.Length-1) + "]")
        JsonConvert.SerializeObject(output)    

Now, the nice thing is here we have a mechanism for storing any JSON object, regardless of schema in lucene, and easy predictable way to interface with the index files for searching, and returning JSON. Total project consist of the HttpHandler from Part 1 of this post and the regex functions. Pretty small, and was easily added to an Azure Web Role project.

 
Leave a comment

Posted by on July 6, 2011 in Uncategorized

 

fDB: F# and JSON Database on Lucene – Part 1

I am experimenting with F# a bit and decided bring in lucene something I have used extensively in the past and vet through building out on Azure.

Couple of things about the approach.

1. HTTP Interface through a custom HTTP Handler written in F#
2. Wrappers for Lucene to save data, manage index, query
3. Open Source Project on Codeplex for the “AzureDirectory” extention to lucene.

First, let’s take a look at what the input mechnism will be:

http://domain.com/<catalog>/<action>?(<timeout>)<query>

We will need to write a function to take the relevant parts from the HTTP call, decode, and have the data ready for the core services of the platform to execute on. The below method returns a 4 item tuple of; TimeOut, Feature, Action, Parms – to use in the rest of the application.

type HttpHandler() =
    let GetHttpParts(rqst:HttpRequest) =
        let items = rqst.Url.AbsolutePath.Split('/')

        let ftr =
            if items.GetValue(1).ToString().Trim().Length = 0 then
                "index"
            else
                items.GetValue(1).ToString().ToLower()

        let (parms, timeOut) =
            match rqst.HttpMethod with
                | "GET" ->
                    if rqst.Url.Query.Length > 0  && (rqst.Url.Query.StartsWith("?(")=false) then
                        (HttpUtility.UrlDecode(rqst.Url.Query.Substring(1)), 0)
                    else
                        if rqst.Url.Query.StartsWith("?(") then
                            let s = rqst.Url.Query.Substring(2)
                            let pos = s.IndexOf(")")
                            let sTO = s.Substring(0, pos)
                            (HttpUtility.UrlDecode(s.Substring(pos+1)), Int32.Parse(sTO))
                        else
                            ("", 0)
                | _ ->
                    let rdr = new StreamReader(rqst.InputStream)
                    let s = rqst.Url.Query.Substring(1)
                    let pos = s.IndexOf(")")
                    let sTO = s.Substring(1, pos-1)
                    (rdr.ReadToEnd(), Int32.Parse(sTO))

        match items.Length with
            | 2 -> (timeOut, ftr, "", parms)
            | 3 | 4 -> (timeOut, ftr, items.GetValue(2).ToString().ToLower(), parms)
            | _ -> (timeOut, "index", "", parms)

Now we need to start to build out the ProcessRequest method of the HTTP Handler. Keep in mind, we want to enable the timeout to be respected. To do this, I am going to use the async capabilities available in Task Parrallel and the specific F# constructs that exists to wrap the work in an async computation and call RunSynchronously setting the timeout.

   interface IHttpHandler with
        member this.ProcessRequest(ctx:HttpContext) =
            let (timeOut, feature, action, parms) = GetHttpParts(ctx.Request)

            let result =
                let operation = async {
                    let results =
                       // do work here based on the Feature and Action Passed in to the HTTP Call
                    return results
                }
                let result =
                    if timeOut > 0 then
                        try
                            Async.RunSynchronously(operation, timeout=timeOut)
                        with
                            | exp ->
                                "ERROR: TIMEOUT " + timeOut.ToString()
                    else
                        Async.RunSynchronously operation
                result

            ctx.Response.Write(JsonConvert.SerializeObject(result))
            ctx.Response.End()

We will come back and write the code to go do some work based on the Feature and Action passed in. First let’s focus on building out the F# module and classes for the Lucene and Azure work in Part 2.

 
Leave a comment

Posted by on July 5, 2011 in Uncategorized

 
 
Follow

Get every new post delivered to your Inbox.