RSS

fDB: F# and JSON Database on Lucene (and Azure) – Part 2

Let’s move on now to putting luncene and azure work into the project, starting first with a couple callouts on things we will be using.
1. AzureDirectory project form CodePlex.com (if you do not want to be on Azure, replace this class with one of the directory classes that ships with Lucene.Net)
2. We will also use a few helper functions I built in a previous post: http://chrisrizzuto.wordpress.com/2011/06/20/f-to-parse-userids-urls-and-hash-tags-from-text/

The code below writes to the index, AzureDirectory manages accessing the indexes from BlobStorage on Azure.

    let WriteDoc(doc:string, catalog:string) =
        let dir = new AzureDirectory(acct, catalog)
        let analyzer = new StandardAnalyzer()
        let luceneDoc = new Document()
        let jsonDoc = JsonConvert.DeserializeObject<JObject>(doc)

        let getVal(obj:JObject, key:string) =
            try
                let s = obj.[key].ToString()
                s.Substring(1, s.Length-2)
            with
                | exp -> ""
        
        try
            if jsonDoc.Property("body") <> null then
                let (urls, tags, usrs) = m_regex.parseTxtForTokens(getVal(jsonDoc, "body"))
                luceneDoc.Add(new Field("AZIndex.urls", urls.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))       
                luceneDoc.Add(new Field("AZIndex.tags", tags.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))      
                luceneDoc.Add(new Field("AZIndex.usrs", usrs.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES))
        with
            | exp -> exp |> ignore

        let newID = getDocID(getVal(jsonDoc, "docType"))
        if jsonDoc.Property("id") = null then
            jsonDoc.["id"] <- JToken.FromObject(newID)

        let fld = new Field("content", doc, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
        
        luceneDoc.Add(fld)

        for prop in jsonDoc.Properties() do
            let fldname = prop.Name
            let fldval = prop.Value.Value<string>()
            let fld = new Field(fldname, fldval, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES)
            luceneDoc.Add(fld)

        let index = new IndexWriter(dir, analyzer, true)
        index.AddDocument(luceneDoc)
        index.Close()

THe function below accesses the index files and performs a lucene search.

    let query(parms:string, catalog:string) =
        let azDir = new AzureDirectory(acct, catalog)
        let index = new IndexSearcher(azDir, true)
        let parser = new QueryParser("content", new StandardAnalyzer())
        let q = parser.Parse(parms)
        let results:Hits = index.Search(q)
        let sb = new StringBuilder("")

        for x in 0 .. results.Length()-1 do
            sb.Append(results.Doc(x).Get("content") + ",") |> ignore
            
        index.Close()
        let s = sb.ToString()
        
        let output = JsonConvert.DeserializeObject<seq<JObject>>("[" + s.Substring(0, s.Length-1) + "]")
        JsonConvert.SerializeObject(output)    

Now, the nice thing is here we have a mechanism for storing any JSON object, regardless of schema in lucene, and easy predictable way to interface with the index files for searching, and returning JSON. Total project consist of the HttpHandler from Part 1 of this post and the regex functions. Pretty small, and was easily added to an Azure Web Role project.

 
Leave a comment

Posted by on July 6, 2011 in Uncategorized

 

fDB: F# and JSON Database on Lucene – Part 1

I am experimenting with F# a bit and decided bring in lucene something I have used extensively in the past and vet through building out on Azure.

Couple of things about the approach.

1. HTTP Interface through a custom HTTP Handler written in F#
2. Wrappers for Lucene to save data, manage index, query
3. Open Source Project on Codeplex for the “AzureDirectory” extention to lucene.

First, let’s take a look at what the input mechnism will be:

http://domain.com/<catalog>/<action>?(<timeout>)<query>

We will need to write a function to take the relevant parts from the HTTP call, decode, and have the data ready for the core services of the platform to execute on. The below method returns a 4 item tuple of; TimeOut, Feature, Action, Parms – to use in the rest of the application.

type HttpHandler() =
    let GetHttpParts(rqst:HttpRequest) =
        let items = rqst.Url.AbsolutePath.Split('/')

        let ftr =
            if items.GetValue(1).ToString().Trim().Length = 0 then
                "index"
            else
                items.GetValue(1).ToString().ToLower()

        let (parms, timeOut) =
            match rqst.HttpMethod with
                | "GET" ->
                    if rqst.Url.Query.Length > 0  && (rqst.Url.Query.StartsWith("?(")=false) then
                        (HttpUtility.UrlDecode(rqst.Url.Query.Substring(1)), 0)
                    else
                        if rqst.Url.Query.StartsWith("?(") then
                            let s = rqst.Url.Query.Substring(2)
                            let pos = s.IndexOf(")")
                            let sTO = s.Substring(0, pos)
                            (HttpUtility.UrlDecode(s.Substring(pos+1)), Int32.Parse(sTO))
                        else
                            ("", 0)
                | _ ->
                    let rdr = new StreamReader(rqst.InputStream)
                    let s = rqst.Url.Query.Substring(1)
                    let pos = s.IndexOf(")")
                    let sTO = s.Substring(1, pos-1)
                    (rdr.ReadToEnd(), Int32.Parse(sTO))

        match items.Length with
            | 2 -> (timeOut, ftr, "", parms)
            | 3 | 4 -> (timeOut, ftr, items.GetValue(2).ToString().ToLower(), parms)
            | _ -> (timeOut, "index", "", parms)

Now we need to start to build out the ProcessRequest method of the HTTP Handler. Keep in mind, we want to enable the timeout to be respected. To do this, I am going to use the async capabilities available in Task Parrallel and the specific F# constructs that exists to wrap the work in an async computation and call RunSynchronously setting the timeout.

   interface IHttpHandler with
        member this.ProcessRequest(ctx:HttpContext) =
            let (timeOut, feature, action, parms) = GetHttpParts(ctx.Request)

            let result =
                let operation = async {
                    let results =
                       // do work here based on the Feature and Action Passed in to the HTTP Call
                    return results
                }
                let result =
                    if timeOut > 0 then
                        try
                            Async.RunSynchronously(operation, timeout=timeOut)
                        with
                            | exp ->
                                "ERROR: TIMEOUT " + timeOut.ToString()
                    else
                        Async.RunSynchronously operation
                result

            ctx.Response.Write(JsonConvert.SerializeObject(result))
            ctx.Response.End()

We will come back and write the code to go do some work based on the Feature and Action passed in. First let’s focus on building out the F# module and classes for the Lucene and Azure work in Part 2.

 
Leave a comment

Posted by on July 5, 2011 in Uncategorized

 

C# “Light Syntax”, F#, lol…

Ok, I saw this on MSDN and started reading…  I actually found it quite humerous :)   Definately check it out: http://www.trelford.com/blog/post/LighterCSharp.aspx

Also…  checkout the comments. Some people I think missed the intent of the post and are busy debugging something that is pretty much clear (IMO) intended to be light hearted dev humor.

 
Leave a comment

Posted by on June 23, 2011 in Software Development

 

F# to Parse UserIDs, Urls, and Hash Tags from Text

Here is a quick set of code to return tuples holding a unique list of twitter style UserIds, Hash Tags, and Urls from a block of text. This also takes advantage of the async workflow constructs in F# and Active Pattern matching.

 
    let (|Matches|_|) (pat:string) (inp:string) =
        let m = Regex.Matches(inp, pat) in
        if m.Count > 0 then
            Some ([ for g in m -> g.Value ])
        else
            None

    let getUrls txt =
        // Regex for URLs
        let linkPat = "(http:\/\/\S+)"
        match txt with
        | Matches linkPat urls -> urls
        | _ -> []

    let getTags txt =
        // Regex for Hash Tags
        let linkPat = "[#]+([A-Za-z0-9-_]+)"
        match txt with
        | Matches linkPat tags -> tags
        | _ -> []

    let getUsers txt =
        // Regex for Users
        let linkPat = "[@]+([A-Za-z0-9-_]+)"
        match txt with
        | Matches linkPat users -> users
        | _ -> []

    let parseTxtForTokens txt =
        let opUrl = async {
                let urls = getUrls txt
                return urls
        }

        let opTags = async {
                let tags = getTags txt
                return tags
        }

        let opUsers = async {
                let usrs = getUsers txt
                return usrs
        }

        let items = Async.Parallel [opUrl; opTags; opUsers] |>  Async.RunSynchronously |> Array.toList
        (items.[0], items.[1], items.[2])

 
Leave a comment

Posted by on June 20, 2011 in Software Development

 

Tags: , , , ,

F# Windows Service Template

Ok, I certainly love F# but templates are lacking out of the gate for some types of apps with one example being a Windows Service.  Now, I saw a few out there folks have made but they actually make the same mistake IMO as seen in the C# windows service.

When it comes down to it, during debugging, I would prefer to have my Window Service actually run as a Console App when I am debugging and only as a true windows service when it is getting the final finishing touches.

See the code below, and note the last two public methods, “InteractiveStart” and “InteractiveStop”.

type AppHost() as this =
    inherit ServiceBase()

    do
        this.ServiceName <- "Some Cool F# Service"
        this.EventLog.Log <- "Application"

    override this.OnStart(args:String[]) =        
                        
        Console.WriteLine("It is working! Yeah!")
        base.OnStart(args)

    override this.OnStop() = 
        base.OnStop()

    member this.InteractiveStart(args:String[]) = 
        this.OnStart(args)

    member this.InteractiveStop() =
        this.OnStop()

 
Next, let’s be sure to add an installer class so when we do want to run this as a true windows service, it is ready to go.
[<RunInstaller(true)>] 
type MyInstaller() as this = 
    inherit Installer() 
    do 
        let spi = new ServiceProcessInstaller() 
        let si = new ServiceInstaller() 
        spi.Account <- ServiceAccount.NetworkService

        si.DisplayName <- "Computing Service" 
        si.StartType <- ServiceStartMode.Automatic 
        si.ServiceName <- "Computing Service"

        this.Installers.Add(spi) |> ignore 
        this.Installers.Add(si) |> ignore

 
Now we will layout the entry point portion of the program. This code will decide if the app should be started as a console app (InteractiveStart) by looking at the Environment.UserInteractive property. If this is true, then it is intended to be run as a console app as opposed to a windows service. YOu can see below, very simple, and makes life a lot easier during debugging early on. I use this same technique in C# as well and works great.
open System
open System.ServiceProcess

module PROGRAM = 
    [<EntryPoint>] 
    let Main(args:String[]) = 
        
        let host = new AppHost()

        if Environment.UserInteractive then            
            host.InteractiveStart(args)
            Console.ReadLine() |> ignore
            host.InteractiveStop()
            0
        else
           ServiceBase.Run(host) 
           0

 
Leave a comment

Posted by on June 20, 2011 in Software Development

 

Tags: , , ,

Http Requests in F# using a TCPClient

Here I am going to show how to use F# to write an HTTP request using the TcpClient classes.  This is nice way to get in a little deeper and be able to tune/optimze further and just implement the bare minimum you need.

First, let’s get the method setup and include the namespaces we will need.  In the method we will create 3 mutable variables to hold the HttpStatus code, header text, and body text – these values will be returned as a tuple at the end of the method.

open System
open System.IO
open System.Net
open System.Net.Sockets
open System.Text
open System.Collections.Generic

module mod_http =

    let SendHTTPCall (url:string, httpMethod:string, userAgent:string, body:string) =
        let mutable httStatusCode = HttpStatusCode.OK
        let mutable headerText = ""
        let mutable bodyText = ""

Now let’s get the code in place to send out the request.  Note the call at the end of this block waiting for “DataAvailable” to become true.
        let URL = new Uri(url)
        let sblog = new StringBuilder()
        use log = new StringWriter(sblog)
        let sb = new StringBuilder()
        sb.AppendFormat("{0} {1}{2} HTTP/1.1", httpMethod, URL.LocalPath, URL.Query) |> ignore
        sb.Append(Environment.NewLine) |> ignore
        sb.AppendFormat("User-Agent: {0}", userAgent) |> ignore
        sb.Append(Environment.NewLine) |> ignore
        sb.AppendFormat("Host: {0}", URL.Host) |> ignore
        sb.Append(Environment.NewLine) |> ignore
        sb.AppendFormat("Connection: {0}", "Close") |> ignore
        sb.Append(Environment.NewLine) |> ignore
        sb.AppendFormat("Date: {0}", DateTime.Now.ToString("r")) |> ignore
        sb.Append(Environment.NewLine) |> ignore     

        sb.Append(Environment.NewLine) |> ignore       

        let message = sb.ToString()
        use client = new TcpClient(URL.Host, URL.Port)
        let stream = client.GetStream()
        let bytes = Encoding.ASCII.GetBytes(message)
        stream.Write(bytes, 0, bytes.Length)
        stream.Flush()        
        let mutable waitCount = 0
        let mutable cndDA = false

        while not stream.DataAvailable do
            ignore()

Now we need to parse out the output and return the results.  Again, this will return the HttpStatusCode and the strings for the header and body text.
        use rdr = new StreamReader(stream)
        let sOut = rdr.ReadToEnd()
        let sbheaders = new StringBuilder()
        use sr = new StringReader(sOut)

        bodyText &lt;- sr.ReadToEnd()

        headerText &lt;- sbheaders.ToString()

        if headerText.StartsWith("HTTP/1.1 200 OK") then
            httStatusCode <- HttpStatusCode.OK;
        else
            httStatusCode <- HttpStatusCode.ServiceUnavailable;

        client.Close()
        (httStatusCode, headerText, bodyText)

Now this is complete and already to start to using it. A lot of the time I would take this approach again is if I have very specific work I am doing over HTTP that would require a certain level of output logging, or specific specific control over the request/response actions that would make it simpler to have this grain of control.

 
2 Comments

Posted by on January 21, 2010 in Software Development

 

Tags: , ,

 
Follow

Get every new post delivered to your Inbox.