Friday, December 07, 2007

SAS and LINQ - Part 1

So, I am just getting my feet wet with LINQ, the new integrated query language within .NET. There are loads of articles on LINQ elsewhere. What I want to do is to illustrate LINQ with SAS and this is my starting point.

Look at the following code:


using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using System.Windows.Forms;

namespace WindowsFormsApplication2
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

private void btnGetData_Click(object sender, EventArgs e)
{
XElement table =XElement.Parse(
@"<TABLE>
<CLASS>
<Name> Alfred </Name>
<Sex> M </Sex>
<Age> 14 </Age>
<Height> 69 </Height>
<Weight> 112.5 </Weight>
</CLASS>
<CLASS>
<Name> Henry </Name>
<Sex> M </Sex>
<Age> 14 </Age>
<Height> 63.5 </Height>
<Weight> 102.5 </Weight>
</CLASS>
<CLASS>
<Name> James </Name>
<Sex> M </Sex>
<Age> 12 </Age>
<Height> 57.3 </Height>
<Weight> 83 </Weight>
</CLASS>
<CLASS>
<Name> Jeffrey </Name>
<Sex> M </Sex>
<Age> 13 </Age>
<Height> 62.5 </Height>
<Weight> 84 </Weight>
</CLASS>
<CLASS>
<Name> John </Name>
<Sex> M </Sex>
<Age> 12 </Age>
<Height> 59 </Height>
<Weight> 99.5 </Weight>
</CLASS>
<CLASS>
<Name> Philip </Name>
<Sex> M </Sex>
<Age> 16 </Age>
<Height> 72 </Height>
<Weight> 150 </Weight>
</CLASS>
<CLASS>
<Name> Robert </Name>
<Sex> M </Sex>
<Age> 12 </Age>
<Height> 64.8 </Height>
<Weight> 128 </Weight>
</CLASS>
<CLASS>
<Name> Ronald </Name>
<Sex> M </Sex>
<Age> 15 </Age>
<Height> 67 </Height>
<Weight> 133 </Weight>
</CLASS>
<CLASS>
<Name> Thomas </Name>
<Sex> M </Sex>
<Age> 11 </Age>
<Height> 57.5 </Height>
<Weight> 85 </Weight>
</CLASS>
<CLASS>
<Name> William </Name>
<Sex> M </Sex>
<Age> 15 </Age>
<Height> 66.5 </Height>
<Weight> 112 </Weight>
</CLASS>
</TABLE>");


var obs = (from o in table.Elements("CLASS")
where (string)o.Element("Sex") == " M "
select new
{
Name = (string) o.Element("Name"),
Age = (string) o.Element("Age")
}
);
List observations = new List();
foreach (var ob in obs)
{
observations.Add(ob.Name) ;
}
}
}
}



The above is C# code.

Notice that the input data is a SAS XML representation (I am doing it inline right now as a demo. It could easily come from a file)? Notice the SQL-like code toward the bottom? That is LINQ.

One of the oft-used criticisms of low-level languages is that they were fine for a lot of things except processing data. For that, we needed languages such as SAS or SQL. LINQ doesn't substitute for the database but what it does do is enable us to combine data processing with the full-blown power of a language like C#. The syntax isn't as elegant as SAS (by any means) but what it does provide is a way to juice up the power of data processing against SAS data when needed.

SAS is OleDb and ODBC compliant. As of yet, Microsoft has not provided support for either one within LINQ. When it does, I will post examples of using LINQ against a SAS dataset directly rather than via XML.

The above represents a fraction of the power of LINQ and it is in its first iteration. Using the same query language against any data source (SAS, XML, SQL Server, etc.) and then combine that with the power of a language such as C# holds a lot of potential. Power SAS users, especially on the ETL side, should keep an eye on LINQ because it offers up a great way to do data processing.

SAS throwing RPC error

If you are doing code in C#  and get this error when creating a LanguageService: The RPC server is unavailable. (Exception from HRESULT:...