Monday, January 23, 2006

Convert SAS datasets to Excel

I've recently posted code samples of using VSTO and some other means of getting SAS data into Excel. I thought I would compile a list of various techniques to move data from SAS to Excel.

These are the means (outside of SAS) that I am aware of using as well as pros and cons:

XML (for Office 2000 and better)

Pros

  • Easily written
  • Works pretty fast
  • Excel not required on machine


Cons

  • Creates enormous files. Have to open and then save as
    old Excel using COM to reduce file size
  • Hard to work with the XML model due to its top to right
    formatting. No real random cell access that I can find.

VSTO (Visual Studio Tools for Office)

Pros

  • Built-in support in Visual Studio (ie easy editing)
  • Microsoft 'direction'
  • Does not require Excel

Cons

  • COM based (slow, 1 instance only)
  • Requires Visual Studio
  • Requires coding in a different language
    (VB.NET or C#)
  • Only supported on Windows



VBA in Excel

Pros

  • Well-documented. Fairly easy to use.

Cons

  • VBA will be deprecated, probably in Office 12
    coming next year
  • COM based (see above)
  • Requires Excel to run and build
  • Possibly opens up security concerns
  • Only supported on Windows

.NET using 3rd party tools (Aspose is an example)

Pros

  • Fastest generation method seen. Faster than
    COM by probably 1000x
  • Able to run simultaneous threads
  • Easy to code and edit in Visual Studios due to
    intellisense support
  • Object model is simple and easy to use
  • Random cell access

Cons

  • Requires 3rd party product ($400) plus
    Visual Studios
  • Requires a non-SAS language
  • Only supported on Windows

Next up: web services and SAS data.

Friday, January 13, 2006

SAS and Processes

Most SAS folks that I have dealt think in terms of the language and how to do things easier or 'niftier' using traditional SAS. No issues there. What has intrigued me the most, though, over the past few years is to think in terms of the processes to write SAS code and whether those processes can be made easier. Also, whether the processes can be done outside of SAS and then incorporated. Well, why go out of SAS is the question. Well, oftentimes it is easier to build a GUI or to write an easy to maintain program in another language. For example, build a SAS program that reads in XML using XPath or loop through the process threads on a system looking for a file name. Well, not going to happen easily hence other languages and approaches are needed either before or after it gets into SAS.

On SAS-L today was a debate over macros vs whatever else or whether a user should learn macros early, late, never, sometime. I have nothing against macros but I do have an issue with the idea that SAS code generation is exclusively the domain of macros and SCL. I also don't think users should ever be held back on what they should or should not learn. No bounds in programming is my motto and the point of this blog entry.

Awhile back I wrote a program that was entirely in C# and just generated SAS format code. This provided the users with a great little graphical utility that generated valid format statements from various data sources (http://savian.net/utilities). Why? To do this in SAS would require Access engines, would not be graphically rich, and wouldn't allow the drag and drop stuff that I like.

This is where I think the SAS community can benefit: asking "how do I do this today and is there a better way?". I'm not saying it's right but that it provides an alternative view for how to accomplish a given task. The more we know the more we will embrace macros, embrace perl, embrace C#, embrace wild stuff like LINQ. The more you play, the more you will be able to help the end users.

Ok, I'm heading back to left field now...

Tuesday, January 10, 2006

Approach to Excel and SAS

Oftentimes on SAS-L, the topic of how to integrate with Excel comes up. Most SAS users take very traditional approaches to this and use DDE, ODS, Access engines, etc. I feel, though, that is is best approached from the Microsoft side of the equation. The reason is that you achieve far more control and power over the resulting sheet.

The approach I typically advocate is:

1. Use SAS to get your data in shape.

2. Write a .NET program and use the SAS OleDB provider to read in the dataset. This sample code will turn your SAS dataset into a .NET datatable.

internal DataTable LoadSasDataSet(string sasLibrary, string sasDataSet)
{
DataTable dt = new DataTable();
OleDbConnection sas = new OleDbConnection("Provider=sas.LocalProvider; Data Source=" + sasLibrary);
sas.Open();
OleDbCommand sasCommand = sas.CreateCommand();
sasCommand.CommandType = CommandType.TableDirect;
sasCommand.CommandText = sasDataSet;
OleDbDataReader sasRead = sasCommand.ExecuteReader();
dt.Load(sasRead);
endTime = DateTime.Now;
return dt;
}

3. Download the Aspose Excel .NET model. It can be found at www.aspose.com . While this ultimately will cost around $400, it is worth every penny if you do this a lot.

4. Write your .NET code. There are loads of examples on the Aspose site.


This approach is simply fast, fast, fast and is very powerful. You have full formatting control over the sheet, random access to anywhere on the sheet, and it is simple to code once you get started. Depending on the level of formatting, sheets should be created in less than 1ms plus Excel is not needed to create the sheets.

While learning 2 languages and keeping up on them has its complexities, I think you will find the above to be worth the time. What you will get out is a powerful tool to create Excel worksheets from your SAS data.

Visual Studio Tools for Office (VSTO) and SAS Integration

Software Used

Microsoft Office 2003 SP1
Visual Studios 2005
SAS version 9.13

Excel Project

1. Start Visual Studios 2005 and create a new Office project, specifying Excel
2. Enable VS2005 to allow access to the VBA engine (security)

3. On the right hand side of the screen, right-click ThisWorkbook.cs, select View Code

Here is the complete code listing. Replace the existing code in ThisWorkbook.cs with the following:

using System;
using System.Data;
using System.Data.OleDb;
using System.Drawing;
using System.Windows.Forms;
using Microsoft.VisualStudio.Tools.Applications.Runtime;
using Excel = Microsoft.Office.Interop.Excel;
using Office = Microsoft.Office.Core;

namespace SasToExcel
{
public partial class ThisWorkbook
{
private Excel.Worksheet xlSheet = null;
private Excel.Chart xlChart = null;

//The @ sign in front of a string means ignore special characters
string sasLibrary = @"C:\Program Files\SAS\SAS 9.1\core\sashelp";
string sasDataSet = "retail";

private void ThisWorkbook_Startup(object sender, System.EventArgs e)
{
ResetWorkbook();
DataTable dt = LoadSasDataSet();
CreateWorksheet();
LoadAndFormatData(dt);
CreateChart();
}

private void ThisWorkbook_Shutdown(object sender, System.EventArgs e)
{
}

#region VSTO Designer generated code

///
/// Required method for Designer support - do not modify
/// the contents of this method with the code editor.
///

private void InternalStartup()
{
this.Startup += new System.EventHandler(ThisWorkbook_Startup);
this.Shutdown += new System.EventHandler(ThisWorkbook_Shutdown);
}

#endregion

private void ResetWorkbook()
{
// Get rid of all but the original worksheet.

try
{
ThisApplication.DisplayAlerts = false;

foreach (Excel.Worksheet ws in Globals.ThisWorkbook.Worksheets)
if (ws != ThisApplication.ActiveSheet)
{
ws.Delete();
}
foreach (Excel.Chart cht in Globals.ThisWorkbook.Charts)
cht.Delete();
}
finally
{
ThisApplication.DisplayAlerts = true;
}
}

private DataTable LoadSasDataSet()
{
DataTable dt = new DataTable();
OleDbConnection sas = new OleDbConnection("Provider=sas.LocalProvider; Data Source=" + sasLibrary);
sas.Open();
OleDbCommand sasCommand = sas.CreateCommand();
sasCommand.CommandType = CommandType.TableDirect;
sasCommand.CommandText = sasDataSet;
OleDbDataReader sasRead = sasCommand.ExecuteReader();
dt.Load(sasRead);
return dt;
}

private void CreateWorksheet()
{
xlSheet = (Excel.Worksheet)Globals.ThisWorkbook.Worksheets.Add(Type.Missing, Globals.ThisWorkbook.ActiveSheet,
Type.Missing, Type.Missing);

xlSheet.Name = "Sas Retail Sample";

// Copy field names to Excel.
// Bold the column headings.
Excel.Range rng = (Excel.Range)xlSheet.Cells[1, 1];
rng.Formula = "Year";
rng.Font.Bold = true;

rng = (Excel.Range)xlSheet.Cells[1, 2];
rng.Formula = "Sales";
rng.Font.Bold = true;
}

private void LoadAndFormatData(DataTable dt)
{
int row;
// Copy the data in from the SqlDataReader.
// Start at row 2.
row = 1;
foreach (DataRow dr in dt.Rows)
{
row += 1;
xlSheet.Cells[row, 1] = dr["Year"];
xlSheet.Cells[row, 2] = dr["Sales"];
}

// Format the columns.
((Excel.Range)xlSheet.Columns[1, Type.Missing]).AutoFit();

Excel.Range rng = (Excel.Range)xlSheet.Columns[2, Type.Missing];
rng.NumberFormat = "0.00";
rng.AutoFit();
}

private void CreateChart()
{
// Now create the chart.
Excel.Chart xlChart = (Excel.Chart) Globals.ThisWorkbook.Charts.
Add(Type.Missing, xlSheet, Type.Missing, Type.Missing);

Excel.Range cellRange = (Excel.Range)xlSheet.Cells[1, 1];
xlChart.ChartWizard(cellRange.CurrentRegion,
Excel.Constants.xl3DBar, Type.Missing,
Excel.XlRowCol.xlColumns, 1, 2, false,
xlSheet.Name, Type.Missing, Type.Missing,
Type.Missing);

// Apply some formatting to the chart.
xlChart.Name = xlSheet.Name + " Chart";

Excel.ChartGroup grp = (Excel.ChartGroup)xlChart.ChartGroups(1);
grp.GapWidth = 20;
grp.VaryByCategories = true;
xlChart.ChartTitle.Font.Size = 16;
xlChart.ChartTitle.Shadow = true;
xlChart.ChartTitle.Border.LineStyle = Excel.Constants.xlSolid;
}

}
}

4. Press F5 to build and see the results

Creating Excel spreadsheets using Visual Studio Express for C# and SAS datasets

These steps show you how to make an Excel sheet using the Excel COM object, SAS, and C#. There are other ways to do this but it is a good way to get started.

Here are the steps needed to create the sheets:

Download Visual Studios Express for C# from Microsoft’s website: it is free.

http://lab.msdn.microsoft.com/express/vcsharp/default.aspx


Download and install the OLEDB Providers for SAS from the SAS website:

http://www.sas.com/apps/demosdownloads/oledbproviders_PROD_9.1.3_sysdep.jsp?packageID=000366

Create a directory c:\temp. Copy the shoes dataset from the sashelp files (this is for testing only) and place it in temp. You can find it where sas is installed under core\sashelp.

Once installed, create a new project (Fileà Newà Project). Select console type for now. After you become familiar with the steps, you can build a windows app.


You need to add in a reference to Excel. This basically is telling the project that the Excel classes will be available to your code. Look at the Solution Explorer on the right, right-click References and select Add Reference….Click on the COM tab at the top and then scroll down until you see Microsoft Excel 11.0 Object Library (you may have one slightly different such as 9.0, etc. Just select the Excel library you have).

Paste the following code into the default class that appears:

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.IO;
using System.Text;
// using Excel = Microsoft.Office.Interop.Excel;


namespace SASToExcel
{
class Program
{
static DataTable dt;
static Excel.Application app = new Microsoft.Office.Interop.Excel.Application();
static Excel.Workbook wb;
static Excel.Worksheet ws;
static List regions = new List();

static void Main(string[] args)
{
app.DisplayAlerts = false;
dt = LoadSasDataSet(@"c:\temp", "shoes");
DetermineDistinctValuesForWorksheets();
CreateWorkbooks();
System.Runtime.InteropServices.Marshal.ReleaseComObject(app);
app = null;
}

public static void DetermineDistinctValuesForWorksheets()
{
Excel.Workbooks wbs = app.Workbooks;
Excel.Workbook wb = wbs.Add(Type.Missing);

foreach (DataRow dr in dt.Rows)
{
if (!regions.Contains(dr["region"].ToString()))
{
regions.Add(dr["region"].ToString()) ;
}
}
}

public static void CreateWorkbooks()
{
Excel.Workbooks wbs = app.Workbooks;
wb = wbs.Add(Type.Missing);

foreach (string region in regions)
{
wb.Worksheets.Add(Type.Missing, Type.Missing, Type.Missing, Type.Missing);
ws = (Excel.Worksheet)wb.ActiveSheet;
//Create the worksheet name but clean up invalid values at the same time
ws.Name = region.Replace("/", " ");
Console.WriteLine("Wroksheet added: " + region);
DataRow[] rows = GetSelectedRegion(region);
AddDataToSheets(rows, region);
}

wb.SaveAs(@"c:\temp\SASToExcelSample.xls", Excel.XlFileFormat.xlWorkbookNormal,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Excel.XlSaveAsAccessMode.xlNoChange, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing);

wb.Close(Type.Missing, Type.Missing, Type.Missing);
wbs.Close();
}

static DataRow[] GetSelectedRegion(string region)
{
string selectQuery = "region='" + region + "'";
DataRow[] subset = dt.Select(selectQuery);
return subset;
}

public static void AddDataToSheets(DataRow[] rows, string region)
{
int row = 1 ;
//Write Header records
WriteRow(row, 1, "Product");
WriteRow(row, 2, "Subsidiary");
WriteRow(row, 3, "Stores");
WriteRow(row, 4, "Sales");
row++;
foreach(DataRow dr in rows)
{
WriteRow(row, 1, dr["product"].ToString());
WriteRow(row, 2, dr["subsidiary"].ToString());
WriteRow(row, 3, dr["stores"].ToString());
WriteRow(row, 4, dr["sales"].ToString());
row++;
}
}

public static void WriteRow(int row, int col, string value)
{
Excel.Range rng = (Excel.Range)ws.Cells[row, col];
rng.Value2 = value;
}

public static DataTable LoadSasDataSet(string sasLibrary, string sasDataSet)
{
Console.WriteLine("Loading SAS data at library " + sasLibrary + ", dataset " + sasDataSet);
DataTable dt = new DataTable();
OleDbConnection sas = new OleDbConnection("Provider=sas.LocalProvider; Data Source=" + sasLibrary);
sas.Open();
OleDbCommand sasCommand = sas.CreateCommand();
sasCommand.CommandType = CommandType.TableDirect;
sasCommand.CommandText = sasDataSet;
OleDbDataReader sasRead = sasCommand.ExecuteReader();
dt.Load(sasRead);
Console.WriteLine("Records read from SAS: " + dt.Rows.Count.ToString());
return dt;
}

}
}

7. Press F5 to build and run the code. You will also have a new program called SasToExcel.exe in the project’s bin\debug folder

8. View your new workbook in the temp directory. It is called SasToExcelSample.xls

SAS throwing RPC error

If you are doing code in C#  and get this error when creating a LanguageService: The RPC server is unavailable. (Exception from HRESULT:...