Rails star schema query using vanilla active record

In a recent project I was faced with generated SQL to query a star schema i.e. a fact table with surrounding dimension tables. For the sake of this post  lets presume we have the following tables in PostgreSQL:

create table calendars (
 id serial,
 calendar_date date

create table locations (
 id serial,
 name varchar(100)

create table products (
 id serial,
 name varchar(100)

create table sales (
 id serial,
 calendar_id integer,
 location_id integer,
 product_id integer,
 quantity integer

and the following models setup:

class Calendar < ActiveRecord::Base
  has_many :sales, :foreign_key => "calendar_id"

class Location < ActiveRecord::Base
  has_many :sales, :foreign_key => "location_id"

class Product < ActiveRecord::Base
  has_many :sales, :foreign_key => "product_id"

class Sale < ActiveRecord::Base
  belongs_to :calendar, :foreign_key => "calendar_id"
  has_one :location, :foreign_key => "location_id"
  has_one :product, :foreign_key => "product_id"

now you can query the star schema like so:

@sales = Sale.find(
  :all, :select => "calendars.calendar_date, locations.name, products.name, sales.quantity",
  :joins => [:location, :calendar, :product]

which generates sql like so:

select calendars.calendar_date,
from   "sales"
       inner join "locations"
         on locations.id = sales.location_id
       inner join "calendars"
         on calendars.id = sales.calendar_id
       inner join "products"
         on products.id = sales.product_id

Compressing a file using GZipStream

 string file = @&quot;C:\windows\temp\file.txt&quot;;

using (FileStream inFile = File.OpenRead(file))
    using (FileStream outFile = File.Create(
        Path.GetFullPath(file) + &quot;.gz&quot;))
        using (GZipStream Compress = new GZipStream(outFile,
            byte[] buffer = new byte[4096];
            int numRead;
            while ((numRead = inFile.Read(buffer, 0, buffer.Length)) != 0)
                Compress.Write(buffer, 0, numRead);

SQL Server: Quickly search for objects in all databases

Use the code below to quickly search for database objects in all the instance databases. Just change the @filter text to match your needs.

declare @sqlstr nvarchar(1000)
declare @filter nvarchar(50)

select @filter = '%DTS%'

create table #tbldbobjects
     dbname    sysname,
     objname   varchar(200),
     objtype   char(2),
     objuserid int

select @sqlstr = 'sp_msforeachdb ''IF DB_ID(''''?'''') > 4 
                    Insert into #tblDBObjects 
                    select ''''?'''' as DBName, name, xtype, uid 
                    From ?..sysobjects where name like ''''' + @filter + ''''''''

exec sp_executesql @sqlstr

select *
from   #tbldbobjects
order  by dbname,

drop table #tbldbobjects 

CLR User-Defined Aggregates: Concatenate strings with separator

This is basically the MSDN sample with the addition of an user-supplied separator. For some reason I couldn’t get this to deploy successfully from within visual studio I had to create aggregate manually inside SQL Server (2008).

The code for the aggregate

using System;
using System.Data;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.IO;
using System.Text;

 Format.UserDefined, //use custom serialization to serialize the intermediate result
 IsInvariantToNulls = true, //optimizer property
 IsInvariantToDuplicates = false, //optimizer property
 IsInvariantToOrder = false, //optimizer property
 MaxByteSize = -1) //maximum size in bytes of persisted value
public class Concatenate : IBinarySerialize
 /// <summary>
 /// The variable that holds the intermediate result of the concatenation
 /// </summary>
 private StringBuilder intermediateResult;
 /// <summary>
 /// Initialize the internal data structures
 /// </summary>
 public void Init()
 this.intermediateResult = new StringBuilder();
 /// <summary>
 /// Accumulate the next value, not if the value is null
 /// </summary>
 /// <param name="value"></param>
 /// <param name="separator"></param>
 public void Accumulate(SqlString value, SqlString separator)
 if (value.IsNull)


 /// <summary>
 /// Merge the partially computed aggregate with this aggregate.
 /// </summary>
 /// <param name="other"></param>
 public void Merge(Concatenate other)

 /// <summary>
 /// Called at the end of aggregation, to return the results of the aggregation.
 /// </summary>
 /// <returns></returns>
 public SqlString Terminate()
 string output = string.Empty;
 //delete the trailing comma, if any
 if (this.intermediateResult != null
 && this.intermediateResult.Length > 0)
 output = this.intermediateResult.ToString(0, this.intermediateResult.Length - 1);

 return new SqlString(output);

 public void Read(BinaryReader r)
 intermediateResult = new StringBuilder(r.ReadString());

 public void Write(BinaryWriter w)

The  SQL to create the assembly and aggregate

create assembly SqlClassLibrary
 from N'C:\temp\SqlClassLibrary.dll'
 with permission_set = safe

create aggregate concatenate (
 @value nvarchar(max),
 @separator nvarchar(10)
) returns nvarchar(max)
external name SqlClassLibrary.Concatenate


Sample usage

select module_name, table_name,
 concatenate(column_name, ', ')
from load_file_mapping
group by module_name,table_name

The PDI, the Bad and the SSIS

I was recently tasked with creating a SQL Server Integration Services (SSIS) package to load a flat file that contained records of varying lengths destined for separate tables.  Having just come out of a 16 month project where I was using Pentaho’s PDI toolset for ETL I had happy thoughts thinking I would I have this wrapped up in time for arvo beers.

I was sorely disappointed thought as what should have been a simple task turned out to be laborious mess.  In all fairness to SSIS there may be a better way to do this and if there is please let me know as I failed to find anything.  So back to the statement of work, load a flat file containing records of various lengths into a staging table that has as many columns as the longest record.

Using Pentaho this is a simple pleasant experience. It requires  a transformation with a “CSV file input” step that loads the file which is tab delimited, and table output step that gets the records and loads them into a table . There ya’ go 2 minutes later and I’m done, hell it even scanned the file and suggested datatype’s and generated SQL for the destination table… awesome.

pdi transformation

PDI Transformation

Using SSIS 2008 it turns out it’s not that nice. SSIS has a “Flat File  Source” step that you use to pull in data from flat file types. The only caveat is it doesn’t support files that have different record lengths. At first I thought I could get past this as it was just scanning the first row to get the number of columns, but it turns out it the just carries on eating chars, including the specified record delimiter char(s), until it has enough values for how many columns the first record had, and then just stuffs in leftovers in the last column. What this means is that for records that are “short” it starts eating into the next records values. (I was however pleased to see that Microsoft’s Data Transformations Services (DTS), yes that previous generation product by a few years, handles this scenario perfectly). So how do you this in SSIS well you have read the data into one large string column, then slap on a “Script Component” and define output columns and their data types one-by-one  (I had 220 output cols yay). Then you write a script to split the rows and populate values for all you output columns, and finally a “SQL Server destination” step to load the data.

Needless to say I’m doing my bit to get Penthao PDI in here.

Determining size of PostgreSQL database

Always handy from PostgreSQL Online Journal

How to determine the size of a database on disk

SELECT pg_size_pretty(pg_database_size('somedatabase')) As fulldbsize;

How to determine the size of a database table on disk

NOTE: There are two functions in PostgreSQL – pg_relation_size and pg_total_relation_size. The pg_relation_size just measures the size of the actual table where as the pg_total_relation_size includes both the table and all its toasted tables and indexes.

SELECT pg_size_pretty(pg_total_relation_size('someschema.sometable')) As fulltblsize, pg_size_pretty(pg_relation_size('someschema.sometable')) As justthetblsize;

How to determine the size of a database schema

When you are doing a lot of data loading for later massaging to dump into your production tables, you will often generate junk tables in the process and worse yet other people may generate junk tables period. As a general practice, we like to create a schema just for junk. The reason for that is that it is extremely easy to exclude schemas from being backed up and to load individual schemas. So with this approach we create or move all tables we are not quite sure if they are needed or we only need temporarily into a schema called junk, scratch, possiblejunk or whatever you want to call the schema and exclude it from our backups.

Something of the form:

ALTER TABLE someschema.joeyplaying SET SCHEMA scratch;

After saving the world from junk, you would like to know how much space you have saved your backups from backing up. So you would employ a query something of the form:

SELECT pg_size_pretty(pg_database_size('mycurrentdb')) As fullprod,
pg_size_pretty(CAST(pg_database_size('mycurrentdb') - (SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name)  )
FROM information_schema.tables WHERE table_schema = 'scratch') As bigint)) As  tobebackedup_size,
pg_size_pretty(CAST((SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
FROM information_schema.tables
	WHERE table_schema = 'scratch') As bigint) )  As junk_size;