The PDI, the Bad and the SSIS

I was recently tasked with creating a SQL Server Integration Services (SSIS) package to load a flat file that contained records of varying lengths destined for separate tables.  Having just come out of a 16 month project where I was using Pentaho’s PDI toolset for ETL I had happy thoughts thinking I would I have this wrapped up in time for arvo beers.

I was sorely disappointed thought as what should have been a simple task turned out to be laborious mess.  In all fairness to SSIS there may be a better way to do this and if there is please let me know as I failed to find anything.  So back to the statement of work, load a flat file containing records of various lengths into a staging table that has as many columns as the longest record.

Using Pentaho this is a simple pleasant experience. It requires  a transformation with a “CSV file input” step that loads the file which is tab delimited, and table output step that gets the records and loads them into a table . There ya’ go 2 minutes later and I’m done, hell it even scanned the file and suggested datatype’s and generated SQL for the destination table… awesome.

pdi transformation

PDI Transformation

Using SSIS 2008 it turns out it’s not that nice. SSIS has a “Flat File  Source” step that you use to pull in data from flat file types. The only caveat is it doesn’t support files that have different record lengths. At first I thought I could get past this as it was just scanning the first row to get the number of columns, but it turns out it the just carries on eating chars, including the specified record delimiter char(s), until it has enough values for how many columns the first record had, and then just stuffs in leftovers in the last column. What this means is that for records that are “short” it starts eating into the next records values. (I was however pleased to see that Microsoft’s Data Transformations Services (DTS), yes that previous generation product by a few years, handles this scenario perfectly). So how do you this in SSIS well you have read the data into one large string column, then slap on a “Script Component” and define output columns and their data types one-by-one  (I had 220 output cols yay). Then you write a script to split the rows and populate values for all you output columns, and finally a “SQL Server destination” step to load the data.

Needless to say I’m doing my bit to get Penthao PDI in here.

15 thoughts on “The PDI, the Bad and the SSIS

  1. Hi Angus,

    I wish I had time to try out some other tools – sounds like Pentaho has a lot going for it. However, time is one thing I never seem to have – except to mess around with MSFT stuff 🙂 And SSIS does have a learning curve, for sure.

    SSIS doesn’t provide a great experience working with flat files – and the team doesn’t seem to get enough funding to help fix that. But luckily, some of the same open source motives are at work on SSIS as well – because it’s extensible, and a lot of people have taken advantage of that to help patch functionality holes.

    For your case in particular, there are two open source projects on CodePlex: the Delimited File Reader Source and the Delimited File Source. Either will handle the scenario you’re describing. Check out a crapload of other extensions for SSIS at http://ssisctc.codeplex.com/

    Cheers!

  2. hi angus,
    I don’t think you did this right. In ssis you define the delimiter and end of line character and it works pretty much like pentaho. It does not guess the data type like pentaho does and makes everything varchar though. But you don’t have to script a line of code for parsing fields for structured files – variable length or fixed length.
    I do think pentaho is superior to ssis for text file processing with more features than ssis. I see this is an old post and you probably already are an ssis expert. I am just starting to use pentaho and really like it after few years of ssis.

  3. Well I have good news, there have been many advances made in the
    past few years especially when it comes to getting close to an acne cure.
    Though there is no perfect health care therapy for this affliction, Tinnitus-Miracle critique demonstrates that this method can remedy Tinnitus in a natural way.
    As I said before, you can apply it topically to treat
    gums, teeth, itches, infections on the skin, and
    just about anywhere in the body.

  4. Whatever you look for be it blinds curtains you will find everything in these stores.
    Planning and careful consideration of the entire flow of your salon or spa will impact the
    experience for your clients. You may be able to land a great deal that will
    allow you to save a significant amount of money.

  5. Several years went by, each more rapid than the previous one.
    They have also been used as a popular traditional Indian medicine that is extensively used to treat several kinds of diseases.
    The next in the 20 greatest moments in sports history happened
    in the Mexico Olympics in 1968, when Bob Beamon created
    history by bettering the previous long jump record by a
    whopping two feet.

  6. It is a good idea to join online groups that
    interest you, but know that what you choose will be judged by your public.
    One thing you can do is employ free and low cost online
    marketing tactics in your plan. While activism is a great way to
    ensure control on authority, at times it can be misused
    by power hungry activists. Article Source: Michelle is the chief web designer at
    Chloe Design ( a Singapore-based design house specializing
    in E-commerce Web Design and Corporate Web
    Development, Search Engine Optimization and Graphics and Logo Design. To exploit
    the business opportunities Facebook offers, first create a company profile and a company network.
    Social media networking working sites have put an ending to
    fan sites because social media sites assist
    the fans to get access to their favorite fans and they will even be
    able to interact with one another. I’m a big believer in your
    ability to hear your inner voice too. Instagram offers you
    the ability to remove a deceased member’s account by submitting proof much like Facebook and Twitter.
    This is one small investment you could make as a
    marketer and a business men. long-term passive
    income businesses need social media strategies.

  7. Increase your business contacts and enhance personal relationships.
    All you need to do is to set it up and you are ready to market your business.
    Cost – Social media marketing is extremely cost-effective when compared to other marketing methods.
    Develop and deliver valuable content for your target audience.
    Go out and listen to people who you’re trying to reach.
    I didn’t really have time to call so I just tweeted
    them asking if it was possible to get the driver to come early,
    if not I would have to delay my parcel (which I needed that day).
    The Inactive Account manager lets you specify
    a period of time after which your account will be marked inactive.
    That will help you get attention from people all over the world and make
    more money. It impresses people that you take the time to actually connect one on one.
    A speakerphone is also provided which ensures hands-free communication if
    desired.

  8. We will provide you free quotation budgetary rates so that you may have an idea about
    the cost to be incurred on your planned masonry works.

    However, screws should be countersunk and covered to create a neat exterior appearance.
    Another potential problem with refacing kitchen cabinets is that you will
    not be changing the kitchen layout.

  9. Moving day can feel more like chaos than order, especially when there are half a dozen movers running
    around your home, packing up your belongings, and carrying out your
    furniture. Narrow down your choices to a maximum of three companies
    based on their answers to your questions and your gut feeling.
    Again, the credibility and market reputation of Indianapolis movers
    ensure that everything, from packing to transportation, will be simple and easy.

  10. Hey very cool web site!! Guy .. Excellent .. Superb ..
    I will bookmark your blog and take the feeds additionally?
    I am satisfied to seek out so many helpful info
    right here in the post, we’d like work out more strategies on this regard,
    thanks for sharing. . . . . .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s