So, recently I find myself heavily involved with SAP Data Services and data migrations, I know what you’re thinking… that's not real code! And you’re damn straight …. it isn’t, but I have always leaned heavily towards the data end of the spectrum rather than the UX end which is filled with sorcery, witch craft and more importantly users who are like little pixies with no respect for the sanctity of code; data is data though and it doesn’t press buttons marked ‘Nuclear Button, Do Not Press’ just for kicks. I thus don’t complain too much (ahem) when ETL jobs come my way. One of the biggest challenges with ETL in the SAP world is to keep everything down at the database level, it’s very easy to slip up and watch your execution times rattle up to unusable very quickly all because you used a feature for which there is no direct database equivalent and at times it is frustrating at how the simplest functions are not covered; bygones.
The idea for this blog came about when I was trying to establish how to delete (archive) a load of data which is of course easy if you use the Map Operation transform, the issue with that of course is that none of this gets pushed down to the database server. To my knowledge there is currently no way of achieving this (without recourse to the SQL transforms which are of course frowned upon). I needed to think outside the box a little, luckily as I always use intermediary tables to compile my data prior to completion that answer was fairly simple:-
- Compile data as normal in intermediary table
- Mark any candidate records in intermediary table as ‘Archived’ using a database field
- Omit ‘Archived’ records when inserting records from intermediary table into main target table
And that led me into using a feature that (until you know the toolset a little better) appears to be missing, namely how to execute an Update statement (using FULL pushdown) against existing data. Oh sure, again it’s very easy to just add a Map Operation and convert a flow of data into an ‘Update’ but this of course results in a RBAR operation as the BODS engine will iterate over each row of data and then change its op code so that it is an ‘update’ not a ‘insert’ . The end result is that the BODS engine then creates individual update statements and we quickly have a non -performant data transformation; furthermore each row of data will compound the problem and before very long we will have a drawing board to which we will have returned.