Search This Blog

Loading...

Friday, May 15, 2015

Incremental data loads in Microsoft Power Query for Excel

Power Query

Power Query is a great and flexible tool for getting and transforming data from different sources into Excel or Power Pivot. The standard procedure for Power Query is to read a full table and to replace the destination with the result of the current query. This is exactly what we need for most use cases of Power Query. However, if you like to add new data to existing data you can still use Power Query but you have to follow a slightly different approach.

For example, let’s say we want to create a list of the blog posts while the RSS feed only delivers the most recent posts. In this case we would need to add the the query results to the existing data.

Let’s start with my olap blog. The RSS feed delivers only the last 25 entries of my blog. In order to load this with Power Query, I’m using the load from web function:

image

The dialog asks for the URL (I’m using my blog’s mirror here at http://ms-olap.blogspot.com/feeds/posts/default)

image

If Power Query does not automatically detect this as an XML table you can insert the XML.Tables(…) function as shown below:

image

The items are stored in the last column (entry table) which can be expanded by clicking on image button right between “entry”. For my example I’m only interested in the publication data and the title.

image

By expanding this table we get one row per blog post. In order the get the title, I’m expanding the last column (in my case title.1 since there is already a column title for the title of the blog) to its text value:

image

Finally, I changed the type of the first column to DateTime and I renamed the columns as shown below:

image

The full script generated by Power Query so far looks like this:

let
    Source = Xml.Tables(Web.Contents("
http://ms-olap.blogspot.com/feeds/posts/default")),
    #"Expand entry" = Table.ExpandTableColumn(Source, "entry", {"published", "title"}, {"published", "title.1"}),
    #"Expand title.1" = Table.ExpandTableColumn(#"Expand entry", "title.1", {"Element:Text"}, {"Element:Text"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expand title.1",{"id", "updated", "category", "title", "subtitle", "link", "author", "generator", "
http://a9.com/-/spec/opensearchrss/1.0/"}),
    #"Split Column by Position" = Table.SplitColumn(#"Removed Columns","published",Splitter.SplitTextByPositions({0, 19}, false),{"published.1", "published.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Position",{{"published.1", type datetime}, {"published.2", type text}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Changed Type",{"published.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"published.1", "Date"}, {"Element:Text", "Title"}})
in
    #"Renamed Columns"

We can now load this result to the Workbook by clicking the “Close & Load” button. Now, here comes the interesting part. Up till now, the RSS feed only delivers the last 25 rows, so whenever there is a new post, the oldest post would be removed from the table. Let’s say we’d like to add new posts to the table.

To do so, I’m creating a new query named “Existing data” from the data we just loaded into the workbook:

image

Using the “Save & Load To …” function in the ribbon, we can choose to only create a connection (no data is actually moved here):

image

The workbook queries pane now looks like this:

image

We can now edit the first query (“BlogPosts”) again. In the home tab of the Power Query ribbon you can find the options for combining queries.

image

In my case, I simple decide to append the new data to the existing one. If you have a key to match the same rows, you could also do a merge based on the key column (for example if there are updated rows).

image

The result is that we now have 25 rows from the existing table plus 25 rows recently read from the blog. At this point of time, the entries are identical so we have 25 duplicates. I remove the duplicated rows here using the “Remove duplicates” function.

image

We can now save & load the query again. In order to demonstrate the effect, I’m pasting the remaining blog posts (which cannot be pulled by the RSS feed) into the Excel table:

image

Summary
While full loads usually are the most usefull approach for loading data with Power Query, you can also build incremental load procedures by joining or appending the new data to the existing data.

Sunday, April 12, 2015

Importing data from SAP using the SAP BW Delta Queue and Theobald Software Xtract IS

SQL Server 2005-2014

 

Importing data from SAP is a common task in many BI projects. Since Integration Services (SSIS) doesn’t have out of the box support for dealing with SAP as a data source, third party components should be considered to simplify this task. The components we’re using and which I am writing about here are supplied from Theobald Software and are called Xtract IS. A fully functional trial version is available and the examples in this blog post can be reproduced using this trial version. Xtract IS contains many different options to access a SAP system, for example

  • read table (through rfc_read_table or by using a custom function included in Xtract IS that also supports compression) or sap queries
  • call remote functions (RFCs) and BAPIs
  • Retrieving data from legacy reports
  • Connection to BW via OHS
  • Connection to SAP extractors (aka “delta queue” or “generic data source”)
  • much more…

For this short blog series I’d like to concentrate on the extractor interface which is optimal for reading large tables from SAP. Especially the support for the delta detection is important when loading tables or structures with maybe many billion rows of data. In order to avoid misunderstandings, Theobald’s delta queue component does not require SAP BW to be in place, but it basically acts like BW and uses the same interface to push the data to SSIS in a very efficient way.

Since this post got a little longer, let’s start with an agenda:

 

Agenda

  • How do I get access to a SAP system?
  • Required customizations in SAP to use BW extractors (Delta Queue)
  • How to use existing (pre-defined) extractors in Integration Services
  • Create and use a simple custom extractor based on SAP query
  • Create and use a delta-enabled custom extractor based on SAP query

 

How do I get access to a SAP system?

Before we start, of course we need some access to an existing SAP system. If you have a development system in house the administrators might give you the required access. If not, there are some other options to get started and to create a test environment for connecting SAP to Integration Services. 

Most of the examples I’m using here in this blog post are working fine with the SAP ABAP NetWeaver trial version (so called “mini-SAP”). This edition runs for 90 days and can easily been extended for whatever time is needed. Since the download link changes from time to time it’s best to search for the download using your favorite search engine or to go to the SAP Community website (registration needed) and search for “SAP NetWeaver Trial Version ABAP”. There is an excellent blog series by Manfred Lutz about the ABAP trial version covering the installation process as well as many aspects of configuration. In addition, if you don’t want to install the trial version, I can also recommend having a look at the Consolut website as they’re also offering free of charge access to a mini sap. Until recently they also offered free of charge access to a full SAP IDES system which is somewhat like the AdventureWorks for Microsoft SQL Server (however IDES covers a lot more of real life business processes). Unfortunately they cannot offer this service any longer so you might want to take a look at the SAP offerings:  https://training.sap.com/shop/live-access.

 

Required customizations in SAP to use BW extractors (Delta Queue)

Theobald Software has very detailed instructions on its web site about how to use and configure the SAP system. You can find the detailed step-by-step instructions here. For this post, I exactly followed those steps naming all of the created objects exactly as shown in the documentation.

 

How to use existing (pre-defined) extractors in Integration Services

In SSIS we first need to create a connection to the SAP system. With Theobald Software’s Xtract IS installed, you can add a new connection of type “XTRACT”:

image

SAP has some specific settings that need to be entered for making a connection (for example the client or the system number). The following dialog box is used to enter these settings which are usually provided by your administrator.

t2

In order to use the source component in a data flow, we first have to create a data flow (obviously) and then drag the Xtract DeltaQ component to this data flow:

image

When opening the component’s settings you first start filling out the right four boxes. The search buttons help you to find the correct information.

image

If you’re using the Xtract IS components for the first time or if you are not sure if the customization of the SAP system was correct, it is a good idea to use the “Customization Check” link (marked with the red arrow in the screenshot). This function checks the security settings and customizations of the SAP system and finally gets back with a checklist:

image

If any of the tests fails the message usually gives administrators enough information to fix this. Next task to do is to find the extractor to deliver data to the data flow. If you’ve installed the SAP trial system, not much content is available. But as mentioned above, you could also connect to the Consolut IDES demo system containing a lot of pre-defined extractors. Now, how do you find the extractor for your data? The easiest thing is to search the web for “sap extractor for <<your topic>>”, for example “sap extractor for profit centers”. You could also go to http://help.sap.com directly or check the content available on the Consolut website.

For this demonstration, I’m using one of the profit center transaction extractors. The search function makes it easy to find the extractor by name or description. Please note that if you are using the Mini SAP installation, this extractor is not available so you need to go for one of the existing extractors there.

image

I’m using the first of the extractors in the list (0EC_PCA_1). After hitting ok, we’re back in the main configuration dialog now showing all the fields provided by the extractor:

image

If filters are defined for the columns in the data source/extractor you will notice an “Edit” link right beside the checkbox for including the column in the output. Filters can be defined following the typical way SAP handles filters (include/exclude, cascaded) and package variables may be used for filter values.

Some important settings can be found in the “Settings” dialog (link right below the dialog window header). Pay special attention to the “automatic data type conversion” setting:

image

With automatic type conversion enabled, Xtract IS derives the type correctly for many different columns. You can see this effect on the data flow (meta data):

without automatic type conversion with automatic type conversion
image image

Detailed explanation of all the settings for this dialog can be found on the Theobald Software website, for example here. The setting dialog also allows to name a package variable for passing the delta mode as shown below:

image

Another great feature is the ability to generate a documentation for all fields that are used in the extractor (button “generate documentation”). The output is in Excel format and looks like this:

image

 

Create and use a simple custom extractor based on SAP query

If the data you want to load is not covered by an extractor, you can still use the extractor functions by creating a custom extractor. You can do so by using a SAP View, a Query or by writing custom ABAP code. For this example, I’m using SAP Query to create a query based on the simple flight model data that is included in SAP systems (also in the SAP trial version) for educational purposes.

First we need to create an info set for the query using transaction code SQ02:

t3

After clicking “Create” further information has to be provided:

t4

We reference one of the tables of the flight model (sbook, the table for flight bookings) here. The other tables to join are defined in the next step. If you are unsure which tables are needed, a good starting point is to view the table definition using transaction se11 (ABAP dictionary). On the “Entry help/check” tab you see which tables are used to validate entry values. This is shown in the following dialog:

t5

Also double clicking on the entries in the domain or search help column shows more information on the given structure. For this example, we will at least need the following additional tables:

  • SCARR (airline)
  • SFLIGHT (flight details)
  • SPFLI (flight schedule)

 

From the info set editor you have to click the icon marked in red below in order to add more tables:

t6

For tables that are required more than once (in this case the airport which is needed as deperature and arrival airport) you may create aliases before adding the table to the diagram:

t7

t8

t9

SAP Query tries to automatically detect the links between the tables. However, in my case it didn’t work well, so you have to manually delete and recreate some of the links:

ta

tb

After adjusting the table links, we click on the “-> InfoSet” button to continue with the field selection. Here, the available fields of the query are defined. As a default we choose to start with the key fields here:

td

Next you can choose the fields you like by placing them in the right pane of the InfoSet editor (via drag and drop). Field groups are used mainly to structure fields a little bit:

te

Clicking on check validates the structures. In my case I had a wrong link created in the diagram:

tf

So here is how the full diagram looks like after correcting it:tg

Next thing to do is to activate the InfoSet:

th

We can now create a new extractor (or “generic data source”) by using transaction code rso2. For my example I used “ZTESTHB” as the name for the new extractor. The name will be needed later on to refer to the extractor:

ti

The entry box for the InfoSet becomes available once you hit the “Extraction by Query” button. If our InfoSet is not in the list it might be because of the visibility. In this case, go back to transaction SQ02 and click the “Role/User Group Assignment” button for the InfoSet.

tj

You can now assign the InfoSet to all required user groups. Back in rso2 you may also need an application component which can be created using transaction code rsa6:

tk

You may now save the extractor (aka generic data source). Depending on the fields you’ve chosen, you might get a warning about text fields:

tl

Especially with large tables (the main reason why we’re using the extractor), it’s a good advise to transfer only the keys, not the text fields. Text fields and other descriptive information could be loaded separately (for example using the xtract table component). To remove textfields from the extractor, you can use this menu entry:

tm

When saving the extractor you need to provide a package (or local object) and a workbench request (which you can also create from this dialog).

We’re now ready to use the new extractor from within SSIS. As shown above we need an XTRACT connection the SAP system and an Xtract IS DeltaQ component in the data flow. We can then configure this component:

tq

The data source/extractor should be easy to find using the search function:

tr

I simply select all fields here and activate the source:

ts

A good way to check if everything works fine is the “Preview” functionality:

tt

From within SAP you can use the gateway monitor (transaction code smgw) to see all active connections:

tu

In the data flow the Xtract DeltaQ component has two outgoing connections. One is for the data retrieved, the other is for the log. Writing the log data to a table could also be helpful for tracking errors:

tv

tw

 

Create and use a delta-enabled custom extractor based on SAP query

So far, the data source/extractor generated does not support any delta mechanism. In order to change this we open the InfoSet again (using transaction code sq02) and add the booking data to the output (if it is not already there).

tx

After re-generating the InfoSet we can edit the data source/extractor again (using transaction code rso2). Clicking on “Generic Delta” opens the following dialog to define the delta mechanism:

ty

For the field we select our order_date field that we have just added to the field list. The safety interval results in the an overlapping data delivery so that we’re sure that we don’t miss any rows. However, in this case, each delta contains some rows from the previous batch which means that we need to take care of this in our data flow.

We can now switch the update mode to “C – Delta Initialization” (and I will show how this is done using a package variable in a minute). The next run of the package fetches all rows from the table.

tz

Now switching the update mode to “D – Delta Update” gives no new rows on the next run:

tz1

The request log shows the detailed extraction:

tz2

In order to control the update mode from a package variable we first need to create the variable and then enter the variable name in the “Update Mode Variable” of the setting dialog. Also, don’t forget to change the update mode for the Xtract IS DeltaQ component to “V – SSIS variable” for the magic to happen:

tz4

At the end of this post I’d like to add some more tips on using the delta queue. First, you may test each extractor using then Extractor Checker (transaction code rsa3).

tz5

And delta queue maintenance (transaction code rsa7) may be used to check the current value for the delta field. Also, if something fails because of missing rights, a trace (transaction code ST01) may help (look for failed authority checks) and transaction code SU53 may be used to display the recently failed authority check. And finally, the Delta Queue Monitor (transaction code odqmon) could be helpful to monitor the state of delta enabled extractors.