Connecting to data stored in Excel spreadsheets or other file sources is great for prototyping Power BI dashboards, gaining quick insight and showing others what the tool can do.
There comes a time when point proved, value established, it’s beneficial to automate the data coming into the dashboard, to save the time and potential for errors with manually updating the data.
There are myriad ways to connect to corporate data with Power BI, the most common are illustrated below to give a flavour of the options whether you are wanting to connect direct to source for the first time or looking to overcome current connectivity issues.
From Power BI desktop we can connect to an on-premise database, be it data warehouse / reporting database or operational system database, create reports and dashboard and publish these to the Power BI service.
In order to refresh the data once loaded into the service, Power BI needs a Data Gateway to connect back to the database. This can be installed on an on-premise workstation or server. Queries that feed the dashboard are built as either ‘import’ or ‘direct’. Import queries store a dataset in the Power BI service, which is what gets refreshed from the source. Refreshing the report/dashboard connects to the latest data in this dataset. Direct queries connect through to the source data whenever the report/dashboard is opened. There are pros and cons to both approaches, but for now suffice to say both are available.
Azure SQL is another common location for storage of corporate data. In this case both Power BI Desktop and service connect without the need for a Data Gateway.
There are often inherent problems associated with connecting to operational systems. Complex queries require additional Power BI computer power, operational system performance may be compromised whilst running complex queries. Therefore data is often extracted and transformed into a database specifically designed for reporting. Queries are then simpler as much of the complex logic is pre-processed and this process minimises the impact on the source system, often happening overnight, or if in-day is required at limiting queries. To this end Integration Services is often used, which are batch jobs to extract, transform and load the data often to Azure but equally to another on-premise database (albeit not shown here). These jobs may be run on-premise (push to Azure) or on Azure (pull).