ML with circles

Mearns Learns

Import incremental data

Is the data that you want to use provided in an incremental or rolling format? For example, the file is provided daily and always contains the last seven days of data.

You want to import the latest file and add it to the previous data. You don’t want to duplicate the overlapping time periods.

Provided that each row of data can be uniquely identified, there are two ways to handle this.

  1. Import the new file, merge it with the existing data and remove the duplicates.
  2. Import all the files, merge them and remove duplicates.

My preference is the first option. You are merging two sets of data, rather that hundreds of files. This article is going to run through the steps involved in the first option.

Import the first data file

In this example, a bank statement file is received monthly. It includes data for the previous three months. The file is in a .csv format.

This first file received is called Bank Statement Current.csv and contains three months of data for January to March. The data looks like this.

Import incremental data Import incremental data

In Excel navigate to the Data menu tab on the Ribbon.
In the Get & Transform Data section, click on the From Text/CSV.
Navigate to where the file is saved, select the file and then click on the Import button.
Click on the Load button.

Power Query will load the file to Excel.

Import incremental data Import incremental data

The following M code will be created.

let
    Source = Csv.Document(File.Contents("C:\Temp\Bank Statement Current.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"Description", type text}, {"Amount", type number}})
in
    #"Changed Type"

In Excel, select a cell in the imported table.
Navigate to the Table Design menu tab on the Ribbon.
Make a note of the Table Name.

Merge existing data with the latest data file

The next step is to modify the query in order to merge the existing data with the latest data file.

Navigate to the Data menu tab on the Ribbon.
Click on the Queries & Connections icon to show the Queries & Connections panel. Double click on the previous query to edit it.

Import incremental data Import incremental data

In the Query Settings panel, right click on the Promoted Headers step, then click on Insert Step After.
Click on the Insert button.
In the Formula bar, replace the reference to the previous step with the following.

= Excel.CurrentWorkbook(){[Name="Table_Name"]}[Content]

Replace the “Table_Name” with the Table Name you noted in the earlier step. This will retrieve the existing table.

Navigate to the Home menu tab on the Ribbon and click on Append Queries. Click on the Insert button.
Select the current query.
Click on the OK button.

Import incremental data Import incremental data

The existing table will be appended with itself.

In the Formula bar, change the generated code from

= Table.Combine({#"Get Existing Data", #"Get Existing Data"})

To the following, in order to reference the earlier step.

= Table.Combine({#"Get Existing Data", #"Promoted Headers"})

Select the last row in the Applied Steps.
In the Table, select all the columns by clicking on the first column, hold down the Shift key and click on the last column.
On the Home menu tab of the Ribbon, use the dropdown on Remove Rows and click on Remove Duplicates.

Alt text Alt text

Click on the Close & Load icon.

Alt text Alt text

The Existing data from the current CSV file will be loaded.

Alt text Alt text

Replace the current CSV file with the latest CSV file.

Alt text Alt text

The file location and name must stay the same as in this example, they are hard coded in the query.

Click on the Refresh icon.

The new data from the file will be added and the data that is no longer in the file will be not be removed.

In the example there is data from January to April, even though the newest file only contains data from February to April.

Alt text Alt text

The earlier data has been preserved.
The incremental refresh will continue to add the latest data from each new file and will not remove data that has already been loaded.