Implementing Paging for SharePoint in Power Automate
One of the many benefits of using Microsoft Power Automate is the ability to process large amounts of data in batches. By automating such tasks, you reduce the risk and effort associated with performing them manually and can focus on more impactful tasks. Many connectors in Power Automate are set up to process large amounts, but some are limited by the APIs they use. In this article, I’ll use an example to show how can you overcome this limitation when you work with large Microsoft SharePoint datasets.
One of the first connectors that was released with Power Automate was for SharePoint. It is no surprise because SharePoint is one of the most popular and heavily used services within the Microsoft 365 ecosystem. The SharePoint connector has paging enabled for many of its actions via the Settings menu, as shown in Figure 1.
However, in some cases, you may need to leverage the Send an HTTP Request action from the SharePoint connector. This action enables you to retrieve more information than the standard actions do, but lacks pagination functionality.
Case Scenario – Auditing OneDrive for Business
To better understand this scenario, see the Auditing Your OneDrive for Business Sharing Using Power Platform (tekkigurus.com) article that I recently published. In that scenario, I needed to scan entire OneDrive for Business sites, which can have tens to hundreds of thousands of documents in them. On a sample OneDrive that contains around 36,000 files, the time to scan all the files and find users with whom files are shared was around 18 hours. Of those files, only 185 had custom permissions. Table 1 shows the overall process:
|Get Files||Iterate through all the files in OneDrive for Business using the Get Files action.|
Use the built-in pagination shown in Figure 1.
|Once explicitly, but it was implicitly called several times to get batches of 5,000 files||~5 min|
|Apply to each||Check each file to see if it has custom permissions.||~36,000||~18 hours|
|Collect information||If file has custom permissions, collect information about it.||Only files that were shared||~18 hours|
Table 1: Tasks and durations for original implementation of OneDrive for Business scanning for custom permissions.
This process worked and falls within the 30-day Power Automate flow execution limit. However, it could only be optimized if the files with custom permissions could be retrieved.
SharePoint REST Call and Pagination
As mentioned earlier, Send an HTTP Request offers a lot of flexibility. You can pick specific properties to retrieve, to reduce the amount of information that needs to be retrieved. You can even collect information that is not available by using the built-in actions. The following REST call retrieves OneDrive for Business shared files only.
https://<tenant>-my.sharepoint.com/personal/<user email>/_api/web/lists/getbytitle ('Documents')/ items? $select=FileLeafRef,HasUniqueRoleAssignments,ID,SharedWith,Title,SharedWithDetails,FileRef,FileDirRef,ServerUrl,DisplayName & $top=1000&$skiptoken=Paged=TRUE%26p_ID=0
The first line indicates that files are to be retrieved from a person’s OneDrive for Business site from the Documents library.
The second line indicates the specific properties to collect. In our case, HasUniqueRoleAssignments is the key property. It is set to true if a file has custom permissions and false otherwise. If only these two rows are used, this REST API call will return the first 100 files.
By adding on the third line, you specify that you want to get 1000 items from this call. The p_ID specifies the ID of the first items. By increasing the p_ID parameter, you can get batches of different files. So, in the first call, you would use p_ID=0. The next time, you would start with p_ID=1000 (assuming the top count is set to 1000), etc.
Putting It All Together
Now that you know what it takes to use the REST API call, you can build a flow to collect the information. The flow uses four variables and a loop. Note that in my flow, I renamed the Send an HTTP Request action to Get Files for better readability of the flow.
The variables are as follows:
|TopCount||Number of files to return in each Send an HTTP Request call||Integer||1000|
|p_ID||ID of the first file in a set of files being retrieved||Integer||0|
|Results||Array of all the file properties I’m retrieving||Array|||
|ResultsCount||Count of number of items retrieved in single loop execution; initially set to 1 so the loop will run at least once||Integer||1|
Table 2: Variables for building the flow.
Inside the loop, the key operations are to retrieve file properties, append the retrieved file properties to the existing Results array, and then increment the various counters, as shown in Figure 3.
The loop has been set up to run until the ResultsCount is 0. This means that once Get Files (in the next section) stops retrieving values, the execution ends. As described earlier, ResultsCount is initially set to 1 so that the loop will run at least once.
This action, as described earlier, does all the heavy lifting, whereby it retrieves the file properties. The TopCount and p_ID values have been replaced by the variables defined. The TopCount is a constant that I set to 1000. The p_ID is initially set to 0.
Unify the Results with the Previous Collected Data
This is a Compose action, which again I renamed for better readability. The action uses the union() PowerFX expression to take the existing Results array values and combine them with the results from the Get Files action. The expression is as follows:
The reason for using a Compose action and not assigning the outcome of this expression directly to the Results variable is that in Power Automate flows it is not possible to reference a variable in an expression that modifies it. In other words, Results cannot be referenced in the action that modifies Results. The final three actions described below can be running in any order because there is no dependency between them.
Increment p_ID by TopCount
As the name suggests, this integer is incremented by 1000 in each iteration of the loop. In the last iteration of the loop, the value of p_ID will be the nearest 1000 below the actual number of files. If there are 3999 files, then the last p_ID will be 3000.
The variable gets set to the number of items that the Get Files action retrieves within an execution of a loop. The number is calculated using the PowerFX expression.
Set the Unified Results
The results of the concatenated arrays (from the Compose action) are assigned to the Results and ultimately returned for processing.
Revisiting the Case Scenario – Auditing OneDrive for Business
With the new workflow, I saw some tremendous speed improvements. The overall execution went from around 18 hours down to just over 17 minutes. This is largely because I let the internals of the SharePoint API do all the heavy lifting by filtering out all the files that didn’t have custom permissions and returning only the 185 files needed. This is an example of how an understanding of the APIs that are wrapped in the connectors can help optimize performance.