• Home
  • Combating the pain of Surface Automation

Combating the pain of Surface Automation

Throughout the various projects I have worked on, the need to automate Windows applications over Citrix or Microsoft Remote Desktop (RDS) sessions has arisen many times, typically because:

  • The customer hosts certain applications on Citrix and there is no option for running those applications directly on the RPA worker. This is very common with larger organisations who host applications in multiple countries or continents.

  • The application is only available in a secure environment, requiring you to connect to a remote desktop before running the application.

  • Automation is from a cloud-based RPA platform such as Blue Prism Cloud where the customer has not permitted a VPN to on-premises networks or for the virtual workers to join the customer’s on-premises Active Directory domain and therefore the applications cannot be run locally.

  • In Joiners / Movers / Leavers (JML) processes where the required tools to perform the automations are stored on jump boxes or dedicated computers, requiring you to connect to a remote desktop before running the tools.

When automating over a Citrix or Microsoft RDS session using RPA software such as Blue Prism, native Win32, MSAA, UIA and Web application inspection tools are not available as these APIs cannot be utilised over a remote session – The RPA software is therefore restricted to viewing the graphics sent from the remote session.

RPA developers must therefore resort to using Surface Automation which interacts with remote applications by using a variety of techniques:

  1. Detecting graphics in the remote application (to use as an anchor for locating other elements).
  2. Using Optical Character Recognition (OCR) technologies to read words and numbers.
  3. Moving the mouse to coordinates and pressing left or right click.
  4. Using keypresses.

This is far less reliable and slower than native techniques as I will demonstrate in this blog.

IA-Connect solves this challenge by passing instructions from the RPA worker to an IA-Connect Agent which runs in the remote desktop Citrix or Microsoft session. The IA-Connect Agent leverages Windows API to interact with the remote applications in a similar way to native automation. The communication is performed down a virtual channel which shares the same network connection as the remote session and hence does not require any additional firewall rules or ports.

IA-Connect does not need to be installed on the remote desktop – it is a standalone executable which can be run from a network share and runs using the account of the user logged into the remote session. There is no requirement for admin rights!

In this blog I will be comparing Surface Automation with IA-Connect User Interface Automation (UIA) using the Microsoft Dynamics NAV (Navision) application as an example.

Challenge 1: Top-level section

The first challenge in this automation is to navigate to the required top-level section in Microsoft Dynamics NAV. In this example I wish to navigate to “Departments”.

Sounds easy, but using Surface automation, you would first locate the coordinate of the graphic to the left of “Departments” and then move the mouse to click on it. The Surface automation method has the following challenges:

  1. The graphic appears differently depending on whether “Departments” is selected or not. This can cause a failure to detect the graphic.

  2. Citrix and Microsoft RDP sessions compress the remote session graphics to reduce bandwidth. This causes the graphics to vary between sessions (at the pixel level) and can cause graphics to be mis-detected. This is usually solved be increasing the “error threshold” but runs the risk of one graphic being confused for another similar graphic.

  3. The coordinates of the “Departments” graphic depend on how many “top level” sections the user has access to. In the above example, the user has access to eight sections:
    “Home”, “Journals”, ….., “Departments” but it could be more or less depending on permissions. This means you cannot use fixed coordinates to locate the element since the position can vary.

  4. Depending on the window sizing, the “Departments” option might not appear on-screen and you have to click the little down arrow first. But there are multiple identical looking arrows on the main screen, so Surface Automation won’t know which one to select.

  5. Another window or error message could appear in-front of the list, completely obscuring the graphics, resulting in the automation failing to detect the element.

In contrast, IA-Connect has none of these issues since it can natively inspect the element using User Interface Automation (UIA) and ask the Operating System to click the element. Because it doesn’t rely on screen images, this will succeed regardless of whether the element is on-screen or not.

In the screenshot below, you can see the IA-Connect Inspector (which runs on the RPA Virtual Worker) locating the native “Departments” element.

The action “Press element” can be used to perform a live test to confirm that the IA-Connect Agent can interact with the element. Once the test succeeds, you can copy the action to the Blue Prism studio, allowing it to be used from within your automation:

Challenge 2: Menu items

The next challenge is to pick the required item from an expandable list. In this example I want to click
“Purchase > Order processing”.

The Surface Automation method has the following challenges:

  1. Each item in the list can be collapsed or expanded and hence the absolute coordinate of the items can vary.

  2. Microsoft Dynamics NAV tends to remember the last selection you made and hence the initial state of this list can very between each automation.

  3. There is no image to assist Surface Automation with identifying the item.

  4. The item is quite often scrolled off-screen and hence cannot be seen at all by Surface Automation - you will need to click the scrollbars until it appears.

  5. OCR will need to be used to detect the words and this isn’t 100% reliable.

Again, IA-Connect combats these issues by natively inspecting the element using UIA. It can determine if “Purchase” is expanded or collapsed by querying the native element and can ask the Operating System to click the required elements. And like the previous scenario, this will succeed regardless of whether the element is on-screen or not.

In the screenshot below, you can see IA-Connect Inspector locating the native “Purchase” top-level item.

The action “Select element” allows you to perform a live test to confirm that the IA-Connect Agent can interact with the element. Once the test succeeds, you can copy the action to the Blue Prism studio, allowing it to be used from within your automation:

Once the menu item has been expanded, the IA-Connect Inspector can be used to locate the sub menu items, again providing the ability to perform live tests and to copy the required action to Blue Prism:

Challenge 3 – Input new vendor

The next challenge is to input details for a new vendor.

The input screen contains many sections, most of which are collapsed. In the example screenshot below, the “General” and “Payments” sections have been expanded. The small chevrons marked in red are used to expand and collapse each section:

The Surface Automation method has the following challenges:

  1. The expand / collapse chevrons all look the same and hence it will be hard to determine their position.

  2. The expand / collapse chevrons are often scrolled offscreen (hidden to the right) and hence cannot even be detected by surface automation until you have scrolled them into view. This requires a further surface automation task of detecting the location of the scrollbars (which also all look the same).

  3. There is no image to assist surface automation with identifying the text fields (e.g. “Name” and “Address”). It would be unwise to locate text fields relative to the toolbar since it can change and also has various tabs (each tab looking completely different).

  4. The text input fields (once visible) all look the same, especially when empty, and hence their position is hard to determine. Typically, you would need to perform an OCR scan on the page to detect the location of the labels (remember, OCR is not 100% reliable) and then click a distance to the right of the label to (hopefully) select the text box.

  5. The drop-down boxes (e.g. payment terms code) don’t support text input so you must first detect the position of the chevron (e.g. to the right of “30 days”) and click it to obtain a list of possible values. Then you would need to use OCR to determine the names of the options in the list so you can determine the coordinate to click (to select the required option). If the drop-down list scrolls then you’d need to scroll through the list until you find the required item.

  6. Some of the lower sections (below “Foreign trade”) are offscreen and you would need to locate the position of the scroll down arrow (which looks the same as any other arrows) and press it multiple times to scroll those sections into view.

Once again, IA-Connect has none of these issues since it can natively inspect the elements using native UIA. In the screenshot below, you can see IA-Connect Inspector detecting the top-level sections “General”, “Communication”, “Invoicing”, “Payments”, “Receiving”, …..

Once the “General” section has been identified, you can click “Step in” to view the elements within. In the example below, the line items correspond to the labels and text input fields in the “General” section. Many elements appear twice because one is the label and the other is the text input field.

The action “Input text” allows you to perform a live test to confirm that the IA-Connect Agent can interact with the element - in the example below, the “Address” field is being populated. Once the test succeeds, you can copy the action to the Blue Prism studio, allowing it to be used from within your automation:

The sections are easy to expand or collapse since the chevrons are called “Expand or collapse” in the UIA inspector and are child elements of the section they are within, e.g. “General”. These are therefore are easy to distinguish:

In this particular Microsoft Dynamics NAV scenario, with IA-Connect there is no need to expand each section or scroll the section onscreen since IA-Connect can interact with the element regardless of visibility.

Challenge 4 – Get Receipt Lines

A challenge further into the automation is to read the receipt lines from the table:

We come across various challenges with automating this screen using Surface Automation:

  1. There are various filters displayed above the table (the red x and the green +) and hence the absolute coordinates of the table can vary. The filters may not even be visually present since the whole section could be collapsed (the ^ symbol to the right of the search box) and the table will then appear higher than normal.

  2. There is no image to assist Surface Automation with identifying the table.

  3. The column headers (e.g. “Document No.”) could potentially be located using OCR, but depending on the column widths, “No.” could appear below or to the right of “Document” or not appear at all.

  4. The data in the table would need to be read using OCR techniques which, as we’ve said, are not 100% reliable, and particularly devastating when dealing with financial data.

  5. Most of the data in the table is offscreen both to the right and below. You would need to scroll down and to the right using Surface Automation to ensure that all rows and columns have been “seen”, and then join the results together.

  6. Some columns could be too narrow, and hence the data in those columns would be cropped (and hence not correctly read).

And you’ve guessed it, IA-Connect has none of these issues since it natively inspects the element using UIA. In the screenshot below, you can see the IA-Connect Inspector viewing a row in the table:

IA-Connect can read the entire contents of the table (or specified rows) in a single action and output into a collection, ready for data processing:

Common surface automation shortcuts

Many of the Surface Automation challenges listed above are hard to solve and RPA developers regularly perform shortcuts to achieve their goal since they are unable to identify the elements with a high degree of accuracy. These shortcuts result in a less reliable automation and increases the cost of supporting and maintaining the process once live.

Typical shortcuts include:

  1. Pre-configuring the application to a very specific set of window sizes. This can ensure that elements you require are on-screen and to a degree at fixed coordinates. This solution is volatile since an application could reset sizes for many reasons (upgrades, profile resets or a human manually loading the application). It is also hard to arrange for the window sizes to be the same across multiple workers.

  2. Pressing tab multiple times to jump between input fields. This is risky because an unexpected field (e.g. an optional field) or a read only field would break the sequence, resulting in the automation entering data into the wrong field.

  3. Using graphics in the toolbar to locate text input element below. This is risky because a change to the toolbar will result in the automation entering data into the wrong field.

  4. Assuming that one element is a certain number of pixels from another and using the mouse to click at that offset coordinate. A layout change or simply a screen resolution change would cause the automation to enter data into the wrong field or attempt to type into nowhere.

  5. Assuming that a drop-down list has a fixed number of elements and hence assuming it is safe to click the drop-down and press “down” a specific number of times. Any minor change to the available options in the drop-down will cause the automation to select the wrong option.

  6. When dealing with two graphics which appear the same such as scrollbars or chevrons, making assumptions about their relative position, e.g. Click the lowest chevron. This will break the automation if the application layout changes or if the “lowest” chevron simply happens to be off-screen (the automation then clicks the wrong chevron).

In my experience, all RPA developers who I’ve discussed Surface Automation with, have all agreed that the stability of the automation is a concern. The words “Don’t touch it or it will break!” being a common utterance caused by the fear that any small change or touch would break the automation.


Surface Automation is required when automating applications over a remote desktop session since native application inspection techniques cannot be used. Complex Surface Automation challenges arise in even simple scenarios and this typically results in the following outcomes:

  1. The automation takes longer to develop compared with a local application automation due to the time required for a developer to overcome the challenges and test the solutions thoroughly.

  2. Unreliable automations which are sensitive to application layout changes, application upgrades, resolution changes, OS upgrades and any other minor unexpected changes.

  3. Lower performance and greater Virtual Worker capacity footprint compared to native application automation due to the large number of intermediate steps, pattern recognition, OCR, scrolling and mouse clicking required to locate the correct application elements.

  4. Automations which are much harder to support because you must unravel the complex sequences which take place to locate fields via Surface Automation and work out what change caused the automation to break (often a minor layout change in another part of the window).

IA-Connect solves the challenges of remote session application automation by allowing native application inspection methods to be used to interact with the remote application – the same familiar RPA methods which work so well for local application automation.

0 (1)-3
Written by Simon Bond, Lead Developer, Ultima Labs

Related Resources