Tokenization of Confluence User Macros as a Vector of Meta-Analysis

Abstract

The division and quantification of the constituent pieces of a Confluence user macro allows for granular analysis and adjustment on a system-wide level.  This approach to analysis also provides a means by which the inbuilt macro usage feature of Confluence may be extended to include additional metrics.

Overview

Confluence user macros are those macros written by users or administrators of the system. They exist in parallel to macros that are installed as part of the Confluence system, as well as macro functionality installed from the Atlassian Marketplace.

The creation of macros by users of the system is at once convenient, and a potential source of both tech debt and instability or inconsistency in the Confluence system.

By developing a method by which the structure of these macros may be extracted en masse, and subsequently broken down into individual components, an opportunity arises in which bulk-fixes of endemic macro-related problems may be applied. 

Part I – User Macro Structures

User macros in Confluence Server or Confluence Datacentre are written in either Apache Velocity or JavaScript.    Apache Velocity is a markup language that bears resemblance to other markup languages.  It provides a means by which users may write simple scripts that interact with both the Confluence system, and the browser’s Document Object Model (DOM).  This, in turn, allows for the creation of dynamic content as part of a macro.

The code of a user macro does not run directly on the page when a Confluence Space is loaded.  Rather, pages may have a reference to the macro as part of their content.   The macro code lives on the back-end, and a call is made to that code each time the page loads.

This is a foundational concept when discussing user macros.  First, that the page contains a reference to the macro, along with the user-supplied parameters.  Second, that the macro is called each time the page loads, rather than simply amending the content of the page when the macro is added.

For clarity’s sake, here is an example. 

  1. User A develops a macro, called Add Heading.  
  2. When this macro is added to a page, it asks the user which text should be made into a heading. The text that the user supplies is a macro parameter, and it lives on the page along with a reference to the Add Heading macro.  
  3. Each time the page is loaded, the body of the page is parsed by Confluence. Remember, the reference to the macro and the macro parameter are stored within the page body.
  4. When confluences detects that a macro is referenced on the page, it sends the parameter value to the back-end, runs the macro code, and sends the resulting value back to the page.

This happens every time the page is loaded.  If the page is reviewed using the REST API, the HTML that appears when the page is loaded would not be present.  Instead of something like <H1>THIS IS A HEADING</H1>, the body of the confluence page contains a structure like this:

<ac:structured-macro ac:name=“add-heading” ac:schema-version="1" ac:macro-id="xxxxx">
<ac:parameter ac:name=“heading”>THIS IS A HEADING</ac:parameter>
</ac:structured-macro>

This is the structure of a user macro, and it is these components that will be tokenized.

Part II – Extracting and Tokenizing User Macros

Within the body of a Confluence page, user macro references are ultimately strings of text with a specific structure.  It is this predictable structure that allows for the use of Regex to extract all macro references from a page.

For the purposes of this research, the analysis was conducted using ScriptRunner, and the Groovy API library with which Confluence runs.

The code example below takes a macro name as input, and interpolates it into a regex pattern.  The script also takes a list of Confluence page IDs as input, and iterates through them.

For each page, it converts the body content to a string and performs a match with the established regex pattern.  For each match that is found, it tokenizes the macro by dividing it into its constituent components.

import java.net.HttpURLConnection
import java.net.URL
import groovy.json.JsonSlurper
import com.atlassian.sal.api.component.ComponentLocator
import com.atlassian.confluence.spaces.SpaceManager
import com.atlassian.confluence.pages.PageManager
import com.atlassian.confluence.pages.Page

// Initialize page manager
def pageManager = ComponentLocator.getComponent(PageManager)

// Macro configuration
def macroName = "macro-name"
def structuredMacroPattern = /<ac:structured-macro ac:name="${macroName}" .*?<\/ac:structured-macro>/

// List of Page IDs for testing

def pages = ["1234567", "506952234"]

//Process page objects
pages.each { pageID ->
    def pageExists = false
    //Assume the page does not exist, until proven otherwise
    def page = null
    
    try {
        page = pageManager.getPage(1570571422 as Long)
        if(page){
            pageExists = true
        }
    } catch (Exception pageFetch) {
        log.warn("Error fetching page with ID ${pageID}")
    }
    if (pageExists) {
        try {
            def bodyContent = page.getBodyContent()
            // Get the body content of the page
            // This will be a storage object, not a string

            def macroMatcher = bodyContent.properties.toString() =~ structuredMacroPattern
                    //Compare the body of the page as a string against the regex statement declared earlier

                    if (macroMatcher) {
                        // If a match is found
                        macroMatcher.each{ match ->
                            // Iterate through the matches
                            def value = match.toString()
                            def tokenizedValues = value.split('>')
                            //Split the macro into its constituent pieces
                            def tokenIndexPos = 0
                            tokenizedValues.each{tokenValue ->
                                log.warn("${tokenValue} is at position ${tokenIndexPos}")
                                tokenIndexPos++
                                // Iterate through the components of the macro and note the index position of each
                            }
                            log.warn("${tokenizedValues[0]} is at position 0")
                            // Here's an example of how you'd work with a specific attribute or component

                            log.warn("${tokenizedValues[0].split("<")[0]} is at position 0")
                            // Here's one example of how you'd remove extraneous information attached to the component

                            log.warn("${tokenizedValues[0].split('"')[1]} is at position 0")
                            // Here's another example of how you'd remove extraneous information attached to the component
                        }
                    } else {
                        log.warn("No macro value found in macro for page '${page.title}'")
                    }
        } catch (Exception e) {
            log.warn("Encountered an error processing macro ${macroName} on page '${pageID.title}': ${e.message}", e)
        }
    }
}

 

Part III – Methods of Analysis

Having had tokenized the macro, the analysis of its contents is a matter of quantifying certain aspects, or of qualifying the presence of the macro on x pages. 

Below is a code example that extends the capabilities of the script in several ways.  First, it only examines those pages that have an association with the macro in question.  This is accomplished by querying the REST API with some targeted CQL to return all pages on which the macro is embedded.

Second, it returns a report on which of the macros in question have a specific value, giving insight into the way in which the macro is being used.

import java.net.HttpURLConnection
import java.net.URL
import groovy.json.JsonSlurper
import com.atlassian.sal.api.component.ComponentLocator
import com.atlassian.confluence.spaces.SpaceManager
import com.atlassian.confluence.pages.PageManager
import com.atlassian.confluence.pages.Page

// Initialize page manager
def pageManager = ComponentLocator.getComponent(PageManager)

// Macro configuration
def macroName = "macro-name"
def structuredMacroPattern = /<ac:structured-macro ac:name="${macroName}" .*?<\/ac:structured-macro>/

// API configuration
def apiUrl = "https://<confluence-url>/rest/api/content/search?cql=macro+%3D+%22${macroName}%22&size=1000&limit=1000"
def maxItems = 1  // Set to a low number for testing, higher for batch processing
def jsonResponse = ""

// Fetch macro usage info using REST API
try {
    URL url = new URL(apiUrl)
    HttpURLConnection connection = (HttpURLConnection) url.openConnection()
    connection.setRequestMethod("GET")
    connection.setRequestProperty("Content-Type", "application/json")

    int responseCode = connection.getResponseCode()
    if (responseCode == HttpURLConnection.HTTP_OK) {
        def response = connection.inputStream.text
                def jsonSlurper = new JsonSlurper()
        jsonResponse = jsonSlurper.parseText(response)
    } else {
        log.error("Error calling REST API. Response Code: $responseCode")
    }
    connection.disconnect()
} catch (Exception e) {
    log.error("An error occurred: ${e.message}", e)
}

log.warn("Total number of pages with this macro: ${jsonResponse.results.size()}")

maxItems = jsonResponse.results.size()
// Set the maximum items processed to the number of pages returned

// Get page objects
def pages = jsonResponse.results.take(maxItems)

def matchResults = []

//Process page objects
pages.each { pageResult ->
    def pageExists = false
    //Assume the page does not exist, until proven otherwise
    def page = null

    try {
        page = pageManager.getPage(pageResult.id) // Replace with `pageResult.id as Long` in real use
        if(page){
            pageExists = true
        }
    } catch (Exception pageFetch) {
        log.warn("Error fetching page with ID ${pageResult.id}")
    }
    if (pageExists) {
        try {
            def bodyContent = page.getBodyContent()
            // Get the body content of the page
            // This will be a storage object, not a string

            def macroMatcher = bodyContent.properties.toString() =~ structuredMacroPattern
                    //Compare the body of the page as a string against the regex statement declared earlier

                    if (macroMatcher) {
                        // If a match is found
                        macroMatcher.each{ match ->
                            // Iterate through the matches
                            def value = match.toString()
                            def tokenizedValues = value.split('>')
                            // Split the macro into its constituent pieces
                            def tokenIndexPos = 0
                            tokenizedValues.each{tokenValue ->
                                log.warn("${tokenValue} is at position ${tokenIndexPos}")
                                tokenIndexPos++
                                // Iterate through the components of the macro and note the index position of each
                            }

                            if(tokenizedValues[0].split("<")[0] == "Project is in progress"){
                                matchResults.add("${page.title} has the target macro containing the specified value")
                            }
                            // Here's one example of how you'd remove extraneous information attached to the component
                        }
                    } else {
                        log.warn("No macro value found in macro for page '${page.title}'")
                    }
        } catch (Exception e) {
            log.warn("Encountered an error processing macro ${macroName} on page '${pageResult.title}': ${e.message}", e)
        }
    }
}

print("There are ${len(matchResults)} pages with the macro and specified value:")
return matchResults

 

Part IV – Conclusion

Any value that can be quantified or qualified is a viable target for analysis.   This tool extends the inbuilt capabilities of Confluence macro usage feature, and allows for a granular examination of the scope and depth of macro usage in a Confluence system.

Additionally, this tool is the basis for an approach to migrating Confluence from Server or Datacentre to the Cloud.  By pulling out specific values from a macro, and replacing the macro with that value as static text, a system may be migrated in such a way that it mitigates the lack of a particular macro on the Cloud.

This approach to analysis, particularly when using targeted data from the REST API, provides a foundation for the precise analysis, quantification, and removal or replacement of Confluence macros.

 

Leave a Reply

Your email address will not be published. Required fields are marked *