How to globally search and modify content
On this page
Description
Often the need arises to find pages that have certain content and modify it slightly. If the number of pages is larger, making manual modifications can be painful and error prone. Automation can help, but you have to make sure that you are modifying exactly what you want and not something unintentional. See also this user question for a discussion on this topic.
Updating storage format
Always be careful updating storage format data and make sure you test before doing mass updates.
Go to How to Find Confluence Pages and Then Do Something for techniques for finding content.
Links
- How to Find Confluence Pages and Then Do Something
- How to Use SQL to Find Pages
- How to make global changes to comments
- Advanced Searching With CQL
Example 1: Changing a url
Changing a link. For instance: http://myjira.com to http://mynewjira.com.
Steps
Setup an example page
--action storePage --space xxx --title test --parent @home --content "aaa http://myjira.com bbb"
Construct a modifyPage action for a single page using a simple text replace. Since : (colon) is in the text and is the default key:value separator for CLI, use # instead using the special parameter (spaces are significant!)
--action modifyPage --space xxx --title at --findReplace "http://myjira.com#http://mynewjira.com" --special " #"
Run against all pages with the link (using unix style escaping - see Tips). Run against your test space first before using @all.
--action runFromContentList --cql "type=page" --common "--action modifyPage --id @pageId@ --findReplace \"http://myjira.com#http://mynewjira.com\" --special \" #\" "
Results
Run: --action modifyPage --id 112197680 --findReplace "http://myjira.com#http://mynewjira.com" --special " #" Page modified: 'at' in space: xxx. Page has id: 112197680 ...
Example 2: Renaming a macro
There are a few cases why you may need to rename a macro:
- A user macro may need a better name or may conflict with macro from a new plugin
- A macro may no longer be valid because you have discontinued a plugin or removed a user macro, but you want to save the body of the macro
Steps
- Determine the kind of body the existing macro has
- Plain text like the noformat macro
- Rich text like the panel macro
- Make sure the target macro has the same body type
- Use the techniques of Example 1
Use something similar to the following findReplace string
One rich text macro: --findReplace "ac:name=~table-plus~#ac:name=~panel~" --special " # ~" One rich text macro and a plain text macro --findReplace "ac:name=~table-plus~#ac:name=~panel~,ac:name=~csv~#ac:name=~noformat~" --special " # ~"
Finding pages with macros
Recommend using CQL for finding pages using the macro cql like macro = noformat from Advanced Searching With CQL
Example 3: Changing XHTML content
This site has many pages created via automation, often by a Bamboo build process. Confluence has an annoying feature that converts simple wiki text like @entry@ into XHTLM reference to a template variable. There doesn't seem to be a way to escape this behavior. The result is a number of pages with incorrect data. This examples shows how this was corrected.
Steps
- Identify the problem - usually, someone notices a problem on a specific page when viewing the page
- Setup an example page for testing
- Find all content with the problem. You need to have a unique way to identify the text that is in error without finding extraneous text. This can be difficult in some cases.
- Look at the storage view of the page with the problem
- In this example: <at:var at:name="entry" /> should have been @entry@
- In the UI, search for something like: "<at:var" and verify that the pages found represent the problem you are trying to solve
Construct the same search using the CLI and verify results. Suggest you start with a single space first before doing @all. Note you need to escape the double quote using your system specific escaping syntax - see Tips
--action getPageList --space @all
Construct a find/replace string. In this case, the text between the @ signs could be anything, so we will use findReplaceRegex instead of simple text replacement that findReplace does. You need to know a little bit of regex syntax including a find group (in this example: any number of alphabetic characters) and the replacement text referencing the find group: @$1@. Test your regex first. How to use regular expressions has some pointers.
<at:var at:name="([a-zA-Z]*)" /> needs to be replace with @$1@
- Now construct the CLI command to make a single page fix.
- Yuk ! XHTML has a bunch of characters that are considered special characters for the CLI, regex processing, and/or command line processing. Tips has some rules for escaping some of the CLI related areas, but it is still a pain especially with differences between Windows and non-Windows systems.
Construct the parameters for find and replace. Use ~ instead of " so we can avoid escaping double quote. Use # as the separator between key and value instead of the default : (colon). For unix based command lines, the $ must be escaped.
--findReplaceRegex "<at:var at:name=~([a-zA-Z]*)~ />#@\$1@" --special " # ~"
Construct the CLI command for a single action and test on a single page
--action modifyPage --space xxx --title "my title" --findReplaceRegex "<at:var at:name=~([a-zA-Z]*)~ />#@\$1@" --special " # ~"
- Global change
Put every thing together using runFromPageList (this examples using unix based escaping)
--action runFromPageList --space @all --common "--action modifyPage --id @pageId@ --findReplaceRegex \"<at:var at:name=~([a-zA-Z]*)~ />#@\$1@\" --special \" # ~\" "
Result
Run: --action modifyPage --id 28901397 --findReplaceRegex "<at:var at:name=~([a-zA-Z]*)~ />#@$1@" --special " # ~" Page modified: '3.0.0 - Documentation' in space: BCLI. Page has id: 28901397 Run: --action modifyPage --id 28901413 --findReplaceRegex "<at:var at:name=~([a-zA-Z]*)~ />#@$1@" --special " # ~" Page modified: '3.0.0 - Documentation' in space: CSOAP. Page has id: 28901413 ...
It is very important to get the quoting right!
That means using the correct escaping for your command line client so your command line parses correctly. For embedded double quotes, that means \" for unix like environments and "" for other environments. Once through to the CLI, the CLI parser takes over and enforces the CLI parsing rules. The CLI can then apply special parameter substitions.
A debugging technique for findReplace and findReplaceRegex parameters is to use –options debugFindReplace to make sure you are really looking for the right thing in the source. Also, I recommend using --simulate on run type actions and/or --limit 1 to do your debugging.