This article explains how to get URLs that are on Confluence pages using the Confluence Command Line Interface app.
Instructions
To get all the URLs from Confluence pages to a file, use runFromPageList and getSource actions., based on given parameters.
Execute the below command to get all links from pages in all spaces of a Confluence instance:
--action runFromPageList --space "@all" --clearFileBeforeAppend --common "--action getSource --id @pageId@ --special: "" #"" --append --file spaceLinkList.txt" --input "--findReplaceRegex ""(?s).*?href=.(http[s]{0,1}://[./a-z]+)#$1,"" --findReplaceRegex ""(?s)(((http[s]{0,1}://[./a-z]+),)*).*#$1"" --findReplaceRegex "",#' \n '"" "
Execute the below command to get all the links from a single page of a space:
--action getSource --space "ZCLI" --title "PAGE2" --special " #" --findReplaceRegex "(?s).*?href=\"(http[s]{0,1}://[./a-z]+)#\$1 \n"
Output from the above commands:
The parameters used in the actions are:
- --space: Name of the space. In this case, @all represents all spaces in an instance.
- --clearFileBeforeAppend: This option will automatically clear an existing file on the first append requested.
- --id: ids of pages. In this case, it searches for all the pages in the spaces using @pageId@
- --file: Path/name to file of result output.
- --findReplaceRegex: The regex pattern used to match the patterns of the links.
- --special: Characters used for specialized processing of some specific parameters.
- --append: The append will append output to the existing file.
- The above action is available from cli v9.3 and above.
- Make sure the regex pattern matches the pattern of the page links to get meaningful output.
- The above commands work for a windows machine. Make sure you update the syntax accordingly for Linux based machines.