Skip to end of banner
Go to start of banner

How to get all the URLs from pages of Confluence

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

This article explains how to get all URLs that are on Confluence pages in bulk using the Confluence Command Line Interface app.

Instructions

To get all the URLs from all Confluence pages to a file, use runFromPageList and getSource actions.

The parameters used in the actions are:

  • space: Name of the space. In this case, @all represents all spaces in 
  • clearFileBeforeAppend: This option will automatically clear an existing file on the first append requested.
  • id: ids of pages. In this case, it searches for all the pages in the spaces using @pageId@
  • file: Path/name to file of result output.
  • findReplaceRegex: The regex pattern used to match the patterns of the links.
  • special: Characters used for specialized processing of some specific parameters.
  • append: The append will append output to the existing file.

Execute the below command to get all links from all pages in all spaces of a Confluence instance.

--action runFromPageList --space "@all" --clearFileBeforeAppend --common "--action getSource --id @pageId@ --special: "" #"" --append --file spaceLinkList.txt"  --input "--findReplaceRegex ""(?s).*?href=.(http[s]{0,1}://[./a-z]+)#$1,"" --findReplaceRegex ""(?s)(((http[s]{0,1}://[./a-z]+),)*).*#$1"" --findReplaceRegex "",#' \n '"" "


Execute the below command to get all the links from a single page of space.

--action getSource --space "ZCLI" --title "PAGE2" --special " #" --findReplaceRegex "(?s).*?href=\"(http[s]{0,1}://[./a-z]+)#\$1 \n"


Results from the above commands:

  • Make sure the regex pattern matches the pattern of the page links.
  • The above command will work for a windows machine. Make sure you update the quotes accordingly for Linux based machines.
  • No labels