Dan AllenDec 8, 2017

Referencing Pages

You’d think an intuitive system for creating links between pages would be a central feature of every site generator. After all, that’s kind of what the web is all about. Sadly, it’s often an afterthought.

The current mechanisms tend to distract from writing. Each time you want to make a link to another page, you have to pause to compute a relative path and/or transform the path to a published URL in your head. If you move the page you’re editing, it may cause the relative path to shift, which means you have to go back and recompute those references.

It’s no wonder writers create broken links at an alarming rate or spend excessive amounts of time fixing broken links left behind by other writers. But we can’t fault the writers when it’s the system itself that’s broken. Let’s figure out a solution.

Page referencing wishlist

We need to go back to the drawing board and design a referencing system that’s humanistic, grounded in how the source files are organized in the project structure (rather than relative to other files), and decoupled from where the documentation site is published.

As writers, here’s our wishlist for a page referencing system:

Reference a page without needing to know its URL.
Reference a page without having to keep track of where the target page is in relation to the current page. Just say no to the ../../ escape hatch!
Minimize updates to references when creating a new version of a component or when moving or renaming a file.
Get autocomplete help for references in the text editor / IDE.
Check and validate references automatically and reliably.
Publish a page to multiple domains without having to customize the references for each domain.
Eliminate the need to fix references when changing the site’s URL strategy.

What we’re looking for is a readable, reliable, and maintainable system for creating references between pages, regardless of where the pages are published.

A reference to a page should really work like a mobile phone number. When you call someone’s mobile number, as long as they have their phone on them, your call will find that person. It doesn’t matter where they are located physically. They could be at the grocery store, their mom’s house, or driving down the road. Wherever. It’s the phone you’re calling.

Isn’t that like a page’s URL?

The Problem with URLs

On the web, each page has a URL. At first glance, that seems like a easy way to reference a page. But URLs in documentation are surprisingly unstable. That’s because they’re dependent on the publishing environment. Consider the case of the same page being published to a staging URL, a protected beta release URL, and a public production URL.

This would be like your friend’s mobile phone number changing whenever they left their home. You’d have to know that your friend was at the grocery store at the moment you called them, and you’d need to know their grocery-store-mobile-number.

URLs are also unreliable as references because they often change when your organization reorganizes their site structure, renames a product, or decides to make URLs prettier by dropping the .html extension.

So a URL isn’t a unique or stable reference to a documentation page. It just locates one of many possible and ephemeral published instances of it. What we need is a way to reference the source of a page that ultimately links to that page’s destination, protected from the vagaries of publishing environments and URL strategies. Like a mobile phone number, we want to get to the source.

Page IDs to the rescue

In order to reference a page, that page needs a unique and stable identifier. That identifier doesn’t change, even if page gets published to different places. Like a mobile phone number, when that identifier is “called”, the page answers, no matter where it’s published.

We’ll call a page’s “number” a page identifier, or page ID for short. The page ID is constructed from the properties of the source document that make it unique within our documentation site.

In order to define a page ID, the first thing we need is the standard project structure we laid out in the previous article. Without a standard structure, we can’t navigate far beyond pages which are immediate neighbors because we have no way to describe pages that live anywhere else.

Organizing our files into documentation components ensures that each page lives in well-defined, describable location. From this structure, each location derives unique, reliable coordinates we can use to refer to it. For instance, we now know what component, version, and module each page belongs to. All we have to do, then, is determine how to arrange those coordinates and when we need to use them.

Let’s consider the “coordinates” that make a source document unique:

component: the documentation component to which the page belongs; often mapped to a repository
version: the version of the component in which this page lives; often mapped to a repository branch
module: the content bundle in which the page is grouped; if blank, defaults to ROOT
page: the path to the page’s source file in the module, including any leading topic directories
fragment: the identifier of an element inside the page

Using this information, we can construct a fully-qualified page ID as follows:

Here’s an example of a fully-qualified page ID:

1.0@antora:playbook:configure.adoc#filter-branches

We can repurpose the inter-document xref macro in AsciiDoc to convert this page ID into a link:

Here’s how that looks in practice:

xref:1.0@antora:playbook:configure.adoc#filter-branches[filter branches]

We’ve avoided coupling to the published URL by using a source-to-source reference. (Notice the page coordinate ends with .adoc, the file extension of an AsciiDoc source file). Regardless of whether you’re deploying your site locally, to staging, or to production, or you change URL strategies, the page ID always remains the same. The reference locks on to the target page and produces a URL that points to it wherever it gets published.

We’ve avoided coupling to the filesystem by using a location based on the documentation component structure. The page ID describes the source file’s project (component / version) and where in that project the source file is located (module / page).

We’ve eliminated the relative path (../../) problem by specifying the page as a module-relative path. The path always starts from a module’s pages directory, even when the referencing page is located inside a topic folder. If you move or rename a page within a module, you don’t have to change any references to other pages.

Of course, inbound references to the page you move do have to be updated. To counter this, you could pin the page ID of the page you want to move, thus adding more stability. That way, references to the page don’t have to be updated even when it moves. Though, a little help from the text editor to “refactor” references could make this abstraction unnecessary.

We’ve made it possible to validate and update references by using a well-defined pattern that’s easy for a script to locate, parse, and rewrite.

This human-friendly referencing system saves you from having to do computations in your head while writing. You just specify the coordinates of the page you want to reference. There’s no need to worry about the source file physical location on disk or its published URL. All you need to know are the names of your components, versions, modules, and pages so you can fill in this information.

Specifying the component and version every time seems like overkill. Surely there must be some sort of shorthand, right?

Concise contextual references

The full page ID sure seems like a lot to type each time you need to create a page reference. It also increases coupling when linking between pages of the same module or component, something you’ll do often. The problem is, the page ID often includes more information than is necessary. What we want to do instead is trim it down to just the essential information. We can do that by taking into account one of the most important aspects of a page ID: its context.

Let’s continue with the phone analogy. Think about when you’re at the office. You probably don’t need to use a complete telephone number to call someone in your office. There’s a company directory with extension numbers for every phone. You just punch in a short extension for the person you want to reach, and the call goes through. You’re benefiting from the two phones sharing the same context.

It’s the same way for page IDs. If you want to reference a page stored in the same module as the current page, which is most often the case, you only need to specify the filename of the other page.

xref:page-in-same-module.adoc[link text]

If you want to reference a page in a different module, you only need to specify the module and filename of the other page:

xref:sibling-module:page.adoc[link text]

To reference another version of a page in the same module, just prepend the version number to the filename.

xref:1.1@page.adoc[link text]

You only have to use the full page ID when you need to refer to a page that belongs to another component.

xref:2.1@other-component:module:page.adoc[link text]

The great thing about contextual references is that they are not coupled to the current component version. That means you don’t need to update the contextual page references when creating a new version of the component. All you’ll need to do is change the version number in the antora.yml file after you create your new version branch.

Why component/version and not repository/branch?

You may have noticed we’ve been using the terms component and version instead of repository and branch when discussing page references. Why the distinction? Aren’t they the same?

There are two reasons for these abstractions:

reduced coupling
fungibility

First, let’s look at how this abstraction reduces coupling.

Reduced coupling

Our instinct might be to assume the name of the component is the same as the name of the repository. But that can be very limiting.

First, we’d lose the ability to change the name of the repository without consequence. For example, we might decide to add a prefix to the name of all documentation repositories to help distinguish them from software repositories. Once we clone a repository to work with it offline, we might not know what the repository is anymore because it’s not necessarily clear what the origin is. And if there’s more than one documentation component in a repository, we certainly can’t count on the repository to provide the component name. So we need a more stable way to identify the name of the component.

The same can be said for the branch. At first, we might think that the branch could be the version number. But that means we have little control over branch names, especially when they are mixed with other types of branches or, again, we’re working with a local copy of the branch. We’d have to rely on the git remote to be accurate, and it may not always be.

In the end, that’s why we put this information in the component descriptor file, antora.yml. This file provides stable metadata that the site generator and other tools can use when they need to retrieve information about the component and version. It’s these values that are used in the page ID’s component and version coordinates:

name: component-a
version: '1.0'
# ...

But there’s another benefit to storing this information in the component descriptor file, fungibility.

Fungibility

A repository provides a component, and a branch provides a version. But we could substitute a different repository to provide that component or a different branch to provide that version. In fact, different branches of different repositories could contribute different versions of the same component. Or a single component version could be spread across multiple repositories.

In the end, the source location doesn’t really matter. It’s the files that the source contributes that matter. This is how we can have a branch still in development (e.g., v4-beta) stand in for a final version (e.g., v4) before the final version is released. It also allows you to have a branch in your own fork substitute the version from the official branch before you’re ready to merge your changes. In other words, it allows us to really stretch this multi-repository design to cover a lot of different scenarios.

Extending the syntax

Page IDs provide an intuitive and reliable way to reference pages, whether within the same module, another module in the same component, or a completely different component and version. By extending the inter-document xref macro in Asciidoctor to support page IDs, writers can use page IDs to make links between pages. Transcribing the page ID into a reference for frequently cited pages should quickly become habit for writers. When they forget, the text editor can lend a helping hand.

Since page references can be used to make links to pages anywhere in the site, they’re also a perfect fit for the navigation system. The navigation system can be designed to read AsciiDoc files composed of page references organized in a nested list.

Page IDs could also be used for page routing. A page could claim multiple page IDs. These page aliases would then be translated into URL routes that redirect to the master page. Such aliases might come in handy when a page is moved or removed, or for making vanity URLs.

Page references are just one example of how the AsciiDoc syntax can be extended to simplify a common documentation task. In the next article, we’ll look deeper into how well suited AsciiDoc is for writing and producing documentation and at additional ways we could build on the syntax to address common writing requirements.

Special thanks to Sarah White and Lisa Ruff for their substantial revisions to this article. Additional thanks to Sarah White for creating the intuitive diagrams featured in this article.