Server Driven UI

Update 2023-03-29: In August 2021 Apple acquired Primephonic the company. On March 28, 2023 Apple launched Apple Music Classical as a new app. This new app is built on the foundations of the Primephonic app, the fundamentals of the Server Driven UI architecture as described in this post remain the same.

Note: This post is based on a talk I gave at CocoaHeadsNL in July 2020. Warning: because this is based on a transcription of a talk, sentences and wording may be weird or incoherent. The original talk, including a live Q&A afterwards, can be seen here:

Over the years, I’ve build many different types of apps. However in one aspect they’ve all been very similar; In the client/server architecture, the server sends domain objects to the client (encoded in JSON), and the client renders these domain objects to some pretty UI.

Server Driven UI is different. The server does not send domain objects, with the client having to decide how to render those. Instead, the server decides what and how to render, and just sends instructions to the client. You know, kinda like HTML…

(Honestly, it’s not HTML… But it sort of is. But really, it isn’t)

Outline

Introduction

This post is about my experiences with building and using a Server Driven UI in the Primephonic app.

Screenshots of Primephonic app running on Android, iPhone, iPad, and the Web on a laptop

Primephonic is a classical music streaming service. It has apps for Android, iPhone, iPad, a web player and Sonos integration. Similar to Apple Music, or Spotify, but additionally there are lot of features that are specific to classical music.

We build Primephonic over the course of a couple of years and we build it all using Server Driven UI from the very beginning.

The basics

The Primephonic Composer page, separated into multiple components

This screen from Primephonic shows a Composer page. It is build up out of several components.
At the top is an Artist header, there are two section headers, three detail buttons that link to other pages and a couple of paragraphs of text.

The order of these components, as well their content, are all decided by the server. So if we want to make a change, and show some albums, instead of biography. We can make that change on the server, without needing to release new versions of the client apps.

The same Composer page, but with some components swapped out

History

This is not a new idea. Here are just a few examples of other applications of the same concept:

Components

Let’s explore Server Driven UI by building some components. We start small, with a UITableViewCell.

A UITableView cell with two lines, showing a single work

This cell shows a single work by Beethoven. This looks simple, but of course there’s more to it than that.

When looking at the domain model, we see that there’s a ton of details. For each work there’s a separate title, number, key, alternative title, opus number, genre, period and some translations.

A comparison of the domain model of a work and the visual representation in UITableViewCell

This domain model lives on the server. In some database somewhere, there are a bunch of fields for this record that contains all this rich information.
What part of all this data do we send to the client? Do we want to send the whole object? Including the fields that aren’t needed? Who does the logic, who decides that the key should be in the top line, and the opus number in the second line?

In a Server Driven UI this is pretty straight forward. Everything is done on the server. The server does all the formatting and simply sends the results to the client.

{
  "title": "Piano Sonata No.14 in C# minor",
  "subtitle": "Op. 27/2 • “Moonlight”"
}

This is all the JSON that’s send to the app. A title and a subtitle. Because that’s all that’s needed to implement the design as shown.

Multiple variants

Three variations of previous cell, all driven by optional fields in JSON

Obviously, in a real world app, there’s more variations. In certain contexts we may want to include the composer name. Or show the number of recordings that we have for this particular work.
Again, we can do all this in a server driven way. The JSON has a bunch of extra fields, all of which are optional. If the field isn’t included in the JSON, it isn’t shown in the table view cell. It is the server that decides which field is filled when and with what content.

{
  "type": "detailRow",
  "addition": "Ludwig van Beethoven",
  "title": "Piano Sonata No.14 in C# minor",
  "subtitle": "Op. 27/2 • “Moonlight”",
  "disclosureText": "873"
}

Cell with Artwork

Three variations of cells showing artworks, all driven by JSON

Let’s look at a different example, a table view cell that shows an artwork. Again, all variations of this cell are modeled in JSON. There’s a bunch of different fields that are all optional.

What’s interesting here is that the format of the JSON was very much influenced by the interaction designs for the app. When developing this app, a trio of a designer, a backend engineer, and an app developer together talked through all variations that could occur and designed the visuals and API simultaneously.

{
  "type": "artworkRow",
  "image": { "url": "..." },
  "addition": null,
  "title": "Berliner Philharmonic, H...",
  "subtitle": null,
  "description": "1980 • 30 tracks • 2h 8m"
}

Carousel

For a final component, let’s look at something more complex. This is a carousel, it lists the top composers in order of popularity.

A single UITableViewCell which contains a horizontally scrolling UICollectionView

We implemented this as another table view cell, but this one contains a UICollectionView.
Looking at the JSON for this, we see this component is of type carousel. It includes the item type, so we know for the collection view which collection view cells have to be dequeued.
The items field holds an array of items for each composer.

Again, the JSON model is pretty straightforward, an image and a title. That’s all there is to display this cell.

{
  "type": "carousel",
  "itemType": "artist",
  "items": [
    {
      "title": "Beethoven",
      "image": { "url": "..." }
    },
    ...
  ]
}

A complete screen: Composer

Taking it all together, let’s look again at this Composer page.

Screenshot of the composer page, along with the JSON

To render this page, the app needs to do 1 single API call to the backend. That API call returns all data for this single screen at once, as a list of components.

{
  "type": "componentScreen",
  "title": "Leonard Bernstein",
  "header": {
    "image": { "url": "..." },
    "title": "Leonard Bernstein",
  },
  "sections": [
    {
      "heading": {
         "title": "Popular Works",
         "button": { "title": "Show all", ...}
      },
      "components": [
         {
           "type": "detailRow",
           "title": "West Side Story",
           "disclosureText": "258",
         },
         {
           "type": "detailRow",
           "title": "Candide",
           "disclosureText": "116",
         },
         {
           "type": "detailRow",
           "title": "Symphonic Dances",
           "disclosureText": "70",
         }
       ]
     },
     {
       "heading": {
          "title": "Biography",
       },
       "components": [
          {
            "type": "text",
            "paragraphs": "Leonard Bernstein was an American pianist...",
          }
        ]
      }
    }
  ]
}

So going from the top, we see the type of the page, a title, and a header. And then a list of sections, each section consisting of a list components.

This whole page is implemented as UITableViewController. And each component implemented as a table view cell. Every row in the table renders itself based purely on the data contained in the JSON.

In fact there is is no special logic in the app for the composer page. This could just as well be a work or an album page. All logic to distinguish between those domain objects lives on the server. The app knowns nothing about that.

A topic that comes up quite a lot when talking about an app whose data comes from a server is “Navigation”.
When do we load the data that is needed?

Two screens, an overview page and a detail page

These are two screen from the app, the first shows a list of composers, the second a single composer. But how do we get from the first screen to the second screen?

Given that each screen is one API call, you might think of two approaches:

  • Tap the item, navigate to an empty white screen and start doing the API call
  • Tap the item, do the API call, wait for that to return and then push the new view controller, that is then completely filled in

We didn’t like either of these approaches; We don’t want to have the “webbrowser” effect, where each link sends you to a white screen that slowly loads. But we also didn’t want taps to be slow, where you have to wait for an API call to return before navigation starts.

So instead, we include the header of the second screen, as part of the link on the first screen.

Three states, an overview page, a detail page in a loading state, and a detail page with all content loaded

So there are in fact three states that we have to deal with.:

The first screen that contains a link, which includes the header for the second screen.
When we tap that item, we immediately push the second viewcontroller.
That second view controller is in a loading state, that already shows the big header, with a loader animation below.
And the finally, when the API call returns, we fade in all data.

Let’s take a deeper look at the JSON for the Mozart item in the Composers section.

{
  "title": "Mozart",
  "image": { "url": "..." },
  "action": {
    "type": "componentScreen",
    "url": "/query/view/artist/wolfgang-amadeus-mozart-1756",
    "title": "Wolfgang Amadeus Mozart",
    "header": {
      "type": "artist",
      "image": {
        "url": "...",
        "lowresUrl": "..."
      },
      "title": "Wolfgang Amadeus Mozart",
      "subtitle": "Composer 1756 – 1791"
    }
  }
}

There is the title and the image fields we saw before, but this item also has a tap action.
That action, already contains the type of the target page. So that we know what view controller to load. It also contains the URL for the API call and a title and a header.
The header is the exact same header that is part of the components on the next API call.
It also contains a title, subtitle and an image.

One interesting aspect of this image, it doesn’t just contain an url field. There’s also a lowresUrl field. That low res image can be used as a placeholder, while the larger image is being loaded. If that image happens to be in the HTTP cache, it is used.
And in fact, that image is very often in the cache, because the low res url is the same url as was used for the small image in the carousel on the previous page.

So in practice, when tapping items in a carousel, we see a nice push animation to a new viewcontroller. Where the top half of the screen is already filled with a header including the background image, while the bottom halve is being loaded.

In general, we are quite happy with this form of navigation. It takes some work on the server, to be able to generate the exact type of header that is also generated when requesting the target URL. But once that’s implemented, the result looks quite nice.
Obviously, it’s not as nice as it would have been had we not have to do the API call. But this is a good middle ground, that works well in practice.

Actions

Local actions

The most obvious action you might expect in an app like this is playing music. So when we tap the first track in this list, the miniplayer pops up, and music starts playing.

How does this work with our Server Driven UI?

A tracklist, each track contains an “action” in the JSON

So a track is a type of component which includes a title, subtitle and a duration. And similar to the navigation example, this also includes an action.
However this action is of type “play” and it contains all data needed to start playing music.

The tracksMetadataUrl will be loaded asynchronously, to get all data needed to play all the items in the queue. But in the mean time, similar to the header for navigation, this action also includes the startTrack object. So that the player can start playing music as fast as possible, without having to wait for the metadata to be loaded.

{
  "type": "track",
  "title": "Beginning and Ending",
  "subtitle": "Max Richter",
  "duration": "4:54",
  "action": {
    "type": "play",
    "tracksMetadataUrl": "/query/view/album/886448177210/tracks",
    "startIndex": 0,
    "startTrack": {
      "id": "886448177210-0",
      "md5": "073629779deddb06a14cdb559da7a62d"
    }
  }
}

Remote actions

Action sheet with three buttons, opened by tapping “more” button in navigation bar

A different type of actions are what we call “remote command” actions. These we can see in the menu in top right. The JSON for this menu shows a list of actions, each of these actions has a URL to call, and a payload to send back as the body of a HTTP POST.

{
  "secondaryButtons": [
    {
      "action": {
        "type": "command",
        "url": "/command/endpoint?path=favorite_playlist"
        "payload": "..."
      },
      "title": "Add playlist to favorites"
    },
    ...
  ]
}

Loader & Done

Three states; a loader, a “DONE” popup, and a new action sheets showing a “remove” action

When we tap the top button “Add playlist to favourites”.
We see a loader while the API call is happening, when it completes we see “DONE” and the page behind it is refreshed.
If we then open the menu again, we see that it now contains an entry to remove the playlist from favourites.

It’s worth noting here that everything we saw here was very generic. In fact when we started implementing favouriting, this was the first version, and it was completely implemented on the backend. We build and shipped the first version of this feature without doing any work on the clients. Later though, we made it better.

Requesting user input

Another state, requesting user input

Here’s a second type of response a server can send. Instead of just showing a big “DONE”, the server can request input, from either a multiple choice list, or, like in this case, a piece of text. Here we see part of the flow of creating and editing playlists, and again, this is all orchestrated by the server.

Customising actions

Now, there are some things that can’t be implemented by the server. For some things, we need the client.

Navigation bar has a “share” button that opens the system share sheet
{
  ...
  "secondaryButtons": [ ... ],
  "share": {
    "title": "Check out “Episode 1: Baroque Era“ on Primephonic",
    "url": "https://play.primephonic.com/playlist/AEl7xsGhQegEhOoyY6cge"
  }
}

Here we see a share button in the top right, when we tap that the system share sheet pops up. For this we need client side logic. So what the server sends us is fairly compact. The server decides the text that is shared, and includes the Universal URL. But it’s up to all the clients to decided what to do with this. In the iOS example, we insert a share sheet button in the navigation bar. We also use this URL for Handoff between different Apple devices.

A beter favourite button ??

A playlist screen showing a heart icon button for favouriting

And finally, in case you were worried we left favouriting in that ugly menu from before: of course not. When we had some time on the client side, we went in and wrote special case code to intercept the favourite item. Instead of showing it in the menu, we create a nice button with a heart icon.

In theory, when you press it, we can show a nice animation and proactively select the item. The API call to favourite the item can then be done in the background. We then of course need to also implement all the logic for handling all edge cases, like network errors or if the server returns an error, we need to unselect the item again.

Unfortunately, Primephonic isn’t there quite yet. Currently when you press the button, there’s still a full screen spinner, but at least the button looks nice already!

API design

Here are some specifics related to the API, and how we integrated that in the app.

Slide showing multiple GET and POST calls that include a version number in URL

As you can see, we include versioning in our API. And in the 2 years the app has been live, we’ve already created 15 versions. But, that’s not a big issue. Versions are cheap.

We create a new version in this API whenever we add a new component type that the existing clients don’t know about. So when a requests comes in, the server looks at the version number of the request, and only returns the component types that that client supports.

If an older client sends a request, the server has to decide what to do as a fallback for a certain component. Maybe there’s a nice new carousel, but the client doesn’t support that yet. Then the server can send back an old boring list of items. Or simply skip the item, for that old client.

On the server there’s a single codebase that supports all versions. The server maintains has a big list that shows which API version supports which component types.

Should it ever become too much of a hassle to maintain all these versions, and statistics show a certain version is rarely used anymore. Then the server can decide to no longer support that version and send back a generic error message, using the “text” component type that already existed in version 1.

JSON Decoding

Screenshot of the Composer page

Again, this screen is implemented as a UITableViewController. It uses a diffable data source to animate between changes when reloading after performing remote command.

The clients and server also make use of an HTTP Etag to indicate that there are no changes, so we don’t need to animate anything.
The same goes for a client side triggered cancellation.

struct Section: Decodable {
  let heading: Heading?
  let components: [ValueOrDecodingError<Component>]
}


enum Component {
  case carousel(Carousel)
  case detailRow(DetailRow)
  ...
}


struct DetailRow: Decodable {
  let title: String
  let subtitle: String?
  let disclosureText: String?
  let primaryAction: Action?
  let deleteAction: Action?
}

JSON parsing is all implemented using Swift’s Codable. We create dedicated structs that directly match with the JSON that the server returns.
Sometimes we need domain objects to sit between the API models and the views. In that case we map from these API models to separate domain models.

I should note that while the server needs to support multiple API versions at the same time, for the clients its a lot easier. It only needs to support 1 API version. So we can even remove old component types in newer API versions, and just throw away that code from the clients. The server isn’t so lucky.

One thing you may have noticed is the ValueOrDecodingError type in the components array of the top Section struct. This is there to protect against programmer error.

In theory the server should never send something to a client that a client doesn’t support. But in practice, programmers are people, and people make mistakes.

public enum ValueOrDecodingError {
  case value(Wrapped)
  case error(DecodingError)
}

So the ValueOrDecodingError exists as a safeguard. It is a generic enum, that either wraps a single value, or it contains an error.

The Decodable implementation for this types tries to decode a value of the generic Wrapped type. If it succeeds, it returns that value, if it fails, it returns the decode error.

struct Menu: Decodable {
  let buttons: [ValueOrDecodingError]
}

What that gives us in practice is that parsing doesn’t stop when there’s an error in our JSON. In this example, when we parse this menu, if the JSON array contains 4 objects, but one of those isn’t actually a button, we still get the other buttons.

The array will contain 3 buttons, and 1 error. We can then decide what to do with that error. In the case of Primephonic, we mostly log the error and skip the item, still showing the rest.

In practice, this has saved us a couple of times, when someone made a mistake that wasn’t caught in testing.

Library: https://github.com/tomlokhorst/Statham

Offline page

The offline experience of the app shows a list of downloaded albums

Finally, one interesting aspect of this Server Driven UI, is that we can even use this for our offline experience.

When you’re online, the app also downloads one special page, that contains a component screen for the offline version of the app.

Then, when you’re offline, you can use that version to browse around your offline downloaded music.

When is Server Driven UI not appropriate?

For the past year, I haven’t worked on Primephonic. So I asked the one of the current developers what their experiences were with Server Driven UI.

And one example that came up was the Offline experience.

Quote about how JSON for offline experience became too large

He said the single JSON response for the offline version became too large. While the initial development went very fast, because of the reuse of components.

They’ve since switched back to a more traditional API.

For the offline page we needed to send the JSON for everything, not just the first page, but any page you might navigate to. And it turned out that some users had 20 megabyte big offline JSONs that needed to be synced all the time.

So this is an example were the Server Driven UI with big JSON responses didn’t work so well.

Music player & Subscription flow

The music player and the subscription flow in Primephonic are implemented without Server Driven UI

Two other things in this app that weren’t implemented using Server Driven UI are the music player itself and the subscription flow. These are both deep native OS integrations.

When you’re playing music, it needs to integrate with the Now Playing screen on your iPhone, it needs to respond to remote control commands from your AirPods. And you can also control playback from your Apple Watch or CarPlay.

The subscription flow is an In-App Purchase, so that’s also very specific logic.

The Rijksmuseum app

Screenshots of the Rijksmuseum iPhone app

And finally, here’s an example of a completely different app, as an example of when not to use Server Driven UI.

This is another app I’ve also worked on, the Rijksmuseum app.
It allows you to walk tours in the museum where you can listen to audio descriptions of artworks.

In this app almost all the logic happens in the client. One big JSON file is download that contains all the information about tours. And the client does all sorts of Bluetooth logic to show you what room you’re in.

For this type of app, a Server Driven UI isn’t appropriate, even though there is a server that maintains all the data about the tours in the museum.

Summary

Server Driven UI can be a great setup for certain types of applications (or parts of applications). If you have a lot of domain objects, but ultimately a relatively simple user interface, an approach where you deal with all the domain complexity on the server can be quite helpful.

I think there are two extremes; On one side you sent complete domain objects to the client, on the other side, you send only UI instructions to the client (think of HTML, or more extreme an image or PDF). There’s probably a gradient between these two extremes, and Server Driven UI lies somewhere in the middle.

With Primephonic, we found that the browsing experience through a rich database of classical music could very well be implemented using Server Driven UI. On the other hand, the actual Music Player was best implemented fully in the native client app.

Interested in Classical Music? Check out Primephonic!

One thought on “Server Driven UI”

  1. Fantastic Article and Video! Looking into implement server driven UI for an app, but it will only be needed for sending UI instructions to the client, because actual files may be too big. I want to design the app to be highly customizable such that from the user’s backend, a user can easily specify UI components hence features he wishes to permit a collaborator (authorized) to have access to.

Leave a Reply

Your email address will not be published. Required fields are marked *