Investigating CloudKit Sync in the Apple News App
Syncing data across a user’s own devices in a privacy-sensitive way is one of the flagship features of CloudKit. Unfortunately many agree “it isn’t as documented as it might be”, e.g. there is no sync sample from Apple, the WWDC 2014 talk and WWDC 2015 talk didn’t quite get to the topic although did offer some vague hints, in WWDC 2016 talk the presenters gave lots of unfinished sentences possibly due to lack of time, e.g. <sic>you have to do it this way “for a variety of reasons.” Looking at the CloudKit framework headers none of the classes have sync in their name, or even have the word “sync” mentioned. Slightly better is that more recently the class doc for CKFetchRecordChangesOperation has been updated to mention this is the one to use for synchronising, however having information spread across multiple places is really hard work. Furthermore, fully understanding CloudKit behaviour requires knowledge of other things like NSOperation quality of service and NSURLSession discretionary. To add to the confusion, the open-source projects out there are all attempting to achieve sync in different ways, e.g. some are attempting to sync with the public database which is missing the required features, or they might use a periodic sync rather than realtime which CloudKit was designed for using push notifications, and zones are a really stumbling block. These common mistakes is clearly a result of lack of good samples and documentation, and in particular lack of some absolutely vital information, for example the CKFetchRecordZoneChangesOperation. I happened come across a Stack Overflow answer by a CloudKit engineer who shared that not all the changes are returned, they are coalesced to remove unnecessary ones, e.g. if a record was added and then deleted since the previous request (tracked by a token) it isn’t included. This was quite eye-opening because it shows the server is a lot smarter than expected, and knowing this now yes this could be used for an efficient sync. What also helped me is, as we see later, they do hit a HTTP endpoint that is named sync, which helped confirm this must be the right path. I think Apple could have put the words sync in the class name or at the very least put in the header, use this class for syncing!
In a situation like this where there is much ambiguity it is useful to look at how Apple do things, for some ground truth, and they are using CloudKit sync in the News App so lets take an in-depth look at that and using two devices for testing, an iPod Touch and iPhone 6s both on 10.1.1.
News Article Download
When the app starts up it shows news articles, since all users can see these articles we would expect them to be in the public database. We will use a web proxy to analyse the requests, this won’t give us exact detail of how the framework classes are being configured, but it will give us an idea of the general algorithm. We will be using the 6s to monitor the requests.
As we can see it performs a query to the container com.apple.news.public which is the public database. The record contains and articleID, title, thumbnail, contentURL etc. so the full article is downloaded in separate requests that go directly to the news organisations own servers. The thumbnail is a URL to an image on the icloud-content server you can see in the screenshot. The query used contains minOrder and maxOrder which are integers so would suggest they are doing a query to get new articles, this isn’t a sync its usually called a delta download, i.e. only downloads the new information. The articles have an order field like an auto-inc, or sequence number, newer articles have a higher number, which is better than using timestamps where 2 articles might have the same time. This is possible because only Apple is the one inserting records. This kind of delta download is a great bandwidth saver however it only allows only new articles to be downloaded which has a limitation you can see in the below screenshots.
As you can see the first article had its title changed between 4h ago and 13h ago. The limitation of their design is it doesn’t allow old articles to be updated, which doesn’t fit well with the news industry where headlines can be changed frequently as a result of errors or updated information. This would require existing downloaded articles to be updated, perhaps even deleted. This isn’t possible with the public database because it doesn’t have the required features to support a full sync. The alternative design would be to clear the cache and re-download all the articles every time, which would ensure the user is seeing the latest list, however that may have higher bandwidth requirements. Apple must have run the numbers for their number of records and data involved and decided an append only delta download was the way to go. There is another interesting usability feature here, as you can see we are on the history list, if a user is browsing the list to find a previously read article it certainly would make it harder to find if the title was changed. Now it becomes a very interesting problem, because you have a trade off between what is technically optimal with what is best for users.
Bookmark & Reading List Sync
Now that we have covered how articles are downloaded lets now look at the features we are really interested in, how it performs the sync between devices. The News app has two features that are synced, bookmarks and history. The history view has already been shown in the above screenshots. An item is added to the history after an article is viewed and the user has scrolled down a bit, or maybe spends some time in the article, or perhaps a combination of both. Articles can be bookmarked when viewing them, by tapping the bookmark icon on the bottom right. Now the really interesting part, if you have two devices side-by-side these two sets of data are updated almost instantly (~5 seconds) when changes happen. For example bookmark an article on one device and it appears in the bookmark list on the other device, un-bookmark the article and it disappears from the other devices list. So there we have the feature we are looking for, a full sync between devices so lets see how it is implemented. We will tackle the push part later, we’ll focus on the News App’s requests just now though.
So we have the 6s connected to the proxy and open at the saved articles page. On the iPod touch we open an article and bookmark it. 5 seconds later this happens:
As we can see in the first request it hits an end point called “zone/sync” to a container called com.apple.news.private. So now we know they are using both public and private databases for this app, which is interesting to me because obviously articles need to be referenced, and we were told many times CKReference doesn’t work across zones or databases, all they’ve done here is simply make the articleID a string field rather than a reference, I suppose they aren’t bothered about referential integrity. Next fortunately for us a familiar looking class name is included in the request, looks like a CKFetchDatabaseChangesOperation (the one in the log has prefix CKD which is because the actual request goes through the CloudKit daemon, so its like an RPC or remote class). On the response tab (not shown in screenshot) we see ReadingList and ReadingHistory which are the name of the two zones that have changed. Lets take a look at what happens next:
These next two requests are to the endpoint “record/sync” again to the private container and again we can see the class looks like the familiar CKFetchRecordZoneChangesOperation (Note. pre iOS 10 the class was CKFetchRecordChangesOperation). This first sync request contains the zone name ReadingList and as expected the second contains ReadingHistory. From more testing we see that the record/sync is only requested if the zone name is contained in the zone/sync.
Another feature is these lists are already up-to-date when the app is re-opened. If the app is killed and restarted then it already has the previous data which sparks my interest to see how they are achieving caching, but for now we will focus on what technique they are using for updating e.g. update on coming to foreground, background fetch or push notification. So to find out we can test this with the 6s connected to the proxy just on the homescreen, and then on the iPod Touch using news to read and bookmark an article. We’ll use the proxy and also the new Sierra Console so we can gain an insight into what the 6s is doing.
A push notification! And it has the content-available flag this shows they are using silent push, implemented using a CKNotificationInfo with shouldSendContentAvailable set. We also also see the private container and the zid which is the short version of Zone Name. Apple shorten the json key names in pushes because packet size is limited. Because the push contains the zoneID this would suggest they are using CKRecordZoneSubscription. Now lets see what the proxy logged:
Looks like the exact same requests as when the app is in the foreground. Finding out what zones changed and then what records within them changed. In fact the same push is used when the app is in the foreground, so now we know how the info is kept up-to-date. I don’t know about you but what crosses my mind is if multiple pushes are received do they coalesce the fetch changes requests? I might investigate that later on.
Next, I noticed there is also a pull-to-refresh enabled on the bookmark and history table views. This was perhaps implemented as a fall-back in case for some reason the push notification doesn’t arrive so it allows the user to force a refresh of new data. Or maybe they are using the feature to clear the cache and re-download all the bookmarks to clear up any inconsistencies? Lets do a pull-to-refresh on the 6s and see what the proxy shows:
We see only a CKFetchRecordZoneChanges this time, no record download. Scrolling down the request shows it is for the ReadingList zone name. Similarly if we pull-to-refresh on the History tab we see the same request but for the ReadingHistory zone name. This proves they are only using this feature to do a sync, so as a replacement for a missing push notification, rather than a full clear-cache and re-download everything.
Now it wouldn’t be a proper sync without caching, this allows the app to be killed and restarted and still show the previous info. So lets see how they achieved that. To that we will use a jailbroken iPod Touch on iOS 9.3.3 so there is a chance that the caching has changed on iOS 10 but hopefully this is still interesting. We will connect over SSH and browse the file system to find where the News app stores its data.
The data is in /private/var/mobile/Container/Data/Application/ContainerID which was found by a process of elimination where first the mobile user’s Library folder was browsed and when nothing was found there I looked to the containers. I think this is a relatively new concept of storing platform (or built-in) app data in the container folder. Since container folders have a UUID it can be tricky to find the right one, how I achieve it is modify something in the app, like save an article to the reading list, and then sort all the containers by date. So in the screenshot above we have found the folder and we can see the private data in a folder, where i have highlighted the reading-list file, there is also a reading-list-commands file, then we also see a CloudKit folder with an sqlite database named Records.db. Lets look at these files in editors, beginning with “reading-list”.
Looks like a binary file made with NSKeyedArchiver and contains the list of articleIDs and date added. The file named “reading-list-commands” just looks like an array of article IDs stored in binary, my guess would be user actions that need sent to the server are cached in here. At first glance flat files looks really, really bad, it means all the data is stored in memory and written out whenever it changes. Its possible they chose flat files rather than a database since fetch changes might error CKServerTokenExpired, which documentation says to toss the cache and start with a fresh download using a nil token, so in that case it does make sense to just delete a file rather than empty and rebuild a database, it would be good to know how common this scenario is tho, Apple definitely to need provide more information to what at moment is very black-box like. To aid my development, I have asked on Stack Overflow if there is a way to simulate CKServerTokenExpired. The best alternative to flat files for the model is CoreData, with automatic UI updates and table sorting, so they would need a very good reason for us to give all that up and use flat files, or maybe they just didn’t have time?
So now we know they are using flat files for all the private syncing which is very interesting given other developers have attempted to sync to Core Data. Lets see what secrets are hiding in the Records.db by opening it in a SQLite editor.
No Core Data here either! They way to tell is lots of capitalised “Z_” prefixed tables and fields, so here they are using sqlite directly. Also we notice the recordID is being encoded with a colon seperator, e.g. recordName:zone:owner, which is interesting because I’ve seen other developers attempt to encode all the properties of the recordID in different ways, some even storing different zones in different databases. The owner might be the creatorUserRecordID.recordName (or maybe modifier) because usually it is __defaultOwner__ when looking at that record name of records your own account creates, rather than being your own user UUID. This table even contains the containerIdentifier which you would think would be redundant information, since the app knows what containers it contacts so this reminded me the last time I looked at the CloudKit headers I did see Sqlite mentioned, lets open that now.
In the file list on the left we see a CKSQLite class which is an Obj-C wrapper around the sqlite library. Opening CKRecordID we see it has methods sqliteRepresentation and initWithSqliteRepresentation likely for the colon separator parsing. Finally, by searching for what class is using CKSQLite we find a large class CKPackage (pictured), which looks like it is responsible for the database we saw and caching all the records. How it is actually used isn’t clear, like if it is loosely-coupled to the operations, in that they decide which records get cached, or if it is tightly-coupled in that the operations automatically are caching records. That would require more investigation but it could suggest caching features are coming to CloudKit APIs of the future. But at least we know now they are using flat files for sync of uploads and downloads of private records, and sqlite for caching public records. This is sad in a way that we didn’t see any core data database perhaps with a sync status flag we could have looked to for inspiration, but also reminds us not to over think a problem and using tried-and-tested techniques of saving files is still fine.
There we have it, it’s been a long journey but we have learned a lot. Now we know the two classes we should be focussing on to peform a sync are the CKFetchDatabaseChangesOperation and CKFetchRecordZoneChangesOperation. Also we know for realtime sync we should be using silent push notifications via CKRecordZoneSubscription and CKNotificationInfo with shouldSendContentAvailable, and upon push received we should first sync database changes and then only the record zone changes for the zones included. We also know we should implement a manual refresh just in case push notifications don’t arrive. We learned about using flat files for caching private database sync, and how CKRecords could be encoded in sqlite database fields. Did we learn anything else? Probably! In the future, we might see if they coalesce sync requests originating from multiple pushes. Another question that came to mind is if if they re-download records to the same device that created it, which is a common dilemma in sync solutions. I hope this post has helped and now you are starting off on the right foot for building the perfect multi-device sync app!
There is one final thing I’d like to mention concerning a new feature of the iOS 10 API. They added the ability for the sync classes to repeat themselves to get all data, via the fetchAllChanges properties. This is great news since the big developer complaints with the CloudKit API is it was very complicated to make the required repeat requests. The strange thing is they only added it to the classes involved in syncing, not to CKQueryOperation for example. On the one hand this shows Apple’s focus with CloudKit is towards improving syncing, which is great, but it also is bad in that they got the API wrong the first time, and have subsequently had to rename classes ( CKFetchRecordChangesOperation -> CKFetchRecordZoneChangesOperation) and add essential properties like fetchAllChanges. It also seems rushed compared to normal, like a block on CKFetchRecordZoneChangesOperation is named recordZoneChangeTokensUpdatedBlock that pluralisation just seems strange to me, i.e. a zone only has one token, and stands out as inconsistent with the other names used. This is the kind of thing that should get cleaned up as they iterate over the API design, maybe Scott Forstall’s perfectionist strategy of iterating API designs ten times before release is no longer being implemented.