Brave New World: Safari Content Blocking


By Andrey Meshkov, CEO and CTO at AdGuard

  • Content blocking is not a priority for Apple and WebKit.
  • Content blocking in Safari is possible despite all its issues and limitations.
  • If we want to improve it, we need to contribute to WebKit ourselves.

This article is about content blocking on Apple platforms, mainly iOS. Why is it important to talk about Apple? First of all, it’s Apple, and it enjoys a large enough market share that many users will be affected by its content blocking capabilities (or lack thereof). Secondly, Manifest v3 is coming to Chromium, and half of the tech problems in Chromium have been solved, unlike Safari. There are a lot of similarities between the two, so we’ve been able to draw some conclusions about where Safari is falling behind. In this article, we’ll go over the content blocking methods available on iOS, and see how to get around the limitations when possible.

Content blocking in general: System-wide filtering

There are only two options for content blocking: System-wide filtering and Safari Content Blocking. System-wide filtering is not as widespread as Safari Content Blocking for a number of reasons. However, it’s the only way you can go beyond Safari and do content blocking in other apps and browsers. Furthermore, System-wide filtering actually was possible even before Safari Content Blocking was introduced in 2015. One of the first content blockers on the App Store, in fact, was quite a popular app called WeBlock, which did system-wide filtering.

All System-wide filtering methods are based on NEVPNManager API. Using a local tunnel, the app can filter DNS, use a PAC file to block requests, scan SNI, or even intercept TLS. You can have all these in your app, but unfortunately nothing comes without downsides. There are techniques to bypass DNS filtering and PAC files, and there are also some technical limitations. For example, there’s a strict memory limit that iOS imposes on VPN tunnel processes, and it will kill any process that uses over 15MB RAM.

The App Store may not be consistent with Apple’s rules

The App Store Guidelines, Section 5.4, VPN Apps, states: “Parental control, content blocking, and security apps, among others, from approved providers may also use the NEVPNManager API.” Вut still, there are no guarantees that your app will be allowed on the App Store.

We at AdGuard have a sad history with the App Store. Everything was great back in 2015 when we launched the app, but then in 2018, Apple suddenly decided to ban all apps that did system-wide filtering. We even had to discontinue our AdGuard Pro app after that. Then after a year or so, they changed their decision again and the guidelines now contain an exemption specifically for parental control, content blocking and security apps. So we were back in business, the app was approved, and we started working on a major update, new features, and other cool stuff. In the beginning of 2020, we uploaded a major update and it was rejected again with pretty much the same wording as they had used two years before. The reviewer told me over the phone that it wasn’t his decision; they had gathered a committee that decided that they didn’t want to have a system-wide filtering app on the App Store. So in order to pass the review, we had to make some rather drastic changes to the app, go through the App Store appeal process and review board, and only then was it approved. At the same time, I see multiple apps that do very similar things to the ones that we weren’t allowed to, and nothing happens to them. This shows that an app may pass the review process, but some time later, another committee may kick the app out of the App Store—or it might never happen.

The Safari Content Blocking API has issues and limitations…

In contrast to system-wide filtering, there’s no controversy about Safari Content Blocking: it’s definitely allowed, and it’s safe to make an app that does it—but nothing good comes without complications, so let’s see the issues and limitations of this API. Fortunately some of them can be solved; maybe not fully, but to an extent.

Safari Content Blocking comes with no debugging tools for debugging content blocking. The only tool that’s available is the browser Console, where you can see which requests were blocked, but from the Console output it’s impossible to understand what rule is blocking those requests. Figuring it out can be an annoying, time-consuming process.

AdGuard, EasyList and uBlock filters are based on the original Adblock Plus “core” syntax. It has since been extended, but the “core” part of it is the same among all popular content blockers. Safari Content Blocking rules have nothing in common with this syntax, which is a problem because we don’t want to create special Safari-only filter lists. Also, Safari just doesn’t provide tools for that. What we want is to use the good old traditional filter lists like AdGuard and EasyList. For now, we’re using a real-time approach right on the device to automatically convert our rules into Safari Content Blocking rules for the AdGuard apps. This way we can convert about 90% of all Easylist & AdGuard filters so they’ll work on iOS.

…And slow compiling…

This point is actually pretty massive, because it’s the reason for some other limitations. Safari compiles every content blocker’s JSON file into a “prefix tree,” and the process is quite slow. For example, it takes over two seconds on a new MacBook Pro to compile a JSON with just a little over 30K rules.

Compared to content blockers on other platforms, it takes less than a second for the AdGuard Android app to parse and compile a list with over 100K rules. The obvious difference, though, is that our Android app uses a different syntax which is not as complicated as regular expressions; perhaps it’s not that flexible, but it’s specifically optimized for matching URLs.

It’s easy to explain the next limitation. A single content blocker cannot contain more than 50K rules, and that’s a hard-coded limit. We contacted the developers of WebKit (the browser engine behind Safari), and they told us that the main reason for this limitation is how slow the compiling process is. They may increase it a little bit because new devices are faster, but that won’t magically solve all our problems. There’s no room for a substantial improvement as long as the rules are based on using regular expressions. This limitation itself is a major problem. AdGuard Base filters + EasyList have 100K rules in total and simply do not fit within the limit.

There are a couple of things to do in order to solve this issue. We can convert our rules to Safari Content Blocking rules now, but we also need some more modifications to make the resulting list as short as possible. One of the things we do is combine similar element-hiding rules into a single rule. This helps a lot, but it’s still not enough. Another thing that we do is remove obsolete or rarely used rules from the filter lists that we use in Safari. So in order to solve this sort of issue, filter list maintainers can use special “hints” to exempt rules from the “optimization” process.

But that’s not all. Now, we come to the issue of multiple content blockers.
AdGuard registers SIX content blockers for Safari, and the user is supposed to enable them all. So, does six content blockers actually mean that the limit is now 6 x 50K = 300K rules? Yes and no; it’s just not that simple. The problem is that these content blockers are completely independent, and the rules in them can’t influence each other. If one content blocker decides that a URL should be blocked, the other ones can’t undo that decision. Or, if one content blocker decides that some page element should be blocked, it will be blocked; the others can do nothing about it. But that’s not how it works in real life on other platforms. Different filter lists are supposed to interact with each other; a good example is EasyList supplementary language-specific lists: they may fix issues on some local websites.

…And slow development

This is basically the full list of changes implemented in Safari Content Blocking:

  • 2015 – Safari Content Blocking is implemented
  • 2016 – Added one new feature (make-https) and a couple of major bugs were fixed
  • 2017 – Added one more new feature (if-top-url) which is pretty useless, if you ask me, added content blockers to WKWebView, and fixed a couple of bugs

Then it drastically slows down…

  • 2018 – fixed a couple of bugs, refactoring
  • 2019 – fixed a couple of bugs
  • 2020 – no significant changes so far

This year, we and Cliqz, Brave, Adblock Plus, and some other developers wrote an open letter and compiled a list of the most pressing issues. Regardless of the severity of those issues, it doesn’t mean that the WebKit developers are undermining content blockers. To us, it just seems like it’s not a priority for them, or maybe they have limited resources or both.

Do it yourself!

Regardless of the reasons behind WebKit’s laxness, it seems the only option we have is to do it ourselves since content blocking remains a priority to us. WebKit is open source and they are open to contributions, so that seems like a good way forward. We may want to start with a proposal or a detailed specification of the changes we would like to implement in WebKit and see if it gets approved. I hope it does, and then we can implement it ourselves.

About the Author

Andrey Meshkov is a co-founder and CTO of AdGuard adblocker. He’s been working in IT for over 15 years and has accumulated tons of experience not just in his primary work area, but also in related ones, such as online privacy concerns. Sometimes the urge to share his thoughts becomes too unbearable and he takes a break from coding to writing an article or two.

First Name can be reached online at (https://twitter.com/ay_meshkov/) and at our company website http://www.mycompany.com/





Source link