Author:
Source
Good morning, hackers. Today I’d like to pick up my series on mobile
application development. To recap, we looked at:
-
Ionic/Capacitor, which makes mobile app development more like web
app development; -
React Native, a flavor of React that renders to
platform-native UI components rather than the Web, with ahead-of-time
compilation of JavaScript; -
NativeScript, which exposes all platform capabilities directly to
JavaScript and lets users layer their preferred framework on top; -
Flutter, which bypasses the platform’s native UI components to
render directly using the GPU, and uses Dart instead of
JavaScript/TypeScript; and -
Ark, which is Flutter-like in its rendering, but programmed via a
dialect of TypeScript, with its own multi-tier compilation
and distribution pipeline.
Taking a step back, with the exception of Ark which has a special
relationship to HarmonyOS and Huawei, these frameworks are all layers on
top of what is provided by Android or iOS. Why would you do that?
Presumably there are benefits to these interstitial layers; what are
they?
Probably the most basic answer is that an app framework layer offers the
promise of abstracting over the different platforms. This way you can
just have one mobile application development team instead of two or
more. In practice you still need to test on iOS and Android at least,
but this is cheaper than having fully separate Android and iOS teams.
Given that we are abstracting over platforms, it is natural also to
abandon platform-specific languages like Swift or Kotlin. This is the
moment in the strategic planning process that unleashes chaos: there is
a fundamental element of randomness and risk when choosing a programming
language and its community. Languages exist on a hype and adoption
cycle; ideally you want to catch one on its way up, and you want it to
remain popular over the life of your platform (10 years or so). This is
not an easy thing to do and it’s quite possible to bet on the wrong
horse. However the communities around popular languages also bring
their own risks, in that they have fashions that change over time, and
you might have to adapt your platform to the language as fashions come and
go, whether or not these fashions actually make better apps.
Choosing JavaScript as your language places more emphasis on the
benefits of popularity, and is in turn a promise to adapt to ongoing
fads. Choosing a more niche language like Dart places more emphasis on
predictability of where the language will go, and ability to shape the
language’s future; Flutter is a big fish in a small pond.
There are other language choices, though; if you are building your own
thing, you can choose any direction you like. What if you used Rust?
What if you doubled down on WebAssembly, somehow? In some ways we’ll
never know unless we go down one of these paths; one has to pick a
direction and stick to it for long enough to ship something, and endless
tergiversations on such basic questions as language are not helpful.
But in the early phases of platform design, all is open, and it would be
prudent to spend some time thinking about what it might look like in one
of these alternate worlds. In that spirit, let us explore these futures
to see how they might be.
alternate world: rust
The arc of history bends away from C and C++ and towards Rust. Given
that a mobile development platform has to have some low-level code,
there are arguments in favor of writing it in Rust already instead of
choosing to migrate in the future.
One advantage of Rust is that programs written in it generally have
fewer memory-safety bugs than their C and C++ counterparts, which is
important in the context of smart phones that handle untrusted
third-party data and programs, i.e., web sites.
Also, Rust makes it easy to write parallel programs. For the same
implementation effort, we can expect Rust programs to make more
efficient use of the hardware than C++ programs.
And relative to JavaScript et al, Rust also has the advantage of
predictable performance: it requires quite a good ahead-of-time
compiler, but no adaptive optimization at run-time.
These observations are just conversation-starters, though, and when it
comes to imagining what a real mobile device would look like with a Rust
application development framework, things get more complicated.
Firstly, there is the approach to UI: how do you get pixels on the
screen and events from the user? The three general solutions are to use
a web browser engine, to use platform-native widgets, or to build
everything in Rust using low-level graphics primitives.
The first approach is taken by the
Tauri
framework: an app is broken into two pieces, a Rust server and an
HTML/JS/CSS front-end. Running a Tauri app creates a WebView in which
to run the front-end, and establishes a bridge between the web client
and the Rust server. In many ways the resulting system ends up looking
a lot like Ionic/Capacitor, and many of the UI questions are left open
to the user: what UI framework to use, all of the JavaScript
programming, and so on.
Instead of using a platform’s WebView library, a Rust app could instead
ship a WebView. This would of course make the application binary size
larger, but tighter coupling between the app and the WebView may allow
you to run the UI logic from Rust itself instead of having a large JS
component. Notably this would be an interesting opportunity to adopt
the Servo web engine, which is itself written in
Rust. Servo is a project that in many ways exists in potentia; with
more investment it could become a viable alternative to Gecko, Blink, or
WebKit, and whoever does the investment would then be in a position of
influence in the web platform.
If we look towards the platform-native side, though there are quite a
number of Rust libraries that provide wrappers to native
widgets, practically all of these
primarily target the desktop. Only
cacao supports iOS widgets, and
there is no equivalent binding for Android, so any
NativeScript-like
solution in Rust would require a significant amount of work.
In contrast, the ecosystem of Rust UI libraries that are implemented on
top of OpenGL and other low-level graphics facilities is much more
active and interesting. Probably the best recent overview of this
landscape is by Raph
Levien,
(see the “quick tour of existing architectures” subsection). In
summary, everything is still in motion and there is no established
consensus as to how to approach the problem of UI development, but there
are many interesting experiments in progress. With my engineer hat on,
exploring these directions looks like fun. As Raph notes, some degree
of exploration seems necessary as well: we will only know if a given
approach is a good idea if we spend some time with it.
However if instead we consider the situation from the perspective of
someone building a mobile application development framework, Rust seems
more of a mid/long-term strategy than a concrete short-term option.
Sure, build low-level libraries in Rust, to the extent possible, but
there is no compelling-in-and-of-itself story yet that you can sell to
potential UI developers, because everything is still so undecided.
Finally, let us consider the question of scripting: sometimes you need
to add logic to a program at run-time. It could be because actually
most of your app is dynamic and comes from the network; in that case
your app is like a little virtual machine. If your app development
framework is written in JavaScript, like Ionic/Capacitor, then you have
a natural solution: just serve JavaScript. But if your app is written
in Rust, what do you do? Waiting until the app store pushes a new
version of the app to the user is not an option.
There would appear to be three common solutions to this problem. One is
to use JavaScript — that’s what Servo does, for example. As a web
engine, Servo doesn’t have much of a choice, but the point stands.
Currently Servo embeds a copy of SpiderMonkey, the JS engine from
Firefox, and it does make sense for Servo to take advantage of an
industrial, complete JS engine. Of course, SpiderMonkey is written in
C++; if there were a JS engine written in Rust, probably Rust
programmers would prefer it. Also it would be fun to write, or rather,
fun to start writing; reaching the level of ECMA-262 conformance of
SpiderMonkey is at least a hundred-million-dollar project. Anyway what
I am saying is that I understand why Boa was
started, and I wish them the many millions of dollars needed to see it
through to completion.
You are not obliged to script your app via JavaScript, of course; there
are many languages out there that have “extending a low-level core” as
one of their core use cases. I think the mitigated success that this
approach has had over the years—who embeds Python into an iPhone
app?—should probably rule out this strategy as a core part of an
application development framework. Still, I should mention one
Rust-specific option, Rhai; the pitch is that by
being Rust-specific, you get more expressive interoperation between Rhai
and Rust than you would between Rust and any other dynamic language.
Still, it is not a solution that I would bet on: Rhai internalizes so
many Rust concepts (notably around borrowing and lifetimes) that I think
you have to know Rust to write effective Rhai, and knowing both is quite
rare. Anyone who writes Rhai would probably rather be writing Rust, and
that’s not a good equilibrium.
The third option for scripting Rust is WebAssembly. We’ll get to that
in a minute.
alternate world: the web of pixels
Let’s return to Flutter for a moment, if you will. Like the more active
Rust GUI development projects, Flutter is an all-in-one rendering
framework based on low-level primitives; all it needs is Vulkan or Metal
or (soon) WebGPU, and it handles the rest, layering on opinionated
patterns for how to build user interfaces. It didn’t arrive to this
state in a day, though. To hear Eric Seidel tell the
story, Flutter began as a
kind of “reset” for the Web, a conscious attempt to determine from the
pieces that compose the Web rendering stack, which ones enable smooth
user interfaces and which ones get in the way. After taking away all of
the parts they didn’t need, Flutter wasn’t left with much: just GPU
texture layers, a low-level drawing toolkit, and the necessary bindings
to input events. Of course what the application programmer sees is much
more high-level, but underneath, these are the platform primitives that
Flutter uses.
So, imagine you work at Google. You used to work on the web—maybe on
WebKit and then Chrome like Eric, maybe on web standards—but you broke
with this past to see what Flutter might become. Flutter works: great
job everybody! The set of graphical and input primitives that you use
is minimal enough that it is abstract by nature; it doesn’t much matter
whether you target iOS or Android, because the primitives will be there.
But the web is still the web, and it is annoying, aesthetically
speaking. Could we Flutter-ize the web? What would that mean?
That’s exactly what former HTML specification editor and now Flutter
team member Ian Hixie proposed this January in a brief manifesto,
Towards a modern Web
stack.
The basic idea is that the web and thus the browser is, well, a bit
much. Hixie proposed to start over, rebuilding the web on top of
WebAssembly (for code),
WebGPU (for
graphics),
WebHID
(for input), and ARIA (for
accessibility). Technically it’s a very interesting proposition! After
all, people that build complex web apps end up having to fight with the
platform to get the results they want; if we can reorient them to focus
on these primitives, perhaps web apps can compete better with native
apps.
However if you game out what is being proposed, I have doubts. The
existing web is largely HTML, with JavaScript and CSS as add-ons: a web
of structured text. Hixie’s flutterized web proposal, on the other
hand, is a web of pixels. This has a number of implications. One is
that each app has to ship its own text renderer and internationalization
tables, which is a bit silly to say the least. And whereas we take it
for granted that we can mouse over a web page and select its text, with
a web of pixels it is much less obvious how that would happen. Hixie’s
proposal is that apps expose structure via
ARIA, but as far as I understand there is
no association between pixels and ARIA properties: the pixels themselves
really have no built-in structure to speak of.
And of course unlike in the web of structured text, in a web of pixels
it would be up each app to actually describe its structure via ARIA:
it’s not a built-in part of the system. But if you combine this with
the rendering story (here’s WebGPU, now draw the rest of the
owl),
Hixie’s proposal leaves a void for frameworks to fill between what the
app developer wants to write (e.g. Flutter/Dart) and the platform
(WebGPU/ARIA/etc).
I said before that I had doubts and indeed I have doubts about my
doubts. I am old enough to remember when X11 apps on Unix
desktops changed from having fonts
rendered on the server (i.e. by the operating system) to having them
rendered on the client (i.e. the app), which was associated with a
similar kind of anxiety. There were similar factors at play:
slow-moving standards (X11) and not knowing at build-time what the
platform would actually provide (which X server would be in use, etc).
But instead of using the server, you could just ship pixels, and that’s
how GNOME got good text rendering, with
Pango and
FreeType and
fontconfig, and
eventually HarfBuzz, the text
shaper used in Chromium and Flutter and many other places. Client-side
fonts not only enabled more complex text shaping but also eliminated
some round-trips for text measurement during UI layout, which is a bit
of a theme in this article series. So could it be that pixels instead
of text does not represent an apocalypse for the web? I don’t know.
Incidentally I cannot move on from this point without pointing out
another narrative thread, which is that of continued human effort over
time. Raph Levien, who I mentioned above as a
Rust UI toolkit developer, actually spent quite some time doing graphics
for GNOME in the early 2000s; I remember working with his libart_lgpl.
Behdad Esfahbod, author of HarfBuzz, built many
parts of the free software text rendering stack before moving on to
Chrome and many other things. I think that if you work on this low
level where you are constantly translating text to textures, the
accessibility and interaction benefits of using a platform-provided text
library start to fade: you are the boss of text around here and you can
implement the needed functionality yourself. From this perspective,
pixels don’t represent risk at all. In the old days of GNOME 2,
client-side font rendering didn’t lead to bad UI or poor accessibility.
To be fair, there were other factors pushing to keep work in a commons,
as the actual text rendering libraries still tended to be shipped with
the operating system as shared libraries. Would similar factors prevail
in a statically-linked web of pixels?
In a way it’s a moot question for us, because in this series we are
focussing on native app development. So, if you ship a platform, should
your app development framework look like the web-of-pixels proposal, or
something else? To me it is clear that as a platform, you need more.
You need a common development story for how to build user-facing apps:
something that looks more like Flutter and less like the primitives that
Flutter uses. Though you surely will include a web-of-pixels-like
low-level layer, because you need it yourself, probably you should also
ship shared text rendering libraries, to reduce the install size for
each individual app.
And of course, having text as part of the system has the side benefit of
making it easier to get users to install OS-level security patches: it
is well-known in the industry that users will make time for the update
if they get a new goose emoji in exchange.
alternate world: webassembly
Hark! Have you heard the good word? Have you accepted your Lord and
savior, WebAssembly, into your heart? I
jest; it does sometime feel like messianic narratives surrounding
WebAssembly prevent us from considering its concrete aspects. But
despite the hype, WebAssembly is clearly a technology that will be a
part of the future of computing. So let’s dive in: what would it mean
for a mobile app development platform to embrace WebAssembly?
Before answering that question, a brief summary of what WebAssembly is.
WebAssembly 1.0 is portable bytecode format that is a good compilation
target for C, C++, and Rust. These languages have good compiler
toolchains that can produce WebAssembly. The nice thing is that when
you instantiate a WebAssembly module, it is completely isolated from its
host: it can’t harm the host (approximately speaking). All points of
interoperation with the host are via copying data into memory owned by
the WebAssembly guest; the compiler toolchains abstract over these
copies, allowing a Rust-compiled-to-native host to call into a
Rust-compiled-to-WebAssembly module using idiomatic Rust code.
So, WebAssembly 1.0 can be used as a way to script a Rust application.
The guest script can be
interpreted, compiled just
in time, or compiled ahead of time for peak throughput.
Of course, people that would want to script an application probably want
a higher-level language than Rust. In a way, WebAssembly is in a
similar situation as WebGPU in the web-of-pixels proposal: it is a
low-level tool that needs higher-level toolchains and patterns to bridge
the gap between developers and primitives.
Indeed, the web-of-pixels proposal specifies WebAssembly as the compute
primitive. The idea is that you ship your application as a WebAssembly
module, and give that module WebGPU, WebHID, and ARIA capabilities via
imports.
Such a WebAssembly module doesn’t script an existing application: it
is the app. So another way for an app development platform to use
WebAssembly would be like how the web-of-pixels proposes to do it: as an
interchange format and as a low-level abstraction. As in the scripting
case, you can interpret or compile the module. Perhaps an
infrequently-run app would just be interpreted, to save on disk space,
whereas a more heavily-used app would be optimized ahead of time, or
something.
We should mention another interesting benefit of WebAssembly as a
distribution format, which is that it abstracts over the specific
chipset on the user’s device; it’s the device itself that is responsible
for efficiently executing the program, possibly via compilation to
specialized machine code. I understand for example that RISC-V people
are quite happy about this property because it lowers the barrier to
entry for them relative to an ARM monoculture.
WebAssembly does have some limitations, though. One is that if the
throughput of data transfer between guest and host is high, performance
can be bad due to copying overhead. The nascent memory-control
proposal
aims to provide an mmap capability, but it is still early days. The
need to copy would be a limitation for using WebGPU primitives.
More generally, as an abstraction, WebAssembly may not be able to
express programs in the most efficient way for a given host platform.
For example, its SIMD operations work on 128-bit vectors, whereas host
platforms may have much wider vectors. Any current limitation will
recede with time, as WebAssembly gains new features, but every year brings new hardware capabilities (tensor operation accelerator,
anyone?), so there will be some impedance-matching to do for the
foreseeable future.
The more fundamental limitation of the 1.0 version of WebAssembly is
that it’s only a good compilation target for some languages. This is
because some of the fundamental parts of WebAssembly that enable
isolation between host and guest (structured control flow, opaque stack,
no instruction pointer) make it difficult to efficiently implement
languages that need garbage collection, such as Java or Go. The coming
WebAssembly 2.0 starts to address this need by including low-level
managed arrays and records, allowing for reasonable ahead-of-time
compilation of languages like Java. Getting a dynamic language like
JavaScript to compile to efficient WebAssembly can still be a challenge,
though, because many of the just-in-time techniques needed to
efficiently implement these languages will still be missing in
WebAssembly 2.0.
Before moving on to WebAssembly as part of an app development framework,
one other note: currently WebAssembly modules do not compose very well
with each other and with the host, requiring extensive toolchain support
to enable e.g. the use of any data type that’s not a scalar integer or
floating-point value. The component model working
group is trying to
establish some abstractions and associated tooling, but (again!) it is
still early days. Anyone wading into this space needs to be prepared to
get their hands dirty.
To return to the question at hand, an app development framework can use
WebAssembly for scripting, though the problem of how to compose a host
application with a guest script requires good tooling. Or, an app
development framework that exposes a web-of-pixels primitive layer can
support running WebAssembly apps directly, though again, the set of
imports remains to be defined. Either of these two patterns can stick
with WebAssembly 1.0 or also allow for garbage collection in WebAssembly
2.0, aiming to capture mindshare among a broader community of potential
developers, potentially in a wide range of languages.
As a final observation: WebAssembly is ecumenical, in the sense that it
favors no specific church of how to write programs. As a platform,
though, you might prefer a state religion, to avoid wasting internal and
external efforts on redundant or ill-advised development. After all, if
it’s your platform, presumably you know best.
summary
What is to be done?
Probably there are as many answers as people, but since this is my blog,
here are mine:
-
On the shortest time-scale I think that it is entirely reasonable to
base a mobile application development framework on JavaScript. I
would particularly focus on TypeScript, as late error detection is
more annoying in native applications. -
I would to build something that looks like Flutter underneath:
reactive, based on low-level primitives, with a multithreaded
rendering pipeline. Perhaps it makes sense to take some inspiration
from WebF. -
In the medium-term I am sympathetic to Ark’s desire to extend the
language in a more ResultBuilder-like
direction, though this is not without risk. -
Also in the medium-term I think that modifications to TypeScript to
allow for sound typing could provide some of the advantages of
Dart’s ahead-of-time compiler to JavaScript developers. -
In the long term… well we can do all things with unlimited
resources, right? So after solving climate change and homelessness,
it makes sense to invest in frameworks that might be usable 3 or 5
years from now. WebAssembly in particular has a chance of sweeping
across all platforms, and the primitives for the web-of-pixels will
be present everywhere, so if you manage to produce a compelling
application development story targetting those primitives, you could
eat your competitors’ lunch.
Well, friends, that brings this article series to an end; it has been
interesting for me to dive into this space, and if you have read down to
here, I can only think that you are a masochist or that you have also
found it interesting. In either case, you are very welcome. Until next
time, happy hacking.