0019 - Adoption of Web Push
Context and Problem Statement
Millions of users now utilize push notifications for background vault syncs and passwordless login requests. As the platform has grown, so has the need to maintain the current SignalR-based solution that utilizes WebSockets, a system that places significant pressure on cloud components in the event of deployments or other application updates. Combined with the need to also support mobile devices with proprietary push notification protocols, a next-generation and hybridized service offering is needed to simplify and scale up notification delivery.
Modern Infrastructure Management
The notifications service is an independently-deployed component and for the Bitwarden-hosted version is backed by dozens of virtual machines each supporting tens of thousands of concurrent connections. With the move to less manual component maintenance in the cloud -- specifically Kubernetes -- the burden of keeping up the large amount of connections with pods is complex and performance spikes are expected.
Usage of SignalR, even with much larger workloads, has at face value worked well, but the nature of the work being performed and usage of persistent WebSockets connections for certain clients has become tenuous. In aggregate the amount of input and output from the notifications service is quite large and is used almost entirely for background-oriented operations. Various new technologies have come out that are better suited towards synchronization work. The technology or protocol used must be usable at large scale in the cloud as well as for self-hosted installations.
Browser extension changes and mandates such as Manifest V3 additionally present support problems with long-lived background connections.
Cost and Reliability
New solutions must be able to not only scale to essentially infinite connections but balance cost with user growth, all the while delivering high availability. Existing options for mobile notifications specifically either have device limits (e.g. Azure Notification Hubs) or lack service level guarantees. Furthermore, some clients (e.g. F-Droid) cannot utilize well-established (albeit proprietary) push backends. A single service provider is desired for as much functionality as possible, while still offering flexibility for self-host.
With respect to mobile notifications:
- Work around Azure Notification Hubs limits - Since all mobile devices, even for self-hosted installations, must utilize the Bitwarden cloud for push due to certificate security, devise a solution that shards devices across many Notification Hubs within new subscriptions.
- Adopt a new offering for mobile push notifications - Use something other than Azure Notification Hubs that doesn't have limits, perhaps with some technology sacrifices.
With respect to protocol modernization:
- Keep the SignalR solution and continue to scale up - Maintain the cluster of notifications service virtual machines and keep pushing for a larger set to handle scale. Also move all mobile notifications to the custom solution and abandon Azure Notification Hubs.
- Utilize a new approach for non-mobile push notifications - Use something other than SignalR like a homegrown Web Push backend inside the notifications service. Host a compatible Web Push backend for self-host with the clients that need it. Use Web Push with Azure Notification Hubs for the Bitwarden cloud.
- Adopt a new combined push service provider - Not only migrate away from SignalR where possible but also select a different service provider than Azure Notification Hubs that supports native mobile notification protocols. Additionally implement a Web Push backend for self-host as described above.
Chosen options: Work around Azure Notification Hubs limits and utilize a new approach for non-mobile push notifications.
During research and planning of the decision, Azure Notification Hubs unveiled support for the Web Push protocol, therefore significantly adjusting the outcome. By continuing to use Azure Notification Hubs a significant technology investment can continue to be leveraged, users will not need to be migrated, and the focus can instead be on scale and support for new device types that can benefit from Web Push.
- Web Push is well-supported in most places we need it and is a valuable technology upgrade with ease of maintenance in the future.
- Several service providers -- including Azure Notification Hubs -- offer Web Push backends for our cloud offering, and the protocol can be implemented within the notifications service for self-host.
- Web Push fits well into Manifest V3 and service workers.
- Infrastructure maintenance burdens and cost for the notification service should significantly decrease.
- Need to watch Safari support for Web Push which was somewhat recently released at time of writing.
- Potential cost increases for utilizing multiple Azure Notification Hubs.
The notifications service will continue to exist and support APIs for the SignalR connections as well as new ones for Web Push. Clients will connect using Web Push whenever feasible and otherwise utilize the existing SignalR implementation when Web Push cannot be leveraged. Self-host clients will utilize a similar blend of Web Push and SignalR provided by the notification service. Web Push's necessary key exchange and security (VAPID) can use existing in-house technology for self-host and the service provider for the cloud. Clients will largely migrate to Web Push connections over time and the load on SignalR will significantly reduce, although the latter is planned to be supported for certain clients for the foreseeable future.
Mobile devices will continue to use Azure Notification Hubs with native iOS (APNS) and Android (FCM) push notification protocols alongside Web Push. Future support for Unified Push will be considered alongside Web Push for incompatible clients, although the SignalR implementation continues to be available.
Utilizing end-to-end encryption -- with user encryptions keys -- of push notification payloads will be considered while migrating compatible clients to the new provider to provide stronger security of potentially-sensitive payload contents.