Cross-region replication for Vaults

Cross-region replication enables automatic and asynchronous copying of encryption keys between Vaults across OCI regions for reduced latency and increased compliance, security and disaster recovery.

Company

Oracle Corporation

Role

Product Designer

Year

2022 - 2023

What is cross-region replication and why are we offering it?

Oracle Cloud Infrastructure (OCI) Vault is a fully managed key and secret management service that lets organizations control keys and secrets that protect their data. Before this feature was launched, Vault used to be a region-specific service and all it's underlying resources were stored in the same OCI region where the vault is created.

With replication feature for Vault, organizations now have the flexibility to replicate their keys to any OCI geographic region within a realm. CRR is a breakthrough feature as none of our direct competitors provide replication as a user controlled feature within their console UI.

What user problem are we solving?

1️⃣

1️⃣

1️⃣

Compliance

Compliance

Compliance

Our existing region-specific KMS service isn’t enough for our customers to protect their critical workloads that require data redundancy and business continuity across regions to meet compliance needs.

Our existing region-specific KMS service isn’t enough for our customers to protect their critical workloads that require data redundancy and business continuity across regions to meet compliance needs.

2️⃣

2️⃣

2️⃣

Application latency

Application latency

Application latency

Customer data accessed from multiple geographic locations are stored and processed close to users to minimize the latency. Due to the existing limitation of our service, customers cannot store the keys close to the data it protects.

Customer data accessed from multiple geographic locations are stored and processed close to users to minimize the latency. Due to the existing limitation of our service, customers cannot store the keys close to the data it protects.

Customer data accessed from multiple geographic locations are stored and processed close to users to minimize the latency. Due to the existing limitation of our service, customers cannot store the keys close to the data it protects.

3️⃣

3️⃣

3️⃣

Continuous scaling

Continuous scaling

Continuous scaling

Our customers make business decisions based on their cloud capabilities and they need security features that can scale to match these cloud capabilities.

Our customers make business decisions based on their cloud capabilities and they need security features that can scale to match these cloud capabilities.

What do our users have to say?

🗣️

🗣️

🗣️

We use Data Guard to replicate our data for fault tolerance. We need our keys to replicate to the same region as the data.

We use Data Guard to replicate our data for fault tolerance. We need our keys to replicate to the same region as the data.

🗣️

🗣️

🗣️

We have fully contained config pods that run Oracle services. Each pod has a duplicate pod in another region to ensure service continuity.

We have fully contained config pods that run Oracle services. Each pod has a duplicate pod in another region to ensure service continuity.

We have fully contained config pods that run Oracle services. Each pod has a duplicate pod in another region to ensure service continuity.

🗣️

🗣️

🗣️

We have fully contained config pods that run Oracle services. Each pod has a duplicate pod in another region to ensure service continuity.

We have fully contained config pods that run Oracle services. Each pod has a duplicate pod in another region to ensure service continuity.

We have fully contained config pods that run Oracle services. Each pod has a duplicate pod in another region to ensure service continuity.

🗣️

🗣️

🗣️

New regulatory concerns keep arising that require us to rethink and improve our business continuity and recovery protocols.

New regulatory concerns keep arising that require us to rethink and improve our business continuity and recovery protocols.

🗣️

🗣️

🗣️

Without having the keys to protect our data in the same region, we paused our plans to store in data in other regions.

Without having the keys to protect our data in the same region, we paused our plans to store in data in other regions.

Without having the keys to protect our data in the same region, we paused our plans to store in data in other regions.

🗣️

🗣️

🗣️

Having a key management system that scales with our cloud requirements would be helpful, as security is the top priority for us.

Having a key management system that scales with our cloud requirements would be helpful, as security is the top priority for us.

Having a key management system that scales with our cloud requirements would be helpful, as security is the top priority for us.

Initial user flow

After my first kick-off meeting with the stakeholders, I initially proposed a user flow where users could start replicating a vault during the creation of the vault itself. This way users can enable replication right from the creation of the vault and can always edit the destination region later on.

Easy, right? WELL, NO.

Chicken or egg dilemma

We had a push back from developers on this approach. For replication to be enabled for the first time, a policy has to be applied to a vault which gives OCI the permission to perform continuous replication on it. If we enable replication in the vault creation flow, the policy enabling API call would fail since the vault is technically not created yet.

Redefining the entire experience

I went back to the drawing board and worked with the stakeholders to redefine the end-to-end experience.

Instead of just focusing on the replication part, I took a more holistic approach by dividing the flow into three parts: what happens before, during and after the replication activity. This helped me cover all the use cases that could occur within this workflow.

Final user flow

Based on the new approach, we created a new user flow. We dissociated the replication feature from the vault creation which means users now create the vault first and then enable the replication feature from within the vault details. This experience is easier to implement, solves our policy API problem, and maintains the simplicity of the creation flow.

Design principles

👁️

👁️

👁️

Discoverability

Discoverability

We must make it easier for our users to discover and engage with this new feature within our existing experience.

We must make it easier for our users to discover and engage with this new feature within our existing experience.

🏃

🏃

🏃

Minimize distraction

Minimize distraction

We must design an experience that is consistent throughout and reduces the chances of our users getting distracted.

We must design an experience that is consistent throughout and reduces the chances of our users getting distracted.

We must design an experience that is consistent throughout and reduces the chances of our users getting distracted.

💎

💎

💎

Increasing ease of use

Increasing ease of use

We must design this new experience which our users can easily use to accomplish their goal effortlessly.

We must design this new experience which our users can easily use to accomplish their goal effortlessly.

We must design this new experience which our users can easily use to accomplish their goal effortlessly.

🪩

🪩

🪩

Transparency

Transparency

We must design an experience that ensures open communication with our users across their journey, from start to end.

We must design an experience that ensures open communication with our users across their journey, from start to end.

🪴

🪴

🪴

Scalability

Scalability

We must design an experience beyond our initial vision, which can equally scale to fit our future changes and decisions.

We must design an experience beyond our initial vision, which can equally scale to fit our future changes and decisions.

We must design an experience beyond our initial vision, which can equally scale to fit our future changes and decisions.

Education and entry point

According to the OCI Security team, cross-region replication is one of the most important feature as a part of the security realm. Hence, with the launch of this feature we wanted to try out a different way to promote the feature and get users to try it out.

Hence aside from our usual blog announcement and inclusion in What's New section on OCI's home page, we also promoted the launch of cross-region replication within the Vault details page. This would allow users to engage with the feature almost immediately by clicking on the action below and getting started.

Letting our users focus on what's important

When users work with cross-region replication for the first time, the Vault service requires users to grant us access to make changes across tenancies to complete the replication action.

Generally users would have to navigate outside their current service to enable policies within the Access Management in OCI. However, asking users to go leave their task in hand breaks their overall flow and is not a good user experience. Hence, we integrated the policy enablement step within our workflow for a seamless continuous user experience.

Keep it simple, stupid!

Replicating a resource via API is a fairly easy process for those who can do it. We wanted to provide the same experience when users work within our UI too. Users should be able to reach their goal with minimum intervention.

As a result, no details or additional information is required from the users when they replicate their vault. We handle everything at the back-end as far as even automatically assigning a name to the replicated vault.

Ease of access with room for scalibility

Once the replication is succesfully completed, users can easily access the information related to their replicated vault from within the source vault itself. We ensure that open and continuous communication throughout the replication lifecycle by providing our users with clear status updates including error messages and possible solutions.

For the MVP, we will support replication between vaults from only one source region to only one destination region. However, we designed our experience with an eye on the future use case where we would extend the capability to allow users to replicate a vault to multiple regions.

Business impact

1,500+ enterprise customers have used the replication feature so far since it’s launch

1,500+ enterprise customers have used the replication feature so far since it’s launch

1,500+ enterprise customers have used the replication feature so far since it’s launch

100,000+ vaults replicated

100,000+ vaults replicated

100,000+ vaults replicated

Oracle Cloud Infrastructure earns 5 points on Gartner’s magic quadrant for Cloud Platform

Oracle Cloud Infrastructure earns 5 points on Gartner’s magic quadrant for Cloud Platform

Oracle Cloud Infrastructure earns 5 points on Gartner’s magic quadrant for Cloud Platform

Retrospective

This project was an interesting and challenging task because we had to provide a simple way to perform an operation, which otherwise may seem very complicated on a UI level, but is simple to perform on a API level.

Working with an enterprise tool can be a daunting task for cloud users. There are so many aspects intertwined with carrying out even the minutest of operations on a cloud and its the product's responsibility to ensure that users are always in control of their actions.

With this simple design we aim maintain transparency with users as well as allow them to perform a complex replication action with just few clicks.

Designed and built in Seattle 🌊 🦆

Designed and built in Seattle 🌊 🦆

Designed and built in Seattle 🌊 🦆