Filter by Brand Clear Filter
09 Mar 2023

Building event-driven architecture for member system

At first, Baemin was created as a single project.

Our orders grew rapidly, with the growth charted out as a J curve, and the traffic naturally grew as the number of orders increased.

The exploding traffic was too much to handle with a database on a single system, so Baemin had to endure a long period of time where it was riddled with errors.

Hence Baemin has attempted to switch to a microservice. The microservice was completed on November 1, 2019, with the separation of all systems, and we got the system stabilized.

This is from Baemin’s microservice journey by Younghan Kim presented at “WOOWACON 2020.”

Baemin switched to a microservice and welcomed the era of event-based architecture.

Many companies and developers both at home and abroad are talking about MSA, including the “microservice journey” at WOOWACON 2020. From a macroscopic perspective, it seems event-based architecture is now a somewhat familiar concept, but event-based architecture is still a new thing from a microscopic perspective looking at one system.

So, we’d like to introduce how Baemin is handling event-based architecture on one system with this post.

What to publish as events?

Why does being event-driven get mentioned when we talk about MicroService Architecture (MSA hereafter)?

It’s related to loose coupling among MSA’s core keywords. Each microservice can reduce dependency and impact on other systems by being loosely coupled, and a highly cohesive system can be created by each system focusing on its own purpose. An Event-driven approach helps with this.

To help with your understanding, let’s take an example of the relationship between two domains, Baemin’s member and family account.

There’s a policy that “if a member’s identity verification is initialized, the member must be withdrawn from the family account service.”

This policy is shown in the codes below.

The logic for withdrawal from the family account service is deeply involved with a member’s identity verification removal logic. The two have a strong coupling.

As we are reorganized as a microservice, two domains have been separated into different systems; the member system and the family account system. At this time, a physical separation occurs for the two domains that used to exist in one system.

By this physical system separation, the code-level calls are now synchronized HTTP communication. However, the intention of calling the target domain remains, so it’s hard to say that the coupling is loose with just the physical system separation.

What comes to mind easily when we think of how to remove physical dependency is the asynchronous method. Typical asynchronous methods include HTTP method via a separate thread and a method using the messaging system.

HTTP requests are made through separate threads separated from the main flow.

It’s done on a separate thread, so the direct coupling from the main flow can be removed. However, the intention of calling the target domain still remains, so the coupling’s still not loose from the system perspective.

Messages are sent using a messaging system. You may expect that the coupling is loose as you use the messaging system, but an architecture using a messaging system doesn’t always guarantee loose coupling.

Let’s say a message about the withdrawal of a family account is sent when a member’s identity verification is removed. The physical dependency is eliminated by sending the message. However, it doesn’t loosen the coupling.

Because the message expecting the withdrawal from the family account is published, the message on the member system must also be changed when the family account system’s policy changes. If the message’s publisher informs what to do (command), the code on both sides of the message publisher and receiver should be changed when this task to be done changes, so a high degree of coupling exists.

In addition, there’s a logically dependent relationship remains that the member system is aware of the family account’s business, so we can’t consider the coupling loose. You can say that the degree of coupling is not high physically, but high in concept.

The dependent relationship remains even though the message is published because the message published has the purpose of expecting something from the target domain. If a message sent via a messaging system contains a purpose expecting something from the target domain, this is not called an event. It’s no more than an asynchronous request using the messaging system.

Let’s say an identity verification removal event is sent when a member’s identity verification is removed. The member system doesn’t know the family account system’s policies any longer. The family account system subscribes to identity verification removal events to implement the family account system’s business. The member system is no longer affected by changes in the family account system’s business. Now, the coupling is loose between the two systems.


We’ve looked at the flow of a dependent relationship from a physical system separation to the asynchronous HTTP communication, and to the event method.

We could remove physical dependency using a messaging system, but found out that we get completely different results, depending on the intention of the message.

Events we should publish are not the purpose we want to achieve with the domain event, but the domain event itself.

A domain is a problem area which you want to resolve, and a domain event is a core value or action that may occur in the problem area. Some of you might relate this term domain to Domain-Drive Design (DDD hereafter), so I tell you now it doesn’t have much to do with DDD.

If you have difficulty defining the domain’s core value or action, I recommend you event storming. Event storming is one of DDD’s strategic design tools, but it’s a good tool for identifying problem areas and solving them even when it’s not used for DDD. Event storming for studying domain knowledge

Publication and subscription of events

In the member system, we defined 3 types of events and 3 event subscriber layers to solve various issues. Let’s see why each layer and event has been created and what they resolve.

Application event & first subscriber layer

Before using a messaging system, we handled Spring Framework‘s Application Event first.

We deal with application first because circumstances, where we have to create a loose coupling through events, don’t just exist in the outside world.

Spring’s application event provides an event bus that can handle distributed-asynchronous tasks and supports transaction control.

The layer of the first subscribers who subscribe to application events may efficiently deal with not-of-interest tasks inside the domain within one application using features provided by Spring’s application event.

One of the domain’s typical not-of-interest tasks that must be resolved within the application is publishing events with a messaging system. Event subscriptions can be expanded or modified freely without impact to the publishing system, so we can create, expand, and modify the connection to the messaging system without affecting the domain.

We can also control transactions through Spring application events. The scope of transaction defined in the domain being able to be controlled externally may be considered an intrusion to the domain, but we can create powerful subscribers by accepting this intrusion.

We’ve established a system policy that any domain activity that causes status change must be delivered via the messaging system. Delivery of events via the messaging system is not the domain’s interest, but it’s an important policy for the system. In this case, we can expand the transaction and make the subscriber’s action be handled within the transaction without any changes to the domain policy.

The member system is using AWS SNS as the messaging system, so an event subscriber has been created who’s in charge of SNS publication for the first subscriber layer.

Note that the events published through these subscribers are internal events.

Internal event & second subscriber layer

Application events can be used to handle internal events, but the application event processor uses the application’s resources, so it affects the domain’s key task processing performance. In addition, Application Event, as well-implemented as it may be, it can’t have the messaging system’s advantage that minimizes message loss and disaster recovery.

The first subscriber layer handled the not-of-interest tasks that need to be taken care of within the application, the second subscriber layer handles all other not-of-interest tasks in the domain.

Separation of not-of-interest tasks

There may be policies that need to be undertaken with domain activities when they are conducted. These additional policies may be mistaken for the domain’s main action, and it expands a dependent relationship and interferes with cohesion around the domain’s main activity.

Let’s take a look at a login process, as an example of the separation of not-of-interest tasks within a domain.

When a member logs in,

  • Change the member’s status to logged-in
  • Make the same account log out from other devices where the account is logged in, according to the “limitation of logins of the same account”
  • Record which device the member logged in
  • Record the log out of other accounts on the same device

the above tasks need to be done.

Looking at the code above, it’s hard to determine which is the domain’s main activity. It’s because additional policies are written in the domain logic together with the main one. We have to increase the domain activity’s cohesion and loosen the coupling to not-of-interest tasks by locating the key feature and separating not-of-interest tasks. You should be able to find a domain’s main activity when looking at the policies. If the policies are vague, you can locate the domain’s main function by separating what needs to be done immediately and what needs to be done later.

The main activity of a login function is to “change the member’s status to the logged-in status.” Other activities are policies additionally attached to the activity of logging in. Separate the additional policies from the domain logic.

You can also see the 3 not-of-interest tasks are not dependent on each other. We can divide one event to multiple subscriptions and process them through the AWS SNS-SQS messaging system.

By separating not-of-interest tasks in a domain like this, we can increase the cohesion of the domain activity and loosen coupling with not-of-interest tasks. In addition, not-of-interest tasks separated can be implemented independently to secure strong cohesion and high reusability.

Publication of external events

We’ve separated not-of-interest in the system, but now the publication of external events is needed for separating not-of-interest tasks from the external system for MSA. An action of delivering events to the external system can also be considered as a not-of-interest task that has been in the domain.

Like handling other internal events, the external event gets published by the event subscriber who is in charge of SNS publication for the second subscriber layer.

– External event & third subscriber layer

You can allow internal events to be subscribed from outside, but there’s an advantage of being able to offer an event that is internally open and externally closed by separating internal and external events.

Each subscriber has a different purpose, even if they receive the same event. For this, each subscriber may need more data to recognize the event properly.

Open internal event, closed external event

A subscriber may provide necessary data to the payload and get the efficiency of event handling for internal events. Such payload expansion can be allowed because this event is an internal event. Internal events exist within the system, so you can understand and manage the impact the event’s publication may have on the subscribers. You can internalize internal concepts that don’t need to be open to the public in these events as well. Such expansion is possible, also because internal events exist within the system.

However, external events, to be delivered to an external system, are different from internal events. Internal events have the purpose of increasing domain cohesion and efficiently handling not-of-interest tasks by separating not-of-interest tasks that are in the domain, while external events aim to reduce the coupling between the systems. Published to loosen inter-system couplings, an external event should not care about what the event subscribers do at the place where the event is published, and can’t manage them either. If the place of publishing the event gets interested in the actions of event subscribers, then it forms a logical dependency again.

An external system would probably need more information to process an event. However, if we add the data needed by the external system’s business to the payload, it will form a direct dependent relationship with the external system’s business changes. Event generalization is required so the event can be delivered in a format in order to create an event that doesn’t have any dependency with external systems.

Event generalization

The activities an external system wants to do using an event may be wide in range, but the process of recognizing the event can be generalized easily.

Which member (identifier) did what (activity) when, that caused what changes (change’s attributes)?

You can see that any system can recognize the necessary event if there’s an identifier, action, attribute, and event time. By implementing them as a payload, the event’s receiver side may classify necessary events to conduct the actions required by each system.

External systems may carry out actions necessary within the specified event format, so the system that publishes the event may not be affected by changes in external systems.


Tip. Subscribing events that are wanted by subscribers only, using the SNS attributes

Each subscriber may use the event filtering feature based on the “AWS SNS” attributes.

https://docs.aws.amazon.com/sns/latest/dg/sns-message-filtering.html

Each subscriber may define an event format or attribute type required to only receive events needed by the application. With the filtering feature, the wasting of resources that happens by the application classifying events itself can be reduced.

Zero-Payload Method

We chose ZERO-PAYLOAD method to deliver additional data of closed external events.

ZERO-PAYLOAD method is often introduced as the solution to problems associated with events’ order guarantee, but it also has an advantage of removing dependency on external systems from the payload and creating a loose coupling.

External systems can filter generalized events and subscribe to necessary events, and use API for additional information needed to use the latest guaranteed data.

We were able to control events’ transactions through application events, efficiently separate internal not-of-interest tasks through internal events, and publish events that have no dependency on external systems through external events.

This is how event-based architecture is built in the member system.

Building event storage

We were able to separate events’ layers and handle events stably through the messaging system, but problems still existed.

The first problem. Loss of event publication guarantee

It’s possible to stably handle failures and retries through SQS policies in the SNS-SQS-Application section, but HTTP communication is used in the Application-SNS section, so problems may occur in the process of publishing events.

The process of publishing internal events was defined inside the transaction, and the messaging system’s failure may directly lead to the system’s failure. The messaging system’s failure leading to system failure is a huge problem and must be solved.

This problem can be solved by defining the publication of internal events to be after a transaction.

However, it’s handled outside the transaction, so we don’t have a guarantee for event publication anymore. HTTP communication is used in the Application-SNS section, and a failure may occur due to various problems in the network section.

The second problem. Republishing event

Even if subscribers handle events successfully, the handling may go wrong and we should be able to republish events for them at any time.

Here, the shapes of the events the subscribers want are free. They may want a specific event, of a specific period, for a specific member, of a specific type, or a specific attribute to be published. Some messaging systems provide the republication feature, but not all of them. It’s hard for them to accept all these requests as well.

Most data is stored as the final status and it’s difficult to restore it to a status of a specific point in time, and even if you have the history, it’s not easy to restore the event with the data stored without considering events.

To solve these two problems, we decided to build an event storage.

Point of saving event

In order to prevent a messaging system failure to lead to a system failure, we defined the publication of events via a messaging system as a separate transaction. This broke the definition that “publishing event via a messaging system is considered the domain’s main activity,” and it also invalidated the guarantee for event publication.

To restore this definition to the event storage, we redefined “saving an event in the event storage as the domain’s main action.” There is a risk that all domain events must be saved in a storage, and when saving fails, the domain action is also considered failed. Such a definition is needed because data must be guaranteed somewhere.

With definition, we created a subscriber who handles storing in an event storage within the range of transaction.

Storage type

You might think we should choose another database that’s not RDBMS since events are stored in small units and should be processed quickly.

If we use a different type of database from the domain storage, transaction processing must be enabled for both storages. However, it’s extremely hard to implement distributed transactions for heterogeneous databases.

If we choose the same type of storage for event and domain storage, we can trust DBMS for transaction processing, and we can also ensure data consistency through transactions even when a failure occurs on the infrastructure.

Saving databases through the same storage and ensuring stable consistency for publishing events are also referred to as the Transactional outbox Pattern. The key to this pattern is to use local transactions (using the same storage) to save databases and ensure consistency in publishing events. The decision to use event storage was to solve the problem of guarantee for event publication, so this might be considered another way of implementing Transactional outbox Pattern.

Performance risks about reading and writing amount in single storage may be accompanied, but this can be handled sufficiently by expanding, such as scaling up/out, or sharding.

So we chose RDBMS for our event storage, which is the same storage as our domain storage.

Data format

It must be possible to verify that an event has been published in order to ensure the event’s publication.

To check whether an event is published, we needed a flag indicating the publication status, and an identifier for the event itself as well.

It should be possible to republish events by querying a specific member, action, attribute change, or period.

Fortunately, generalization that could be the solution of event query has already been done, when we handle external event publications. Now we know that any system can recognize the necessary event if there’s an “identifier,” “action,” “attribute,” and “event time.”

So let’s define an “identifier,” “action,” “attribute,” and “event time” to get the event query solved.

It should be possible to republish events by querying a specific member, action, attribute change, or period.

Fortunately, generalization that could be the solution of event query has already been done, when we handle external event publications. Now we know that any system can recognize the necessary event if there’s an “identifier,” “action,” “attribute,” and “event time.”

So let’s define an “identifier,” “action,” “attribute,” and “event time” to get the event query solved.

Thus, the storage schema for solving problems has been configured.

Solving problems

Event publishing guarantee

Where we needed a guarantee for event publishing was the process of publishing an internal event. When the first event is recorded, the publishing status was saved as false, and the data was updated by adding a subscriber that records whether the event has been published to the second subscriber layer.

Here, the subscriber that records whether the event has been published can be processed with just the event ID. All events’ super class have been defined so that all events have their own event ID.

The subscriber uses the common payload of events, so all SNS events can be subscribed and processed with a single Queue.

  • When a domain event occurs, the event storing subscriber on the first layer expands the transaction, so the event is stored in the storage along with the domain action.
  • The SNS publishing subscriber on the first layer publishes an internal event using SNS when the domain’s transaction has been successfully processed, because of the AFTER_COMMIT option.
  • The second layer's event publishing record subscriber receives internal events and records that the event has been published successfully.

Now, if an internal event has been published successfully via the messaging system, the event’s publishing status will be updated without fail. We organized a batch program so that the system, rather than a human, detects any case of missed event publishing and republishes it automatically. This batch program republishes events that haven’t been published after 5 minutes of the event’s storing to SNS.

  1. The set limit is 5 minutes because we had configured the retries from AWS SQS to be continued for up to 5 minutes.
  2. This batch program doesn’t change the events’ statuses directly. It’s because if the event is republished and relayed to the messaging system successfully, it will be subscribed by the subscriber that handles event publishing.
  • Events that have not been published successfully are automatically republished by event publishing detection batch.

Like this, we built an event system where message publishing is guaranteed through an event storage, a publishing handler subscriber, and a batch program.

Republishing event

All events are there in the event storage, so you can republish all and any events through the event storage.

We built a batch program to handle this easily.

We enabled the selection of internal event and external event to publish events with the conditions of period, specific action, specific attribute, specific member, and specific event.


Tip. Sending event to a specific subscriber layer using the SNS attribute

Each subscriber may use the event filtering feature based on the AWS SNS attributes.

https://docs.aws.amazon.com/sns/latest/dg/sns-message-filtering.html

We defined the attribute called “target” for all SNS attributes.

Issue a unique ID for each subscriber, and define unique ID and ALL as the conditions for target.

ALL is a shared attribute for all subscribers that enables them to subscribe to all events.

Normally, it publishes using the ALL type for the target attribute so that all subscribers can use the event. However, if an event needs to be published for a specific subscriber, it publishes by entering the unique ID in the target attribute in the batch system.

With this method, you can create a mechanism to publish events for specific subscribers only.

Integration of record tables

The member system handles personal information, and there are many requirements for data queries.

The member activities must be able to be tracked to resolve issues that came in through the customer center, to track fraudulent users and to cooperate with an investigation agency, etc. Thus, there used to exist dozens of record tables in the member system to fulfill these requests.

All member activities are stored in a consistent way by building of the event storage and the separate record tables have become unnecessary.

Building event-based architecture for the member system is complete with the building of the event storage.

Closing

Member is a domain that exists on most systems. It’s one of the most common domains that exist on any system, but also one of the most central domains as all domains are dependent on the member. It’s also the most critical domain because it intensively deals with personal information.

This event-based architecture was created as the result of considerations for the member domain, located at the very center of MSA, to not be affected by external systems, not to affect external systems, and to safely handle members’ personal information.

We continue to think of the member system as the most stable system in the center of MSA.


Woowa is Delivery Hero’s South Korean subsidiary and operator of Baedal Minjok.


Please note: This article is an English translation of Kwon Yong-geun’s blog post titled “Building an event-based architecture for member systems” published in April 2022 for global affiliates and readers.

by | Berlin

02 Mar 2023

Personalisation @ Delivery Hero: Understanding Customers

Understanding the customer base goes a long way in powering a number of recommendation models throughout our apps.


Personalisation models make use of various signals and behaviors to build an understanding of user preferences while making sure business criteria are also met. Let us explore what entails such an exploration.

In the previous blog, we discussed key components that make up a personalisation setup with the sole purpose of improving customer experience and helping our business bottomline. Customers are at the very center of any business. At Delivery Hero they are our heroes and no personalisation journey can work without understanding the customers. Let’s dive deeper into this topic and how it shaped our personalisation journey.

It is very important to understand and answer questions related to users’ behavior, motivations, preferences and requirements. Exploratory analysis is a key step which powers not just the initial set of tasks but is an on-going activity which helps throughout the journey. This step generates a number of key insights, ideas and helps identify potential challenges which we try to handle in subsequent phases as the system matures.

To understand how this helps, let us first talk about how the exploratory analysis works out. The most widely used and obvious method is to leverage all the data we have about our customers. This can be in the form of transactions, the kind of items they order, their restaurant and price point preferences and more. This form of analysis makes use of implicit signals which we leverage as feedback to understand different customer behaviors. This also helps in developing personas and categorization of usage patterns.

“Customers in Singapore and Thailand Love 💖 Indian vendors “ for late-night dinners

The second way of analyzing customer behavior is through explicit feedback. When customers use our apps, they are presented with the opportunity to rate the packaging, delivery, rider, restaurant/vendor and more such characteristics. They can also provide textual feedback in the form of reviews. These explicit signals are even more impactful towards building an understanding of the customer base. Most modern recommendation systems make use of both implicit and explicit feedback to improve overall experience.

“Malaysian customers just Love💖 Burger vendors for their evenings”

The final piece in the list for methods for exploratory analysis is User Research (UXR). This is by far the most meticulous and tedious task but it provides a direct connection with our customers. This is also very costly to conduct, in terms of time, effort and monetary aspects. At Delivery Hero, we typically run multiple UXR studies to understand how our customers use the app, what they use it for, their major pain points and what they love :).

A combination of these three analysis methods helped us develop an understanding of our customers.  For instance, UXR studies helped us understand that our customers in certain parts of the world just love bubble teas at certain times of the day over other beverages. This helped us prioritize the visibility of bubble tea serving locations to help our customers find the best ones quickly. Similarly, analysis of implicit signals pointed to hyper-local preference of restaurants. Helping us have a more focused approach to improve our recommendations.

On an average, most customers just love💖 to order again from vendors they have already ordered from #nostalgia #cravings

EDA’s are not just helpful in understanding customer behavior. We analyze data to improve upon existing models and system behaviors as well. This acts as a feedback loop to ensure our models are indeed working as intended and to identify gaps to improve further. The following is a high-level overview of a recent analysis to understand the impact of position and what motivates a customer to order from a specific vendor.

This analysis helped us understand what type of vendors were being rated high by our models versus which appealed to our customers better. It helped us narrow down to a list of features which helped in improving our model performance.

Modeling User Behaviour

The insights from the exploratory phase power the modeling phase to a great extent. Right from helping Data Scientists decide on what features to use to even choice of models, evaluation criteria and more. There are a number of intricate details which we are skipping here for the sake of brevity, but in a nutshell, the modeling phase is split into two main components described next.

A New Beginning: Cold Start

If you know someone, it’s comparatively easy to guess what they would like to eat or from where they would like to eat, as opposed to someone whom you just met. The same applies to the recommendation models as well. 

The coldstart models are geared towards analyzing generic and macro patterns in a population to provide better recommendations than random listing. The aim is to try to present what a new user might be interested in rather than just a plain random set of choices. The evaluation criteria and business metrics are also specifically defined separately for new customers. Coldstart models mitigate the challenge of not knowing much about new customers by trying to use aggregate information and signals/behaviors from long term customers. Some of our models even make use of platform (our app) usage behavior to tailor recommendations for new customers.

Don’t Worry, We Got You: Personalized Model

Once we know a customer’s behavior based on their interactions with our system, we try to provide more fine-tuned recommendations, tailored to individual tastes and preferences. The more we know about you, the better our recommendations get. The “know you” part is where the whole data science magic happens. Data Scientists leverage both explicitly shared feedback and implicitly deduced behaviors to power such models. Just like every other component in this setup, personalized models also go through an iterative process of improvement. Right from a basic rule based setup where we leverage insights from EDA and domain understanding of experts to our current set of machine learning based models, we have been in constant state of improvement. A resilient and robust infrastructure setup powers this highly iterative data driven ecosystem. 

Understanding the data/signals and developing models to improve concerned metrics is only one part of the picture. Another important aspect is “how does one know which model to use?” or “is the latest model any better than what we currently have in production?”. Evaluation of recommendation models which are highly temporal in nature is quite different from evaluation of usual machine learning models. We make use of offline evaluations as well as online A/B tests to evaluate our models using a number of well defined metrics. We will cover more on this in upcoming posts.

Conclusion

Personalisation is an interesting and important aspect of our platform. Understanding user preferences is one of the key steps in developing solutions powered by recommendation engines. It is a continuous and iterative process which keeps evolving as the user preferences evolve. We discussed how implicit and explicit feedback helps in developing an understanding of our customers along with UX Research which involves far more effort and depth. We touched upon a few interesting insights from exploratory analysis of different populations as well. The blog post presented how we personalize experience for new and existing customers by leveraging different approaches. We concluded by hinting about standard methods that help in evaluation of recommendation systems. Stay tuned as we introduce more components and go more in depth in up-coming blog posts.


Would you like to become a Hero? We have good news for you, join our Talent Community to stay up to date with what’s going on at Delivery Hero and receive customized job alerts!

by | Berlin

01 Mar 2023

Vendor Portal/Solutions Hack Day 2023

One of the key drivers of productive teams is efficient communication, and there is no better way to foster it, than creating a whole-day event where Developers, Project Managers, Quality Assurance Engineering, Product Analysts, and Designers join forces to work on innovative ways to solve problems for our customers.

Who had this idea?

A committee of 6 people planned and executed the event, where over 60 heroes allocated in teams, spend the whole day ideating, coding, testing and presenting the exciting results of their work.

Creativity vs Feasibility

We wanted to make sure all our heroes utilize 100% of their creativity. Ensuring that everyone feels empowered to think “out of the box” and work on alternative solutions was one of our priorities.

Based on the foundation “3 days in a Vendor’s life “ the 8 teams were able to come up with 8 different innovative ideas at the end of the day.

Everyone’s a winner!

All participants spent their day here in Berlin, ate delicious burgers and healthy salads, enjoyed a great variety of cold and hot beverages to ensure their motors kept running… Why? Because there was one more thing on the line: our prizes!

Broken down into 4 categories, everyone voted for the winners of the following categories:

  • For the most innovative solution
  • Best use of technology
  • Best overall solution
  • Jury’s choice

These heroes finished the day with flowers, a trophy, their peers recognition and most importantly, a smile on their faces.

Work hard & play hard

We could not simply end the Hack Day with “just” an award ceremony. We needed more food, more drinks and a live band!With this lovely atmosphere, the heroes stayed exchanging and networking. As my grandma would say, happiness is an enlightened mind with a filled stomach


Would you like to become a hero and experience the next Hack Day with us? We have good news for you, join our Talent Community to stay up to date with what’s going on at Delivery Hero and receive customized job alerts!

by | Berlin

03 Feb 2023

How to Turn Stakeholders into UX Advocates: A Guide For Any UX designer & researcher

Stakeholders often perceive the UX as a time-consuming & skippable process. “We already know XYZ. We don’t have time to afford to do research. Is insight from 5 users really trustworthy? 

As a UX practitioner, you know that gaining buy-in from stakeholders is essential to the success of your projects. But what if they’re not already sold on the value of UX design? Recently, we had a project that started with doubts from stakeholders but ended with a team full of UX advocates. 

In this post, we would like to share the lessons and tips we’ve learned from our recent project.  


1. Make the business case for the design and welcome all the questions

Your stakeholders may need to be more familiar with the value of design, so it’s essential to make a case for why it’s vital to the success of your project. Explain how design can help to improve the user experience, increase conversion rates, and drive business results. Use examples and goals that are valuable for them, and start these conversations early – before the project kicks off.

Remember – no one challenges without reason. Understanding the ‘whys’ of the main stakeholders will help you to uncover what business goals and metrics are the most important to them. Above all, you will have an insight into their worries. Equipped with this knowledge, you can create a design and research plan that mitigates their worries and shows how design can impact the metrics and goals they care about.

(Tips) How we did it:

  • In the planning stage, give two or three options for a design plan with connected risks and benefits (as an example: one plan with the user research and one without). This way it will be easier for the stakeholder to choose a way forward.
  • Align clear goals from the get-go. Make sure that terms and definitions mean the same for all team members.

2. Get them involved early to build trust

The best way to get stakeholders on board with your project is to involve them from the outset. Invite them to participate in user research, help to define the project goals, and provide feedback on early designs. Getting them involved early will help them understand the value of design and build trust.

It will take time and effort to build a relationship founded on trust, sometimes maybe even as long as 6 or more months. However, such a relationship is a crucial puzzle piece to a successful designer and stakeholder long-term collaboration. If stakeholders trust you in your decision-making, you can move to strategic work more quickly.

(Tips) How you can do it:

  • Make them feel heard online and offline 
  • Prioritise challenges and ideas together
  • Create one communications channel (perhaps a Slack channel?)
  • Keep them regularly updated on the progress (if you have a chance – over-communicate)
  • Don’t omit chances for social interactions so that they get to know you as a person, not just “another UX designer in a company x”

3. Work together through all highs and lows

Another key element you need is collaboration. Collaboration is a great way to create a sense of ownership and alignment throughout the project.

People are always in favor of their own ideas. Especially the stakeholders, who have their stakes in this project would appreciate making their own decisions. Helping them to feel this sense of ownership – making decisions with the information has helped us turn them into the biggest research advocates.

Surprises can be good. It can be a game changer. But it should come at the right time. It can come while we run the interviews and tests and uncover the surprising findings. But once they are uncovered, the earlier you share with the stakeholders and get aligned, the better. Collaboration can be a great way to get this alignment as early as you can.

(Tips) How you can do it:

  • Get stakeholders to observe the sessions
  • Hold daily debrief sessions to gather their observations & insights
  • Hold collaborative analysis sessions to incorporate their perspectives and insights into the final analysis
  • Make sure you educate the research principles along the way rather than taking their literal words into the report

4. Don’t jump into solutions too early

We instinctively want to fix problems quickly. Especially when they are clearly spelt out in a research readout presentation. However, as designers, we are there to remind everyone of the power of discovering and agreeing on the problem first.

It might feel counterintuitive to the stakeholders to spend a large chunk of time on the challenge rather than start discussing solutions. However, this will help to ensure that you are solving an actual problem, rather than a symptom of the problem and that all angles are being taken into account.

(Tips) How you can do it:

  • Have minutes after the meeting so that you don’t lose track of what you spoke about
  • Invite them to a pre-read of the user research sharing session, so that they are aware of what to expect
  • Assure that there will be enough time to co-create solutions later

5. Build empathy with the stakeholders as much as you do with the users

It’s essential to make it feel for stakeholders that they are a vital part of the design-led project. Especially, because they are! Set up regular check-ins, listen actively and be open to any changes to the project plan. This will help to ensure that they feel invested in the project and that their feedback is being taken into account.

(Tips) How you can do it:

  • Organise stakeholder interviews prior to the sprint 0 kickoffs
  • Be proactive about following up—help make reviews easier for others
  • Understand and remember their perspective while designing

Conclusion

There are a few key steps to take when trying to transform stakeholders into design advocates. First, it is important to clearly articulate the value of design and how it can help achieve business goals. Next, it is necessary to build trust with stakeholders by being transparent and honest about the design process. Finally, it is crucial to involve stakeholders in the design process as much as possible and get their feedback before making any final decisions. By taking these steps, you can help all stakeholders see the value in design and become advocates for the design process.

by Justyna Belkevic, Dahee Kang | Berlin

26 Jan 2023

Why do the same thing over and over? External dependencies, Swift Package Manager

If you are a Mobile Engineer like me, you’re likely extremely excited about Apple Silicon computers. They are performant and energy-efficient, but above all, they allow you to build your projects faster and without jet engine fan sounds. However, even with this improvement, the compilation times are very far from what you are getting in web development. Building locally, or on the CI, takes up momentum and can occasionally push you out of the flow, decreasing overall productivity. Luckily, there are a couple of improvements for build time.

by Mike Gerasymenko | Berlin

19 Jan 2023

Creating an SRE Culture while preventing a 12 million order loss

Back in 2019, we were in a race to constantly build new features while trying to juggle stability. During this phase, technical debt was piling up and the reliability of the platform was suffering. We had a “stability” meeting with all of the backend and infrastructure chapters EVERY morning to talk about the incidents we caused and what we were going to do next. I used to call this meeting “The ring of fire”.


Operation Hawk

We decided to call our Observability project ‘Operation Hawk’, as a Hawk has better vision than humans. We had too many different observability tools, all spread out among local squads. The goal of this project was to bring observability to one single place, while increasing ownership on local teams so that the data could be as trustworthy as possible.

The foundation of Operation Hawk was, and still is, the implementation of the Four Golden Signals mentioned in the Google SRE Book. However, before implementing it, we needed a new tool.

The Hunt

We wanted our observability data to be in one place, so we began the hunt for the right tool. At Delivery Hero, we only make architectural decisions through RFCs, so we started a couple of RFCs and POCs until we found the right tool.

The Golden Path

Our mission as a team was to enable Heroes to achieve Operational Excellence by providing Best Practices, Observability and Governance throughout the application lifecycle – meaning that we wanted to lower the adoption bar by providing a standard and self-service approach for every service or tool that we provide, as we have a self-service mindset regardless of the solution we provide.

With that in mind, we created our SRE Framework.

The SRE Framework

At Delivery Hero, we invest time and effort into monitoring our services from day one.

We created the SRE Framework with various maturity levels, based on the adoption of the SRE best practices. The SRE framework creates a golden path to increase the reliability and stability of the platform while promoting the SRE culture in local teams and giving service owners the ownership and independence they need.

The SRE Framework is split into 5 Maturity Levels. The squad was given ‘Maturity Level 0’. For Maturity Level 0, we provide an awesome list of resources so that one can learn what SRE is. At Maturity Level 4, squads own the whole process of ‘how to SRE’ in their local teams.

“…And, as we all know, culture beats strategy every time”

One of Delivery Hero’s core values is “We always aim higher 🚀”.

We quickly learned that by making it easy for our developers/stakeholders to do the right thing, the path to adoption is made easier. Therefore, we decided to spend time and effort making adoption of the golden signals and observability best practices the ‘easy option’ for our developers, by including monitoring directly into the modules used to create infrastructure, rather than pointing them to resources they could use to create those monitors themselves. Doing so meant every service and its underlying dependency had a fantastic observability stack ‘out of the box’, driving the proportion of services covered by 100% and empowering engineers to own their own stack.

This is now the default approach for our solutions, it’s called “Batteries Included”.

Batteries Included

Imagine you buy a toy for your child for Christmas. They rip the wrapping paper open excitedly to see what gift they have received. Their face lights up, they want to start playing immediately – but the toy needs  2 AA batteries. You go and find the packet (or take them from the TV remote). At that moment, the excitement of opening a new toy turns to frustration. 

Toy manufacturers became aware of this and started to include batteries directly in their toys, resulting in happy kids and less frazzled parents. This is the ‘batteries included’ approach. 

In product usability (mostly in software) it states that the product comes with all possible parts required for full usability. It means that the local teams now have all the observability out-of-the-box when they onboard their service. Not only this, but whenever a resource is created on AWS, it will already have all the Observability included.

Batteries included is now our approach at Delivery Hero.

Conclusion

With the right tools and data to create awareness about application performance, along with the underlying dependencies and costs, we were able to shift the Engineering Culture and improve our MTTD (Mean Time to Detect) and MTTR (Mean Time to Recovery) by 195% and 282% consecutively, and the percentual overall reduction was at around 327% less minutes in incidents.

In other words, Delivery Hero makes approximately 2 thousand orders per minute. If we calculate the difference, we can see that the reduction of both MTTD and MTTR helped us prevent an order loss of more than twelve million orders in the last two years.


If you like what you’ve read and you’re someone who wants to work on open, interesting projects in a caring environment, check out our full list of open roles here – from Backend to Frontend and everything in between. We’d love to have you on board for the exciting journey ahead!

by | Berlin