Sustainable Software Deliverability with Timelines

Filed under: — Aviran Mordo

In my previous post “Kill the Deadlines” I rant about how (fake) deadlines are demotivating, reduce quality and burn development teams. But if you think about why people use deadlines, is to deliver a software project in a timely manner. While I’m not a proponent of using fake deadlines as a way to push developers, in a product development process it is extremely important to have a time frame where you want to launch the product.

Agile software development has good tools and methodologies that can help you ship products fast, as long as you do it right and not follow the process blindly, but you understand the tools in your arsenal and pick the ones that work for you and adjust them to your needs and culture.

While it is extremely hard (if not impossible) to accurately predict how long a software project will last, setting a timeframe for a product launch is necessary to keep the team in focus and decide on priorities.

Release content
A project consists of many features: some take a long time to implement, and some take a short time to implement. While a product manager envisions a whole set of features, implementing all of them before the release will take a very long time and will probably never happen, or the product will never launch as there are always more features you would like to add, or more polishing to the product you would like to do.

Set a timeline
In order to keep everyone focused and be able to launch a product in a timely manner we need to set a timeline when we want to launch the product. The timeline consists of estimations of how long each feature should take. Based on these estimations we can build a release content list with all the features we would like to launch in this version. When we set a timeline we force ourselves to think about what are the features we would like to get into this version. The list of features in a version should be determined by the business value they provide. For instance, we can have one big feature with a lot of value or maybe two small features what each one has less value, but releasing two of these at the same time have a bigger value to the product that the one big feature.

Timeline should have checkpoints or milestones where you can evaluate your progress. In the axis of Time/Quality/Content I tend to keep quality as the constant (you should always strive to produce high quality software) and now you will have to play with time and content. Given a timeline, these milestones are good points to understand if you are going to be able to ship with your initial list of features in time, or you would have to take a good look and see what you can leave out for the next version. You will be surprised how “critical” features can be “less critical” and be cut down when you are forced to deal with a decision of extending the timeline or cut features, which helps you ship the product faster.

Synchronizing the company
Timelines help you synchronizing different departments inside the company. Since releasing a product requires also involvements from other department in the company, such as marketing, support, content writing, QA, etc’, having a public timeline that is visible and communicated to the whole company, helps synchronizing all the departments and help them plan accordingly to be ready for the release. You can communicate the timelines via periodic email updates, public boards, posters on the walls, publicly displayed on monitors in the office or any other method that will keep the timeline transparent and accessible to all.

So what are timelines?
Timeline is a rough time frame that you would like to ship a product. Timeline is NOT a deadline and is flexible to changes and delays (unfortunately most of the times it will be postponed, but you should try to keep it to a minimum).
Depending on the amount of the work timelines should be in a resolution of a week, month or a quarter.

Sometimes due to business constraints the timeline becomes a hard deadline. This is not an arbitrary date that you have set, but it is a direct result of a business constraints. For instance, a special feature that needs to be released before a holiday or a regulatory hard limit on something you need to change or add to your system. In this instance the real reason needs to be clearly communicated to the team.

When a timeline is delayed it should be as part of a discussion to understand what is the business impact of the delay as you may come to a conclusion that maybe instead of delaying the timeline you would make the hard choice of reducing the scope (content) of this version and keep the time unchanged.

So if a timeline is a rough estimate, when is the release date?
The release date, AKA deadline should be set towards the end of the timeline at the point you understand all the unknowns and you feel comfortable that you will be ready to launch (and so all the other dependencies). Setting the deadline late in the process will make the deadline based on REALITY and not a fake one. Yes, you will probably need to push the teams to get to this deadline but it will be something that they will be able to relate, understand and not sacrifice quality and release a bad product.

Continuous improvement
Agile software development is talking about retrospective as a tool to improve development process, in many organizations the retrospective is being done after a sprint and unfortunately geared towards improving the estimations which is causing the side effect of developers taking buffers just to be on the safe side. This should NOT be the point of retrospectives.
Retrospectives should be treated as a tool for continuous improvement of development velocity and not as a tool to improve estimations. In order to improve velocity, you should have a continuous feedback about what would have help you to deliver software faster.
Here are some examples of issues developers should point that would have made them finish their tasks faster: less context switches, faster build time, too many production alerts, I was waiting for QA, provisioning staging machines is taking a lots of time, I had too many meetings, etc’.
As opposed to I estimated a task to take 3 days and I realized that the initial database schema I chose was wrong and I had to redo it which caused delays, or I thought this was an easy task and realized I had to first do a big refactoring before I could implement it.
Once you get these feedbacks you will start to identify patterns and identify what are the bottlenecks in your development process and you could then tackle these bottlenecks and improve your development velocity. (Just between us, as a side effect of this process you will also get better estimations but that is accidental and not the real purpose of this)

In order to release a product efficiently, you can use agile software delivery practices, set a rough timeline and checkpoints along the way to see if you are on schedule. In case you are late you should re-evaluate the version content you may want to cut or switch some features.
Communicate and make the timeline visible to all the people in the company so everyone can be in sync, and when you feel that you can have a release date that you can actually make and it is based on real progress, only then set the actual date, and lastly have a process in place for continuous improvement of your development velocity.


Best practices for scaling with microservices and DevOps

Filed under: — Aviran Mordo

Wix.com is a highly successful cloud-based web development platform that has scaled rapidly. We now support around 80 130 million website builders in more than 190 countries. This translates into approximately 2 petabytes of user media files and adds about 1.5 terabytes per day. So how did we get there? The 400-strong Wix engineering team used a microservices architecture and MySQL, MongoDB, and Cassandra. We host our platform in three data centers, as well as on the cloud using Amazon Web Services (AWS) and the Google Cloud Platform.

I’ve been working with Wix since 2010 and oversaw the engineering team’s transition from a traditional waterfall development-based approach to agile methodologies and helped introduce DevOps and Continuous Delivery. Here’s what I learned about using microservices and MySQL to effectively support a fast-scaling environment.

How Wix defines microservices

Wix currently has around 200 microservices, but we didn’t start out on this path. During our days supporting 1 million sites back in 2008, we used a single-monolith approach with Java, Hibernate, Ehcache, Tomcat, and MySQL. This typical scale-up approach was useful in some aspects, but ultimately, we couldn’t tolerate the downtime caused by poor code quality and interdependencies inside the monolith. So, we gradually moved to a service-level-driven architecture and broke down our monolith.

By our definition, —(a single team can manage a few microservices), and the team must be able to describe each microservice’s responsibility in one clear sentence.

Specifically, a microservice is a single application deployed as a process, with one clear responsibility. It does not have to be a single function or even a single class. Each microservice writes only to its own database, to keep things clean and simple. The microservice itself has to be stateless to support frequent deployments and multiple instances, and all persistent states are stored in the database.

Wix’s four sets of microservices

Our architecture involves four main groups of services:

Wix Editor Segment: This set of microservices supports creating a website. The editor is written in JavaScript and runs in a browser. It saves a JSON representation of the site to one of the editor services, which in turn stores the JSON in MySQL and then into the Wix Media Platform (WixMP) file system. The editor back-end services also use the Jetty/Spring/Scala stack.

Wix Public Segment: This set of microservices is responsible for hosting and serving published Wix sites. It uses mostly MySQL and Jetty/Spring/Scala applications to serve the HTML of a site from the data that the Editor has created. Wix sites are rendered on a browser from JSON using JavaScript (React), or on the Wix Public server for bots.

Wix Media Platform (WixMP): This is an Internet media file system that was built and optimized for hosting and delivering images, video, music, and plain files, integrated with CDNs, SSL, etc. The platform runs on AWS and the Google Cloud Platform, using cloud compute instances and storage for on-the-fly image manipulation and video transcoding. We developed the compute instances software using Python, Go, and C, where applicable.

Verticals: This is a set of applications that adds value to a Wix site, such as eCommerce, Shoutout, or Hotels. The verticals are built using an Angular front end and the Jetty/Spring/Scala stack for the back end. We selected Angular over React for verticals because Angular provides a more complete application framework, including dependency injection and service abstraction.

Why MySQL is a great NoSQL

Our microservices use MySQL, so scaling them involves scaling MySQL. We don’t subscribe to the opinion, prevalent in our industry, that a relational database can’t perform as well as a NoSQL database. In our experience, engineers who make that assumption often ignore the operational costs, and don’t always think through production scenarios, uptimes, existing support options, knowledge base maintenance, and more.

We’ve found that, in most cases, we don’t need a NoSQL database, and that MySQL is a great NoSQL database if used appropriately. Relational databases have been around for more than 40 years, and there is a vast and easily accessible body of knowledge on how to use and maintain them. We usually default to using a MySQL database, and use NoSQL only in the cases where there’s a significantly better solution to the problem, such as if we need a document store or a solution for a high data volume that MySQL cannot handle.

Scaling MySQL to support explosive growth

Using MySQL in a large-scale system can present performance challenges. Here is a top 5 list of things we do to get great performance from MySQL:

Whenever possible, we avoid database-level transactions, since they require databases to maintain locks, which in turn have an adverse effect on performance. Instead, we use logical, application-level transactions, which reduce loads and extract better performance from the databases.

We do not use sequential primary keys because they introduce locks. Instead, we prefer client-generated keys, such as UUIDs. Also, when you have master-master replication, auto-increment causes conflicts, so you have to create key ranges for each instance.

We do not have queries with joins, and only look up or query by primary key or index. Any field that is not indexed has no right to exist. Instead, we fold such fields into a single text field (JSON is a good choice).

We often use MySQL simply as a key-value store. We store a JSON object in one of the columns, which allows us to extend the schema without making database schema changes. Accessing MySQL by primary key is extremely fast, and we get sub-millisecond read time by primary key, which is excellent for most uses. MySQL is a great NoSQL that’s ACID compliant.

We are not big fans of sharding because it creates operational overhead in maintaining and replicating clusters inside and across data centers. In terms of database size, we’ve found that a single MySQL instance can work perfectly well with hundreds of millions of records. Having a microservices architecture helps, as it naturally splits the data into multiple databases for each microservice. When the data grows beyond a single instance capacity, we either choose to switch to a NoSQL database that can scale out (Cassandra is our default choice), or try to scale up the database and have no more than two shards.


It’s entirely possible to manage a fast-growing, scale-out architecture without being a cloud-native, two-year-old startup. It’s also possible to do this while combining microservices with relational databases. Taking a long, hard look at both the development and operational pros and cons of tooling options has served us well in creating our own story and in managing a best-in-class, SLA-oriented architecture that drives our business growth.

Original post: http://techbeacon.com/how-wix-scaled-devops-microservices


Safe Database Migration Pattern Without Downtime

Filed under: — Aviran Mordo

I’ve been doing a continuous delivery talk for a while now and during my talk I describe a pattern of how to safely migrating one database to another database without downtime. Since many people contacted me and asked for more details about it, I will describe it here in more details as promised.

You can use this pattern to migrate between two different databases, for instance between MySql and MongoDB or between two schemas in the same database.

The idea of this pattern is to do a lazy database migration using feature toggles to control the behaviour of your application and progressing through the phases of the migration.

Let’s assume two databases you want to migrate from the “old” database to the “new” database.

Step 1
Build and deploy the “new” database schema onto production.
In this phase your system stays the same, nothing changes other than the fact that you have deployed a new database which you can start using when ready.

Step 2
Add a new DAO to your app that writes to the “new” database.
You may need to refactor your application to have a single (or very few) point(s) in which you access the database.
At the points you access the database or DAO you add a multi-state feature toggle that will control the flow of writing to the database.
The first state of this feature toggle is “use old database”. In this state your code ignores the “new” database and simply uses the “old” one as always.

Step 3
Start writing to the “new” database but use the “old” one as primary.
We are now getting into the distributed transaction world because you can never be 100% sure that writing to 2 databases can succeed of fail at the same time.
When your code performs a write operation it first writes to the “old” database and if it succeeds it writes to the “new” database as well. Notice that in this step the “old” database is in a consistent state while the “new” database can potentially be inconsistent since the writes to it can fail while the “old” database write succeeded.

It is important to let this step run for a while (several days or even weeks) before moving to the next step. This will give you the confidence that the write path of your new code works as expected and that the “new” database is configured correctly with all the replications in place.

At any time you decide that something is not working you can simply change the feature toggle back to the previous state and stop writing to the “new” database. You can make modification to the new schema or even drop it if you need as all the data is still in the “old” database and in a consistent state.

Safe database migration pattern

Step 4
Enable the read path. Change the feature toggle to enable reading from both databases.
In this step the it is important to remember that “old” database is the consistent one and should still be treated as the authoritative data.

Since there are many read patterns I’ll describe just a couple here but you can adjust it to your own use case.

In case you have immutable data and you know the record id you first read from the “new” database and in case you did not find the record you need to fall back to the “old” database and look for the record there. Only if both databases don’t have the record you return a “not found” to the client. Otherwise if the record is found you return the result preferring the “new” database.

If your data is mutable you’ll need to perform the read operation from both databases and prefer the “new” one only if the timestamp is equal to the record in the “old” database. Remember in this phase only the “old” database is considered consistent.

If you don’t know the record id and need to fetch unknown number of records you basically need to query both databases and merge the results coming from both DBs.

Whatever your read pattern is, remember that in this case the consistent database is the “old” one, but in this phase you need to read and use the “new” database read path as much as you can, in order to test your application and your new DAO on a real production environment. In this phase you may find out that you are missing some indices or need more read replicas.

Let this phase run for a while before moving to the next phase. Like in the previous phase you can always turn the feature toggle back to the previous states without a fear of data loss.

Another thing to note that since you are reading data from two schemas you will probably need to maintain backward and forward compatibility for the two data sets.

Step 5
Making the “new” database the primary one. Change the feature toggle to first write to the new database (you still read from both but now prefer the new DB).
This is a very important step. In this step you already run the write and read path of your code for a while now and when you feel comfortable you now switch roles and making the “new” database the consistent one and the “old” as a not consistent.
Instead of first writing to the “old” database first you now write to the “new” database first and do a “best effort” writing to the old database.
This phase also requires you to change the read priority. Up until now we considered the “old” database as having the authoritative data but now you would prefer the data in the “new” database (of course you need to consider the record timestamp).

This is also the point where you should try as much as you can to avoid switching back the feature toggle to the previous state as you’ll need to run a manual migration script to compare the two databases as writes to the “old” one may not have succeeded (remember distributed transaction). I call this “the point of no return“.

Step 6
Stop writing to the “old” database (read from both).
Change the feature toggle again to now stop writing to the “old” database having only a write path with the “new” database. Since the “new” database still does not have all the records you will still need to read from the “old” database as well as from the new and merge the data coming from both.

This is an important step as it basically transforms the “old” database to a “read-only” database.

If you feel comfortable enough you can do step 5 and 6 in one go.

Step 7
Eagerly migrate data from the “old” database to the “new” one.
Now that the “old” database is in a “read-only” mode it is very easy to write a migration script to migrate all the records from the “old” database that are not present in the “new” database.

Step 8
Delete the “old” DAO.
This is the last step where all the data is migrated to the “new” database you can now safely remove the old DAO from your code and leave only the new DAO that uses the new database. You now of course stop reading from the “old” DB and remove the data merging code that handle merging data from both DAOs.

This is it you are done and safely migrated the data between two databases without downtime.


Side note:
At Wix we usually run steps 3 and 4 for at least 2 weeks each and sometimes even a month before moving on to the next step. Examples for issues we had encounter during these steps were:
On the write path we were holding large objects in memory which caused GC storms during peak traffic.
Replications were not configured/working properly.
Missing proper monitoring.

On the read path we had issues like missing index.
Inefficient data model that caused poor performance which let us to rethink our data model for better read performance.


Kill The Deadlines

Filed under: — Aviran Mordo

I have been building software for over 20 years and participated in many projects. Usually when you come to write a new feature or starting a new project one of the first thing your manager asks you is a time estimate (if you are lucky) and then will set a deadline for the project completion.

Once a deadline is set everybody is start working to meet this date. While setting a deadline helps management plan and have visibility about what is coming from the development pipeline, the act of setting a deadline, especially an aggressive one is a destructive act that in most cases hearts the company in the long run.

Development work consists of much more than writing and testing code, it also consists of research and design. The problem is that while (experienced) developer can estimate the coding effort there is no real way to estimate the research phase, problems you may encounter and how long the design is going to take (if you want to make a good design). How can you really estimate how long it will take you to learn something that you don’t know?

If deadlines are aggressive, meeting them usually means that developers will start cutting corners. Do a bad design just because you don’t have the time to the right one. If developers are pressed in time they may stick to the bad design choice just because they don’t have time to switch the design to a better one after they realize their initial design has flaws.

Other things developers do in order to meet the deadline is to cut down on testing, while doing that hurts the quality of their work. While cutting down on automated testing may let you believe the work is progressing at a higher rate, however you will usually find the problems in production and spend much more time stabilizing the system after it is deployed. You might think you met the deadline shipping the product, but the quality of the product is low and you are going to pay for that in maintenance and reputation.

In addition to all that working to meet deadlines create stress on the developers which is not a healthy environment to be in for a long time, if they move from one deadline to another.

Now don’t get me wrong, by not setting a deadline you are not giving a free hand to project to stretch beyond reason. Developers should be aware that their performance is measured but they should also know that they can take the time to produce a high quality product by not cutting corners. In most cases a project can be delayed by few days or weeks without any real consequences to the company, and developers should know that if they have the time they need to produce a good product.

In the exception where you do have a deadline which you cannot postpone the delivery, you should take into consideration that there will quality issues and design flaws. After the delivery you should give developers time to complete the missing tests, do necessary refactoring and bring the product to the desired quality.

See Next post: Sustainable Software Deliverability with Timelines


Dev Centric - Trust And Collaboration

Filed under: — Aviran Mordo

After my first post about Dev Centric culture I got many questions on the topic which I will try to explain in the next few posts.
At first glance Dev Centric sounds like the only one that matters is the developer and everyone else are just simple helpers. This cannot be farther from the truth.

To understand Dev Centric let’s take a step back and describe how a company grows but before that we need to understand how a software company ship a product. Like every manufacturing plant a software company also have a pipe line that a product need to go through in order for it to be shipped to the customers.

Here is a standard software pipe line.

Product definition -> Design -> Develop -> Build -> QA -> Deploy

When a company is small you have only a handful of developers which are the pipeline, and do all the work. They design, code, deploy, maintain, do QA and also define the product. However while good developer can do all these tasks it comes with a price, they do not focus on what they are best of, which is writing code.

So in order to make the product better a company hires a product manager, which is probably going to do a better job at designing a product than the developer. Product manager has better skills and specializes in product definition phase of the pipe line. Now while the company is small PM work closely with the developers with a lot of collaboration.

Same goes with QA, operations and architects which each one can probably do a better job than the developer on their own field of experties. While the company is small they all work together. However when a company grows then walls starting to show as every group want to control their aspect of the pipe line, which in turns causes mistrust, and slows down the whole “production line” as you define more structured process and work flows.

Dev Centric culture tries to make the production line as fast as possible. Now if we look at the pipeline the one person who we cannot be without is the developer. The developer IS the production line, he is the one that manufacture the product. However since we also want the best quality product we cannot give up PM, QA and Ops, they are very important part of the manufacturing floor.

So since the developer is the one that ships the product we want to make the process as fast and as efficient as we can without losing quality. In order for the developer to create the best product he needs to understand it.
The best way to understand a product is to help define it. This is where Dev Centric comes into the picture. Developers should work together with PM and help define the product. Not buy getting a thousand pages spec but to sit in the same room with the PM and discuss the product, define it while writing the code. This way the code that the developer writes is the right one and there are no misunderstandings between what the PM intent and what the developer understood.

Same goes with QA. Developers should work closely with QA so they understand each other. QA understands the product the same way the developers do. The developers should continuously release code to QA during the development process to shorten the cycles. The best way is to work Test Driven Development (TDD) where developers are also writing the automated tests for their code and QA backs up the developers with more comprehensive end to end tests which also serve as acceptance tests. Another important role for QA is to review the developer’s test cases and point out if there are un-tested use cases.
So the same goes with Ops and architect like mentioned in my previous post.

Dev centric basically clears the path for the developer to deliver faster and better product by breaking the walls in the manufacturing floor having everyone working in collaboration and focusing on what is important (delivering the best product) by creating trust between people with different agendas and by doing that create a productive environment.


Building Engineering Culture Based On Quality To Drive Innovation

Filed under: — Aviran Mordo

When I joined Wix in 2010 my job was to rebuild the back-end engineering team. The previous team that took the company to where it was back then was scattered to other projects and except for one developer from the original team I had a completely new team.

A couple of months after I arrived and started to figure things out with my new team we decided to move to continuous delivery methodology. Since we faced many challenges in both moving to continuous delivery and the need to re-write the whole back-end system, we needed very good software engineers to build a new framework and to be the first ones to lead the company’s Dev-Centric culture.

We wanted to create a culture based on quality in terms of software engineering and people responsibilities. Since every person in a growing company has a profound effect on the company’s culture, it sets the tone for the recruitment process. Ever since I got to Wix I have never stopped recruiting engineers, however recruiting is a big challenge. I was looking for exceptional software engineers. The standards for passing the interview process is very high and very few actually succeeded, but that is a price I’m willing to pay in order to build an ‘A team’.



Lifecycle – Wix’ Integrated CI/CD Dashboard

Filed under: — Aviran Mordo

This post was originally published by Ory Henn on Wix engineering blog, who is part of our continuous integration / continuous deployment team. Since I write about Wix’s continuous delivery I think this post might interest you.

There Are So Many CI/CD Tools, Why Write One In-house?

About 3 years ago we first set foot on the long and winding road to working in a full CI/CD mode. Since then we have made some impressive strides, finding the way to better development and deployment while handling several years’ worth of legacy code.

An important lesson learned was that CI/CD is a complex process, both in terms of conceptual changes required from developers and in terms of integrating quite a few tools and sub-processes into a coherent and cohesive system. It quickly became clear that with the growing size of the development team, a central point of entry into the system was needed. This service should provide a combined dashboard and action center for CI/CD, to prevent day to day development from becoming dependent on many scattered tools and requirements.

To this end, we started building Wix Lifecycle more than two years ago. This post describes Lifecycle in broad terms, giving an overview of what it can do (and a little about what it is going to be able to do). Following posts will describe interesting series and other fun stuff the system does.

Keep in mind that our processes are nowhere near full CI/CD yet, and that Wix Lifecycle has to allow a great deal of backward compatibility for processes that are remnants of earlier days, and for teams that move towards the end goal at a different pace.
So What Does It Do?


Continuous Delivery - Part 8 - Deploying To Production

Filed under: — Aviran Mordo

Previous chapter: Cultural Change

It is about time we talk about the actual continuous delivery process works, application lifecycle and how the code reaches production once development is done.

Phase 1: Git – Developers push the completed code (and the tests) to a Git repository.

Phase 2: Build – Team city (our CI build server) is triggered - checks out the code from Git; Runs the build, unit tests and integration tests. Once the build is passed a SNAPSHOT artifact is created and pushed to Artifactory.

So far the process was automatically. At this point we decided to go to a manual process in order to do the actual deployments. However this process should be as easy as pressing a button (which it actually is). We are now in the process of also automating deployment to staging making staging continuous deployment environment and production continuous delivery.



Continuous Delivery - Part 7 - Cultural Change

Filed under: — Aviran Mordo

Previous chapter: Backward and forward compatibility

In order for continuous delivery to work the organization has to go a cultural change and switch to Dev-Centric Culture.

Continuous delivery gives a lot of responsibility in the hand of the developer and as such the developer need to have a sense of ownership. At Wix we say that it is the developer’s responsibility to bring a product / feature to production, and he is responsible from the feature inception to the actual delivery of the product and maintaining it on the production environment.

In order to do that several things have to happen.

Know the business :
The developer has to know the business and be connected to the product he is developing. By understanding the product the developers makes better decisions and better products. Developers are pretty smart people and they don’t need to have product specs in their face (nobody actually reads them). Our developers work together with the product mangers to determine what the product for the feature should be. Remember while the actual product may have several features bundled together for it to be valuable for the customer, we develop per feature and deploy it as soon as it is ready. This way both the product manager and the developer get to test and experience each feature on production (it is only exposed to them via Feature toggle) and determine if it is good enough, and may cause the direction of the product to change not according to plan, and actually develop a different next feature than the planned one.

Take ownership
Developers are taking ownership on the whole process, which means that they are the ones that need to eventually deploy to production. This statement actually changes the roles of several departments. What the company needs to do is to remove every obstacle in the developers way to deploy quickly on his own to production.

The operations department will no longer be doing the deployments. What they will do from now on is to create the automated tooling that will allow the developers to deploy on his own.
Operations together with dev will create the tooling to make the production environment visible and easy to understand to developers. Developers should not start a shell console (ssh) in order to view and know what is going on with the servers. We created web views for monitored metrics of both system and application metrics and exceptions.


Continuous Delivery - Part 6 - Backward & Forward Compatibility

Filed under: — Aviran Mordo

Previous Chapter: Startup - Self Test

One very important mind set developers will have to adopt and practice is backward and forward compatibility.

Most production system do not consist on just one server, but a cluster of servers. When deploying new piece of code, you do not deploy it to all the servers at once because part of Continuous deployment strategy is zero downtime during deployment. If you deploy to all the servers at once and the deployment require a server restart then all the servers will restart at the same time causing a downtime.

Now think of the following scenario. You write a new code that requires a new field in a DTO and it is written to a database. Now if you deploy your servers gradually you will have a period of time that some servers will have the new code that uses the new field and some will not. The servers that have the new code will send the new field in the DTO and the servers that not yet deployed will not have the new field and will not recognize it.

Continuous Delivery - Backward & Forward Compatibility

One more important concept is to avoid deployment dependencies where you have to deploy one set of services before you deploy the other set. If we’ll use our previous example this will even make things worse. Let’s say you work with SOA architecture and you have now clients that send the new field and some clients that do not. Or you deploy the clients that now send the new field but you have not yet deployed the servers that can read them and might break. You might say, well I will not do that and I will first deploy the server that can read the new field and only after that I’ll deploy the client that sends it. However in Continuous deployment as easily as you can deploy new code you can also rollback you code. So even if you deploy first the server and then the client you might now roll back the server without rolling back the client, thus creating again the situation where clients send unknown fields to the server.


Continuous Delivery - Part 5 - Startup - Self Test

Filed under: — Aviran Mordo

Previous Chapter: A/B Testing

So far we discussed Feature Toggle and A/B testing. These two methods enable safe guards that your code does not harm your system. Feature toggles enable to gradually use new features and gradually expose it to users, while monitoring that the system behaves as expected. A/B testing on the other hand let you test how your users react to new features. There is one more crucial test you as a developer should write that will protect your system from bad deployments (and also be a key to gradual deployment implementation).

Self-test sometimes called startup test or post deployment test is a test where your system checks that it can actually work properly. A working program does not only consist of the code that the developer write. In order for an application to work it needs configuration values, external resources and dependencies such as databases and external services it depends on to work properly.

When an application loads and starts up its first operation should be Self-test. Your application should refuse to process any operation if the self-test did not pass successfully.


Continuous Delivery - Part 4 - A/B Testing

Filed under: — Aviran Mordo

Previous chapter: Continuous Delivery - Part 3 - Feature Toggles

UPDATE: We released PETRI our 3′rd generation experiment system as an open source project available on Github

From Wikipedia: In web development and marketing, as well as in more traditional forms of advertising, A/B testing or split testing is an experimental approach to web design (especially user experience design), which aims to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement). As the name implies, two versions (A and B) are compared, which are identical except for one variation that might impact a user’s behavior. Version A might be the currently used version, while Version B is modified in some respect. For instance, on an e-commerce website the purchase funnel is typically a good candidate for A/B testing, as even marginal improvements in drop-off rates can represent a significant gain in sales. Significant improvements can be seen through testing elements like copy text, layouts, images and colors.

Although it sounds similar to feature toggles, there is a conceptual difference between A/B testing and feature toggles. With A/B test you measure an outcome for of a completed feature or flow, which hopefully does not have bugs. A/B testing is a mechanism to expose a finished feature to your users and test their reaction to it. While with feature toggle you would like to test that the code behaves properly, as expected and without bugs. In many cases feature toggles are used on the back-end where the users don’t not really experience changes in flow, while A/B tests are used on the front-end that exposes the new flow or UI to users.

Consistent user experience.
One important point to notice in A/B testing is consistent user experience. For instance you cannot display a new menu option one time, not show the same option a second time the user returns to your site or if the user refreshes the browser. So depending on the strategy you’re A/B test works to determine if a user is in group A or in group B , it should be consistent. If a user comes back to your application they should always “fall” to the same test group.

Powered by WordPress