12/28/2013

Kill The Deadlines

Filed under: — Aviran Mordo

I have been building software for over 20 years and participated in many projects. Usually when you come to write a new feature or starting a new project one of the first thing your manager asks you is a time estimate (if you are lucky) and then will set a deadline for the project completion.

Once a deadline is set everybody is start working to meet this date. While setting a deadline helps management plan and have visibility about what is coming from the development pipeline, the act of setting a deadline, especially an aggressive one is a destructive act that in most cases hearts the company in the long run.

Development work consists of much more than writing and testing code, it also consists of research and design. The problem is that while (experienced) developer can estimate the coding effort there is no real way to estimate the research phase, problems you may encounter and how long the design is going to take (if you want to make a good design). How can you really estimate how long it will take you to learn something that you don’t know?

If deadlines are aggressive, meeting them usually means that developers will start cutting corners. Do a bad design just because you don’t have the time to the right one. If developers are pressed in time they may stick to the bad design choice just because they don’t have time to switch the design to a better one after they realize their initial design has flaws.

Other things developers do in order to meet the deadline is to cut down on testing, while doing that hurts the quality of their work. While cutting down on automated testing may let you believe the work is progressing at a higher rate, however you will usually find the problems in production and spend much more time stabilizing the system after it is deployed. You might think you met the deadline shipping the product, but the quality of the product is low and you are going to pay for that in maintenance and reputation.

In addition to all that working to meet deadlines create stress on the developers which is not a healthy environment to be in for a long time, if they move from one deadline to another.

Now don’t get me wrong, by not setting a deadline you are not giving a free hand to project to stretch beyond reason. Developers should be aware that their performance is measured but they should also know that they can take the time to produce a high quality product by not cutting corners. In most cases a project can be delayed by few days or weeks without any real consequences to the company, and developers should know that if they have the time they need to produce a good product.

In the exception where you do have a deadline which you cannot postpone the delivery, you should take into consideration that there will quality issues and design flaws. After the delivery you should give developers time to complete the missing tests, do necessary refactoring and bring the product to the desired quality.

11/18/2013

Dev Centric - Trust And Collaboration

Filed under: — Aviran Mordo

After my first post about Dev Centric culture I got many questions on the topic which I will try to explain in the next few posts.
At first glance Dev Centric sounds like the only one that matters is the developer and everyone else are just simple helpers. This cannot be farther from the truth.

To understand Dev Centric let’s take a step back and describe how a company grows but before that we need to understand how a software company ship a product. Like every manufacturing plant a software company also have a pipe line that a product need to go through in order for it to be shipped to the customers.

Here is a standard software pipe line.

Product definition -> Design -> Develop -> Build -> QA -> Deploy

When a company is small you have only a handful of developers which are the pipeline, and do all the work. They design, code, deploy, maintain, do QA and also define the product. However while good developer can do all these tasks it comes with a price, they do not focus on what they are best of, which is writing code.

So in order to make the product better a company hires a product manager, which is probably going to do a better job at designing a product than the developer. Product manager has better skills and specializes in product definition phase of the pipe line. Now while the company is small PM work closely with the developers with a lot of collaboration.

Same goes with QA, operations and architects which each one can probably do a better job than the developer on their own field of experties. While the company is small they all work together. However when a company grows then walls starting to show as every group want to control their aspect of the pipe line, which in turns causes mistrust, and slows down the whole “production line” as you define more structured process and work flows.

Dev Centric culture tries to make the production line as fast as possible. Now if we look at the pipeline the one person who we cannot be without is the developer. The developer IS the production line, he is the one that manufacture the product. However since we also want the best quality product we cannot give up PM, QA and Ops, they are very important part of the manufacturing floor.

So since the developer is the one that ships the product we want to make the process as fast and as efficient as we can without losing quality. In order for the developer to create the best product he needs to understand it.
The best way to understand a product is to help define it. This is where Dev Centric comes into the picture. Developers should work together with PM and help define the product. Not buy getting a thousand pages spec but to sit in the same room with the PM and discuss the product, define it while writing the code. This way the code that the developer writes is the right one and there are no misunderstandings between what the PM intent and what the developer understood.

Same goes with QA. Developers should work closely with QA so they understand each other. QA understands the product the same way the developers do. The developers should continuously release code to QA during the development process to shorten the cycles. The best way is to work Test Driven Development (TDD) where developers are also writing the automated tests for their code and QA backs up the developers with more comprehensive end to end tests which also serve as acceptance tests. Another important role for QA is to review the developer’s test cases and point out if there are un-tested use cases.
So the same goes with Ops and architect like mentioned in my previous post.

Dev centric basically clears the path for the developer to deliver faster and better product by breaking the walls in the manufacturing floor having everyone working in collaboration and focusing on what is important (delivering the best product) by creating trust between people with different agendas and by doing that create a productive environment.

10/7/2013

Building R&D Culture Based On Quality To Drive Innovation

Filed under: — Aviran Mordo

When I joined Wix in 2010 my job was to rebuild the back-end engineering team. The previous team that took the company to where it was back then was scattered to other projects and except for one developer from the original team I had a completely new team.

A couple of months after I arrived and started to figure things out with my new team we decided to move to continuous delivery methodology. Since we faced many challenges in both moving to continuous delivery and the need to re-write the whole back-end system, we needed very good software engineers to build a new framework and to be the first ones to lead the company’s Dev-Centric culture.

We wanted to create a culture based on quality in terms of software engineering and people responsibilities. Since every person in a growing company has a profound effect on the company’s culture, it sets the tone for the recruitment process. Ever since I got to Wix I have never stopped recruiting engineers, however recruiting is a big challenge. I was looking for exceptional software engineers. The standards for passing the interview process is very high and very few actually succeeded, but that is a price I’m willing to pay in order to build an ‘A team’.

(more…)

8/22/2013

Lifecycle – Wix’ Integrated CI/CD Dashboard

Filed under: — Aviran Mordo

This post was originally published by Ory Henn on Wix engineering blog, who is part of our continuous integration / continuous deployment team. Since I write about Wix’s continuous delivery I think this post might interest you.

There Are So Many CI/CD Tools, Why Write One In-house?

About 3 years ago we first set foot on the long and winding road to working in a full CI/CD mode. Since then we have made some impressive strides, finding the way to better development and deployment while handling several years’ worth of legacy code.

An important lesson learned was that CI/CD is a complex process, both in terms of conceptual changes required from developers and in terms of integrating quite a few tools and sub-processes into a coherent and cohesive system. It quickly became clear that with the growing size of the development team, a central point of entry into the system was needed. This service should provide a combined dashboard and action center for CI/CD, to prevent day to day development from becoming dependent on many scattered tools and requirements.

To this end, we started building Wix Lifecycle more than two years ago. This post describes Lifecycle in broad terms, giving an overview of what it can do (and a little about what it is going to be able to do). Following posts will describe interesting series and other fun stuff the system does.

Keep in mind that our processes are nowhere near full CI/CD yet, and that Wix Lifecycle has to allow a great deal of backward compatibility for processes that are remnants of earlier days, and for teams that move towards the end goal at a different pace.
So What Does It Do?
(more…)

8/15/2013

Continuous Delivery - Part 8 - Deploying To Production

Filed under: — Aviran Mordo

Previous chapter: Cultural Change

It is about time we talk about the actual continuous delivery process works, application lifecycle and how the code reaches production once development is done.

Phase 1: Git – Developers push the completed code (and the tests) to a Git repository.

Phase 2: Build – Team city (our CI build server) is triggered - checks out the code from Git; Runs the build, unit tests and integration tests. Once the build is passed a SNAPSHOT artifact is created and pushed to Artifactory.

So far the process was automatically. At this point we decided to go to a manual process in order to do the actual deployments. However this process should be as easy as pressing a button (which it actually is). We are now in the process of also automating deployment to staging making staging continuous deployment environment and production continuous delivery.

(more…)

6/24/2013

Continuous Delivery - Part 7 - Cultural Change

Filed under: — Aviran Mordo

Previous chapter: Backward and forward compatibility

In order for continuous delivery to work the organization has to go a cultural change and switch to Dev-Centric Culture.

Continuous delivery gives a lot of responsibility in the hand of the developer and as such the developer need to have a sense of ownership. At Wix we say that it is the developer’s responsibility to bring a product / feature to production, and he is responsible from the feature inception to the actual delivery of the product and maintaining it on the production environment.

In order to do that several things have to happen.

Know the business :
The developer has to know the business and be connected to the product he is developing. By understanding the product the developers makes better decisions and better products. Developers are pretty smart people and they don’t need to have product specs in their face (nobody actually reads them). Our developers work together with the product mangers to determine what the product for the feature should be. Remember while the actual product may have several features bundled together for it to be valuable for the customer, we develop per feature and deploy it as soon as it is ready. This way both the product manager and the developer get to test and experience each feature on production (it is only exposed to them via Feature toggle) and determine if it is good enough, and may cause the direction of the product to change not according to plan, and actually develop a different next feature than the planned one.

Take ownership
Developers are taking ownership on the whole process, which means that they are the ones that need to eventually deploy to production. This statement actually changes the roles of several departments. What the company needs to do is to remove every obstacle in the developers way to deploy quickly on his own to production.

The operations department will no longer be doing the deployments. What they will do from now on is to create the automated tooling that will allow the developers to deploy on his own.
Operations together with dev will create the tooling to make the production environment visible and easy to understand to developers. Developers should not start a shell console (ssh) in order to view and know what is going on with the servers. We created web views for monitored metrics of both system and application metrics and exceptions.
(more…)

4/21/2013

Continuous Delivery - Part 6 - Backward & Forward Compatibility

Filed under: — Aviran Mordo

Previous Chapter: Startup - Self Test

One very important mind set developers will have to adopt and practice is backward and forward compatibility.

Most production system do not consist on just one server, but a cluster of servers. When deploying new piece of code, you do not deploy it to all the servers at once because part of Continuous deployment strategy is zero downtime during deployment. If you deploy to all the servers at once and the deployment require a server restart then all the servers will restart at the same time causing a downtime.

Now think of the following scenario. You write a new code that requires a new field in a DTO and it is written to a database. Now if you deploy your servers gradually you will have a period of time that some servers will have the new code that uses the new field and some will not. The servers that have the new code will send the new field in the DTO and the servers that not yet deployed will not have the new field and will not recognize it.

Continuous Delivery - Backward & Forward Compatibility

One more important concept is to avoid deployment dependencies where you have to deploy one set of services before you deploy the other set. If we’ll use our previous example this will even make things worse. Let’s say you work with SOA architecture and you have now clients that send the new field and some clients that do not. Or you deploy the clients that now send the new field but you have not yet deployed the servers that can read them and might break. You might say, well I will not do that and I will first deploy the server that can read the new field and only after that I’ll deploy the client that sends it. However in Continuous deployment as easily as you can deploy new code you can also rollback you code. So even if you deploy first the server and then the client you might now roll back the server without rolling back the client, thus creating again the situation where clients send unknown fields to the server.
(more…)

4/14/2013

Continuous Delivery - Part 5 - Startup - Self Test

Filed under: — Aviran Mordo

Previous Chapter: A/B Testing

So far we discussed Feature Toggle and A/B testing. These two methods enable safe guards that your code does not harm your system. Feature toggles enable to gradually use new features and gradually expose it to users, while monitoring that the system behaves as expected. A/B testing on the other hand let you test how your users react to new features. There is one more crucial test you as a developer should write that will protect your system from bad deployments (and also be a key to gradual deployment implementation).

Self-test sometimes called startup test or post deployment test is a test where your system checks that it can actually work properly. A working program does not only consist of the code that the developer write. In order for an application to work it needs configuration values, external resources and dependencies such as databases and external services it depends on to work properly.

When an application loads and starts up its first operation should be Self-test. Your application should refuse to process any operation if the self-test did not pass successfully.
(more…)

4/4/2013

Continuous Delivery - Part 4 - A/B Testing

Filed under: — Aviran Mordo

Previous chapter: Continuous Delivery - Part 3 - Feature Toggles

From Wikipedia: In web development and marketing, as well as in more traditional forms of advertising, A/B testing or split testing is an experimental approach to web design (especially user experience design), which aims to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement). As the name implies, two versions (A and B) are compared, which are identical except for one variation that might impact a user’s behavior. Version A might be the currently used version, while Version B is modified in some respect. For instance, on an e-commerce website the purchase funnel is typically a good candidate for A/B testing, as even marginal improvements in drop-off rates can represent a significant gain in sales. Significant improvements can be seen through testing elements like copy text, layouts, images and colors.

Although it sounds similar to feature toggles, there is a conceptual difference between A/B testing and feature toggles. With A/B test you measure an outcome for of a completed feature or flow, which hopefully does not have bugs. A/B testing is a mechanism to expose a finished feature to your users and test their reaction to it. While with feature toggle you would like to test that the code behaves properly, as expected and without bugs. In many cases feature toggles are used on the back-end where the users don’t not really experience changes in flow, while A/B tests are used on the front-end that exposes the new flow or UI to users.

Consistent user experience.
One important point to notice in A/B testing is consistent user experience. For instance you cannot display a new menu option one time, not show the same option a second time the user returns to your site or if the user refreshes the browser. So depending on the strategy you’re A/B test works to determine if a user is in group A or in group B , it should be consistent. If a user comes back to your application they should always “fall” to the same test group.
(more…)

3/27/2013

Continuous Delivery - Part 3 - Feature Toggles

Filed under: — Aviran Mordo

Previous chapter:The Road To Continuous Delivery - Part 2 - Visibility

One of the key elements in Continuous Delivery is the fact that you stop working with feature branches in your VCS repository; everybody works on the MASTER branch. During our transition to Continuous Deployment we switched from SVN to Git, which handles code merges much better, and has some other advantages over SVN; however SVN and basically every other VCS will work just fine.

For people who are just getting to know this methodology it sounds a bit crazy because they think developers cannot check-in their code until it’s completed and all the tests pass. But this is definitely not the case. Working in Continuous Deployment we tell developers to check-in their code as often as possible, at least once a day. So how can this work? Developers cannot finish their task in one day? Well there are few strategies to support this mode of development.

Feature toggles
Telling your developers they must check-in their code at least once a day will get you the reaction of something like “But my code is not finished yet, I cannot check it in”. The way to overcome this “problem” is with feature toggles.

Feature Toggle is a technique in software development that attempts to provide an alternative to maintaining multiple source code branches, called feature branches.
Continuous release and continuous deployment enables you to have quick feedback about your coding. This requires you to integrate your changes as early as possible. Feature branches introduce a by-pass to this process. Feature toggles brings you back to the track, but the execution paths of your feature is still “dead” and “untested”, if a toggle is “off”. But the effort is low to enable the new execution paths just by setting a toggle to “on”.

So what is really a feature toggle?
Feature toggle is basically an “if” statement in your code that is part of your standard code flow. If the toggle is “on” (the “if” statement == true) then the code is executed, and if the toggle is off then the code is not executed.
Every new feature you add to your system has to be wrapped in a feature toggle. This way developers can check-in unfinished code, as long as it compiles, that will never get executed until you change the toggle to “on”. If you design your code correctly you will see that in most cases you will only have ONE spot in your code for a specific feature toggle “if” statement.
(more…)

3/23/2013

The Road To Continuous Delivery - Part 2 - Visibility

Filed under: — Aviran Mordo

Previous chapter: The road to continuous delivery - Part 1

Production visibility

A key point for a successful continuous delivery is to make the production matrix available to the developers. At the heart of continuous delivery methodology is to empower the developer and make the developers responsible for deployment and successful operations of the production environment. In order for the developers to do that you should make all the information about the applications running in production easily available.

Although we give our developers root (sudo) access for the production servers, we do not want our developers to look at the logs in order to understand how the application behaves in production and to solve problems when they occur. Instead we developed a framework that every application at Wix is built on, which takes care of this concern.

Every application built with our framework automatically exposes a web dashboard that shows the application state and statistics. The dashboard shows the following (partial list):
• Server configuration
• All the RPC endpoints
• Resource Pools statistics
• Self test status (will be explained in future post)
• The list of artifacts (dependencies) and their version deployed with this version
• Feature toggles and their values
• Recent log entries (can be filtered by severity)
• A/B tests
• And most importantly we collect statistics about methods (timings, exceptions, number of calls and historical graphs).

Wix Dashboard

We use code instrumentation to automatically expose statistics on every controller and service end-point. Also developers can annotate methods they feel is important to monitor. For every method we can see the historical performance data, exception counters and also the last 10 exceptions for each method.

We have 2 categories for exceptions: Business exceptions and System exceptions.
Business exception is everything that has to do with application business logic. You will always have these kinds of exceptions like validation exceptions. The important thing to monitor on this kind of exception is to watch for sudden increase of these exceptions, especially after deployment.

The other type of exception is System exception. System exception is something like: “Cannot get JDBC connection”, or “HTTP connection timeout”. A perfect system should have zero System exceptions.
(more…)

3/16/2013

The Road To Continuous Delivery - Part 1

Filed under: — Aviran Mordo

The following series of posts are coming from my experience as the head of back-end engineering at Wix.com. I will try to tell the story of Wix and how we see and practice continuous delivery, hoping it will help you make the switch too.

So you decided that your development process is too slow and thinking to go to continuous delivery methodology instead of the “not so agile” Scrum. I assume you did some research, talked to a couple of companies and attended some lectures about the subject and want to practice continuous deployment too, but many companies asking me how to start and what to do?
In this series of articles I will try to describe some strategies to make the switch to Continuous delivery (CD).

Continuous Delivery is the last step in a long process. If you are just starting you should not expect that you can do this within a few weeks or even within few months. It might take you almost a year to actually make several deployments a day.
One important thing to know, it takes full commitment from the management. Real CD is going to change the whole development methodologies and affect everyone in the R&D.

Phase 1 – Test Driven Development
In order to do a successful CD you need to change the development methodology to be Test Driven Development. There are many books and online resources about how to do TDD. I will not write about it here but I will share our experience and the things we did in order to do TDD. One of the best books I recommend is “Growing Object Oriented Guided by tests

A key concept of CD is that everything should be tested automatically. Like most companies we had a manual QA department which was one of the reasons the release process is taking so long. With every new version of the product regression tests takes longer.

Usually when you’ll suggest moving to TDD and eventually to CI/CD the QA department will start having concerns that they are going to be expandable and be fired, but we did not do such thing. What we did is that we sent our entire QA department to learn Java. Up to that point our QA personnel were not developers and did not know how to write code. Our initial thought was that the QA department is going to write tests, but not Unit tests, they are going to write Integration and End to End Tests.

Since we had a lot of legacy code that was not tested at all, the best way to test it, is by integration tests because IT is similar to what manual QA is doing, testing the system from outside. We needed the man power to help the developers so training the QA personal was a good choice.

Now as for the development department, we started to teach the developers how to write tests. Of course the first tests we wrote were pretty bad ones but as time passes, like any skill, knowing how to write good test is also a skill, so it improves in time.
In order to succeed in moving to CD it is critical to get support from the management because before you see results there is a lot of investments to be done and development velocity is going to sink even further as you start training and building the infrastructure to support CD.
(more…)

Powered by WordPress