3/23/2013

The Road To Continuous Delivery - Part 2 - Visibility

Filed under: — Aviran Mordo

Previous chapter: The road to continuous delivery - Part 1

Production visibility

A key point for a successful continuous delivery is to make the production matrix available to the developers. At the heart of continuous delivery methodology is to empower the developer and make the developers responsible for deployment and successful operations of the production environment. In order for the developers to do that you should make all the information about the applications running in production easily available.

Although we give our developers root (sudo) access for the production servers, we do not want our developers to look at the logs in order to understand how the application behaves in production and to solve problems when they occur. Instead we developed a framework that every application at Wix is built on, which takes care of this concern.

Every application built with our framework automatically exposes a web dashboard that shows the application state and statistics. The dashboard shows the following (partial list):
• Server configuration
• All the RPC endpoints
• Resource Pools statistics
• Self test status (will be explained in future post)
• The list of artifacts (dependencies) and their version deployed with this version
• Feature toggles and their values
• Recent log entries (can be filtered by severity)
• A/B tests
• And most importantly we collect statistics about methods (timings, exceptions, number of calls and historical graphs).

Wix Dashboard

We use code instrumentation to automatically expose statistics on every controller and service end-point. Also developers can annotate methods they feel is important to monitor. For every method we can see the historical performance data, exception counters and also the last 10 exceptions for each method.

We have 2 categories for exceptions: Business exceptions and System exceptions.
Business exception is everything that has to do with application business logic. You will always have these kinds of exceptions like validation exceptions. The important thing to monitor on this kind of exception is to watch for sudden increase of these exceptions, especially after deployment.

The other type of exception is System exception. System exception is something like: “Cannot get JDBC connection”, or “HTTP connection timeout”. A perfect system should have zero System exceptions.
(more…)

Powered by WordPress