8/14/2015

Games of Gangs

Filed under: — Aviran Mordo
games-of-thrones-1.jpg

Working in a product company you are always in conflict between the product short term, long term goals and tasks that engineering want and need to do, but have nothing to do with the product itself, for instance improving the testing framework, building plugins for the IDE that will improve their day to day or even creating back-office tools that will solve other people in the company day to day problems or issues.
There are also tasks that engineers want to do to pay technical debt on the product, thus improving the long-term maintainability of the code.
Getting this quality time for tasks not directly related to the product development is hard as there is always pressure to release the next feature.

This is where the Guild steps in and help making it possible and creating a balance between working on features and taking care of other engineering and company wide concerns.

Like I described in the previous post, 20% of the time (one day a week) is dedicated for Guild activities. As part of the Guild activities we wanted this day to not only be about talking and learning, but also about doing. So we have created a game called “Games of Gangs).

“Games of Gangs” is a gamification of the guild tasks, which in its core is our main value of building an engineering culture and knowledge sharing.
While the first half of the day is mostly dedicated to retrospective and training, the second half of the day is dedicated to doing Guild related tasks.

A Games of Gangs task can be anything that is not directly related to the product that the engineer is working on. Also we want to enhance our engineering culture and knowledge sharing by using these tasks as a tool for learning and improving. So here are the guidelines we put for the game’s tasks (these are guidelines and not rules):
Tasks should be done in pair programing with an engineer from and different team.
Tasks should conform to at least one criterion:

  • Enhance quality
  • Improve velocity
  • Enhance our framework
  • Help another company with their own tasks
  • Share knowledge

Examples for good tasks we had are: Creating maven Archetype for new projects; reduce build time; creating CMS for our studio to manage templates; enhancing our monitoring capabilities.

To kick start this activity and to encourage people’s participation we had points assigned to tasks based on the task value and its knowledge sharing value. For instance if you do a solo task you will get only 1 point, but if you do it in pair each one will get 2 points. If you pair up with someone not from your company you will get 3 points and of you do it with someone from an off-shore office you’ll get 4 points.
You would also get points for doing lectures, writing blog posts on our engineering blog and other knowledge sharing activities.

Games of Gangs can sometimes be dedicated to a specific topic we want to push. For instance cleaning warnings in the code, or upgrading to a new Scala version.

So “Games of Gangs” has become a great way to balance between the engineering needs and the product needs while putting our engineering culture to play. It also creates the much-needed personal relationship between Guild members who do not meet on any other day as they are working for different companies at different physical locations.

8/12/2015

MySQL Is a Great NoSQL

Filed under: — Aviran Mordo

NoSQL is a set of database technologies built to handle massive amounts of data or specific data structures foreign to relational databases. However, the choice to use a NoSQL database is often based on hype, or a wrong assumption that relational databases cannot perform as well as a NoSQL database. Operational cost is often overlooked by engineers when it comes to selecting a database. At Wix engineering, we’ve found that in most cases we don’t need a NoSQL database, and that MySQL is a great NoSQL database if it’s used appropriately.

When building a scalable system, we found that an important factor is using proven technology so that we know how to recover fast if there’s a failure. For example, you can use the latest and greatest NoSQL database, which works well in theory, but when you have production problems, how long does it take to resume normal activity? Pre-existing knowledge and experience with the system and its workings—as well as being able to Google for answers—is critical for swift mitigation. Relational databases have been around for over 40 years, and there is a vast industry knowledge of how to use and maintain them. This is one reason we usually default to using a MySQL database instead of a NoSQL database, unless NoSQL is a significantly better solution to the problem—for example, if we need a document store, or to handle high data volume that MySQL cannot handle.

However, using MySQL in a large-scale system may have performance challenges. To get great performance from MySQL, we employ a few usage patterns. One of these is avoiding database-level transactions. Transactions require that the database maintains locks, which has an adverse effect on performance.

Instead, we use logical application-level transactions, thus reducing the load and extracting high performance from the database. For example, let’s think about an invoicing schema. If there’s an invoice with multiple line items, instead of writing all the line items in a single transaction, we simply write line by line without any transaction. Once all the lines are written to the database, we write a header record, which has pointers to the line items’ IDs. This way, if something fails while writing the individual lines to the database, and the header record was not written, then the whole transaction fails. A possible downside is that there may be orphan rows in the database. We don’t see it as a significant issue though, as storage is cheap and these rows can be purged later if more space is needed.

Here are some of our other usage patterns to get great performance from MySQL:
Do not have queries with joins; only query by primary key or index.
Do not use sequential primary keys (auto-increment) because they introduce locks. Instead, use client-generated keys, such as GUIDs. Also, when you have master-master replication, auto-increment causes conflicts, so you will have to create key ranges for each instance.
Any field that is not indexed has no right to exist. Instead, we fold such fields into a single text field (JSON is a good choice).

We often use MySQL simply as a key-value store. We store a JSON object in one of the columns, which allows us to extend the schema without making database schema changes. Accessing MySQL by primary key is extremely fast, and we get submillisecond read time by primary key, which is excellent for most use cases. So we found that MySQL is a great NoSQL that’s ACID compliant.

In terms of database size, we found that a single MySQL instance can work perfectly well with hundreds of millions of records. Most of our use cases do not have more than several hundred million records in a single instance.

One big advantage to using relational databases as opposed to NoSQL is that you don’t need to deal with the eventually consistent nature displayed by most NoSQL databases. Our developers all know relational databases very well, and it makes their lives easy.

Don’t get me wrong, there is a place for NoSQL; relational databases have their limits—single host size and strict data structures. Operational cost is often overlooked by engineers in favor of the cool new thing. If the two options are viable, we believe we need to really consider what it takes to maintain it in production and decide accordingly.

This article is published on JAX Magazine.

I will be speaking at JAX London and would be happy if you join my sessions. Get 10% off if you use promo code: SPKR_JLAM
Aviran Mordo - JAX London

8/10/2015

Building a Guild

Filed under: — Aviran Mordo

A lot of people heard about Spotify company structure of Guilds and tribes. We at Wix.com have a similar structure that has evolved over time and influenced by their ideas, however we have our own interpretation of the structure and the role of the Guild.

In this article I will try to walk down memory lane and describe my experience in building the first Guild (back-end JVM Guild) in Wix, which is now the role model of all the other Guilds at the company, and how it has evolved from the time I joined Wix when we had one back-end team of 4 developers to a about 100 back-end engineers in the back-end guild.

The Guild model did not start right away, when you are a relatively small startup all you have is teams, and this is exactly what we had. We had one server team (4 engineers) that was basically responsible for all the back-end development at Wix. As there was a demand for more back-end engineers the team grew very slowly. As with a small startup the recruitment process was very picky and we were only looking for the best engineers. At the course of a year I have only recruited 4 senior engineers. While this is very slow at this stage of the company it was very important to pick only the best engineers you can find, as these are the core engineering team that will help to build and shape the Guild and the engineering culture at the company in the future.

At this point where we had around 10 engineers we were pretty much functional teams, where everybody knew almost everything and I could move people from project to project according to the company’s priorities.

As we continue to grow (doubling the number of people every year) we saw that we are very good in focusing our efforts in some areas where are that point the company decided to invest, but were neglecting other existing products that had to compete on shared engineering resources but without any priority.

At this point we realized that we need dedicated engineers for each product group (at least for the big ones). We still didn’t have a name for that but I had essentially assigned some developers to be dedicated on some products while the other still remained shared resources.

As Wix continued its growth we had different groups of people who worked on different projects and were less engage with each other. So what we started to do is to formalize our engineering culture. While we always had a strong ownership and DevOps culture we started more and more being involved in knowledge sharing activities in order to keep our engineering teams on the cutting edge and learn from each team’s experience.

At this point we started to have discussions about how to structure the company. We looked around and found the Spotify paper. We realized that while we don’t have a name for our current structure it resembled to what Spotify had. So we adopted some of the naming and agreed that we should be working at a form of Guilds that are defined by a profession; and Gangs, which are the product teams.
Initially we only had the engineering Gangs who were dedicated to a product with all the other as shared resources across products.

This was the point where the role of the Guild had started to form.

The Guild is the one who is responsible for the person’s profession thus the Guild has the following responsibilities:

Recruitment (Hiring and firing)
Assignment to product teams according to the company’s priorities.
Setting the professional guidelines.
Training.
Set the engineers compensation (salary, bonuses etc’).
Create an engineering brand for the company.
Be responsible for the professional development / career of the engineers.

As Wix continued to grow we started to have more and more projects and product teams. What we realized then is that while having dedicated engineering teams (Gangs) is not enough because there was a bottleneck on the other shared resources. Also we had multiple products that had a common domain. What we wanted to do is to give as much independence to each product domain / vertical.

So once more we had to evolve and created what we call now a “Company”. A company is like a startup within Wix, it has all the resources it needs (developers, product manages, analysts, UX engineers, marketing etc’) in order to progress fast and create the best product they can do regardless if, the other products at Wix.

At this point the Guild also had to take on more responsibilities. While we want the “Companies” to progress as fast as they can, the “Companies” also has to keep alignment with Wix as a whole. Another issue is that we expect these “Companies” to create products that compete in the free market with other startups and big companies, but with limited resources.

The Guild now needs to play a big role in enabling the success of the “Companies” within Wix. If each “Company” had to develop everything on their own, for instance the frameworks, deployment, taking care of the infrastructure, monitoring etc’ they would not stand a chance to compete with whole companies that are doing the same product with more resources. So the Guild now took another responsibility in taking care of all the infrastructure, deployment pipeline, and core services that all the “Companies” share. For instance is we see a service that is needed in more than two companies (for example mailing service), we develop it in the Guild (which has its own core services teams) and all the other “Companies” can use this service, thus focusing only on the product itself and not having to worry about the infrastructure.

In order to keep alignment with the other “Companies”, and make it easier for engineers to move between “Companies”, share knowledge and best practices, all the “Companies” share the same infrastructure and methodologies. This is a tradeoff between freedom and velocity. You loose some freedoms but gain a lot of velocity as many of the things you need for your service are already there for you.

Now a “Company” may decide (in coordination with the Guilds) that using the existing infrastructure is the wrong solution for the product they own, and they want to develop on a different stack. They can do that, however they will need to take full responsibility over the whole application lifecycle, deployment, monitoring and integration with the other Wix echo-system. This is a lot of work and usually time to market will be very long, having to develop all the infrastructure on their own, so almost every “Company” opt-in to use the current infrastructure, although we have several cases where it was the right decision to develop some products on a different stack .

So if I was to describe the line of responsibility between a “Company” and a Guild is that the “Company” decides what to do, and the Guild say how to do it.

So now that we have “Companies” and Guilds, the Guild needs to assume more responsibilities in addition to the above:

Align between “Companies”. The Guilds are horizontal while “Companies” are verticals.
Support the engineers working in the “Companies”
Review and guidance
Develop shared infrastructure
Improving development velocity
Temporary help “Companies” in need with additional resources from the Guild.

Guild masters:
Guild masters are senior engineers that part of their responsibilities is to support engineers in different “Companies”. Guild masters conduct reviews, training, mentoring and also since they are horizontal and working with many companies they identify common issues, duplication of code between companies, understand the development bottlenecks and are trying to solve them. Also because of that they also pollinate “Companies” by bringing best practices and lessons learned from other “Companies”

Guild activities:
In order for the Guild to be able to take on these responsibilities it needs developer’s time so at Wix 20% of the engineering time is dedicated to the Guild activities.

Every Thursday we have a Guild day in which the Guild is conducting training activities and Guild tasks. All the engineers from all the “Companies” are assembled at one place for the Guild day.

Here is the back-end guild day schedule:
10:00-11:00 – Guild retrospective in which we discuss engineering dilemmas and lesson learned from across “Companies”.
11:00-11:15 – Break
11:15-11:30 – Project spotlight – where someone is presenting a new project that is being worked on, some lesson learned and challenges they have faced
11:30-13:00 (usually not the whole 1.5 hours is needed) – Tech talk, which if it does not contain any sensitive information is also open to the public at a meetup.
13:00–EOD – Lunch and Guild tasks. (The guild tasks are called “Games of Gangs”, but on “Games of Gangs” we’ll discuss on another post).

Powered by WordPress