Updated: 5 days ago
Wix.com is a highly successful cloud-based web development platform that has scaled rapidly. We now support around 80 130 million website builders in more than 190 countries. This translates into approximately 2 petabytes of user media files and adds about 1.5 terabytes per day. So how did we get there? The 400-strong Wix engineering team used a microservices architecture and MySQL, MongoDB, and Cassandra. We host our platform in three data centers, as well as on the cloud using Amazon Web Services (AWS) and the Google Cloud Platform.
I've been working with Wix since 2010 and oversaw the engineering team's transition from a traditional waterfall development-based approach to agile methodologies and helped introduce DevOps and Continuous Delivery. Here’s what I learned about using microservices and MySQL to effectively support a fast-scaling environment.
How Wix defines microservices
Wix currently has around 200 microservices, but we didn’t start out on this path. During our days supporting 1 million sites back in 2008, we used a single-monolith approach with Java, Hibernate, Ehcache, Tomcat, and MySQL. This typical scale-up approach was useful in some aspects, but ultimately, we couldn't tolerate the downtime caused by poor code quality and interdependencies inside the monolith. So, we gradually moved to a service-level-driven architecture and broke down our monolith.
By our definition, (a single team can manage a few microservices), and the team must be able to describe each microservice's responsibility in one clear sentence.
Specifically, a microservice is a single application deployed as a process, with one clear responsibility. It does not have to be a single function or even a single class. Each microservice writes only to its own database, to keep things clean and simple. The microservice itself has to be stateless to support frequent deployments and multiple instances, and all persistent states are stored in the database.
Wix's four sets of microservices Our architecture involves four main groups of services:
Wix Media Platform (WixMP): This is an Internet media file system that was built and optimized for hosting and delivering images, video, music, and plain files, integrated with CDNs, SSL, etc. The platform runs on AWS and the Google Cloud Platform, using cloud compute instances and storage for on-the-fly image manipulation and video transcoding. We developed the compute instances software using Python, Go, and C, where applicable.
Verticals: This is a set of applications that adds value to a Wix site, such as eCommerce, Shoutout, or Hotels. The verticals are built using an Angular front end and the Jetty/Spring/Scala stack for the back end. We selected Angular over React for verticals because Angular provides a more complete application framework, including dependency injection and service abstraction.
Why MySQL is a great NoSQL Our microservices use MySQL, so scaling them involves scaling MySQL. We don't subscribe to the opinion, prevalent in our industry, that a relational database can’t perform as well as a NoSQL database. In our experience, engineers who make that assumption often ignore the operational costs, and don't always think through production scenarios, uptimes, existing support options, knowledge base maintenance, and more.
We've found that, in most cases, we don't need a NoSQL database, and that MySQL is a great NoSQL database if used appropriately. Relational databases have been around for more than 40 years, and there is a vast and easily accessible body of knowledge on how to use and maintain them. We usually default to using a MySQL database, and use NoSQL only in the cases where there's a significantly better solution to the problem, such as if we need a document store or a solution for a high data volume that MySQL cannot handle.
Scaling MySQL to support explosive growth Using MySQL in a large-scale system can present performance challenges. Here is a top 5 list of things we do to get great performance from MySQL:
Whenever possible, we avoid database-level transactions, since they require databases to maintain locks, which in turn have an adverse effect on performance. Instead, we use logical, application-level transactions, which reduce loads and extract better performance from the databases.
We do not use sequential primary keys because they introduce locks. Instead, we prefer client-generated keys, such as UUIDs. Also, when you have master-master replication, auto-increment causes conflicts, so you have to create key ranges for each instance.
We do not have queries with joins, and only look up or query by primary key or index. Any field that is not indexed has no right to exist. Instead, we fold such fields into a single text field (JSON is a good choice).
We often use MySQL simply as a key-value store. We store a JSON object in one of the columns, which allows us to extend the schema without making database schema changes. Accessing MySQL by primary key is extremely fast, and we get sub-millisecond read time by primary key, which is excellent for most uses. MySQL is a great NoSQL that's ACID compliant.
We are not big fans of sharding because it creates operational overhead in maintaining and replicating clusters inside and across data centers. In terms of database size, we've found that a single MySQL instance can work perfectly well with hundreds of millions of records. Having a microservices architecture helps, as it naturally splits the data into multiple databases for each microservice. When the data grows beyond a single instance capacity, we either choose to switch to a NoSQL database that can scale out (Cassandra is our default choice), or try to scale up the database and have no more than two shards.
It’s entirely possible to manage a fast-growing, scale-out architecture without being a cloud-native, two-year-old startup. It’s also possible to do this while combining microservices with relational databases. Taking a long, hard look at both the development and operational pros and cons of tooling options has served us well in creating our own story and in managing a best-in-class, SLA-oriented architecture that drives our business growth.