Our Journey with Proprietary, Inflexible Systems
Sometimes you need to play the short-game and roll out a solution to test its viability. When time is money, and you’re short on both, specific and ready-built solutions are a great way to get there in a hurry.
But if you are going for growth and the economies of scale, it’s worth the effort to plan ahead and look down the road. It can help prevent large cost and potential churn during the phase in your business when your client base is most critical.
For a burgeoning business, there is definitely danger ahead from rolling your own solutions from scratch. The exhilarating story from AWS and GCP of prebuilt plugins that just work* and never need maintenance seems so alluring if your entire team is wearing hats on top of hats and you don’t have the budget for a Dev-Ops team.
But we’ve been down that road of making decisions more tuned for the short-term, and have paid for it. A Help Desk full of burning issues, 36-hour code-a-thons to patch and replace unstable components before they break. Without our patient, kind clients and our stellar team, I doubt we would have been able to make it this far.
So, I want to go over some key watch-outs as you follow your own path and help you avoid the pitfalls.
The Mental Mentality of Just Ship It
Have you ever noticed that they rarely show planning in montages? It seems to lack productivity when you’re intoxicated on a fresh, new product idea.
“I can knock out a V1 this weekend!” your rockstar says, out of that all-hands meeting discussing the brand new product concept. Next week they demo it and it does well, gets some of the internal team to test and before you know it, it’s stable and in production. No worries, it’ll eventually get replaced by something more long-term once there’s time to make a better version.
There is never a good time to make it better. Never.
The system that you create, once data starts flowing through them, becomes a living entity that grows roots. Replacing it goes out the window once you realize it’ll take six weeks, competing with other projects that may end up being worth the time even more.
So, you rely on patches. They start small but grow increasingly complex as the feature is expanded upon and as it’s technical debt grows. Soon it’s a quilt of code that takes an anthropologist to decipher.
Or only a specific developer can touch it because, as far as anyone is concerned, it seems to run on magic and unicorn dust. Also, since it was made over a weekend, it was never documented and no one remembers the code review since last quarter was a lifetime ago.
It doesn’t need to be perfect, and it’s not going to go as well as a heist movie, but by outlining the ideal state and figuring out what the iterative steps are in between, it gives a basis to the whole team. Everyone can get a broad sense of the functionality that it entails and it can be built with the knowledge of what components will need to be swapped out in the future. All because it’s expected.
Vendors and the High Cost of Moving Fast
Cloud Providers routinely issue thousands of dollars in free credits to startups and new ventures to build out platforms within their ecosystem. While very helpful in the fledgling, early times of a company’s life, it can lead to a bit of sticker shock once they run out.
Using Amazon’s Elastic Container Service (ECS), if you’ve set up any infrastructure before, you can have a production-ready cluster set-up in a day. If it’s your first time, maybe a few days.
Of course, this convenience comes with some trade-offs.
First off, the same computing power within EC2 is almost half the cost. This means those few days you saved up-front just came from the tail-end of your runway.
Also, ECS being a proprietary tool means that you and your team will need to learn how to use a system that’s only applicable to AWS. The edge cases, and there will always be edge-cases, have a bigger chance of bogging your productivity down simply because there is less documentation than with comparable systems and the fact that you have limited control over the entirety of the platform. Worse case, you may require using AWS’s costly premium support to help you finish your implement, eating a bit more into your credits.
You can use proprietary, maintained systems, but make sure you create the connections into your own platform in a modular way. There are great SDKs that abstract the ways you talk with managed systems and allow you to hot-swap to a different provider or toolset without having to change the underlying code. And if there isn’t an existing tool, just abstract your connections within the architecture of your application, so there is one file to update instead of hunting around your project for obfuscated patterns.
It’s ultimately a judgment call for what works best for your organization, but I will heavily recommend investigating what’s out there that adds a bit more portability and have a wider base of support from the community.
For instance, Kubernetes is a great option for replacing ECS while still using Amazon’s services. Has the same benefits that ECS has but a wider base of support all while its managed versions within Google Cloud, Azure, and Digital Ocean give you the flexibility and option of what helps your company the most. You can even install Kubernetes on your own private servers, if you ever get to the point needing that flexibility, with minimal changes to your project’s structure or the time it may take to retrain your team so they can use a new management layer.
Other open-source tools do this too for different parts of your infrastructure, such as PostgreSQL (SQL database), Kafka (kick-ass stateful message queues), or Timescale (Timeseries), allowing you to run managed versions but gives an easy exit once you need to control your costs.
Expandability and the Migraine of Migration
Have you ever tried unscrewing a running water hose, without turning it off, and not get yourself wet? Either from my inability to remember all the times I’ve failed at it or my tenacity fueling the concept that I’ll finally succeed at this elusive task, I will struggle and fail, only to be drenched in water and shame. Before you create your production systems, you should have your team do the same. It’s a great lesson to learn before they turn on the digital hose that cannot be turned off.
There are a lot of solutions that can be unpacked from such an analogy that you can apply to a production migration problem, such as you can divert running water from the primary hose to a second hose in order to prevent yourself from getting splashed. You could kink the hose to lessen the flow, and then disconnect and reconnect. Or maybe you can get a giant bucket below the tap as you disconnect and reconnect the hose, then pour the contents into where you need.
While more and more creative solutions can be had, the best approach is to think about how to move the hose before there is any water. This gives you a chance to screw in adapters and valves that make the task much easier. Just imagine that you would lose thousands of dollars for every drop of water lost, you and your team would probably prefer the position of constructing the hose before water starts flowing than to run around with buckets.
You can honestly believe that your systems will never need to be migrated, or that future-you will find a more novel approach to the issue. But, if you’re also counting on future-you to be more successful than you are now, then that means they will have less time and much more to lose than current-you.
Proprietary systems themselves are not to blame as much as ensuring that you treat everything like it needs to be modular. If it’s not modular, you can ensure flexibility when you can add a layer of modularity yourself.
Our best decision was to leverage GraphQL for the API that powers the controls for our platform. Because of the well-defined data schema, it allows for so much flexibility with running our Cloud product. We can migrate databases without having to overhaul the connection to them or fundamentally change the architecture of some of our features. Because every service downstream of our API is connected in a modular fashion, we can have 100% and focus on only updating a couple of services instead of the whole system.
Also, focus on eliminating the data junk drawer approach from being a go-to solution for your team. It seems tedious at the time to label resources and to create and maintain multiple tables but ultimately will make cleaning up after experiments much easier and keep data-sets smaller and more portable.
Lastly, think about how you can insert easy ways to divert traffic. So when you are tasked with connecting your hose to something better, you don’t need to break out the bucket brigade. We leverage reverse-proxies and stateful messages queues to do this; giving our team a really nice way to swap connections and upgrade components without a lick of downtime.
We’ve always advocated for flexibility. Our use of an open-source consumer analytics schema was our first real step into providing options away from vendor lock-in and a way for specific business models from bearing the high cost that comes from a single-player space.
Our SaaS product offering has been a great source of learning on these lessons and has greatly influenced our new Enterprise Edition, which will also be the core processing component of our upcoming Cloud upgrade.
We know that edge-cases are always present and that your team doesn’t have time to learn languages or technology that isn’t reused anywhere else in the business. All the pain points mentioned above in the article are real and are considerations we built into our product. Being able to abstract configurations for different vendor services is a good step, but leveraging technology that is designed with this modularity in mind allows you peace of knowing that as your environment changes—internally or externally—your team and stack are guarded against meltdown.