How refactoring keeps spirits and code standards high on the Xolo Engineering team
on May 06, 2022 • 8 minute read
Editor's note: This post was co-written by Piret Kerem, Head of Engineering at Xolo and Erko Hansar, Xolo Co-Founder and Chief Technology Officer. While Piret is the author of the overall narrative, Erko was responsible for crafting the portions of this post that read like a technical detective novel. Keep reading to see what we mean.
I have been pretty vocal about how I like working for a company that is building its own product. The main reasons and what kind of team it needs were covered in my previous blog posts. This time I want to concentrate on how to maintain the health of the product and its codebase over time so that we can keep building additional value on top of it.
I've gotten into the habit of describing the mentality behind code ownership through the metaphor of home ownership. In general, when living in a rental it's hard to feel a sense of ownership. Want to change the wallpaper or hang some pictures in the hallway? Better talk to your landlord! And because the smallest alteration requires the permission of the owner, renters have less incentive to make changes and a lot less is done.
This is not the case with home ownership. Whether it's replacing the water heater, knocking down walls to create an open concept floor plan, or installing solar panels, it's the prerogative of the homeowner to dive into the details. You're only limited by your passion and the depth of your pockets.
The same principle holds true when building your own software product. In project-based developments, it's all about completing the scope of work. You have an agreed-upon contract and once it's delivered, it's on to the next one! But when you're building a product you actually care about, you invest a lot more effort into the health of the codebase so that the project will be maintainable into the future.
As our experience and market know-how is growing over time, it enables us to be faster when launching new products. Our first product, Xolo Leap, allows global solopreneurs to launch a remote EU company through e-Residency. Leap was built 7 years ago when Xolo first launched back in 2015.
Our second product, Xolo Go, was released 4 years later in 2019. It's structured to allow freelancers to invoice cross-border clients like a company without the headache and expense of actually opening a company. The platform comes with integrated business banking, along with tools for contracting, creating invoices and expense management.
In 2021, we released 2 more products. Xolo Teams was built for organizations to quickly, easily and compliantly contract, onboard and pay teams of global freelancers. We also launched our first localized product in Spain, which makes it much more efficient for local independents to register with the government, pay taxes and social security along with admin tools for invoicing and expense management. We launched a similar product in Italy in early 2022, with ambitious plans to continue releasing localized products to help European solopreneurs operate more efficiently. We recently launched our 6th product, Xolo Estonia, to serve solos in our own native Estonia.
Right now we have around 350 000 lines of code to keep all 6 of our products running smoothly. We're always proud to launch new products, but we're equally excited when we can delete old code without losing any functionality from the platform. We emphasize the importance of making sure our platform and existing codebase enables us to continue to grow and achieve our mission of making solopreneurship a magical experience for more and more solos in the future.
At Xolo, we encourage all engineers to have ownership of their domain area. That doesn’t mean that they have to work on a specific technology or feature alone. It means they dedicate time to ensuring that this specific part is viable long-term. We've been experimenting with various refactoring principles. We're big fans of Uncle Bob’s boy scout rule (always leave the code a little better than you found it) and have code reviews as a permanent step in the development process.
Refacto Week: the why and the how
Refacto Week, as you probably guessed, is when the entire product engineering team spends a week concentrating exclusively on refactoring. We disband our usual domain teams and come together as a big group. This is where the fun part starts: we pick our own challenges and then tackle them using our collective knowledge and pair programming techniques. This switch in the work mode gives us multiple benefits:
- No pressure to deliver business features allows us to free up our brains and tackle even the most complex technical problems and create more elegant solutions to solve them
- Pair programming builds stronger bonds between colleagues and keeps team spirit high
- Cross-team work encourages knowledge sharing and minimizes the dreaded silo effect in the long term — it's a win/win because we're paying back technical debt while building a stronger team!
We aim to have one refacto week per quarter. Our most recent one was back in March, with the main focus on improving the application performance. The page load times for our customers in self-service and for our own employees in back-office had deteriorated over time, due to the additional logic and growing mountains of data. There was already a noticeable “broken window” effect at play: since some pages were already slow, people hardly noticed when other pages started loading slower, too.
During our last refacto week we managed to work on 36 separate tasks. While some tasks only affected a specific area, others were wreaking havoc on tens of different pages across the application. Here are a few interesting use cases to give a bit more context to this sticky situation:
The case of the slow load time
Problem
A number of different self-service and back-office pages that usually loaded quickly (100-500ms), were occasionally taking 16-17 seconds to load.
This happened on a variety of different pages, including the self-service dashboard, invoice creation screen, bank account overview, etc. When we analyzed the logs for the times when these pages took a long time to load, we quickly discovered the problem. These pages sometimes have balances or amounts in different currencies. We use currency rates for the matching date to show the value in EUR, too.
Normally the currency rates are available from the in-memory data grid (cache), but when the cache had expired, the next request had to wait until the data had been loaded from the database and cached in memory. Loading the latest currency rates directly from the database took up to 16 seconds!
There are only around 242 000 rows in that currency_rate table, so a simple SELECT query with the current date criteria should work very fast there, right? The trick is that you might not have an entry for a specific currency_from -> currency_to pair for the requested date, and then you would need to use the most recent currency rate before that date. For example, the ECB doesn’t publish official currency rates for Saturdays and Sundays, so if today's Sunday and we want to convert a GBP balance to EUR, then we need to use the EUR-GBP rate from Friday, instead. But when we are converting a different currency that doesn't have an official rate listed from the ECB, but rather uses a different source for rates and has data for Saturdays and Sundays — then the rate row would exist for today, as well!
The slow query was written with human-friendly logic: “select me all currency rates where a newer row for the same currency pair doesn’t exist.”
SELECT cura.id, cura.date, …
FROM currency_rate cura
WHERE NOT EXISTS (
SELECT *
FROM currency_rate latest
WHERE latest.date > cura.date
AND latest.currency_from = cura.currency_from
AND latest.currency_to = cura.currency_to)
ORDER BY cura.currency_from, cura.currency_to;
EXPLAIN ANALYZE for that query shows that both currency_rate selects do their Seq Scan once (quickly) — in approximately 30 milliseconds. The Hash calculations for the latest result set take another 45 milliseconds. This is where the trouble starts. The Hash Anti Join between those 2 result sets generates 97 400 000 rows and then discards almost all of these when applying the join filter, resulting in only 168 final rows. Matching and filtering those 97 400 000 row combinations takes over 15 seconds.
Solution
This simple problem can also be solved with a different query, in human logic that goes something like, “find all currency rates in descending date order, and only keep the first row for every currency.” Add an index that's already in descending date order — et voila!
SELECT DISTINCT ON (cura.currency_to) cura.id, cura.date, …
FROM currency_rate cura
ORDER BY cura.currency_to, cura.date DESC;
This takes 120 ms to do an Index Scan and then 30 milliseconds for the Unique step. So now when the currency rate cache has expired, it takes only around 160 milliseconds to repopulate the cache with fresh data — great success!
The case of the slow load time: paginated lists with many rows edition
Problem
This issue appeared on many different pages where the data sets had grown over time and the main reason seemed to be the same for all of them. We are using DataTables Javascript plugin to make our HTML lists support pagination, ordering and search. The user interacts with the table controls and DataTables requests a page worth of data from the server. The request would ask something like: find me a list of companies where the name contains “Smart” and the status is “ACTIVE,” sort them by founding date, and only give me 20 rows starting from the 21st row. The server would find the matching rows, and return them in JSON. So far, so good.
But in addition to these 20 rows, the response also needs to contain a count of all the rows that would match these search criteria. DataTables then uses this “recordsFiltered” value to calculate the number of pages in the list to build the pagination buttons. The problem is that calculating this “count of all the matching rows” gets very slow when the underlying SELECT query has to join multiple big tables and apply the search criteria to those different pages. Returning just the 20 next rows is one thing. But to process all rows and calculate the whole result set, is a totally different task.
Solution
If we look at a query (Portions that weren't relevant to this specific problem have been excluded) that returns rows 21-40 after applying some search criteria:
SELECT comp.id
, comp.registry_code
...
, COUNT(*) OVER() AS full_count
FROM company comp
LEFT OUTER JOIN contract cont ON ...
LEFT OUTER JOIN person pers ON …
…
WHERE cont.status = 'ACTIVE'
AND pers.display_name ILIKE '...'
ORDER BY comp.id DESC
OFFSET 20
LIMIT 20
… then this part “COUNT(*) OVER() AS full_count” is responsible for calculating an extra column that contains the total number of rows that this query would return if the OFFSET and LIMIT would not be there. For this example case, the full_count value right now would be 114 912.
With this full_count column, the query took 2 000 milliseconds. Without it, it took 60 milliseconds. That’s a difference of 33x! In other cases the difference could even be 100-200x. So the full_count had to go!
Unfortunately, this meant that we had to lose some navigational elements from our paginated lists too:
- We couldn’t show the total count of filtered rows anymore under the list. Instead, we had to resort to “Showing 21 to 40 of many records.”
- And as we don’t know the precise amount of rows, then we can’t calculate the number of pages, either. Instead of having “PREVIOUS 1 2 3 4 5 NEXT” as the pagination elements, we could now show only “PREVIOUS 1 2 … NEXT” — but that’s a small price to pay to get a 100-200x performance boost!
To make this work, we had to change DataTables to request 21 (page size + one) rows instead of 20. Then if the result contains 21 rows, we discard the 21st row, show the remaining 20, and as we now know that there is at least one row on the next page, then we show the next page number and “NEXT” button in the pagination section.
This is not a job ad
Except yeah, it kind of is. Job ads, unfortunately, are usually clouded in so many vague buzzwords that it's hard to understand what kind of environment you'll be working in — until you actually work there. As our company is growing and we're looking for new talent to join the team (and let's face it, talented developers always have options), I wanted to take this opportunity to showcase why working on the Xolo Engineering team is different from a lot of other teams you've worked with. We don't want you just for your skills, we want you for your mind. You will be encouraged to take ownership, to make decisions, to speak up and share your opinion, to build something that actually matters. Sound like a place you'd like to work? Awesome, we'd be thrilled to have you.
About Piret
Piret has been leading the Engineering Team at Xolo since 2019. As a leader, she's a firm believer in a people-first management style where a culture of autonomy and personal responsibility is prized above all else. Her background in communications has taught her the importance of transparency, clear expectations and creating a candid & kind feedback culture.
Piret writes a column for the Xolo Blog where she talks about the latest lessons gained from the adventures of the Xolo Engineering team. You can read her previous posts here.