This post is the second of a two-part miniseries identifying and correcting old mistakes. Part one discusses cleaning up Git repos based on permissions faux pas.
Today's atonement for old mistakes: Using centralized/standard "includes" for path variables and eliminating passwords from committed code.
The 'How We Got Here'
As I mentioned in part one, the production environment we spun up several years ago at work for my data warehouse sourcing/ETL/shipping jobs was very literally the first environment we'd set up in which Git (and Github Enterprise) were heavily involved. When we moved into Git for managing the scripts, some of our bad decisions from the olden times came along, like hardcoded paths to binaries on the production host and, more egregious, hardcoded functional account credentials in scripts (specifically cron include files).
The Problem
We knew this was a bad decision, specifically the credential business, but the reality of the situation from a practical perspective is that the three individuals on our team with access to this information are the only three individuals allowed to access the production host...and also the tightly-controlled/restricted Github repo. Further, the specific functional account credentials in question are only afforded access from the production (and test) hosts, and with read-only access. The overall risk is low. Not to defend the poor choices of our past, but the other auditable controls in place makes this a thing to change, but not something terribly high on the priority list.
The hardcoded paths to client binaries was a mediocre-at-best decision that was made a number of years back (early 2015) when we last had a major client software update. The change at that time required a very specific cutover plan in which both client versions were required (simultaneously during the change window which was several weeks in duration). Add in the test environment business and it was "simpler" to include the full paths. All of this was done before we moved anything into Git/Github.
The Change Catalyst
Things worked out "okay," but this spring we made a substantial change due to our database environment undergoing a major upgrade. The upgrade required a client-side (our production host) software upgrade and functional account password changes. Since we now work in a much more nimble environment (both on the database hosting side and via Git), it was time to really "fix" the problem and get rid of those bad paths and credentials stored in the repos.
The Solution
Short version: use include files and, for credentials, store them outside of/ignored in the repos. To do this, however, required some minor re-architecting of how we were kicking off such jobs.
Without going into excruciating detail, the existing/old setup is heavily leveraging cron include files (placed in /etc/cron.d
where they are picked up). These cron files include specific declarations of path variables (the aforementioned hardcoded paths) and, where necessary, also have the functional account credentials. After some failed starts to modify the existing behavior (e.g. make cron files use environment variables), the obvious solution comes to mind:
Move all that direct call business into normal shell (bash) scripts!
By using normal bash scripts, I can much more consistently and simply (e.g. source /path/to/include
) bring in a central version of environment variables as necessary, even maintain separate client versions should it be required in the future. Further, this switch makes it also pretty simple to "include" (via source
) variables such as credential information from files either outside of the repos or which are ignored from them.
The only real downside is that there are now several more bash scripts in the repos, effectively one for each cron entry of old. However, using the Powershell model of verb-noun cmdlets in script naming (e.g. run-thing.sh
), at least it is more obvious which are the 'calling' scripts versus the 'doing' scripts.
So Far, A Good Change
Testing all of these changes was a bit of an adventure, namely due to the sheer volume of changes. But the change is good -- passwords and credentials are no longer in the repository (and the passwords for functional accounts have been changed since extracting them), and we're now by default using the generalized version of client binary paths (e.g. /usr/bin/application
vs. /usr/lib/software/version/path/application
) in the include files. Should we ever need to reference specific versions, the more central/singular includes can be easily changed versus a hundred individual changes across ten repos.
The real takeaway here is that it's never really too late to go deal with your old technical debt and poor choices. Sure it can take some planning and time, but in the end I know for a change like I've written about here...I can feel better about the defunct need for future me to remember all those super strange 'gotchas.'