Home - Matt Whipple

Site Nav

Society

Education Isn't the Answer

When societal issues arise a response that some people jump to is that education is the answer. Most recently I've heard this in response to much of the racism spurred US-domestic strife. This is a non-answer: it ultimately translates to inaction and therefore by default the preservation of those systems and policies which underlie the perceived problems.

The argument is likely a popular one for a range of issues because it requires more than casual inspection to dismiss. Education is inherently a good thing, and is the solution for ignorance: positing that education is not the answer could feel as though it's arguing in favor of ignorance although such inference is ultimately based on a straw man fallacy.

A cynical reaction to the education argument is that it is a means of distancing oneself from the problem. By equating a given problem with ignorance and believing that enlightening those benighted souls that instigate problems is the solution is likely to carry with it the belief that one places oneself above the problem. A snarky summary would be that the solution subtext may be along the lines that "the problem wouldn't exist if more people were like me" but less presumptuously there is a likely message of "I'm not part of the problem".

The education argument is also often used for problems which are systemic, yet it shifts the honus onto the individual. While this may foster emergent enhancements from a neutral starting point producing a targetted outcome is inefficient and risky, further as any current issue likely indicates contrary momentum a naive propositon of education is more likely to resemble attempting to stop flooding by randomly throwing sandbags in the streets. An emphasis on the individual may be fostered by something along the lines of liberterian ideals: though it is important to temper such ideals with pragmatism and also to not fall into the common trap of cherry picking when such ideals are espoused. Any systemic failure must be the result of either an issue which organically emerged which speaks to a deficiency in some of the fundmanetal libertarian ideas, or it has been caused by some previous tampering with the system. Any proposed solution born from such ideology must be accompanied by the recognition of associated limitations and corrections and cannot be rationally presented in a bubble. In any case this position primarily disregards the fact that the underlying system has either introduced a problem or has failed to protect itself from an externally introduced problem.

In general the idea of resolving problems through additional knowledge also implies that such problems are incidental rather than intentional; in any case where such issues are systemic this is likely to be naive. If a given idea has been reinforced, and particularly if it has been codified in any way the propsect of ascribing any origin to an ignorance that could be rectified through education ignores or triviailizes the forces that originally created the issue and may continue to benefit from and support its existence.

On a more fundamental level education is means to distribute knowledge, and knowledge by itself does not resolve anything other than lack of knowledge. Problems are solved through the application of knowledge, and therefore education normally amounts to equipping people with tools such that they are better able to apply knowledge to resolve issues. Suggesting that eductaion is a solution to any practical problem is therefore leads to somewhat of a circular argument: education may help yield a solution to the problem and therefore cannot be that solution itself. This is another subtle aspect as to why the "education is the solution" argument is tempting but incorrect: education may help produce solutions and is inherently beneficial but in terms of mapping to reality it is equivalent to the statement "more research is needed" masquerading as a solution in and of itself.

Suggesting education is a solution is generally a non-solution. It is a pernicious idealized carrot which admits a problem but provides no clear action to disrupt the status quo. It amounts to an attempt to be transcendent on a moving train and in the current racial context provides a glaring example of trying project inerty non-racist stances onto a society which clearly has inequality and data seems to indicate has racist policies. If a system has proven itself to be lacking then it takes more than knowledge or wishful thinking to correct it. The education argument could have easily been made within our Jim Crow past (and that lens is also useful for other questions around racial policies). We have since come to accept that that system was broken and when our current system is signalling that it too has issues we must assess that system itself rather than deflecting inspection onto individuals and any ignorance which a broken society seems to be empowering.

Mathematics

Pursuing Literacy

Having not had any significant amount of higher education my formal maths training never even made it to single variable calculus, but I tend to read a fair amount of material that vomits out mathematical notation. To me (and I suspect most people) such information is, at best, hard to read which leads me to either skim over it without understanding, or stop reading and spend what is often a disruptive amount of time to try to process the expression(s) into submission. I'm currently attempting to smooth out the consumption of such content.

Mathematical notation is inherently dense, and its content most often relies on the reader to infer many of the connections. Digesting such content more easily therefore relies on familiarity with both the syntax and a catalog of common techniques (and their outputs). This familiarity will be built with potentially the most obvious and proven approach of practice. To this end I'll be looking to regularly read such materials and working through any included maths until they can be understood with less difficulty. Such materials are easily located within my primary field of software engineering let alone extra areas of interest across science and mathematics itself.

The process of actually working through much of the material is likely to be valuable itself, moreso than just reading explanations. Any content here is therefore only likely to be valuable to myself or anyone that happens to stumble upon it and it nudges them past a problem upon which they are stuck.

In particular it feels as though it can be very east to fall into a trap of being able to provide correct mathematical solutions without a real understanding of why that solution applies. This feels particularly pernicious in that it could manifest as an underlying lack of understanding upon which formal approaches could continue to be applied: such as using pre-baked formulas without the prerequisite knowledge to produce them. This can lead to what is equivalent to an increased ability to leverage the packaged axioms within an ostensibly isomorphic formal system but without a defensible means to map any productions to anything outside of that system (i.e. reality).

Concrete Mathematics

I'm currently picking my way through Concrete Mathematics1, so here I'll attempt to elucidate some of the ideas within that book which gave me pause.

Josephus Recurrence

The third of the problems in the first chapter of Concrete Mathematics is a version of the Josephus Problem2 in which every other element is eliminated. This is the first problem that expands significantly beyond a pattern of deducing recurrences and proving them through induction.

One of the first conclusions that seemed fuzzy to me while working through the Josephus problem is the establishment of the recurrence relation. That the input is effectively halved upon each lap around the circle is evident, and that yields what are effectively new numbers which are adjusted based on whether the input size was odd or even. The missing step for me in my case (which is relatively minor and may be due to having an initially off perspective) is how such new numbers correspond directly to a relation for the solution. As the distinction between the odd and even cases is insignificant for crossing this gap and therefore the logic for one can be transplanted to the other, the focus here will remain on the slightly simpler even case.

Given a trivial example with numbers (let's call them I for input):

\(1 2 3 4 5 6\)

The first lap would yield:

\(1 x 3 x 5 x\)

Where each survivor effectively acquires a new number for the new lap:

\(1 x 2 x 3 x\)

The numbers after that lap can therefore be defined as:

\(T_n = (I_n + 1)/2\)

which corresponds neatly to the halving. Now that the problem has been reduced any product of that reduction needs to be mapped back to the number in the original sequence and therefore the above equation needs to be flipped to reflect the relationship from Tn back to In. Some combination of algebra and inspection of the values reveals that to be:

\(I_n = 2T_n - 1\)

As the previous transformation relfected the halving, this one conveys a product range of odd numbers and therefore omits those eliminated members. The process of solving the halved problem and then transforming that result back into the original sequence therefore provides the ultimate recurrence relation:

\(j(2n) = 2j(n) - 1\)

These seem to be fairly evident dots to connect and resulted in a conclusion which seemed easy to digest but led me to uncertainty when I stopped to make sure I truly understood how the numbers involved aligned with the fundamental relationship.

Repetoire Method

Once concept that left my head spinning the first time I flipped through CM (but felt far more approachable the second time around) was the repertoire method. I think on of the main stumbling blocks with the introduction of the repetoire method is the introduction of the new symbols which are somewhat canonically expressed in the equation:

\(f(n) = A(n)\alpha{} + B(n)\beta{} + C(n)\gamma{}\)

This equation is used as a generalization to shed light on the recurrence within the Josephus Problem covered earlier where the complete set of relations follows the pattern of:

\begin{equation} f(1) = \alpha{}\\ f(2n) = 2f(n) + \beta{}\\ f(2n+1) = 2f(n) + \gamma{}\\ \end{equation}

with \(\alpha{}=1\), \(\beta{}=-1\), and \(\gamma{}=1\). A key aspect that I think I missed the first time is that there are effectively two levels of indirection being introduced into the problem and the repetoire method is being used against that higher layer. The original problem is now somewhat buried and can be temporarily ignored while looking at the higher level of abstraction lest it invites confusion. The replacement of the constants with \(\alpha{}\), $\\(beta{}\), and \(\gamma{}\) is straightforward but then the addition of \(A\), \(B\), and \(C\) add an additional level such that the focus moves to defining the relationship between \(\alpha{}\), \(\beta{}\), and \(\gamma{}\) as \(n\) increases. The original more concrete problem is now somewhat irrelevant and the new problem becomes establshign how the shape of the equation changes and how that can be expressed in terms of \(A\), \(B\), and \(C\). It is this more abstracted relationship that the repetoire method is used to resolve: seeking to create equations that produce the desired values for \(f(n)\) as \(n\) increases.

The book provides a table conveying how that shape evolves as \(n\) increases:

\(n\) \(f(n)\)
1 \(\alpha{}\)
2 \(2\alpha{} + \beta{}\)
3 \(2\alpha{} + \gamma{}\)
4 \(4\alpha{} + 3\beta{}\)

This effectively captures a new recurrence for which \(A\), \(B\), and \(C\) can be solved to produce a closed form, or as used in the book to support/prove an existing solution. As mentioned the additional layers of abstraction allow us to focus on how the shape of the equation changes as \(n\) increases. At this level of abstraction the role of the equation is secondary to establishing that internal relationship between the terms. The use of \(\alpha{}\), \(\beta{}\), and \(\gamma{}\) provide the flex points by which we can fit multiple functions into the established shape, and if each such function is provably correct across \(n\) then the relationship between terms is also correct. This allows us to home in on \(A\), \(B\), and \(C\) and by using substitutions and eliminations we can isolate a closed form for each variable which produces the recurrence reflected in the table above. The trivial functions used such as \(f(n) = n\) and \(f(n) = 1\) are obviously entirely different than the equation that we really care about, but that they can be solved using the shape established verifies that the assertions about the shape itself are sound (and this is likely tied in with concepts like linearizability with which I'm not particularly familiar at the moment).

Reading

ACM

I'm a proud member of the ACM3. Part of my daily routine is to chip away at comprehensively reading the material in the ACM Digital Library. Much of the material is fairly old at this point, but in addition to historical interest there are clear patterns that have carried through the history of computing and many challenges during early computing are echoed in problems encountered today. Indeed, many of the older articles circle concepts which remain relevant and which overall seem to indicate that computing has spread far more than it has advanced.

ACM membership also provides access to additional learning resources including O'Reilly, SkillSoft, and Elsevier ScienceDirect from which I'll regularly be consuming resources.

Here I'll track some of the publications which I'm currently working through.

Queue

Transactions on Data Science

Books

C++ Crash Course

I'm currently reading C++ Crash Course4. This is primarily to rebuild enough familiarity with C++ so that I can read the code for some of the software I'll be using. While I spent some time in the past writing C++ code it was long enough ago that I'd forget most of the details… and the language is a fairly large one… and the language has undergone significant enhancements since then… and I don't think my usage of it at the time was ever particularly advanced.

Cracking the Coding Interview

I recently read a copy of Cracking the Coding Interview5. This popped on to my radar as I was cleaning out my inbox and found an old email from a Google recruiter who referenced it as a resource. Unfortunately when collecting the information to write this I noticed that I was too quick to buy a copy off of Google Play and unfortunately there are several books with similar titles and I got the book by "Harry" whereas the one referenced in the email is the one by Gayle Laakmann McDowell.

The purchased book is far fairly unsatisfying and not something which has aged well.

The initial part is fairly typical interview advice with nothing particularly domain specific. Much of the advice feels a little too calculated for behavioral questions; as an interviewer if a candidate provides overly crafted answers it is likely to trigger a BS alarm or leave me with an unclear feeling towards a candidate. The ability for someone to spin their experiences such that some checkboxes are ticked does not convey what they would be like to work with, nor does overly prepared answers elucidate their ability to reactively communicate. While there is some useful guidance it is buried within practices which would gut authenticity and it didn't seem likely to provide any more insight than the myriad of more general books in the more general category of getting a job (which would also include advice on getting in the door which is absent from this book).

The book then continues to an explanation of why Java should be used for interviews. Many of the justifications for Java are emblematic of a time when Java was expected to dominate a wider range of platforms most of which it no longer dominates, and some of which are effectively dead (such as delivering Web front-end functionality through applets). There's a dissonance in that the selection of Java seems to have a tone of Java being an incidental practical choice, but then the material proceeds to dig through specific corners across the range of the Java platform. The message therefore shifts from Java knowledge being a means to an end driven by concepts (and supplemented as needed by readily available reference material) to a primary goal.

The book then continues onto a randomly distributed (and highly redundant) list of questions about a variety of Java related topics. As previously mentioned and alluded to many of these are focused on scattered corners of the platform and solutions which have since fallen out of favor. These are combined with more general questions and those that are more conceptual, though even these more fundamental questions span a range between pragmatism and dogmatism with the previously mentioned age being a major factor (seemingly pre-Java 5 which at this point is missing at least two transformative versions). The biases may also speak to the buzz around Java at the time (and possibly the fact that publisher appears to be part of Oracle which is now the steward of Java), but it still aligns with a fetishizing of tools over engineering that is unlikely to be indicative of a culture which produces elegant solutions.

The writing and editing of the book itself feels very rough around the edges, but is not noticeably worse than other technical books I've read by some publishers (Packt stands out in my mind as an example of seemingly absent editing though there are others).

The book as a whole has a very dated feel which leaves the initial impression that it is just aged out: however there are references to technologies which are far newer than the rest of the content. There are questions seemingly targeting mostly abandoned legacy features from Java (1.)2 and others (sometimes adjacent) targeting Java 8 features. This implies that the majority of the content is outdated but that some ultimately insignificant updates have been made yielding a spotty veneer of freshness on top of rotting foundation. This could arguably be valueable if looking for a position that involves significant amount of legacy code or legacy mindsets, but that is inconsistent with the purported purpose of the book.

The feeling of half-hearted refresh is then solidified by a section devoted to newer Java 8 features which is presented as a laundry list of those features in a form that is entirely inconsistent with the rest of the book. This section feels like a typical rundown you'd see on blog posts and serves to underscore that the promise of those features has been largely absent from the content thus far. Rather than having more recent developments integrated into the content throughout they are largely independently and it is apparently left to the reader to deduce which other content those new ideas may impact or invalidate.

The book then wraps up offering more lightly supported dogma, preaching the gospel of some of the Java approaches which inspired a generation of cargo culting and which can red flag anyone that espouses them during an interview without a deeper understanding of what they actually deliver independently of their sponsoring paradigm. This is especially true now as most languages seem to be increasing support for hybrid approaches where concepts like object-oriented and functional programming are often mixed. Indeed Java 8 is a good example of a single-paradigm langauge expanding and therefore exposing new alternatives to previously established patterns.

The one actual vaguely coding question (there is actually another but it is bizarrely dismissed as outside of the scope of the book) is a monster one of designing a "database". There's not much substance in the section aside from what seems to be strong suggestions to use ubquitous locking to protect against concurrency issues (ignoring the likelihood of the resulting horrific performance). The idea of designing a database within an interview session is ludicrous unless it was a potentially extended session for a company that designs databases. Designing any type of general purpose tool is likely a terrible fit for an interview question since the full time could be spent discussing varying use cases and the involved tradeoffs: exhausting the time that could be spent designing or articulating for any specific use case. A higher level design interview could certainly explore a database supporting specific use cases but how any preemptively offered solution is not likely to fit the resulting constraints and unless the problem is very tightly scoped it seems unreasonable to expect to get down to the level of code. Questions asking for wheel reinvention do not reflect well on anyone involved unless there's an identified reason or the line of questioning remains focused on the use and understanding of the wheel.

Ultimately this book feels like a waste of time aside from being a survey of Java hype and history. The contents in no way match the title as the focus is not on coding but on largely obsolete Java trivia. In the case of someone actually looking to acquire Java knowledge in preparation of an interview a far better source that provides sound practicable knowledge would be something like Josh Bloch's Effective Java.

I should probably pick up a copy of the right version of Cracking the Coding Interview.

Hardware

Home Network

TODO Net: SSH

Linksys WRT1900AC

Acer C738T N15Q8

After just under six years of use I decided to replace my Acer C720 Chromebook. I had an issue with a stuck key (which later worked itself out) and the power connector has become very loose, all of which felt like signs of the impending demise of a relatively old budget system. I decided to opt for a two-in-one as I'm without a tablet at the moment also and much of the planned use for the system would be more suitably done in tablet mode.

Swing For the Fences or Bunt

Upon cursory evaluation of options I realized that I would either want to spend a fair amount of money for a notably nice system, or try to minimize cost while satisfying immediate needs. As alluded to previously the main purpose of this system is lightweight work and concerns such as portability (and potential disposablity) promise more value than power or bells and whistles. Additionally the combination of the likelihood of my normally working from home post-pandemic and that I'd likely want to make sure any significant investment was towards a system that could be beefed up with GPUs to support deep ML both steer me more towards building a desktop system if I were to invest in a powerful personal system. All of these factors lead me down the cost minimization path.

Chromebook

Given the intended uses and budget I had no particular OS or vendor in mind. There are manufacturers that I'm drawn to but for the most part such opinions are unsubstantiated and woefully dated. One driving force for me, however, is that if the stock OS ends up feeling in some way deficient I like to have the option to replace it with Linux and I'm unlikely to spend a substantial amount of time to get that installation working. For me that line is that kernel customization is acceptable but needing to write or tweak or track down driver code to smooth out hardware issues is not something I'm looking for at the moment. This loss of this time would also likely be a particular nuisance given that additional effort would be expected to configure much fo the two-in-one behavior in a fresh Linux install. Given the options for cost and the inherent Linux hardware support, I was led to replace my old Chromebook with a new Chromebook.

ChromeOS

With my previous Chromebook, ChromeOS stayed on for a couple months before being replaced by a fresh (Gentoo) Linux install. I used what was then Crouton for a bit which was awkward but passable, but ultimately I wanted to use Docker and the kernel upon which official ChromeOS was based at the time did not support that (and just going to a standard Linux distributrion represented a far more visible path than building a custom ChromeOS image). There were several other pieces of software which I wanted to have locally (such as systemd) but I think Docker was the straw which led me to give up on ChromeOS. I figured the new device would follow a similar trajectory where I'd work with and work around ChromeOS for a bit until I hit a blocker at which point I'd wipe the system and install Linux. I'd heard and assumed the official and community support for more enanced functionality had matured, and was looking forward to experiment with the support for Android apps along with the discovered beta support for Linux apps. Aside from having to chase down some related settings to enable things in my managed GSuite account, everything was working smoothly and the pivot point was expected to once again be Docker.

For development Docker is often a compelling but practically avoidable convenience. Conversely often the additional isolation of namespaces can complicate or introduce churn into local development. In terms of values such as reproducible verifiability that containers can also provide, such functionality can (and should) also be provided by less golden machines such as CI servers. All together therefore abandonding any working system for the sake of having Docker for development would not be done lightly. Though I had plans to delay pursuing using Docker I made it less than a week before having motivation to attempt to install it (as part of verifying the publishing pipeline for this site). To my delight Docker installed and was able to be started without issue so I'm now left with no expected reasons to need to replace ChromeOS on this system. That ChromeOS may be becoming a viable off-the-shelf OS for more advanced uses is an exciting prospect; though it will be interesting to see how that may evolve in light of Project Fuschia and any resulting tension that may cause with the relative dominance of Linux in the open source community (which includes my personal biases).

  • Quirks
    • Storage Space

Disk Space

I think I paid too little attention to disk space when purchasing this system, and I belatedly remember that I had to upgrade the disk in my prevoius Chromebook to 128G. I've already inadvertently exhausted the 32G so I either need to be careful with disk usage or consider upgrading the drive in this system also (the previous Chromebook required a relatively specialized SSD drive so I'm not sure what kind of compatibility concerns are waiting for me).

The nasty surprise when space was exhausted was that the Linux container which chewed through the space had it's filesystem flipped to read-only and did not allow remounting as read-write. If it becomes an ongoing issue I'll have to add some more proactive monitoring to prevent that behavior.

The disk space is relatively small especially given that it is currently shared by ChromeOS and the system space required by the Crostini Linux container. For now I'm going to attempt to fit within the space and use external storage as necessary. I can limit my development projects on the system disk to work which does not involve downloading the Internet, and I may have to keep an eye on installed packages. A current space hog is some of the LaTeX packages, so I may end up having to remove the distro provided bundles and install specifically required packages from CTAN.

Coding

Literate Programming

Literate programming is a practice I've been interested in for several years. Previously when I was looking at solutions for code documentation I landed on liberal use of Doxygen style code comments (regardless of actual tool support) along with fairly comprehensive tests with names that descriptively convey specified behvaior and some form of ADR or lighter-weight design document. These options seemed far more accessible than literate programming solutions such as WEB.

Upon rebooting this Web site using org-mode and considering the fact that a fair amount of the content is likely to be some form of annotated code it seems like a natural fit to make use of the literate programming facilities provided by org-mode. So, I'll be trying it out for some of the content here. I may explore using it elsewhere also as I do spend a fair amount of my time writing code and configuration which is likely to be used or built on top of far more often than it needs modification and therefore invites a low bus factor.

Potential Personal Tipping Point

A particular experience of mine that likely solidified my current interest in literate programming took place a couple of years ago. At that time I was leading a team which was building an authorizing reverse proxy which largely consisted of some tooling built to configure and deploy nginx containers to a Kubernetes cluster. Any non-trivial nginx configuration can get tricky, and this is aggravated when it is acting as a generalized proxy. Operating at the application layer requires reconciliation of the needs of both the client and the upstream server in many ways which cannot just be blindly passed through.

Evaluation Without Prompting

Evaluating code blocks presents some security concerns, so for the purpose of headlessly generating a site which includes such blocks the evaluation needs to be enabled without requiring a prompt.

This can be done by setting the associated variable:

(setq org-confirm-babel-evaluate nil)

As this is currently used for the generation of this site, that variable is set in the `.el` file passed to the batch emacs invocation.

Practice Sites

Exercism

Excerism is a nice language focused coding exercise site. One of my coworkers introduced it to a bookclub we had started at Brightcove (where we were doing the Elixir track as a group). As I'm starting to conscientiously pick through some languages it's very useful to exercise (appropriately) the languages being used.

In using it there certainly seems to be a range of quality between tracks and exercises, though that should certainly be expected for a community provided service.

I've added a profile for myself where I'll dig into particular languages. I'll start with C since I'm currently brushing up on some basic algorithms and C provides a reasonable battery construction kit without inlcuding many of note (and remains the dominant system level language).

HackerRank

Software

AWS

The primary cloud provider for my professional work has been AWS. Here I'll try to capture some hopefully useful information gathered while working with AWS offerings.

AWS provides solid documentation for each of the services, but this may omit some information about practical usage. In particular AWS provides a catalog of building blocks which (seemingly moreso than other cloud providers) may require or benefit from non-trivial combination; additionally while many services have fairly obvious roles and relationships (such as S3 or EC2) there is also a raft of services which overlap such that there application for a given solution requires more in-depth differentiation.

Using MFA and Role Assumption from Command Line Bash

If you're using AWS for anything important MFA should be required for any non-trivial actions. Additionally if you have a fairly complex environment across multiple accounts it may be preferable to have a configuration such that permissions are attached to roles which must be assumed rather than being able to invoke them directly from an authenticated user. Often such needs are addressed through the use of additional executables such as `aws-mfa`, but such tools seem needlessly heavy handed and can complicate some use cases.

A particularly useful place to store AWS authentication information is in the environment; the typical AWS credentials lookup chain uses such variables and any calls which need to pass through an AWS SDK will use such variables if they are available. Therefore if such environment variables are populated the authentication concerns around interacting with AWS disappear aside from the occasional need to retrieve a refreshed STS ticket with an updated MFA code.

  • Why Separate Tools May Make Things Worse

    An fundamental concern when using some of these tools to manage authentication is that there is no straightforward means to populate the environment variables to support that approach. When invoking a command from a shell on a POSIX-y OS that command will normally be run in a forked process; that forked process can read exported environment variables from the shell but can't modify or otherwise muck around with those variables. The allowance of such access could present some fairly obvious dangers both intentional or accidental, and while some creative solutions such as using `exec` to load code into the current process may be possible, they are best avoided for that reason. The safe and common solution for this type of problem would be some mechanism of IPC where the invoking shell is able to retrieve the appropriate values and update the environment accordingly. Obviously additional tools could provide these values back to the shell, but at that point their value over using the AWS CLI effectively vanishes.

  • Cutting Out the Middle Man

    The AWS CLI can provide the values required for the environment so rather than introducing an intermediary the simplest approach is to simply load those values into the current environment as retrieved from the CLI. There's often a tendency to gravitate towards output formats such as JSON which can quickly introduce additional complexities around handling such values in bash, but the AWS also supports simple text output which is fits neatly within bash's wheelhouse. Simple, space delimited text can be easily parsed with bash so the retrieval of these values from a call to the AWS CLI can be as simple as:

    read AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN << $(aws sts..)
    

    This approach works as long as the command produces the expected fields in the expected format, the above will parse the output produced if the AWS call is provided the arguments:

    --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' --output text
    
    • Basic MFA Usage

      This can be wrapped up in a function which adds those arguments and loads the returned values into the exported environment.

      aws::load_credentials_from() {
          # Clear any existing token which could mess with request.
          unset AWS_SESSION_TOKEN
          local aws_exec_result=$($@ --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' --output text) ||
              echo "Failure acquiring STS ticket!"
          read AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN <<< ${aws_exec_result}
          export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN
      }
      

      This can then be used with any appropriate call to sts to load the environment with the results of the call such as:

      aws::load_credentials_from aws sts get-session-token ${other_args}
      

      A function which issues the call to sts directly may be simpler. The design above accomodates both of the use cases documented here but most enviornments are likely to only require one use case and corresponding sts call.

      Given an MFA serial and a profile name the above could be used to authenticate with an MFA token retrieved from a prompt with a call such as:

      read -s -p "Enter MFA code for ${AWS_MFA_SERIAL}:" token
      aws::load_credentials_from aws sts get-session-token --serial-number "${AWS_MFA_SERIAL}" --token-code "${token}" --profile="${AWS_PROFILE}"
      

      Typically the MFA serial will be configured for a profile, so assuming a standard profile the serial could be looked up from that file and a function exposed to use that MFA.

      declare AWS_CONFIG_FILE="${AWS_CONFIG_FILE:-${HOME}/.aws/config}"
      declare AWS_PROFILE='default'
      
      aws::lookup_mfa_serial() {
          AWS_MFA_SERIAL=$(awk -s "BEGIN { thisP=0; } /\[.*]/ {thisP=(\$0==\"[${AWS_PROFILE}]\")} /mfa_serial/ { if (thisP) { print \$3 } }" "${AWS_CONFIG_FILE}") 
      }
      aws_load_mfa_session_token() {
          [[ -n "${AWS_MFA_SERIAL}" ]] || aws::lookup_mfa_serial
          read -s -p "Enter MFA code for ${AWS_MFA_SERIAL}:" token
          aws::load_credentials_from aws sts get-session-token --serial-number "${AWS_MFA_SERIAL}" --token-code "${token}" --profile="${AWS_PROFILE}"
          # Echo newline for subsequent prompt.
          echo
      }
      

      Sourcing that function into the current shell provides MFA authentication into the current shell by simply calling the function `awsloadmfasessiontoken`.

    • Role Assumption

      The above works for simple MFA authentication with a particular profile, in more complex environments however you may need to assume different roles. Each such role can be represented as a separate profile within your AWS configuration (I prefer a `role@account` naming scheme). The configuration files need to match a particular format which I need to dig out information about but am unfortunately? in the camp of having simpler needs right now so I can't experiment. In short however a profile can be dynamically selected by parsing the config file with a function such as:

      aws::select_profile() {
        PS3='Select an AWS Profile #: '
        select AWS_PROFILE in $(awk -s '/\[profile/ {print substr($2, 0, length($2)-1)}' "${AWS_CONFIG_FILE}"); do
          echo ${AWS_CONFIG_FILE}
          AWS_ROLE_ARN=$(awk -s "BEGIN { thisP=0; } /\[profile/ {thisP=(\"$AWS_PROFILE\"==substr(\$2, 0, length(\$2)-1)) } /role_arn/ { if (thisP) print \$3 }" "${AWS_CONFIG_FILE}")
          break
        done
      }
      

      Authenticating with the selected profile can then use the previously defined `aws::loadcredentialsfrom` function:

      aws::assume_profile() {
        aws::load_credentials_from aws sts assume-role --profile "${1}" --role-arn "${2}" --role-session-name "${1}-session"
        echo
      }
      

      Those two functions can be stitched together using:

      aws_assume() {
        select_aws_profile
        aws_assume_profile ${AWS_PROFILE} ${AWS_ROLE_ARN}
      }
      

      The final product upon sourcing the above is that `awsassume` is available as a function which will prompt you for a role to assume and load the appropriate authenticated values for that profile into the current environment.

      Implementing auto-completion for the profiles could be a nice alternative, but as already mentioned I'm not currently using this and the `select` based option was normally sufficient.

      The full latest version of these snippets that I load into my bash shell is available at https://gitlab.com/mwhipple/mw-config/-/blob/master/bash.d/aws.sh

AWS Database Migration Service (DMS)

AWS DMS6 provides what is says: database migration. This can be a highly convenient option to move between databases as long as either the source or destination is in AWS. DMS supports a variety of databases with a nice separation between the sync process itself and the endpoints which act as interfaces to the underlying databases. DMS provides change data capture (CDC) in addition to full database loads and can therefore support a prolonged or incremental migration effort.

A distinguishing factor of DMS (as indicated by its name and the first sentence of this section) is that it is oriented towards migration. Migration in this sense could be considered analogous to a bounded replication where after some trigger the replication is halted and likely the source of the migration is replaced with what was the destination. DMS is not designed for ongoing replication. The CDC behavior it provides could certainly serve such a purpose and be a compelling alternative to solutions like Maxwell's daemon, but DMS is not designed for that purpose and some associated use cases or optimizations may therefore be underserved.

Emacs

I used Emacs7 on and off for many years; after several attempts to use other editors and crawling back to Emacs I've finally settled on thoroughly investing in it and using it as my primary editor.

For Tinkering

Grievances and Resolutions

I've had a series of concerns that have driven me away from emacs which I subsequently came to accept or embrace.

  • Emacs is Too Slow/Initial Adoption

    My initial exposure to emacs was in the late 1990s when it would for one reason or another accidentally open on Linux machines I would use. At that time it would generally elicit a groan as I waited for it to finish loading and then either stumbled through using it or closing it and switching to vim. PCs at that point were not slow through a long lens, but slow enough that emacs (at least without some optimizations) was sluggish enough to put me off. It also felt a bit too alien which may have been aggravated by cosmetic similarities to simpler editors such as Notepad acommpanied by the significant behavioral differences; but that's just speculation about thoughts I may have had ~20 years ago.

    Several years after that I was in search of a new editor (which has been a periodic quest of mine). At that time I had started down the path of using Eclipse but was looking for something smaller and more flexible and figured that PCs had sped up enough by then that the previously perceived cost of emacs would be gone (there's obvious cost in Eclipse also but that is a more obvious price for what it is). I started actively learning emacs and was quickly drawn into the discoverable self-documentation that it provides and the REPL-y model of being able to dynamically execute code within and against the editor environment.

  • Too Basic

    Emacs quickly became my primary go to editor and I started down the road of trying to use emacs for everything. Shortly thereafter I had jobs which involved a fair amount of Java coding and so I ended up using Eclipse and IntelliJ IDEA to match the tools familiar with the rest of the team while I got my bearing in unfamiliar terrtitory. Emacs gradually faded into the background while I used shinier alternatives for most of my work and only used emacs for simpler purposes. At this point I fell into the common trap of overreliance on my editor. At the time I somewhat attributed this to a faction of the Java culture which leaned heavily on tooling, but now I'd argue that it is an instance of a nuanced continuing tension which falls out of having more powerful tooling but it certainly does not have clear boundaries.

    Emacs is certainly capable of providing similar functionality but my attempts left me stumbling over package selection and lower level details which were all bundled up neatly in the newer IDE alternatives.

    This pattern was broken as I was preparing to change jobs and practicing for coding exercises. Much of the convenience that IDEs provide can be used to produce some parts of code far more easily, but also with less understanding. Switching back to emacs broke that spell over me and forced me to think far more intentionally about the code I was writing and the value the tool was providing for me, so I switched back to emacs.

  • Avoiding the IDE Project Model

    An immediate practical advantage of a simpler editor is that IDEs track their own model of the projects on which you're working. This can often lead to obvious issues such as development processes being defined in terms of specific editors or even specific versions of those editors and as a result not being directly portable to any system not using that editor (including systems such as CI). A less obvious issue is that this also normally introduces a large amount of information and configuration about your project which is cached or indexed internally to the IDE. This alternative projection of your project data invites new issues when this IDE managed state falls out of step with other definitions or when drift is introduced as that internal state evolves.

    This often manifests as the IDE misbehaving until such state is flushed. Ideally the IDE will coordinate such state though a manual refresh is often a backup option; I've had to occasionally wipe out a project entirely, however. While this is normally not a major issue to correct once recognized, that recognition can waste a fair amount of time. Normally the need for such refreshing is also tied to other changes and therefore isolating the problem down to the IDE rather than those changes may not be straightforward. A variation which is particlarly likely to invite confusion is where the poor behavior of the IDE is that it continues to behave properly when the canonical project has been broken in some way.

    These pitfalls are likely to be addressed through improvements to the IDEs and through new habits/knowledge but they impose additional overhead which doesn't produce value at runtime and the reconiliation of reproducibility between two models is non-trivial.

    Most often much of the additional power of IDEs aims to allow developers to work with larger projects without getting lost amidst the project size: or from another angle they allow projects to sprawl out and then superimpose organization on to that sprawl. The alternative is to structure projects such that they can be navigated and exercised without looking for a tool which may be sweeping mess under the rug. While larger projects may benefit from some such features within or without an IDE, more intentional use of organization and standard build tooling confers a more inherent coherence to the project itself. A large swatch of software currently being developed should also be of a small enough size (potentially something like one bounded context) that if the code is not grokkable without the support of tooling it's likely just a result of sloppiness.

  • Emacs is Too Big

    An underinformed concern that I acquired at one point was that emacs itself has grown to be too large. This was primarily based on times when it was built from source and the enormity of the repository which noticeable during cloning and as consumed by the resulting copy. This size spawned a fear that core emacs has become bloated through some combination of legacy cruft and inherting functionality which should likely not be part of the core. Generally speaking emacs offers value as a more minimalist alternative to larger IDEs and that value would be inversely proportional to the size of emacs itself.

    This concern was trivially addressed when I stopped to pay closer attention to the repository itself. While the repository itself is large, the size of the active state is a small fraction of the overall size. This is almost certainly due to the extended period under which emacs had been under active development and therefore the reasonably full record of the changes over that time would grow to be significant even if the product of those changes had much more tightly bound growth.

    While repository may be notably large, the size of the current project is well within reason and not large enough to lend ample support to fears imputed by the size of the repository as a whole.

  • Elisp is Too Integral

    An ideological concern with Emacs is how fundamental the role of elisp plays within the system. This is particularly evidenced when looking at the source code where references to `lisp.h` are ubiquituous throughout the code base, presenting a clear picture that that file and therefore elisp is one of the foundational building blocks upon which everything is built rather than being a relatively abstracted extension mechanism.

    The most obvious concern with this is that it means that replacing elisp with another language or engine (such as Guile) would be difficult at best. This could be advantageous in that other languages may have larger communities and have better support for some areas in which elisp has come up short.

    From a design perspective this speaks to emacs being a lisp interpretter within which editor behavior is built; so while emacs is described as an extensible editor it could be viewed as more of an extension environment in which an editor exists. In limited past experience platforms with this motivation often end up far more difficult to work with. While extensible systems may offer well defined extension points into which additional behavior may be added and make use of an array of well-defined supporting constructs, systems which have an ostensible purpose but provide a generalized platform to enable that purpose can end up introducing a lot of noise. Such platforms often provide a combination of building blocks and contracts that must be satisfied but push enough responsibility on to the developer to bridge that gap that any introduced limitations and non-portable knowledge outweigh the supposed benefits. The issue likely comes down to trying to provide a unified solution for both the general purpose and the system specific aspects of any extension behavior; the general purpose behavior is likely most easily delivered using the more powerful and likely more familiar tools provided by operating systems and language ecosystems rather than operating within the confines of any specialized tools provided by a particular environment so the primary role should then shift to one of integrating that behavior with the target system which reduces the role of the target system to one of providing a well defined means of integrating that behavior, which in turn leads to the alternative perspective of focusing on an extensible system rather than providing a more general purpose platform (as with most things there is plenty of middle ground also).

    The above concerns are primarily based around the perception that elisp exists as an extension mechanism. In a far more practical light the implementation language for Emacs and elisp is C and exposing C for wide extension has several pitfalls. Safe and consistent extension of a notably sized codebase in C is likely to involve a fair amount of established conventions around code organization and memory management. After creating such additional scaffolding the extension code will interact with the system in specific ways and start to look like a more specific dialect of C, and at a certain point this may be accompanied by the realization that the introduction of such a more specialized form of code is akin to introducing a new language but with somewhat implicit idioms. Therefore layering in some form of language above C makes sense and elisp is a nice simple starting point, and is a natural fit to provide the more dynamic interpretation which allows the more exploratory coding favored by GNU projects.

    Elisp being an entry point doesn't necessarily address why there wouldn't be others. While the benefits that integral language idioms can provide over code conventions is fairly straightforward, the memory management concerns are far more subtle. The management of the lifecycle of objects across multiple languages is not straightforward and reconciling how different languages approach that issue would require significant care. On top of that presenting anything resembling a unified runtime across different language styles is likely to acquire costs hand over fist. Many of these types of concerns are evident when reading some of the Guile documentation (and the general recommendation to work towards pushing as much as possible into Guile so that the work of coordinating lifecycles across boundaries is minimized).

    On top of everything else there's also the general takeaway that this is a somewhat imagined problem. Elisp works and it's simple (and simple enough to be reimplemented elsewhere as needed). Making a practical argument against a language choice in a working system needs a fair amount of support to offset the costs of enhancing that existing language as necessary, and without that support is more likely to largely come down to personal preference. Further this also aligns with the earlier general purpose platform vs. extensible system concept where new behavior should be able to use whatever tools are desired and just be implemented outside of emacs which is simply integrated into the system at the appropriate point using elisp as glue code and otherwise following the general software tools philosophy.

  • There's Too Much Legacy Cruft
  • Standard Elisp is Too Primitive

    After using emacs for a while and starting to try to pay closer attention to available packages, a major gripe is that elisp feels too low level. Some of the major strengths of lisps are that they allow evolving the language used to be closer to the problem space; the fundamental property that written code is a lisp structure itself confers the relatively rare property that such alignment is easily attainable in structure in addition to verbiage. Emacs lisp, however, does not provide a notable amount of such high level constructs. This fits in with snide Emacs quips such as it being a promising OS that is only missing a good editor.

    Although Emacs is ostensibly an editor, most of the language constructs provided remain seemingly too generalized and primitive rather than supporting what would be expected to be typical editor use cases. A particular case that stands out in my mind when dealing with this several years ago was working on wiring up a hook for comint mode. I believe the specifics were that I was running a `compile` with an external process that triggered a rebuild on code changes (typical watch behavior such as done by watchman) and was configured to clear the screen upon each such refresh. The control sequence used to clear the terminal was not working as desired in comint-mode so I wanted to write a handler that would clear the screen when that sequence was sent. While the code was not overly difficult it was far more tedious than expected. The general use case seemed like a typical one: something along the lines of listen to a buffer change and react based on the modified content. The actual fix involved a fair amount of smaller building blocks which involved concerns such as actually retrieving the content out of the buffer, searching through that content, and paying appropriate attention to concerns such as the buffer continuing to mutate after the event and doing any necessary resolutions to produce the desired outcome. While this is does not qualift as particularly low level it certainly felt like a representative enough use case for an editor that several of the steps involved may have been provided by more packaged functionality.

    This is also evident in cases such as typical emacs configuration. It seems reasonable to expect that such configuration could be highly declarative given the previously described ability of lisps to easily enable what amount to domain specific languages combined with the evident domain of the editor (and the configuration thereof). In spite of this, emacs configuration (and general functionality) remains primarily imperative. While a more imperative orientation is more natural for defining the general purpose logic that delivers the underlying functionality provided in elisp packages, it often leaves concerns such as configuration a less readable assemblage of disconnected actions rather than a more descriptive definition of how those actions fit together into an interrlated final composition.

    This is ultimately fairly nit-picky and could easily be addressed through creating or adopting libraries that provide higher level constructs. In the past I started down the path of creating such a library for my configuration and I'm currently planning on working towards the same while trying to build as much as possible on community supported offerings.

  • Wider Elisp is Too Disorganized
  • Emacs does too much

GitLab

Why GitLab

I use GitLab as it is a git hosting service which follows an open core model. I generally try to make use of the facilities provided by git itself and more manually composed pieces rather than relying on functionality provided by any centralized service, so from that perspective the choice of hosting service becomes somewhat abitrary. The main driver for adopting GitLab is therefore due to the aforementioned open core model and lending support to a relative underdog.

Some of the additional features provided by GitLab may be utilized, however the underlying functionality will be defined within the source code or similarly and GitLab will simply act as an execution agent.

GitLab vs. GitHub

GNU/Linux

Motivation

GNU/Linux is adopted as it is open-source, has a very large community and widespread support, and is also the primary target for containerized deployments which allows my local environment to more closely match anything which may be deployed (and therefore more work can map more directly).

  • Alternatives
    • Microsoft

      For tool compatibility, any preferred OS would be POSIX compliant. Last time I used a Microsoft OS (~10 years ago) it didn't fit naturally into that category; I think they may be more in line currently the fear would be that it still represents too significant of an ecosystem split to warrant crossing (maybe next time I get a device).

    • Apple

      Apple OSes are a compelling alternative, particularly given that OS X allows for convenient development against both OS X and Linux whereas the converse case of developing for OS X on Linux is non-trivial. I often use Macs for work computers and am also likely to recommend Apple devices generally to others, however I prefer Linux due to it supporting a higher level of tinkering and a wider range of devices. As previously mentioned Linux is also far more likely to be aligned with deployment environments (Apple seems to have little penetration in the server space), and in particular the differences between OS X and Linux can often be subtle enough to not be immediately obvious but cause subsequent consternation.

    • Others

      There are a variety of other POSIXy operating systems available, some of which promise some conceptual advantages over Linux; Linux largely adopts models which are similar to older versions of UNIX which have proven to be remarkably robust but have been incrementally augmented in other OSes). Linux has a far larger community and knowledge base from which to draw and therefore promises more pragmatic economy.

MongoDB

Restoring Data Onto a Different Replica Set

  • TODO Add sources

    In an ideal MongoDB deployment you have fully operationalized and independent environments with things like proper disaster recovery in place for each enviornment. In reality however the production environment may be the only one that's actually important and ay pre-production environments may ultimately just act as a disposable proxy for production. In any relatively small engineering team such an environment may not justify significant investment and may suffer some minor neglect as a result. In such a scenario if that more disposable environment does end up being disposed of, a recovery plan may involve loading a sanitized copy of production data into that pre-production environment.

    The MongoDB documentation provides ample information on restoring backups from data files, and messing with some of the internal databases should allow you to get all of your replica sets and any relevant configuration servers for sharding mapped out properly. This can be a hassle if you're not using some kind of management interface that eases the pain, but at the end everything should be pointing to the right place.

    BUT…those instructions are suitably focused on restoring a backup to a cluster which matches that from which the backup was taken and there can be some additional wrinkles if you're doing something like restoring a backup from one environment into another.

    A particular issue I've run into is around the target environment being on a much lower "term" than the source environment. Unfortunately `term` is a generic enough…umm…term that it's likely to not be a useful hint by itself without a fair amount of background context and so its significance in associated errors may be glossed over. I no longer have the specifics for when I encountered this error and will need to dig in a bit to collect more concrete information. At the time I ran into the issue I ended up having to poke through the MongoDB source code for the significance of the term "term" to click (which was also referenced in a relevant StackOverflow post). This could affect data nodes or also config nodes as newer versions of Mongo use replica sets on both.

    The "term" here is a reference to an election term which is part of with the Raft-based consensus that MongoDB replica sets use to coordinate understanding of which node is the primary/master node that should be accepting and replicating writes to the data: thereby being responsible for consistency and durability. The term number is used to make sure that all of the nodes in the replica set are up to date as operations are replicated to them, and so if a term number in the oplog (such as retrieved from a backup) is higher than that of the cluster then the cluster ends up in a state where it considers itself out of date and therefore unable to safely handle writes. There are likely more rigourous and correct explanations of this around.

    Messing with this number is not something which is readily exposed, and while there is likely a better way to handle this issue a quick solution is to just force the cluster ahead as many terms as are needed. Note in particular that this solution may be suitable for a relatively disposable cluster but elections impose a short period of unavailability and therefore triggering a flurry of elections will cause a proportional amount of downtime which would be unacceptable for any service that should have relatively high availability.

    An election can be instigated by having the primary stepDown, and therefore terms can be incremented by repeatedly issuing a stepDown. As this is a relatively hacky solution in any case the simplest solution would be to issue a stepDown request to all of the nodes on the cluster in a loop until the required term is reached. Such a solution could be done in bash with something along the lines of:

    while true; do for h in ${mongohosts}; do mongo –host ${h} ${mongoauthoptions} –eval rs.stepDown() done sleep 10 done

    That command may certainly need some tuning as I'm not able to easily verify anything about it, and it also may need adjustment based on how connections to Mongo can be established. Harmless errors/noise will be returned for the secondaries. The outer loop could also be modified to break at the desired term number but that's out of scope for this quickly thrown together memory; similarly retrieval of the active term may need an additional line in there. Some variation of the above with some trivial monitoring should do the trick, and a key safety factor is that so long as the cluster has a term at least as high as the terms in the operations things should work so if the term is advanced too far it shouldn't cause issues.

Terraform

I use Terraform8 a fair amount whenever I need to configure something along the lines of managed infrastructure or services. I like Terraform as it is ultimately fairly simple while conferring the benefits of infrastructure as code. It does not produce any runtime dependencies between the tool and the resources it manages and provides a significant amount of flexibility in terms of interoperating with resources outside of its control along with attaching or detaching such resources from Terraform's control.

Making use of Terraform among a wider group can also provide benefits in terms of coordinating configuration through remote state and (when combined with an appropriate CI configuration) can help safely provision and promote infrastructure while enforcing appropriate controls in terms of permissions and workflow.

Issue Running Tests for AWS Provider

When working on extending the AWS Terraform provider to support a wider range of AWS Data Pipeline functionality I encountered an issue where running the `go` build/test tasks performed terribly when run on my ChromeBook.

The very clear suspect seemed to be memory pressure, but there also seems to be a clear issue in tracking the memory usage of Linux applications in Chrome OS. Running `top` within the Linux environment showed virutally all of the memory free even while reporting that the compilation process was consuiming a large amount of memory. The Chrome OS Task Manager also fails to report a useful memory footprint for the Linux process or any other process, though the Cog app reported appropriately low available memory. The result of running the process was a gradual slowdown until the system becomes basically unusable.

As support for Linux apps on ChromeOS is currently in Beta, I'm hoping the core of this issue will be resolved as any wrinkles in that functionality are smoothed out, but in the meantime I'll be investigating at least enough to unblock my immediate needs (and hopefully resist replacing ChromeOS with Linux).

  • Preparing an Escape Hatch

    The first step in attempting to work through this issue was to make sure an escape hatch was available: leaving a terminal window running with `top` allowed for getting in to kill the process even after the system becomes painfully slow (since a sigterm can be sent with a manageably low number of key presses).

Projects

This Site

  • Publishing

    This was relatively straightforward by borrowing from the provided example9, modified a bit to match my tastes.

    • Use of Project .el File

      My initial hope was to adopt a simpler approach than the example configuration as the first iteration. This site started as a single org file to export to HTML, so the hope would be that that could be done through invoking an existing function through emacs batch mode10. This didn't pan out however: basic export functionality seems focused on buffer contents and the combination of loading the file into the buffer and exporting it seemed to stretch the limits of what should be done in the shell rather than in an elisp file. More appropriate functions exist as part of the more powerful publishing functionality, though that functionality involves a combination of defining projects and tracking some of their related state. This therefore once again tips the scales back toward using a separate elisp file (and is the approach used in the example project). I had also hoped to use directory local variables to help with project definition, but as those files seem to be oriented towards mode specific editor configurations rather than more general or structural configuration it didn't seem to offer an idiomatic (even if feasible) possibility. Ultimately the solution therefore uses an elisp file within the source directory of the site which defines the project and provides functions to help with publishing.

      A big part of the original motivation was that additional logic should not be required solely for the sake of the publishing pipeline, and that remains a goal. More elaborate configuration is expected as the project evolves and therefore project specific elisp was to be expected. The logic in this file should therefore remain the authority on the project configuration regardless of whether it is accessed through a batch build process or a more typical emacs (etc.) session. Ideally that file can remain entirely unaware of the batch invocation, and the publishing pipeline therefore involves nothing more than calling an appropriate standard function.

    • Integration with Make

      With the very simple initial alternative of easily calling a readily available emacs function off the table, pursuit of abject triviality is abandoned. As in virtually every other project I work on I therefore introduce `make` to capture the resulting recipe. On a basic level this is a trivial indirection where the emacs batch command is captured in make rather than in the GitLab CI configuration file, there are however two further adjustments.

      The first of these tweaks is bridging the now more segmented build configuration. If the build is managed entirely within emacs then emacs has access to all of the relevant parameters: however if make is performing more general build definition then it should also provide parameters such as where artifacts should be built. The resulting call to emacs is therefore enhanced to issue a call which passes relevant arguments through to be used in the ultimate project configuration.

      Another slightly more interesting issue is that make introduces a new dependency which therefore must be present in the container within which GitLab calls emacs. For the smallest code footprint the desire would be to use an already published image which contains both (an up-to-date) emacs and make. After a bit of local experimentation the `dev` tagged `silex/emacs` images fit the bill. That particular image seems to be of a fairly large size but considering how it is being used that's not expected to be of practical consequence. Normally I'd also be likely to wire up a make target which recreates the containerized behavior (a host target which calls `docker run` to invoke another container target) but at the moment this doesn't feel interesting enough to warrant the effort given the relative ubquitious availability of make and emacs.

  • Unchanging Cool URIs

    One Web site principle I like and try to adhere to is that of Cool URIs Don't Change. I honestly haven't done a great job with it due to repeatedly ripping up and restarting Web sites, but maybe this time will be different (one of these times will be).

    Maintaining persistent URLs for given content could be handled fairly smoothly if the site has dynamic routing or is fronted by some form of reverse proxy or gateway, but a static site with no such proxy presents a challenge if the content evolves. Beyond the HTTP routing there is also an additional possible challenge if fragments are treated as part of the Cool URIs. In that case any additional server-side routing is ill-equipped to deal with any required rewrites for fragments. Much of this could be addressed through careful initial thought around the URL structure, but in practice it can be nice to evolve content such that what may initially be a section on a page grows to the point where it warrants being split out into its own page. More interestingly it may also be desirable to commingle related content which may have originally been distributed across different contexts. As I'm currently creating this site as somewhat one sprawling page I'm deferring much structural thought and therefore assorted paths of refinement are likely but I'd like to preserve URL based content resolution throughout.

    The above ideas imply a desired solution in which any targetable content should be accessible as either a page or as a fragment within a page. Such content should therefore ultimately be resolvable across requests that:

    • accurately target the current location
    • accces the content through a page that is currently a segment
    • access the content through a segment that is currently a page
    • access the content through a segment on the wrong page

    Additionally there is is the case of non-existent content (404)s.

    The general problem has the basic solution of providing an index of IDs and their actual locations. Incoming requests need to then be able to be resolved based on entries within that index (or lack thereof). Each such entry needs to be able to be resolved in response to the HTTP request itself if the ID is requested as a page/in the path, or using client side scripting if the ID is provided as a fragment. In either case with a static Web site the available solution is to handle this in Javascript after the initial page load. With a slight amount of sophistication the content could be somewhat separated from the container in which it is presented and then the targeted content could be loaded into the current page. As my current site is particularly simple the initial approach will start with a less elegant approach of forcing the client to redirect.

    • Inside vs. Outside org-mode

      With this type of functionality one of the first questions is where to add the functionality between the source input and the produced output. A seemingly ideal place would be to wire up the additional output as part of the actual generation process, and similarly operating on top of whatever source model is available may seem to provide less resistance than an output model which is less aligned with the primarily utilized technologies.

      In this particular case, however, HTML is a very standard format, the needs are simple, and most importantly the desired functionality is independent of any underlying site generation mechanism.

    • Flat Structure

      An assumption in this logic is that all of the output files will be in the same directory. The relevant logic could be modified to process subdirectories recursively, but a flat structure is also a step towards stable URIs in that the subdirectory for a particular file can't change and therefore invalidate a previous URI.

    • Extracting IDs

      The first step is extracting any ids that may be of interest along with their containing page. As the ids are basic, controllable attribute values there is no need for any complex parsing, so an appropriately crafted `grep` command should be sufficient.

      • Specification

        While the basic rules for extraction are simple, it's worth making sure the resulting command handles some anticipated use cases. As this is testing the configuration/parameterization of the established logic of grep the testing will focus more on expected behavior rather than fully excerising edge and corner cases (leaving that to grep testing).

        We can start with a trivial example from which the id should be extracted:

         <div id="simple" />
        

        This should produce the IDs:

         simple
        

        It's not unlikely that multiple IDs may exist on the same line, so it's also worth verifying that all such IDs would be extracted:

         <div id="multi_1" /><div id="multi_2" />
        

        This should produce the IDs:

         multi_1
         multi_2
        

        This largely leaves the nuances around what text should actually be matched. For the sake of keeping everything a bit simpler it shoudl always be expected that `id` will be lower cased, that there will be no spaces around the equals, and that the value will be a simple quoted string on a single line, yielding a regular expression equivalent to `id="[^"]+"` which would grab everything from the opening quote to the next/closing quote.

        There are remaining concerns however around making sure only `id` attribtues are extracted rather than simple an attribute which ends with `id` (making sure the initial boundary is as desired). Some cases to verify such attributes will not be extracted can be added:

         <div pid="pid" />
         <div my:id="my:id" />
         <div my_id="my_id" />
         <div my-id="my-id" />
        

        While some of these cases may go against some adopted conventions, they are compliant with the spec and so future proofing against conventions which evolve either intentionally or incidentally as part of an introduced technology feels wise.

        In addition to targeted cases I also like to throw a case in which just combines variations in a somewhat representative structure.

         <div id="match_1" /><div pid="skip_1" /><div id="match_2" />
         <div id="match_3" /><div id="match_4" /><div id="match_5" />
        
         <div my:id="skip_2" />
         <div id="match_6"/>
        

        This should produce the IDS:

         match_1
         match_2
         match_3
         match_4
         match_5
         match_6
        

        All together this produces test input of:

          <div id="simple" />
          <div id="multi_1" /><div id="multi_2" />
          <div pid="pid" />
          <div my:id="my:id" />
          <div my_id="my_id" />
          <div my-id="my-id" />
          <div id="match_1" /><div pid="skip_1" /><div id="match_2" />
          <div id="match_3" /><div id="match_4" /><div id="match_5" />
         
          <div my:id="skip_2" />
          <div id="match_6"/>
        

        Which should extract the IDs:

          simple
          multi_1
          multi_2
          match_1
          match_2
          match_3
          match_4
          match_5
          match_6
        
      • Implementation

        Skimming the grep man page11 leads to an invocation along the lines of:

         extract_1() {
                 grep -Ewio 'id="[^"]+"'
         }
        

        Let's try try piping the test input through that invocation.

         echo ${input} | extract_1 :noweb yes :results both
        

        The `w` flag enforces word boundaries around the pattern which handles some but not all of the possible leading text which could lead to false matches. While not expected for my immediate use cases is would be nice if the logic was robust enough to not mistakenly pull in attributes such as `foo-id` or `foo:id`. These are allowed by the spec even if not utilized by local convention, and the latter is not unlikely to evolve so accommodating the spec provides some future proofing.

        Preferring POSIX style character classes for portability the above can be refined to:

         grep -Eio '[[:space:]]id="[^"]+"' *.html
        

        This provides a more focused word boundary for the attribute name. This introduces the drawback that now the output will include that leading whitespace character. Using a tool that supports capture groups would allow for tuning this appropriately (I think a grep I've used in the past does this but it doesn't seem widespread), but since this output needs further processing it seems easy enough to address downstream.

    • Enriching With Pages

      ID extraction retrieves IDs embedded within pages, but does not readily address the complementary case of wanting to address pages themselves. Conceptually this is equivalent to having a tuple similar to the extracted IDs where an ID is computed to match the container. Augmenting the stream to include those results would be a viable option but as the output of the above `grep` command is not particularly clean this also feels like something that may best be left for downstream processing.

      Retrieving the list of files can be done trivially with `ls`. The desired output format is one file per line which could be done with the `-1` argument, but is also the default when called in a non-interactive context. Combining both types of input into a single stream can be done with `cat` and process substitution:

       grep -Eio '[[:space:]]id="[^"]+"' ${OUT_DIR}*.html | cat - <(ls ${OUT_DIR}*.html)
      

      When tidying this up both of the sources are likely to be provided by functions which are both called through process substitution, but the pattern of piping stdout to cat is more conducive to exploration.

    • Flipping and Grouping

      For the intended indexing purposes the ID values should be the keys, but the output produced at this point is inverted from that state. The desired behavior also relies on a single container for each key. Rather than bake collision handling directly in to the index preparation, the initial thinking is to defer the conflict resolution by outputing an array value when multiple such containers exist and a scalar value when there is a single, unambiguous match.

      Several different tools could likely fit the bill for this step; I'll go with the stalwart `awk` since it's a good fit and a tool I enjoy using.

  • Family Media Site

    A project I've been dragging my feet on for years is a system to track media files for my family. I settled on a general design and did some proof-of-concept work at the outset, but have since not gotten around to actually building out a UI (which is kinda important for sharing something like media).

    • UI
      • Standard Functionality

        My main thought around Web development (which will be covered elsewhere later) is that the base level functionality provided by modern browsers is more than sufficient for most uses, as long as the developer is able to design systems decently well. In most healthy ecosystems weight usually builds on one side of a boundary which allows a commensurate reduction in weight on the other side: either a foundational system grows richer and the solutions built on top of that foundation grow smaller or an underlying kernel grows smaller and more focused and more specialized work is offloaded to those solutions which are built on top of it. Currently Web development seems to be growing on both sides at once where the languages and runtimes continue to grow more powerful but the systems built tend to remain at increasingly levels of abstraction above that foundation. While this is likely beneficial for some of the more immersive uses of the Web it feels very heavy handed for basic sites and SPAs. My intent would therefore be to just use functionality readily available in a browser: however that does require the aforemented design and dot connection in a space which I honestly don't have much interest in, which is a largely a reason the UI was left alone for a while.

      • React

        The UI is ultimately going to be built using React. The main reason for this is that it is widely popular and I'm currently using it at work. More thoughts around React will likely be captured separately. The starting point will be the typescript template provided by Create React App12, and the UI will be developed using functional components and hooks with minimal additional dependencies.

    Career

    Assessing Companies

    Software engineering positions are increasingly prevelant, and the dynamics and culture which surround those positions continue to evolve in new and interesting ways.

    Home Repair

    Photography

    Process

    Sources

    Footnotes:

    Author: mwhipple

    Created: 2020-10-27 Tue 19:36

    Validate