Annotations
As an unqualified qualifier for context: AlphaFold is a machine learning model that is able to simulate how protein molecules curl up, or fold.
This is a notoriously hard problem to solve. Imagine one of those magnetic bead toy sets, where you can pull it apart to get a single-file chain of magnets. However, these magnets aren’t all the exact same; instead, they all want to be at different angles, are different shapes, and have different magnetic fields. Your job is to take this random chain of magnets, toss it in the air, and figure out what shape it’ll be before you catch it. That’s a simple version of the protein folding problem.
In the spirit of friendly competition among nerds, the Critical Assessment of Structure Prediction (CASP) competition was established in 1994. The goal was simple — given a bunch of proteins whose folds have already been solved, who can use only computers to predict correct folds?
The CASP competition is basically an Olympic event for computational biologists, and has a massive impact on research for medical treatments and drug interactions. Experimentally determining folds for protein sequences is difficult work, so being able to even vaguely estimate protein folding would be a miracle.
In 2017, when I was a freshman computer science undergraduate, a (fairly young) professor said that he was hopeful, but not optimistic, that reliable computerized protein folding would be achieved in his lifetime — at that point, the best models were a far shot away from reliable, barely coming close to parity with experimental predictions.
Then, in 2018 and 2020, AlphaFold absolutely swept the competition:
QUOTE
Anyway — I’ll get on with it. AlphaFold (and its successors) are now open-source software, allowing anybody with a half-decent GPU to run it at home. The problem of protein folding still isn’t solved, but the accuracy of modern models is still enough to be incredible for preliminary research, allowing researchers to find and follow promising investigative results. AlphaFold is probably a leading candidate for any list of 7 Wonders of the Modern World, and its creators won the 2024 Nobel Prize in Chemistry.
I’ll be honest — the CASP website is… dense. I’m having trouble finding the CASP16 (2024) metrics, but it seems(?) like ColabFold actually did come in first for the single-protein domain competition. Congrats!
My understanding is that single-protein folding is now nearly solved, coming close to experimentally confirmed folding results. It seems that the next big thing is predicting protein multimers — which, if I’m understanding right, is how multiple proteins fold together even if they’re not chemically attached. Fret not! CASP has a competition for that, too!
Well — fret a little bit. The NIH under the Trump administration withheld funding for CASP17, scheduled for fall of this year. All part of the mission to make us great again?
Thankfully, stopgap funding from close partners seems to be keeping the competition afloat.
Okay — whoever coined “amber force fields” as an actual scientific parameter should get the Nobel Prize for Coolest Term.
However, this does bring up precisely why I’m taking a look at this paper. As neat as machine learning models, they aren’t plug-and-play. Transformer-based models are built on weights — the values of connections within the model itself. These are not directly customizable.
However, that isn’t to say these models aren’t customizable. All models are. We call these customizations hyperparameters, and they’re effectively how somebody using the model can properly drive it.
In the context of the investigation I’m trying to assist, these are very important. This paragraph seems to go over them with a light touch, so more digging may be necessary. Their values aren’t necessarily opinion, but they are part of the experimental design and should be both understood during the investigation, and published as part of any results.
This is the tricky part about the localcolabfold library — it exposes these parameters for “advanced users,” which is why it’s important to understand what’s going on, here.
Ah, this is the good stuff — in the proof-of-concept, I wasn’t quite sure what any of these visualizations represented. I did appreciate how pretty they were, though.
My homelab server is beefy, but not half-a-terabyte-of-RAM beefy.
I don’t think that this will come up, but I know that there’s… something in here about a more complex/intensive run generating key-value pairs for additional analysis, especially in active areas of the folded protein — keeping an eye out for that, because it seems relevant.
Okay — so there are material differences in the data. The shape is roughly the same, and the local run is probably good enough for preliminary investigation, but any more intense runs will probably require more than my 2x3060 could handle. The current homelab runs are a hair above “just fuckin’ around.”
After looking at the source code for localcolabfold, I was thoroughly impressed by how well-optimized it was. The onboarding was a bit tricky if somebody didn’t have a real UNIX-y background, but onboarding is realistically the hardest part of any project, and the most impossible to test.
Ah, okay — so there is a short-circuit in place when the result has converged. This is a good one to get confirmation of, as there’s quite a bit of iteration in the standard run — while a single run recycles, it also runs through multiple models (and, if you want to, multiple seeds on each model). This is the difference between convergence (one model seems to have honed in on a consistent result) and consensus (multiple models have returned similar final results).
Interestingly, non-consensus is not a dealbreaker, and can actually be valuable information. Off the top of my head, a specific case of non-consensus as valuable data is to suss out which parts of the fold are flexible, and which are rigid. As I understand it, it’s like how those floppy blow-up mannequins in car dealerships can flop around.
I did see this parameter floating around — while I think there is some reason to adjust these, the values may be more limited by the available RAM on my workstation than hard scientific rationale.
You know — I’m usually used to papers like this floating GPUs like the Nvidia H100, which is about $35,000 retail. However, I’m pleasantly surprised by the cards they put forward, here: the V100 32GB seems to be around $700 right now, and the K40 is, like, $150.
Looking a bit closer, my homelab machine may be a bit stronger than their base AlphaFold2. However, it did still barely break 60FPS on Borderlands 4, which is the real benchmark — right?
As a sanity check, the target they mention running on the stronger machine is T1061, from CASP14. At 949… residues? amino acids? I doubt anything we’re up to would break that cap.
While I was hoping for a bit more information on how best to read the output graphs, and how best to tune the hyperparameters of the model to try and actually answer the investigatory questions, I think this was a good contextual primer for ColabFold, and a nice opportunity to gush about the nerds out there competing in the world’s strangest, but possibly most beneficial, sport. Why curl when you can fold?

We’ll be taking a minor detour from the usual riveting topics — the philosophy of computer programming, the graph theory behind urban design, etc — to visit a new and exciting subject: folding proteins in the comfort of your own home!
The heaviest of disclaimers — I am absolutely not a computational biologist. The words “proteins” and “sugars” mean almost nothing to me, save for two adjectives that might describe a steak dinner and dessert. These annotations are part of a favor to a friend, who has found the need to set up and run ColabFold for some investigative digging into something she’s working on.
While I know next-to-nothing about biology, I know just enough about machine learning models to get a branch of the project, localcolabfold, running on my homelab server to generate an initial proof-of-concept.
A good second step to using a new piece of software (after, of course, actually getting it to run) is to make sure you’re using it right. While reading a single paper is no substitute for actual training, or a Ph.D. in computational biology, I’d like to at least be able to semi-accurately convey good practices for driving AlphaFold (as well as other related models), to make sure she gets the information she needs to carry out a thorough investigation.