Annotations
Some context — Dr. Bertozzi was one of the the 2022 Nobel Laureates in Chemistry for her contributions to developing click and bioorthogonal chemistries.
My understanding is that the basis for a lot of mucin research, especially in the synthesis space, is built on click chemistry — a general overview being that click chemistry is good for attaching complicated things to other complicated things. That explanation does the process zero justice, but it’s the best I’ve got.
The catch, here, is that click chemistry attaches two complex structures together. If you’re linking two big, rigid things together, you have some options in how flexible or stiff the connection point can be. The goal, then, is to use ColabFold to measure flexibility between two rigid structures.
The point of reading this paper is to look into ColabFold hyperparameters. Here, we have an established, published use of ColabFold for specifically this domain. If we can replicate the models, here, using the current hyperparameters (the defaults), then we can have some confidence that the current hyperparameters could apply well to other problems in this domain. If it doesn’t match up, then we need to worry more about the hyperparameters.
This is where my interest piques, since I’d need to bug my partner for ELI5 explanations of any of the research up to this point.
I’m here to know how exactly they used ColabFold for this particular problem domain. From the core ColabFold paper, there’s quite a few hyperparameters that allow us to make sure the model best matches the environment it’s seeking to emulate, as well as expose the relevant information for further study/replication/confirmation.
In a best-effort attempt to confirm this, I looked into the upstream paper referenced in the ColabFold method, where they are confirming the AlphaFold result with the upstream Yu et al. paper investigating StcE specifically. The associated data for that paper references 3UJZ: Crystal Structure Of Enterohemorrhagic E. Coli Stce, which — I think — is the experimentally-determined StcE structure. There’s an associated plaintext amino acid sequence that we can pop into a .fasta file and feed to localcolabfold and… hopefully just get the same structure this paper got, but with the full ColabFold statistical report?
Comparing the outputs of our run versus this paper’s run, then, we either do, or don’t get the same structure:
- If we do get the same structure, we can be fairly confident that this paper is also just using the default
localcolabfoldhyperparameters from their sample run, and have some comfort in continuing to use those hyperparameters in similar scenarios; or - If we don’t get the same structure, we can assume they used different hyperparameters that aren’t here, or in the supplementary materials, and we may need to reach out and ask what hyperparameters they used.
Well, it was audacious to expect a clear-cut answer here. After using localcolabfold under sample hparams to categorize the 3UJZ sequence, and coloring it to the same domain coloring map available at the NIH 3UJZ source, I’m getting… something vaguely similar. From the Relaxed, Rank 1 PDB:

We’re about to get real fuzzy, here.
The Y shape demonstrated in the paper’s results does seem to be present, although not quote as cleanly as the sample figure. Additionally, my assumption (with fingers crossed) was that the C and INS domains were in the 5 domains from the NIH source. I’m not sure this ended up being the case.
As a more quantitative source that we do have a hub-and-spoke with three offshoots, though, we can take a look at the error graph:

From my understanding on how to read this chart from the upstream ColabFold paper — specifically, the extended figure from the bioarXiv pre-print, area that have low confidence, but high consensus across the multiple models, may correspond to generally flexible offshoots to the core rigid structure of the protein. If that’s a correct understanding, those three uncertain regions would correspond to three offshoots, two of which are likely the C and INS domains mentioned.
The best conclusion I can take away, then, is that the ColabFold defaults are likely good enough for cursory glances, but would need to be better understood.
My secondary conclusion, though, is that AlphaFold is generally a precursory/investigatory garnish that can assist in an exploratory phase. We can see here that it was used for just a handful of figures, to visually highlight important information, but is (obviously) no substitute for experimental evidence. It’s a pair of binoculars to look closer at where you’re headed, not the thing that gets you there.
Y’all are about to see me do my best, again. This is a knock-on from the ColabFold annotations with a specific instance of ColabFold that somebody pointed me to. The same disclaimer applies — I’m not a computational biologist! This is due diligence, simply to make sure I’m understanding ColabFold the best I can while using it.