Self-assembly of silk-like protein into nanoscale bicontinuous networks under phase separation conditions

Liquid-liquid phase separation (LLPS) of biomacromolecules is crucial in various inter and extracellular biological functions. This includes formation of condensates to control e.g. biochemical reactions and structural assembly. The same phenomenon is also found to be critically important in protein based high performance biological materials. Here, we use a well-characterized model triblock protein system to demonstrate the molecular level formation mechanism and structure of its condensate. Large-scale molecular modelling supported by analytical ultracentrifuge (AUC) characterization combined with our earlier high magnification precision cryo-SEM microscopy imaging lead to deducing that the condensate has a bicontinuous network structure. The bicontinuous network rises from the proteins having a combination of sites with stronger mutual attraction and multiple weakly attractive regions connected by flexible, multiconfigurational linker regions. These attractive sites and regions behave as stickers of varying adhesion strength. For the examined model triblock protein construct, the β -sheet rich end units are the stronger stickers while additional weaker stickers, contributing to the condensation affinity, rise from spring-like connections in the flexible middle region of the protein. The combination of stronger and weaker sticker-like connections and the flexible regions between the stickers result in a versatile, liquid-like, self-healing structure. This structure also explains the high flexibility, easy deformability and diffusion of the proteins decreasing only 10-100 times in the bicontinuous network formed in the condensate phase in comparison to dilute protein solution. The here demonstrated structure and condensation mechanism of a model triblock protein construct via a combination of the stronger binding regions and the weaker, flexible sacrificial-bond-like network, as well as its generalizability via polymer sticker models, provide means to understand not only intracellular organization, regulation, and cellular function but also identifies direct control factors for and enables engineering improved protein and polymer constructs to enhance control of advanced

fiber materials, smart liquid biointerfaces or self-healing matrices for pharmaceutics or bioengineering materials.

INTRODUCTION
In liquid-liquid phase separation (LLPS), a macromolecular solution reorganizes at molecular level to two (or more) coexisting liquid phases differing in their molecular compositions.2][3][4] In cells, LLPS results in, for example, membraneless organelles, such as nucleoli, Cajal bodies and PML nuclear bodies in the nucleus, but also, at a more general level, compartmentalize molecules, control molecular environments and via that command many cellular biochemical reactions. 1,4The significance of LLPS in biological cells rises from the phase separation giving the cell means to concentrate proteins, nucleic acids, and other important molecular components at discrete cellular sites, and via that provide the cell control over function.Yet, as the condensate phase boundary is a rather permeable composition gradient in a liquid, molecular exchange between the phases occurs readily: the condensates are in constant equilibrium exchange of molecular components with their surroundings. 57][8] Examples of this are the reversible stress-induced condensates 9 or function inducing reversible grouping of signaling proteins in biological cells. 10LLPS has also been demonstrated to have a role in prion formation, i.e. controlling folding of biomacromolecules.
Similarly, LLPS may have a role in several neurodegenerative illnesses such as Parkinson's, Alzheimer's and ALS, see Ref. 11 for a recent review.
Further outlining the importance of the LLPS, in biotechnology especially in biological materials engineering, condensates have so far been confirmed to be key steps, e.g. for elastomer protein, 12 amyloid, 13 or silk-like protein construct based materials, 14,15 but also materials rising inspiration from, e.g.mussel feet adhesive proteins, [16][17][18] squid beak protein components, or nacre protein biomineralization. 19In polymer technology, the condensate phase is referred to as coacervate phase and it is a key assembly phase to, for example, the superior mechanical characteristics, [20][21][22] superhydrophobic, 23 or responsive materials 24 rising from hydrated polyelectrolyte assemblies, in particular layer-by-layer polyelectrolyte multilayers (PEMs) and polyelectrolyte complexes (PECs).Hydrated polyelectrolyte assemblies rising from LLPS have a wide set of technological applications in e.g.functional films, energy applications, drug delivery systems and adhesives. 18,24,25Controlling the coacervate phase via e.g.ionic strength, 26 temperature, 27 pH, 28 but also aqueous solvent, 29 are commonly used means in engineering the characteristics of PEM and PEC based synthetic polymer materials.
Due to this wide basic biology and technological relevance, liquid-liquid phase separation has attained significant interest in the scientific community.Fundamentally, the aqueous, multicomponent macromolecular solutions present in all the above-described biology and bioand polymer materials science systems phase separate because the lowest achievable total free energy state corresponds to partitioning the aqueous solution molecular composition unevenly, 30,31 i.e. to a macromolecule rich condensate phase and a macromolecule dilute carrier phase.Simplistically thinking, the entropy loss involved in demixing (phase separating) a uniformly mixed molecular solution needs to be countered by enthalpically favorable interactions, i.e. attraction between the condensing species.However, for proteins condensates it is worth noting that an especially prevalent class of proteins present in biomolecular condensates is intrinsically disordered proteins (IDPs). 4,32,33IDPs do not have a single specific folded structure, but instead have a large range of flexibility in their conformations, and may change their conformational phase space, or folding state upon surroundings, for example condensation.For example, intracellular phase separation and formation of membrane-less bodies in cells seems to involve proteins with significant conformational heterogeneity and intrinsically disordered regions [34][35][36] .Altogether, the prevalence of IDPs in protein condensates points toward the conformational flexibility and degree of disorder in the condensing macromolecule having a role in the condensate formation.
In addition to flexibility and disordered regions, condensation of proteins has previously been associated with attractive interactions between parts of the condensing proteins, see e.g.Ref. 1 .
Various binding domains, interactions between helical and β-sheet regions, π-π-interactions, π-cation, and electrostatics based attraction have been suggested to contribute to separation. 1 Especially segments of oppositely charged residues appear to enhance condensation in many cases, see e.g.8][39] .As expected, attraction rising from electrostatics is sensitive to pH and salt. 40The salt dependency rises mainly because of electrostatic screening and ion adsorption. 41wever, the salt response can also have complex concentration and salt species dependencies. 42,43Altogether, this points toward interactions with water and water solvation of the ions and macromolecules having a role in the protein condensate formation.Signaling molecules that have an important role in intracellular organization and phase separation contain repeats of weakly binding, associative domains. 34Additionally, sequences of low-complexity regions 44 and a small subset of aliphatic amino acids, especially alanine, glycine and serine, 45,46 have been associated with phase separation.
Condensation mechanism considerations for multidomain macromolecules, such as complex proteins, have so far focused on strong, specific interactions between regions of the macromolecules forming a network that holds the condensate structurally intact. 33More recently, the presence of multiple attractive regions (multivalency) has been reported to promote phase separation 4,47 and also the coexistence of interactions of different magnitudes (stronger and weaker interactions) appear to be condensation prerequisites for a protein system. 48Additionally, in some other protein systems, structural transitions promoting β-sheet formation seem to promote LLPS. 49Furthermore, condensate formation examination via various engineered protein systems composed of repeated folded domains connected by flexible linkers have revealed that the number of attractive regions (valency) and their affinity both promote assembly into larger structures, enable phase separation at lower concentrations, 50 and decrease dynamic rearrangements of molecules in the condensed droplets. 48eoretical considerations of condensation mechanisms in macromolecule solutions involve both classical polymer physics-based considerations, see e.g.Refs. 34,36,51for reviews, and traditional materials science statistical mechanics based phase separation theory approaches, see Ref. 31 for a review.The polymer physics approaches focus on solvation and mixing characteristics, as well as, the effect of associative interactions between polymer segments 52,53 to explain the liquid-liquid phase separation characteristics.Berry et al. 31 argued that the liquidliquid phase separation can in most cases be understood via equilibrium considerations of the materials mixtures.Linear proteins with multiple attractive interaction regions (multivalent proteins) can be considered as associative polymers 52,53 with specific intra and intermolecular "stickers" of non-covalent interactions between the domains. 54The stickers in these systems span a range of length scales, can vary in strength and also be directional in nature. 34Examples include localized, point-like charge interactions or hydrogen bonding but also larger scale hydration-mediated interactions including hydrophobic attraction between both individual protein segments but also larger units of secondary structure such as β-sheet or α-helical domains.The sticker approach has been relatively popular also in numerical studies.For example, Martin et al. 13 employed such sticker-and-spacer approach to explain the phase separation behavior of their prion protein system, Zeng et al. 55 a Gaussian cluster theory approach also based on considering the proteins to have sticker sites and Choi et al. 56 employed a lattice model based on the sticker concept to simulate phase transitions of protein systems.
The traditional computational means to model the phase separation is field theory type statistical mechanics-based approaches but as biomolecular condensate formation is often subject to localized interactions, more recently also particle-based simulation methods that cover microscale from atomistic detail to mesoscale have become popular. 57They can provide predictive frameworks and may aid in interpretation of experimental results by directly probing the underlying physics with molecular detail. 58Condensate formation in biological systems has been studied via the former approach in Refs. 59,602][63] .While the field-based studies resolve basic premises for LLPS, they suffer from lack of chemical specificity and the microstructural correlations, even at the level of secondary structure.On the other hand, particle-based simulations are limited by their computational cost which results to either limitations in the description resolution or describable system size and time scale as well as their inability to efficiently probe rare events, such as low-probability conformational rearrangements and nucleation. 57So far, the particle-based simulation were mainly used to correlate single-molecule properties with phase behavior [62][63][64] .Atomistic detail molecular dynamics to resolve structural features and adhesive sites in condensate forming proteins and their subunits have been reported. 65Coarse-grained molecular modelling has resolved, e.g., phase transition temperature dependency as the function of protein system properties. 62Despite efforts in understanding the condensate microstructure, e.g., formed in protein-RNA systems, 66,67 so far, no studies have resolved the formation and the assembly mechanism of silk-like protein condensate at microstructural level.This might indicate that the condensate microstructure is system-specific and also that for many LLPS systems, no specific microstructure emerges.
In our previous experimental work on a model triblock protein construct CBM-AQ12-CBM exhibiting condensate formation 14,15,17 , the LLPS enabled pulling of long, resilient protein fibers out of it, 14 and the condensate turned out to be an efficient adhesive 17 .Motivated by this, we examine here the CBM-AQ12-CBM condensate via large scale molecular modelling, combined with analytical ultracentrifuge (AUC) characterization of the solvated proteins.The role of AUC is to verify the range of protein configurations and folding shapes rising from the molecular simulations of the partially disordered protein.Previously reported cryo-SEM imaging of the condensate 14 provides a direct comparison and verification means to the structures observed in modelling.
We chose this model triblock construct because we already have significant understanding of the condensate macroscopic properties and the conditions over which LLPS occurs.Furthermore, the protein construct is modular, contains intrinsically disordered regions, yet as a model system is simple enough to potentially reveal the condensate formation mechanism and its control means.Additional interest to this particular protein construct choice rises because its condensate has been reported to have a bicontinuous network structure. 14Hence, examining this particular model protein system could resolve the molecular features giving rise to the bicontinuous network structure, as well as, its potential significance in condensate system properties.Altogether, the performed study is the first that we are aware in which large scale molecular simulations are combined with such high precision experimental characterization methods (both AUC and cryo-SEM) that the modelling can be directly compared and verified against the experimental data.This enables extracting from the modelling data the condensate molecular level structure, its control means and formation mechanism while the overall assembly matches experimental microscopy images.

Computational methods
The Gromacs 5.1.4package 68,69 with the Amber03ws force field 70 was used for all-atom molecular dynamics (MD) modeling of the silk-like proteins.For water, in compliance with the force-field choice, the explicit TIP4P/2005 model was employed. 71The simulations model the triblock architecture of the CBM-AQ12-CBM, i.e. a repetitive mid-block terminated by CBMs at each end, but instead of the twelve AQ repeats in the experimentally characterized protein, the simulated protein CBM-AQ3-CBM consists of three AQ repeats, see Figure 1.The CBM-AQ3-CMB is used in the modelling because it captures the end unit interactions, effect of linkers, and has a sufficiently long repeat of AQ units to obtain both flexibility and intra-as well as intermolecular contact sites, following the longer repeat unit; the shorter repeat enhances dynamics and improves statistics in the simulation sampling.
Protein bonds were constrained by the LINCS 72 and water by the SETTLE 73 algorithms.The equations of motion were integrated using the leap-frog integration scheme and a 2 fs time step.
Configurations were saved for analysis every 10 ps.The protein solutes were coupled to the heat bath separate from the solvent and ions.Temperature was maintained at T = 298 K using the Bussi et al. canonical V-rescale thermostat 74 and pressure at 1 bar by the isotropic Parrinello-Rahman barostat. 75The coupling constants were  = 0.1 ps and  p = 2 ps, respectively.Long-range electrostatic interactions were modelled by the PME method. 76Van der Waals interactions were described using the Lennard-Jones potential with a 1.0 nm cut-off.
Long-range dispersion corrections were applied.Periodic boundary conditions were applied in all directions.All molecular visualizations employ the VMD software package. 77e CBMs forming the terminal domains have the crystal structure of Ref. 78

(Protein Data
Bank ID code 4JO5) as their initial configuration.Initial configurations for the linkers, AQ units, and H6-tag were generated using the Avogadro software 79 with initial secondary structures following Ref. 80The protein parts were merged using VMD 77 and pdb2gmx tool of Gromacs.To reduce bias resulting from the initial configuration in interactions between different protein regions of the flexible AQ repeat part, the configurations were relatively extended, see Fig. 1.The protonation states of the residues were set to match pH = 7 resulting in a total charge of -5e per each CBM-AQ3-CBM molecule.In all molecular dynamics runs, a corresponding number of Na + counter ions to make the simulation systems charge neutral were introduced into the solvent.
Two different types of CBM-AQ3-CBM protein simulations were performed: single protein and 200 mg/ml concentrated solution multiprotein simulations containing either 10 or 28 CBM-AQ3-CBM constructs.For single-protein simulations seven repeat runs were performed.For the concentrated solution, containing 10 proteins, six different initial configurations with varying composition (see Table S1), while for the biggest system, containing 28 proteins, there was a single initial configuration.The single protein simulations targeted at resolving structure and solution characteristics information of the protein construct, and variation of these as the protein contains intrinsically disordered flexible parts, while the concentrated multiprotein simulation represents the conditions at which CBM-AQ12-CBM is known to form a condensate phase in experiments. 14For the single protein simulations, the single CBM-AQ3-CBM proteins were first solvated by water in a 29 × 11.5 × 9.5 nm 3 box, periodic boundary conditions in all directions.The AQ repeat part of the protein adopted significantly less extended conformations during the first 10 ns of nanoseconds of simulation: at 100 ns, the protein and a 2 nm thick water shell around it were extracted and solvated again in a 16 × 16 × 16 nm 3 box as this captured more efficiently the protein shape.The NPT run was continued for 100 ns to a total of 200 ns simulation time.The concentrated, multiprotein solutions were constructed from 10 or 28 final protein conformations of the single protein simulations set in random orientation and placement into a simulation box.In these multiprotein simulations, the simulation box size for each system corresponded to 200 mg/ml protein concentration and was 16.5 × 16.5 × 16.5 nm 3 (for 10 proteins) or 22.6 × 22.6 × 22.6 nm 3 (for 28 proteins).The multiprotein simulations were run for 300 ns.

Experimental methods
Cloning, expression and purification of the CBM-AQ12-CBM was carried out according to our earlier study. 14Determination of the anisotropy (deviation from spherical shape) of the protein was conducted by a Beckman Coulter Optima Analytical Ultracentrifuge.
Sedimentation velocity experiment were performed at 50000 rpm and at 20 o C. As a buffer we used 0.1 M of NaCl, concentration of the CBM-AQ12-CBM, commonly referred to also as CBM-eADF3-CBM, 14 protein was 0.5 mg/ml.Sample was measured by UV/Vis absorbance optics at 280 nm.Data analysis were performed by Ultrascan III version 4.0 revision 2528 (http://www.ultrascan.aucsolutions.com).Partial specific volume was determined from the sequence and is 0.6994.Noise reduction was conducted by two dimensional spectrum analysis. 81We used Genetic Algorithm for further refinement and regularization. 82More detailed information about the experiment setups and data treatment procedures is available in Ref. 83 .In presented data values, the error estimate corresponds to a 95% confidence interval.
The frictional ratio of protein constructs were estimated using the UltraScan Solution Modeler (US-SOMO). 84

RESULTS AND DISCUSSION
The molecular simulations of single CBM-AQ3-CBM constructs in water (200 ns) indicate the protein adopts a significantly compacted shape in water solution.Figure 2 presents a detailed analysis corresponding to one of the examined seven different single CBM-AQ3-CBM initial configurations, while the data corresponding to the rest of the single protein simulations are provided in SI (Figure S1).A summary of the findings about the conformations adopted by the protein construct is that it has five stable α-helical Ala-rich regions located one in each linker and AQ unit, see Fig. 2(a-b), as well as, a metastable Gly-Gly β-sheet region, see Fig. 2(c).
Stable α-helical regions observed in MD simulations are in line with the high propensity of Ala-rich regions to form α-helices that has been reported in literature 85 and confirmed in our previous Fourier transform infrared spectroscopy 14 and circular dichroism 86 experiments for CBM-AQ12-CBM protein.Additionally, the conformations show intramolecular α-helix interactions and α-helix bundling, see Fig. 2(b).The structure of CBM is stable in the simulations.The residue contact map corresponding to the sample configuration in Figure 2(a-c) is presented as Figure 2(d).The contact map reveals the α-helix bundling as strips directly adjacent to the diagonal line, but the most dominant features are the two large crisscross patterns at the ends of the diagonal of the graph.These are characteristic for parallel and antiparallel β-sheets at the terminal CBM segments.
Analysis of the rest of the single CBM-AQ3-CBM trajectories starting from different initial configurations presents some variation in the degree of compactness of structure and AQ3 region packing conformations but the general structural features described above persist.
Additionally, due to the multitude of conformations to which the β-sheet and α-helical regions can pack such that the hydrophobic segments are shielded from water, the chain extensions show variation and conformational flexibility, as expected for an intrinsically disordered protein.Detailed analysis of the simulations with different initial configurations, their secondary structure evolution, the corresponding residue contact maps, and time evolution of the radius of gyration of the structures showing the flexibility are provided in the Supporting Information (Figs.S1 and S2).

Fig. 2. Sample conformation of a single silk-inspired protein CBM-AQ3-CBM after 200 ns simulation and its residue contact map. Subfigure (a) shows the protein, (b) and (c) magnifications of α-helical and β-sheet features in the structure and their residue compositions, and (d) the map of smallest distance between residue pairs of the entire protein construct.
Let us next compare the single protein conformations present in our simulations (CBM-AQ3-CBM) against the shape of CBM-AQ12-CBM proteins in dilute solution assessed by our AUC experiments.The AUC data analysis gives a value of 1.81± 0.05 for the frictional ratio that characterizes the shape of the individual molecules (data provided in SI, Figure S3).The simulations frictional ratio of the CBM-AQ3-CBM constructs, as determined by US-SOMO, 84 is calculated as an average over the seven different final configurations (see SI).These vary between 1.33 and 1.6 in their frictional ratio, with a mean of 1.45 ± 0.05.For a comparison, we determined the frictional ratio also for sample extended constructs of CBM-AQ3-CBM (see Fig. 1) and CBM-AQ5-CBM.For the sample extended constructs, the frictional ratio was 2.0 and 2.8, respectively.If one calculates a corresponding estimate for an extended CMB-AQ12-CBM protein, the expected value of the frictional ratio is around 5-6. Notably, both the AUC data derived value and the value calculated from the simulations correspond to significantly compacted protein structures.Altogether this means that the examined protein constructs adopt significantly compacted structures in water solution; the difference between experimentally and computationally determined frictional ratios can be directly explained by the higher number of repeating units in the experimental construct.We conclude the compacted conformations adopted by the single proteins represents a similar level of compactness as indicated by the AUC data.Hence, using the final configurations of the single protein simulations to construct the multiprotein initial configurations is a warranted choice.

Concentrated protein solution
Next, the concentrated protein solution was modelled by using the conformations rising from the single protein simulations in random order and orientation as initial conformations of the multiprotein setups.Systems of both 10 and 28 proteins at the same 200 mg/ml concentration were modelled for 300 ns.Notably, the time scale is sufficient for local relaxation, structural changes of the proteins locally, and changing local order but not for full protein chain diffusion.
In all multiprotein simulations corresponding to this elevated protein concentration the proteins reorganize to a non-uniform, sponge-like network arrangement.In this, the individual molecules form a sponge-like frame with the β-sheet rich CBM end units acting as main binding stickers.Examples of the network are presented in Figure 3a   As visible in Figures 3 and 4a, the network appears to be carried by compacted regions of the β-sheet rich CBMs and bundles of α-helices, both intramolecular and intermolecular.Figure 3 presents a structural analysis of the network via a contact map and identification of typical binding patterns.Similar to individual proteins (Fig. 2), each pattern in the contact map of Figure 3 can be associated with corresponding intermolecular connection between proteins: the crisscrossing patterns of the β-sheet rich CBMs dominate but also α-helix -β-sheet connections between the proteins exist.The magnifications in Fig. 3b show examples of these two characteristic connection patterns in this system and the involved protein segments.In both types of connections, attractive interactions between the molecules rise mainly from Ala, Ser, and Gly amino acids, see Fig. 3(c).Furthermore, we assessed the fluctuations of the amino acid pairwise contacts: although difficult to measure quantitatively, the amino acid pairing fluctuations measure a standard deviation of approximately 10%.This means, the pairings are flexible and evolve in time.
Notably, periodic boundary conditions may enhance or even cause the formation of the observed networks and their structural specifics.Both the 10 protein and the significantly larger 28 protein system result systematically in repeat runs with different initial configurations in very similar networks pointing toward the network structure rising from the protein.Contact maps for all the different simulation runs and initial configurations are included in the SI (Figure S4).In our earlier work, we examined the liquid-liquid phase separation and the condensate structure formation was examined also by Cryo-SEM imaging. 14Interestingly, Cryo-SEM showed the formation of a bicontinuous network that has a sponge-like structure, see Fig. 4b.
Figure 4c presents the pore size characterization while 4a presents the simulation predicted network.Perusal of especially the extended, 28 protein system reveals that the network-like structure observed in simulations bears a remarkable resemblance to the Cryo-SEM imaging of the condensate samples, see Figure 4. To gain understanding to the connection of the experimental microscopy images and the structure adopted by the proteins in the simulations, and via that, the fundamentals of liquid-liquid phase separation and its relation to the formation of protein fibers, we analyzed both the Cryo-SEM images and the simulations for the pore size distribution.For Cryo-SEM images, the average pore diameter is 47.7 ± 0.5 nm (determined by ImageJ software 88 ) which is of the same order of magnitude as the size of a single molecule of the protein CBM-AQ12-CMB.In the simulations, even the 28 protein the system is so limited in size that a full assessment of the pore size distribution is unfeasible.However, the pores rising in this system are typically between 10 and 15 nm in diameter, measured using the VMD software. 77The difference can be directly explained by the shorter AQ repeat between the end domains in the simulations, as the network in the simulations configuration is dominantly carried by the end domain connections.Furthermore, the pore size observed in simulations is also limited by the size of the simulation box.In total, the structure obtained in the simulations is consistent with the Cryo-SEM imaging and provides a molecular level explanation of the formed structure.
As the modelled network seems to be consistent with the Cryo-SEM imaged structure of the condensate, let us next discuss the significance of the simulations findings for understanding the condensate.First, the dominance of CBM connections in the network formed in the simulations suggests that the initial assembly in the protein solution could be driven by attractive interactions between the terminal groups.This is in line with experimental, AUC based observations of isolated CBMs showing dimerization with a weak dissociation constant KD of 90 ± 30 M even at low concentration in aqueous solutions. 89Furthermore, the role of terminal domain for condensate formation of AQ12 protein construct is demonstrated by the concentration threshold needed for the phase separation increasing significantly, when SpyCatcher was used as a terminal domain (instead of CBM) in SpyCatcher-AQ12-SpyCatcher proteins as opposed to CBM-AQ12-CBM proteins. 14Although we are not aware of KD values corresponding to SpyCatcher, the finding shows the end unit is important in condensate formation.
In the simulational network, CBM-CBM connections carry the main load in supporting the network structure while α-helix bundles have a lesser role.It is worth noting that the experiments employ a protein construct with 12 AQ units so the relative concentration ratio of α-helices from AQ units to CBMs is significantly higher than in the simulations.This might result in a higher number of intermolecular α-helix-α-helix connections and larger intramolecular α-helix bundles 14 than what can be observed in the simulations employing proteins with three AQ repeats.This also decreases the CBM concentration in comparison to the AQ repeat unit part in general; a longer AQ repeat unit leads to a larger conformational variety.
Attractive interactions between the terminal domains can be expected to immobilize the proteins.Indeed, for dilute solution (single protein simulations) the mean diffusion coefficient calculated from simulations was 2.3 ± 1.3 ×10 -7 cm 2 s -1 and for the concentrated solution, 0.4 ± 0.3 ×10 -7 cm 2 s -1 , see Tables S2 and S3.The diffusion coefficients, determined using the FRAP method for CBM-AQ12-CBM, in the dilute phase surrounding the condensate are 1.18 ± 0.08 ×10 -7 cm 2 s -1 and in the condensate phase, 4.5 ± 0.5 ×10 -9 cm 2 s -1 . 14This significant 400-fold decrease in the experimentally measured diffusion coefficient in the condensate is related with increased intermolecular interactions between proteins, such as attraction between the terminal domains.Notably, the diffusion coefficients calculated from MD simulations are larger than those in the FRAP measurements, because in the simulations a shorter protein construct is used.Also, the simulation timescale is very likely insufficient to fully capture the translational motion of the protein in the concentrated solution.Nevertheless, the values calculated from MD simulations are in line with the FRAP experiments results, and both indicate clearly the presence of intermolecular attractions in the condensate.
The bicontinuous network observed here in the protein condensate provides an explanation for the exceptionally low surface energy, i.e., 1.5 ± 0.5 µN/m, observed for the condensate droplets in previous study. 14In particular, as the network rises from individually relatively weak, multiple level connections, the proteins readily reorganize, enter and exit the condensate resulting in a low surface energy.Additionally, a recent study 90 points toward bicontinuous networks in which the interactions between the water and polyelectrolyte/protein are weak, and thus the water mobility is similar to bulk water, having a very low interfacial tension between the phases.To assess our system against these prior reports, we also determined the diffusion coefficients for water molecules in our protein-rich simulation systems and compared them with the diffusion coefficient calculated from bulk water simulations, using the same water model.The water diffusion coefficient in the highly concentrated protein solution here proved to be only 20% lower than the bulk water diffusion coefficient.The small decrease in the diffusion coefficient results rather from steric effects in the network than strong interactions between water and proteins.Furthermore, in the simulations, the pore size is underestimated in comparison to the experimental system due to using a shorter protein construct in the simulations: for larger pores, the water diffusion coefficient can be expected to increase as larger water droplets approach bulk water in character.[93] Discussion about the liquid-liquid condensate structure and its implications for protein condensates The results indicate that the end units of the protein construct act in the condensed protein phase as mechanical load-bearing, stronger connections that carry the main responsibility of the stability and cohesion of the condensate.A secondary contribution to the condensate structural cohesion rises from the α-helical and transient β-sheet structures that form a variety of weaker connections.Altogether, this leads to the LLPS having multiple-level hierarchical network structure where the basic characteristics of the condensate, such as low surface energy, fast in/out diffusion of the macromolecules and in general, molecular permeability, rise from coexistence of connections with varying magnitudes of load-bearing strengths.
Notably, the mid-part of the protein has compacted shape also in the concentrated protein solution.The compacted shape rises from a high degree of α-helical bundling between the different AQ units.This folding can be considered to represent randomly folded, disorganized mechanical springs, i.e. provides flexibility and resilience to the condensate, and decreases condensate viscosity.Furthermore, even small mechanical forces will displace these weaker connections and reorganize the α-helix bundling in the network.This intra and intermolecular binding structure with sticker interactions of different magnitudes actually provides a mechanistic explanation for the pulling response of silk-inspired fibers from the condensate, as demonstrated in Ref. 14 During pulling fibers from the condensate, the stronger end unit connections carry most of the mechanical load while the resilient, spring-like weaker bonding gives way and reorganizes.For the compacted, perhaps spring-like, middle part, this signifies unfolding to a more extended configuration in the fiber pulling. 87multaneously, a transition of the α-helical regions to β-sheets, characteristic for bioelastomeric fibers, 94 has been recently reported for this system. 14r simulation results here show also that any formed β-sheet regions are likely to form attractive pairs with each other, or with the remaining α-helical regions.The formation of a fiber with aligned, extended proteins can thus be envisioned with newly formed intermolecular connections stabilizing the fiber.Yet, as these connections are relatively weak, and able to form and reform, the fiber remains in fluid state while hydrated, where water acts as molecular lubricant and gives space for relaxation of the macromolecules, see e.g.Refs. 91,95This also explains previously reported seamless fusing of the protein fibers with other similar fibers. 14

CONCLUSIONS
Here, we demonstrated via molecular simulations supported with AUC experiments and our earlier 14 high precision cryo-SEM microscopy work findings the detailed structure of a bicontinouous protein network formed under liquid-liquid phase separation in a recombined triblock protein solution.The results show the bicontinuous, sponge-like structure of the condensate phase rises from attraction between the end units of the protein while secondary contributions and compaction of the proteins in the condensate phase rise from smaller attractive regions, stickers, in the flexible, multiconfigurational middle part; these weaker middle part interactions easily break and re-form again which provides a high flexibility to the network but also indicate a mechanism for protein chain alignment under directed mechanical strain: indeed, the liquid-liquid phase separation in the same triblock protein solution has earlier been demonstrated to be a prerequisite for pulling silk-like fibers. 14 the observed behavior rises from attraction between end units separated by a flexible chain combined with lesser, relatively localized attraction sites in the flexible part, the observations are very likely general for macromolecules with similar architecture, including polymers, proteins and peptide constructs.In total, the findings align well with associative polymer models 52,53 and sticker-site considerations in them.Notably, we emphasize that a hierarchical set of interactions, i.e. weaker and stronger attractive stickers, linked by flexible regions are needed to produce the liquid-liquid phase separation, the bicontinuous network, the low surface tension in it, the diffusion responses, and altogether the response experimentally characterized for this protein system.The findings have significance in understanding cellular function 1,4 but also building artificial cells, 96 and engineering advanced biomaterials solutions. 14,17,18reover, the extracted features, via connection with classical associative polymer theories, provide a handle to design and engineering of liquid-liquid phase separating polymer materials for synthetic fiber, tissue engineering, sensor, energy, and drug carrier applications.S1 lists the number of repeat runs corresponding to each simulation system.
Diffusion coefficients D were calculated using the mean square displacement (MSD) of the protein center of mass.The linear region on the MSD plot was approximated by a straight line according to the Einstein relation.We note that the simulation duration influences the absolute value of D and the point in presenting the values is a qualitative comparison.

Fig. 1 .
Fig. 1.The silk-inspired protein construct CBM-AQ3-CBM used in the molecular dynamics simulations and its amino acid sequence.The visualization shows a sample initial configuration used in the simulations.The initial configurations are extended in comparison to the resulting final simulation configurations for better configurational sampling.
Figure 4a (28 protein system).The simulations corresponding to the different initial

Fig. 3 .
Fig. 3. a) A sample protein condensate formed in the molecular dynamics simulations of concentrated, 200 mg/ml protein solution (10 CBM-AQ3-CBM proteins in the simulation box).b) The contact map of the corresponding system.The CBMs are highlighted in yellow.The snapshots on the right show the contacts between proteins corresponding to the different patterns and the amino acids involved.The running residue number r = (n-1)R + r i , where n is the molecule index (protein count index), r i the residue number within a protein, and R = 493 the total number of residues within a single protein.(c) Interprotein and (d) intraprotein amino acid-specific pair counts averaged over the 6 different initial configurations.

Fig. 4 .
Fig. 4. (a) The protein condensate formed in the molecular dynamics simulations of concentrated 200mg/ml protein solution, with 28 CBM-AQ3-CBM proteins in the 22.6×22.6×22.6 nm 3 simulation box.(b) A high-magnification SEM image of the internal structure of a condensate droplet shows details of an internal bicontinuous network (scale bar 200 nm), reprinted with permission (CC-BY 4.0 License, http://creativecommons.org/licenses/by/4.0/)from Ref. 14 (c) Pore size distribution of Final conformations and conformation time evolution for the different initial configurations of CBM-AQ3-CBM in the 200 ns molecular dynamics simulations are presented as Fig. S1 Fig. S2.presents the corresponding radius of gyration time evolution.Frictional ratio distribution of the CMB-AQ12-CBM proteins as determined from the analytical ultracentrifuge (AUC) data is available as Fig. S3.In Fig. S4, the final 200 ns molecular dynamics simulation conformations resulting from the different initial configurations of concentrated CBM-AQ3-CBM solution (200 mg/ml) and their contact maps are presented.Finally, Table