[discontinued] Cartenix: Run Nix inside Cartesi

slavni96 · June 5, 2023, 9:28pm

New POC Proposal

Cartenix: a Proof of concept allowing you to run Nix inside the Cartesi virtual machine leveraging its determinism to accomplish complete reproducibility.

Core Concept and purpose statement

Nix is a set of software that takes a unique approach to package management and system configuration.

Nix allows you to build packages in sandboxes making you declare all the dependencies (both sources and other packages built with Nix). This ensures a high degree of reproducibility and hermeticity (i.e. if a package works on one machine, it will also work on another). It also allows having different versions of the same package with different dependencies installed on the same machine, effectively solving the “dependencies hell” problem.

Thus we can say that Nix provides an infrastructure to reproducibly build software.

Unfortunately, since everything should be built from scratch otherwise, it has to rely on big centralized caches (substituters/binary caches in the Nix terminology) to speed up things. These caches are continuously populated by a set of trusted builders.

Cartesi would remove this need for trust in centralized builders.

We developed an early POC at ETHGlobal Lisbon winning the “Cartesi Best Use” award. The project is already listed in the ETHGlobal showcase.

Technical clarification about the early POC: during the Hackathon, we successfully executed Nix primitive operations able to produce outputs in the Nix store but not building derivations (though we were close).

The goal is therefore to expand the concepts and experience gathered so far to move forward with the development of a POC that actually succeeds in enabling a generic reproducible build system in a deterministic environment, generalizing its architecture without resorting to time-imposed shortcuts.

How will you use Cartesi, specifically?

In our proof-of-concept project, Cartenix, we will be using Cartesi specifically to enable reproducible builds using Nix. By leveraging the Cartesi virtual machine, we aim to remove the need for trust in centralized builders, which is a limitation of the traditional Nix build process.

Here’s an overview of how we will use Cartesi in our POC:

Architecture Overview: We will utilize the Cartesi VM and integrate it with the Nix package system. The Cartesi VM provides a deterministic and secure environment for executing the Nix builds.
Nix Package System Integration: We will create a Docker image containing a RISC-V Nix installation and a set of already-built dependencies, this container will run in the Cartesi VM environment. We’ll try to shrink this set of dependencies until it contains only sources (that cannot be downloaded from inside the Cartesi VM)
Reproducible Builds: By running the Nix builds inside the Cartesi VM, which is deterministic, we’ll achieve complete reproducibility.
Eliminating Trust in Caches: With Cartesi, we will remove the reliance on centralized caches (substituters/binary caches) in the Nix packages system. This enhances security and eliminates potential vulnerabilities associated with trusting external caches.

By combining Cartesi and Nix, we address one of the big problems of any distributed build infrastructure leveraging on the trustlessness of the blockchain.

For a more detailed understanding of our approach and the technical aspects of the project, you can refer to the “Core Concept and Purpose Statement” and “Technical Details” sections.

Technical details

Rudiments: Nix, as well as a build infrastructure, is also a purely functional and lazy DSL that lets you easily define derivations. Nix derivations are the “build recipes” that, given to the Nix daemon, are built to their outputs (e.g. usually packages but in general they are just directories containing files). During the code evaluation, Nix implicitly creates the dependencies graph and the build plan.

Derivations outputs are hashed using some attributes among which are the name, the architecture, and the hashes of their dependencies. When traversing the dependencies graph of a derivation output you can find some special fixed output derivations (FOD) which are simply derivations that need internet access, then cannot be built inside a sandbox. To ensure reproducibility Nix demands to declare the content hash of the output of these FODs.

It’s important to notice this duality in the hashing: normal derivations (that are built inside sandboxes) are input addressed while fixed output derivations (that are fetched) are content addressed (actually there is an experimental feature behind a flag that enables content addressing for all the derivations).

More information about the Nix model is available in the creator’s Ph.D. thesis

As the Cartesi VM does not have internet access, all the (transitive) dependencies (which usually are simply sources) must be provided within the VM. This means that, when we create the Docker image, we have to know in advance all the fixed output derivations that will be needed by the build. For this reason, we’ll create a different Docker image for every derivation that is going to be built. However, this image creation process will be automatized.

Here is a high-level overview of the technical steps involved in the POC development proposal:

Generalize the build and installation process of Nix for the RISC-V architecture to make it more adaptable
Ensure that Nix is properly installed in a Docker image based on cartesi/python:3.10-slim-jammy (e.g. avoid hard coded paths)
Update the Dockerfile to get programmatically all the fixed output derivations in the build closure of the target derivation output
Move away from the early POC’s reliance on the simple primitive builtins.toFile that we adopted to build a dependency-free package.
Update the nix.py or generalize the quick start example to enhance the interaction and allow retrieving useful information about the build process (e.g. output’s content hash)

The final goal is, given a Nix derivation, provide a programmatic way to create a docker image able to build that derivation inside the Cartesi VM.

Value Proposition

This project is a valuable addition to the Cartesi developer ecosystem and core technology, as it tackles the challenges of reproducible builds using the Nix package system within the Cartesi VM. The successful execution of this POC offers several compelling reasons for the community and other developers to take an interest:

Enhancing Reproducibility: Cartenix leverages the deterministic nature of the Cartesi VM and seamlessly integrates it with the Nix package system, enabling deterministic and fully reproducible builds.
Eliminating Trust in Centralized Caches: The POC aims to remove the reliance on centralized caches (substituters) in the Nix package system, enhancing security and reducing vulnerabilities associated with external caches.
Fostering Collaboration and Innovation: The successful execution of Cartenix paves the way for collaboration between the Cartesi and Nix communities. It encourages the exploration of novel ideas and solutions that leverage the strengths of both technologies, fostering innovation and knowledge sharing among developers.

Subsequent Vision and Extensibility

To extend the functionality of the POC, the following immediate next steps and features that could be implemented are:

CLI Integration: Introducing a command-line interface (CLI) via sunodo, a command runner for Cartesi, would enhance the user experience and allow developers to interact with Nix and Cartesi more efficiently.
Shared Computation: Enabling shared computation among multiple Cartesi VM instances would enhance resource utilization by reusing already built trusted artifacts. Developers could leverage distributed computing capabilities to accelerate the build process, reduce latency, and improve overall efficiency.
IPFS/Shared Cache for Dependencies: Integrating IPFS or a shared cache mechanism for storing and retrieving dependencies would optimize the build process. This would reduce the reliance on individual developers to provide all the necessary dependencies, improving efficiency and enabling easier collaboration among developers working on similar projects.
Nix for Docker Images: Using Nix to build Docker images for the Cartesi VM would further enhance the reproducibility and portability.
Recursive Cartenix: Build above mentioned Docker images using Cartenix itself

Risks and Contingency Plans

This project may face some potential challenges. Here are some examples and proposed contingency plans:

Technical Limitations: The POC’s success relies on the effective integration of the Nix Package system with the Cartesi VM. If technical limitations or compatibility issues arise during the implementation, it may hinder the progress of the project. Those limitations could also be related to resource availability, such as RAM and CPU within the Cartesi VM.
Dependencies and Complex Applications: As the project aims to generalize the build process and handle complex software dependencies, there may be challenges in managing and integrating a wide range of dependencies (e.g. Nixpkgs support for RISC-V is suboptimal).

To mitigate those risks, the development team will actively engage with the Cartesi community and seek support from the Cartesi team to address any technical challenges that arise.

Success criteria & Deliverables

The success criteria for the Cartenix POC can be defined as follows:

Generalized Build and Installation Process: The POC will be considered successful if the build and installation process of Nix for the riscv64 architecture is generalized and made adaptable.
Dependency Source Provisioning: Success will be achieved if the POC effectively adapts the step of providing dependency sources. This should ensure that all the fixed output derivations and necessary dependencies are accurately fetched and included within the Cartesi VM during the image build.
Generalized Software and Dependencies: The success criteria involve successfully generalizing the software to be built and its dependencies. The POC should demonstrate the ability to build a non-trivial application with complex dependencies.

Deliverables

The entire codebase under a permissive Apache-2.0 open source license
A readme file that provides an overview of the project and that provide a comprehensive documentation
Sample Builds: demonstrations of successfully built software using the generalized process

Estimated Duration and Funds Requested

Duration: 4 weeks

Funds request for the POC: 5000 USD

About Your Development Team

The development team for the Cartenix project consists of two:

Andrea Ciceri

Role: DevOps Engineer and Nix developer at MLabs

Ivan Sala

Role: CTO at Beatdata, Agile & Blockchain enthusiast

CC. @aciceri

ERC-20 Payee address

0xAF0DE9616735ED16E25F0C08829d5E37068dd375

joe-cartesi · June 7, 2023, 4:56am

I see a good alignment between the project POC and the objectives of the Cartesi Community Grants Program. It’s great that the project aims to create a public good by using Cartesi technology to eliminate trust in centralized caches (binary/substituters) and leverage determinism for reproducible builds. The focus on reusable infrastructure falls in line with the program’s goals. It seems that the technology developed could be widely adopted by applications that require reproducible builds in a decentralized environment.

I also think the project is well-motivated and has the potential to significantly impact how package management systems operate in a distributed environment.

A couple of questions for you:

I’m not an expert in Nix, but I took a look. How will you make sure the Docker image that’s created for each package isn’t too big and slow? Considering each container needs to hold all the necessary ingredients (the ‘fixed output derivations’) to build the package.
Considering the current limitations of the Cartesi VM, such as limited memory, how does the project intend to handle complex builds that might require more resources?

And what a great writeup, by the way. Thanks for embracing our new format as we move toward a leaner approach to BUIDLING

aciceri · June 9, 2023, 1:28am

Hi, glad to see that we share same ideas regards the advantages of this synergy

About your questions, we think that to be properly dealt with, they should be addressed with respect to the Subsequent Vision and Extensibility of the project in the next milestone.
However:

How will you make sure the Docker image that’s created for each package isn’t too big and slow? Considering each container needs to hold all the necessary ingredients (the ‘fixed output derivations’) to build the package.

At the beginning we’ll start using a derivation whose FODs in the closure are not that heavy altogether. Probably like 90% of the derivations in nixpkgs requires glibc which in turn requires gcc, so this would mean having gcc sources injected into any Docker image. If this alone makes things too slow we could always “cheat” injecting into the Docker image also normal derivations (non FODs, like glibc itself). This wouldn’t be optimal since it would mean trusting whoever built that derivation but could be a convenient compromise until we find a way to speed up the Cartesi VM. We could also choose to always ship a set of already built trusted derivations into the Docker image. However, frankly I’m more worried that build process in the container would be slow than the docker image being too large.

Considering the current limitations of the Cartesi VM, such as limited memory, how does the project intend to handle complex builds that might require more resources?

Talking with @carlofragni in Lisbon it seemed it was possible increase the memory used by the machine by default. As said above, a temporary workaround could be providing a subset of trusted prebuilt derivations in the Docker image.

Anyway we are not trying to completely solve these problems in this POC, in our opinion, how to handle memory limitations and the docker image size and related performance should be addressed in the view of the “Shared Computation” and “IPFS/Shared Cache for Dependencies” macro-activities as described in the proposal.

For example we know that adding IPFS support to Cartesi is in the roadmap, indeed if the VM fetched stuff from the internet we could create a general docker image with Nix (not one image for target derivation) that builds arbitrary derivation given an input like "build xyz trusting these derivations (and you provide IPFS CIDs i.e. content hashes).

milton-cartesi · June 11, 2023, 2:17am

Hello there! First of all, congratulations and thanks for submitting this proposal! I believe that tackling build reproducibility is a tough but crucial task - and not only for Cartesi. For the Cartesi ecosystem, ensuring reproducible builds means that every DApp’s template hash can be checked by anyone: simply build the DApp locally from its sources and see if the resulting hash is the same.

I admit I lack the appropriate depth of knowledge about some aspects of the subject, but here go some comments:

I also don’t see memory or size as a problem in principle. Speed, however, may be: building a lot of stuff from source inside the Cartesi Machine will certainly take some time!
I’m not sure I understand your expectations regarding “IPFS support for the Cartesi Machine”, maybe there is some confusion with the IPFS/Logger support we have for Cartesi Compute? In any case, I think it’s perfectly reasonable to fetch data from IPFS (or another source) and add it as a Cartesi Machine drive. I guess automating that would involve some tooling around the Cartesi Machine. Let me know if that makes sense and if we are on the same page here, ok?

Cheers!

slavni96 · June 12, 2023, 4:24pm

Yes, I think that we are on the same page about this.
We are not referring to IPFS/Logger support for Cartesi Compute but instead about eventually using a (new) feature that supports fetching data from IPFS, in this regard, we did understand from @carlofragni that something was already ongoing or at least has been already supposed
(but maybe we misunderstood that and it’s something completely new but still of common interest that can be addressed in the future).

bubbalex · June 14, 2023, 3:25am

Looks promising and straight to the point!

joe-cartesi · June 21, 2023, 7:34am

Gotcha. Yes, it’s understandable that you might circumscribe a set of issues to tackle first in the POC rather than addressing everything all at once. It’s nice to see that there is at least a domain to which my concerns belong inside your macro-activities.

I’d like to see if @milton-cartesi is satisfied with your response, and if @carlofragni or others have any more feedback.

This proposal is shaping up quite nicely.

erick.demoura · June 21, 2023, 10:33am

The value proposition is clear and compelling.
I am in favor of moving this proposal into voting.

Blockmain.capital · June 21, 2023, 8:48pm

Great in detail. Let’s move this project into voting

joe-cartesi · June 22, 2023, 7:52am

It would be a good idea to have at least one person from the Cartesi Emulator team weigh in on this proposal. Once we have their feedback in this thread and all is ok, we can proceed to moving this to a vote.

Augusto · June 22, 2023, 6:32pm

I really like the proposal, it would indeed be great if Package Managers could be more trustless and the determinism of the Cartesi Machine can be an important piece of this puzzle.

The first question I have concerns the typical times to build packages inside the Cartesi Machine. I get a bit worried because build times of things like Firefox are already quite considerable and if we add the emulator overhead and the fact that it is single-core, it can become a bottleneck. Have you benchmarked this?

Also, I have another question that is completely orthogonal to this. What about the possibility of making Cartesi DApps reproducible? For example, in the case of Solidity, it is possible to verify that a certain binary was originated from a given source code (given the correct compilation parameters) because the Solidity team was careful enough to provide such features. For example, Etherscan made this verification accessible through its interface: Verify & Publish Contract Source Code | Etherscan

But in the case of Cartesi, it can be very difficult to provide the same level reproducibility given that there are many compilers with varying levels of reproducibility. The only solutions I see are:

we could expect people to distribute their DApps in the form of source code (instead of binaries) and then the first thing that the validator does is to compile it. Perhaps your project could help Cartesi testing this route?
we could add some support for people to make reproducible builds outside the Cartesi Machine. This would be very different from your proposal. But do you think Nix could be helpful in this approach as well?

aciceri · June 26, 2023, 8:21am

I get a bit worried because build times of things like Firefox are already quite considerable and if we add the emulator overhead and the fact that it is single-core, it can become a bottleneck. Have you benchmarked this?

We haven’t benchmarked it but during the hackathon we noticed that both Nix evaluation and builds seemed very slow. Besides being single-core, probably much is due to the fact that the RISC-V architecture is emulated and that available memory was low.
However we weren’t thinking about people building softwares like Firefox from scratch, building it with all of its transitive dependencies from scratch wouldn’t be fast not even on a powerful modern pc. Also for this reason we were talking about including already built trusted derivation into the Docker image. Futhermore consider that, when the build is done, is done forever, people can distribute their softwares normally, and users can check that the hash corresponds to one produced by the Nix builder living in Cartesi.

Also, I have another question that is completely orthogonal to this. What about the possibility of making Cartesi DApps reproducible?

This is a really interesting question, we also investigated this possibility during the hackathon.
Nix itself, executed outside Cartesi, doesn’t guarantee absolute reproducibility. It just provides sandboxes where stuff is built without access to the network and other system impurities and where all the inputs has to be made explicit. This alone gives you enough reproducibility for most scenarios. There was this project (no more active since 1 year I see) that tried to continuously build the NixOS minimal ISO using different combinations of hardware and kernels and checking how many derivations output paths differ. As you can see the result is not bad but also not 100%. I believe that the main problem are multi threaded compilers, probably you can get more reproducibility forcing the execution with a single thread (still no absolute certainty).

However Nix can be used to build Docker/OCI images (without Dockerfiles then) with a degree of reproducibility way greater than directly building them with Docker (and also other advantages).

Speaking of the absolute reproducibility issue, unfortunately I don’t think there are currently solutions to this problem: or you use specifically designed softwares (as solc /solcjs you mentioned) or you limit yourself to one thread putting away all the non-determinism that multiple threads involve.

If, instead, we are talking about building Cartesi DApps with inside a Cartesi virtual machine then yes, you would have real reproducibility (and greater build times, but again, doing it only one time is enough, no?)

we could expect people to distribute their DApps in the form of source code (instead of binaries) and then the first thing that the validator does is to compile it.

In my opinion the solution is always providing both, where obviously building from sources must produce the same binaries (and Nix makes this way easier).

Perhaps your project could help Cartesi testing this route?

Probably not this grant directly but using Nix for building DApps Docker images would be another interesting attempt to do. Indieed at the hackathon in Lisbon we were unsure if trying to run Nix inside Cartesi or use Nix to build Cartesi DApps, both ideas seemed interesting.

we could add some support for people to make reproducible builds outside the Cartesi Machine. This would be very different from your proposal. But do you think Nix could be helpful in this approach as well?

Already answered above, absolutely yes! But probably it would be out of scope for this grant.

Hope we have answered fully

milton-cartesi · June 26, 2023, 12:56pm

Hi folks. Let me see if I understood things correctly:

This proposal is focused on making 100% reproducible builds by running Nix inside Cartesi Machines, and it’s actually aimed at building regular mainstream software, and NOT Cartesi DApps themselves. Is that right?
Running Nix outside Cartesi Machines to enable reproducible Cartesi DApps builds is NOT within the scope of this proposal

I’m not sure I understand if (1) (covered by this proposal) also applies to building Cartesi DApps themselves, which I personally see as our main interest.

Moreover, given that Cartesi Machines are already deterministic, I’m also a bit unsure about the benefits of adding Nix to the mix (sorry about the pun): its relevance is to allow reusage of previously deterministically compiled components? (sorry about my ignorance on the subject)

slavni96 · July 5, 2023, 12:06pm

Hey sorry for the late reply @milton-cartesi

This proposal is focused on making 100% reproducible builds by running Nix inside Cartesi Machines, and it’s actually aimed at building regular mainstream software, and NOT Cartesi DApps themselves. Is that right?

Exactly

Running Nix outside Cartesi Machines to enable reproducible Cartesi DApps builds is NOT within the scope of this proposal

Exactly, although as written in the previous message, we believe it would be another useful Nix application

given that Cartesi Machines are already deterministic, I’m also a bit unsure about the benefits of adding Nix to the mix (sorry about the pun): its relevance is to allow reusage of previously deterministically compiled components?

Well, Nix will benefit from Cartesi with this proposal and Cartesi will benefit in terms of finding a very useful (in our opinion) use case for it i.e. getting an infrastructure for decentralized builds.

Furthermore as written in the “Value Proposition” paragraph we believe that the project will be valuable in Enhancing Reproducibility (of any package), Eliminating Trust in Centralized Caches (a benefit for the nix ecosystem), Fostering Collaboration and Innovation (a benefit for all).

Last but not least, as described in the paragraph “Subsequent Vision and Extensibility” running nix outside Cartesi Machines is out of scope of the proposal but in the vision for the future of adopting nix in the Cartesi ecosystem.

erick.demoura · July 11, 2023, 4:34am

Based on the discussion above, it appears that the alternative project supporting reproducible Cartesi DApps aligns better with the immediate needs of the Cartesi project.

On the other hand, considering the insidious increase of malicious packages in open-source, Cartenix still sounds compelling.

I foresee that there may be technical obstacles on the path to transforming Cartenix into a practical solution. It would help to understand how you see potential challenges and the path for the actual adoption of the solution.

Also, how far are you willing to pursue this project toward adoption?

slavni96 · July 13, 2023, 4:58pm

I agree, as emerged during the discussion of this proposal there will be some challenges before being able to have a complete practical solution, but I think everything proposed is very doable and the activities could lead to other interesting outcomes and projects reusable by the community.

I really want to see Cartenix “born” and get adopted (and I think I can freely say that from both of us @aciceri). At the moment I am investing my personal time but I see foreseeable to invest also resources from my company.

erick.demoura · July 14, 2023, 4:03am

Thanks for your response, @slavni96.
I keep my stance on moving forward with your proposal for a vote.

joe-cartesi · July 14, 2023, 5:23am

Alright, Thanks everyone for their input. We have allowed enough time for others to voice any feedback they may have had. Now it is time to let the community decide if this project should be funded. We will put this to a vote on snapshot, and I will put the link here soon.

joe-cartesi · July 19, 2023, 11:20am

The voting has begun! Make sure to stake your CTSI in order to vote on this proposal!

https://snapshot.org/#/cartesi-community-grants-program.eth/proposal/0xcd1159b7a49540b4402e97a0bce1099955dd314b8e116a3f0918828c689008c0

joe-cartesi · July 26, 2023, 7:12am

Congratulations, your proposal has passed the community vote with 100% who voted “yes, fund this project”!

I will reach out privately for next steps. Thank you again to everyone who took part in giving feedback and staking their CTSI to vote on snapshot.