just heard about CHASM. there isn't much written about it: "CHASM is the Cryptographic Hash Algorithm Secured Mirroring solution. An ambitious project to replace rsync as the leading mirroring solution for Linux distributions and other large projects with multiple mirrors with a peer-to-peer system that also provides assurances about the integrity of mirrored data." there is a Google summer of code project for kernel.org along with one for centralized statistics gathering. https://korg.wiki.kernel.org/index.php/Gsoc2010:ideas chasmd improvements Assisting: John "Warthog9" Hawley Website: http://projects.robescriva.com/projects/show/chasm CHASM, the Cryptographic Hash Algorithm Secured Mirroring solution, is a project that is to help alleviate a lot of the pains that mirrors have in organizing and verifying their content. The project can be thought of as a stateful rsync daemon in some respects, and is a project that kernel.org and a number of other large mirroring infrastructures have been looking into for several years now. This is ultimately a project that will be used by a greater portion of the larger mirroring infrastructures and as such has a lot of need for high performance and good design. This is a project to help get CHASM to a usable and production quality state, it is currently in the middle of a rewrite into C++ for performance reasons and there are still several aspects that may need to be flushed out. Individuals will need a solid understanding of *NIX systems programming in C or C++ (C++ is mainly used to provide things like destructors and type safety). Familiarity with the git scm storage model, and rsync internals are both positive traits. Developers seeking to work on CHASM will be working primarily on developing network code, including documenting the network protocols. Students will be expected to be able to develop such code/protocols independently, but will be provided every chance for feedback and guidance from the current developers so as to maximize the impact of their contributions. Students looking to work on CHASM should contact the current developers, and register on the bug tracker (http://projects.robescriva.com/account/register). Things to note about this project: There are several servers involved in this project; most of which communicate locally over Unix domain sockets. Each server will be a separate piece of functionality. All code written should be accompanied by test code to aid in automated testing (see http://cdash.chasmd.org/ for our dashboard). C++ is the language used by current developers. We chose C++ for its beneficial standard library and ability to link C libraries as well. Code written must be capable of running for extended periods of time without excess resource consumption or leakage. Centralized statistics gathering Primary: John "Warthog9" Hawley This is a multi-part project involving both the collection of the statistics and the server aggregation of the statistics. The main idea of this project is to create a universally usable statistics download statistics collection. The Open Source community has a tendency to rely on a wide flung array of servers and infrastructure to provide it's download distribution. This works wonderfully for the most part, however there is little insight into the mirrors themselves from the position of the originator of the data. This lack of insight is due to a multitude of problems, from privacy concerns and legal reasons to system to system resources on the mirror itself. This project is intended to help both the mirrors themselves and the upstream providers of data get a better handle on how many downloads of various things are actually occurring. It's intended to be an all encompassing solution, meaning that the project will work equally well for something like kernel.org, to Fedora, to Ubuntu, to Apache and to Mozilla should they choose to use it. This project will involve both a frontend log parser capable of determining what downloads have occurred, the type of download and how much data was transferred, as well as unique downloaders for that server. There will also be a backend portion of this, that will initially be hosted on kernel.org. This backend will be the collection point for the statistics that will be provided by frontend processes running on the mirrors. It will involve logging statistics, parsing out duplicates from a single mirror, deal with mirror authenticity and aggregating the statistics. It will also provide a website for individuals to be able to quickly browse and discover common downloads from a particular distribution, or open source project. Things of Note about this project: There is both a client and a server aspect of this project, both pieces need to be created and interoperable along with a client/server api. Clients: Resource constrained environment Needs to be lightweight and as efficient as possible Potential to be processing 10s or 100s of Gigabytes of data on a single run fora single machine Will be collecting data from a variety of different log types from http, ftp, rsync, git, etc. Server: Mostly a web-app, for reporting and data collection Needs to be relatively efficient, but not to the same extent as the client Has to be capable of running independent of the kernel.org infrastructure General todos: Prototype client Prototype server Prototype API -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads _______________________________________________ mirrorbrain mailing list Archive: http://mirrorbrain.org/archive/mirrorbrain/ Note: To remove yourself from this mailing list, send a mail with the content unsubscribe to the address mirrorbrain-request_at_mirrorbrain.orgReceived on Sat Apr 03 2010 - 06:34:56 GMT
This archive was generated by hypermail 2.3.0 : Wed May 05 2010 - 18:17:05 GMT