Re: [mirrorbrain] CHASM (Cryptographic Hash Algorithm Secured Mirroring) + download stats

From: Peter Poeml <peter_at_poeml.de> Date: Wed, 5 May 2010 20:15:22 +0200 · This archive was generated by hypermail 2.3.0 : Wed May 05 2010 - 20:47:05 GMT

Hi Ant,

wonderful. That sounds like a very promising project. I hear from John
that no student was found for GSoC, but it's being worked on.

Definitely something that MirrorBrain would want to integrate with.

Also, the second referenced project is interesting; it goes into a same
direction as mod_stats.

Thanks very much for the pointers!
Peter

On Sat, Apr 03, 2010 at 02:34:35AM -0400, Ant Bryan wrote:
> just heard about CHASM. there isn't much written about it:
> 
> "CHASM is the Cryptographic Hash Algorithm Secured Mirroring solution.
> 
> An ambitious project to replace rsync as the leading mirroring
> solution for Linux distributions and other large projects with
> multiple mirrors with a peer-to-peer system that also provides
> assurances about the integrity of mirrored data."
> 
> there is a Google summer of code project for kernel.org along with one
> for centralized statistics gathering.
> 
> https://korg.wiki.kernel.org/index.php/Gsoc2010:ideas
> 
> 
> chasmd improvements
> 
> Assisting: John "Warthog9" Hawley
> Website: http://projects.robescriva.com/projects/show/chasm
> 
> CHASM, the Cryptographic Hash Algorithm Secured Mirroring solution, is
> a project that is to help alleviate a lot of the pains that mirrors
> have in organizing and verifying their content. The project can be
> thought of as a stateful rsync daemon in some respects, and is a
> project that kernel.org and a number of other large mirroring
> infrastructures have been looking into for several years now. This is
> ultimately a project that will be used by a greater portion of the
> larger mirroring infrastructures and as such has a lot of need for
> high performance and good design.
> This is a project to help get CHASM to a usable and production quality
> state, it is currently in the middle of a rewrite into C++ for
> performance reasons and there are still several aspects that may need
> to be flushed out. Individuals will need a solid understanding of *NIX
> systems programming in C or C++ (C++ is mainly used to provide things
> like destructors and type safety). Familiarity with the git scm
> storage model, and rsync internals are both positive traits.
> Developers seeking to work on CHASM will be working primarily on
> developing network code, including documenting the network protocols.
> Students will be expected to be able to develop such code/protocols
> independently, but will be provided every chance for feedback and
> guidance from the current developers so as to maximize the impact of
> their contributions.
> Students looking to work on CHASM should contact the current
> developers, and register on the bug tracker
> (http://projects.robescriva.com/account/register).
> Things to note about this project:
> There are several servers involved in this project; most of which
> communicate locally over Unix domain sockets.
> Each server will be a separate piece of functionality.
> All code written should be accompanied by test code to aid in
> automated testing (see http://cdash.chasmd.org/ for our dashboard).
> C++ is the language used by current developers. We chose C++ for its
> beneficial standard library and ability to link C libraries as well.
> Code written must be capable of running for extended periods of time
> without excess resource consumption or leakage.
> 
> 
> Centralized statistics gathering
> 
> Primary: John "Warthog9" Hawley
> 
> This is a multi-part project involving both the collection of the
> statistics and the server aggregation of the statistics. The main idea
> of this project is to create a universally usable statistics download
> statistics collection. The Open Source community has a tendency to
> rely on a wide flung array of servers and infrastructure to provide
> it's download distribution. This works wonderfully for the most part,
> however there is little insight into the mirrors themselves from the
> position of the originator of the data. This lack of insight is due to
> a multitude of problems, from privacy concerns and legal reasons to
> system to system resources on the mirror itself.
> This project is intended to help both the mirrors themselves and the
> upstream providers of data get a better handle on how many downloads
> of various things are actually occurring. It's intended to be an all
> encompassing solution, meaning that the project will work equally well
> for something like kernel.org, to Fedora, to Ubuntu, to Apache and to
> Mozilla should they choose to use it. This project will involve both a
> frontend log parser capable of determining what downloads have
> occurred, the type of download and how much data was transferred, as
> well as unique downloaders for that server. There will also be a
> backend portion of this, that will initially be hosted on kernel.org.
> This backend will be the collection point for the statistics that will
> be provided by frontend processes running on the mirrors. It will
> involve logging statistics, parsing out duplicates from a single
> mirror, deal with mirror authenticity and aggregating the statistics.
> It will also provide a website for individuals to be able to quickly
> browse and discover common downloads from a particular distribution,
> or open source project.
> Things of Note about this project:
> There is both a client and a server aspect of this project, both
> pieces need to be created and interoperable along with a client/server
> api.
> Clients:
> Resource constrained environment
> Needs to be lightweight and as efficient as possible
> Potential to be processing 10s or 100s of Gigabytes of data on a
> single run fora single machine
> Will be collecting data from a variety of different log types from
> http, ftp, rsync, git, etc.
> Server:
> Mostly a web-app, for reporting and data collection
> Needs to be relatively efficient, but not to the same extent as the client
> Has to be capable of running independent of the kernel.org infrastructure
> General todos:
> Prototype client
> Prototype server
> Prototype API
> 
> -- 
> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
>   )) Easier, More Reliable, Self Healing Downloads
> 
> _______________________________________________
> mirrorbrain mailing list
> Archive: http://mirrorbrain.org/archive/mirrorbrain/
> 
> Note: To remove yourself from this mailing list, send a mail with the content
>  	unsubscribe
> to the address mirrorbrain-request_at_mirrorbrain.org

_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org