4/22/2003 Notes re Morley's idea for a Distributed [spam] Early Warning System. Meta-goal: provide a distributed, authenticated mechanism for monitoring sources of spam and other network-abusive activity. Philosophical goals: the system is intended to act as a widely distributed, difficult-to-shut-down, and responsibly-operated peer-to-peer network of information about ongoing spamming, portscanning, and other abusive Internet activities. It's based on the bazaar model rather than the cathedral model. It will _explicitly_ lack any central point of authority. Basic concepts: - Anyone who wishes can contribute information (reports of spamming and other abuse incidents, and recommendations about possible actions to take) to the pool. - Information will flow from node to node in the system, using a flooding algorithm, with connections between nodes being agreed upon by the node operators. Some nodes may wish to offer or accept information only to/from specific peers, while others may offer information more widely (analogous to today's DNSBL sites). - Information will be exchanged in a flexible format, which can allow a great deal of descriptive information to be passed along if relevant, but will also allow for a compact transfer of the essentials. - The integrity of the information flow will be enhanced by the use of digital signatures or MACs on each batch of information, in order to avoid DOS'ing or flooding of the network. Batches whose validity cannot be verified will be dropped. It might be desirable to have a hard-and-fast rule in the peering system - a node may _not_ pass along a batch of data to another node outside of its organization, unless it has first ensured that the signature on that batch is valid. - Decisions about what information to use, in what ways, will be made at each node, according to criteria set by the node's operator. A node can pass along reports and recommendations from another source to its peers, without actually deciding to act on them itself. Nodes will filter batches using the originator's digital signature key-identity as the criterion. - Information batches will, in general, be time-bound - they can be issued with a limited validity time (to allow for e.g. daily updates, with an overlap between batches being possible). Individual nodes will be able to override the originator's expiration time recommendation (in either direction) if they wish. - Information from batches can be combined, filtered, weighted, etc. as desired by each node, and used to create router/IP blocks, mailer DENY tables, teergrube/honeypot controls, DNSBL zones, etc. as desired. - The batching, validation, and exchange techniques will be modeled on existing, tested protocols (e.g. NNTP IHAVE/SENDME, B- and C-news batches, etc.) and will be implementable (and, initially, prototyped and implemented) using off-the-shelf software tools (e.g. PERL, GnuPG/OpenPGP/keyservers). There will probably be interchange mechanisms which use popular, existing network data protocols... e.g. HTTP queries, FTP, or even email. Ideally, the processing, and transport, mechanisms used by this system will be largely independant and modular. - The design will attempt to be OS-neutral. Implementations may (should!) favor open-source techniques. A GPL'ed reference design is highly desireable. Basic elements of information: I envision there being two basic globs of information which would be carried or referenced by the system: incident reports, and opinions. An incident report is a self-contained report of spamming (one or more messages), portscanning, detection of an open proxy, detection of a system compromised by a trojan or virus, DOS attempt, etc. At minimum, each incident report would carry a unique ID, a timestamp, the IP address of the system which caused the problem, and the problem category. Auxiliary information might include a URL where the captured spam could be viewed, details on the portscanning, identity of the trojan/virus/proxy detected, a severity indicator (number of spams, intensity of portscan), etc. Incident reports against a given IP address might be issued by a site if the IP address in question has not previously been seen to have problems and is not currently known to the DEWS data available at that site, but not if the site is a known spamsite with numerous existing reports. Incident reports are batched together and identified by the signer's key and the batch ID, so they can be traced if necessary. Opinions are similar to today's DNSBL entries... they're someone's personal opinions about the state or trustworthiness of a particular IP address or subnet. Each opinion would give an IP address or subnet, a protocol identifier (e.g. SMTP), a problem-intensity assessment (anywhere from 'squeaky clean' to 'dirty as hell'), and timestamp/expiration information. Opinions can cite an incident report as justification (by report ID or by a URL). Opinions would also be batched and identified-by-key-and-batch. Opinions received by a node, from other nodes, would be acted upon (that is, they'd have local effects at that node) based on a "policy" document or recipe or script implemented at that node. This policy document might, for example, say things having the following sort of semantics: - Here are signers whose opinions I trust very strongly. If any one of these signers is of the opinion that an IP address is involved in spamming, add that IP address to the local mail filter. - Here are signers whose opinions I trust somewhat. If any two of them publish separate opinions that an IP address is spamming, add this address to the local filter. - If any three of the following signers report incidents involving a specific Class C network, reconfigure our MTA to automatically teergrube inbound connections from that network and to automatically perform a low-impact "check for trojan or open proxy" scan on any IP address on that network which connects to our server. - Here are signers whose opinions, and incident reports, are of unknown reliability to me. Add the IP addresses these signers report, to an "inspect carefully" mail-transport-agent list. Tools for writing such policy documents, and converting them into a form which can be used efficiently during mail arrival or during report/opinion database updating, would certainly need to be designed and written. I'm not sure how much of the local-filter-change stuff would be done during the initial arrival of batches (e.g. adding of new spam sources to a short-term blocking filter) and how much might be done during periodic "clean up the expired or revoked entries, perform weighting/voting analysis, and set up the longer-term filters" processing in the middle of the night. No doubt this would vary a lot from one node to the next.