Concurrently Chaotic

Random notes on technology by Kenji Rikitake

[ blog home | about me | my homepage | recent entries | categories | archives | atom ]


Persistent Erlang processes or process pairs

Erlang's processes are the minimal unit of execution of Erlang BEAM virtual machine. Each process has its own ID, and can send/receive messages, register the name to the BEAM it's running, and can link with another process for error handling and monitoring. It even has its own process dictionary.

I've been thinking about a question for a few days: can you make a computer holding a process for more than 100 years? The word process here does not necessarily have to be the Erlang one, but the Erlang process will be a good candidate because the working environment it has to carry around with it is minimal and much smaller than that of UNIX process.

Keeping a process alive for a long time would not be possible if a process is confined to a single machine; the machine's failure means the immediate death of the process. So the process should be able to move around between multiple machines and make its clone on its own. Using shared memory should be avoided as possible. Realizing these characteristics with Erlang is less difficult than in other computer language systems.

I discovered the idea of having a persistent computer process is actually not my original. Google search engine tell me that Jim Gray has already published a technical report when he was in Tandem in 1985 (PDF copy from HP Labs) (a scanned text file of the report) with an idea of persistent process-pairs, as a part of his model of transactions for simple fault-tolerant execution.

In Gray's report, he describes a much smarter approach of making two persistent processes a pair to realize the persistency. If one of the pair fails, another one will show up and take over the actions of the failed one. This idea is much wiser than trying to keep a single process alive.

So now I find a way to realize persistent processes; next I need to learn how to implement them. It'll be a part of my new year's resolution for year 2010. A happy new year to you all the readers.

posted at: 30 Dec 2009 | path: /erlang | permanent link


DNS operation is utterly neglected by many people

Twitter outage via DNS hijacking showed another case of common symptom: DNS operation is simply neglected by people doing business on the Internet.

I was doing research on DNS transport security from 2002 to 2008. One of the reason I quit focusing on the research was that most, if not all, of the DNS problems are caused by operation failures, not necessarily due to technical deficiency of the DNS protocols and systems. In short, it's too political and social to do the technological experiments over DNS.

I still think DNS transport protocol issues are critical for stable Internet operation. But solving those issues does not help recovering human errors, such as lame delegation (missing link) between the domain name hierarchy. And stable operation of DNS systems is very difficult to maintain without stable hardware, software, networks, and operators.

I notice many small companies (especially in Japan) keep their authoritative servers inside their office, which is not good from the stability point of view. Actually, for many small Internet sites, including mine, not so many DNS zone records have to be exposed to the public. So I've already outsourced the DNS authoritative servers, while I periodically watch whether those servers do the right thing.

DNS is by definition a distributed system; and the management standard is much lower than what people (and even Internet engineers) believe. For the further details of how DNS is not well-managed, I suggest you to read a more detailed commentary on how important DNS is as an asset, by Danny McPherson of Arbor Networks.

posted at: 21 Dec 2009 | path: /security | permanent link


Bruce Schneier's speech at IWSEC2009

I had a chance to meet Bruce Schneier face-to-face for the first time, when I attended his invited talk session at IWSEC2009 conference in Toyama, Japan, on October 28, 2009.

I once worked for translating Schneier's book Email Security (published in 1995, which is now declared outdated by him) into Japanese. At that time he was a technologist on cryptography. The keynote speech in Toyama showed, however, that he was rather interested in psychology and human behavior, which is not necessarily logically predictable and often considered errorneous from technological points of view.

While I read a few people who apparently tweeted Schneier's speech was boring, I found his speech on the psycology of security rather refreshing and interesting. Maybe that's because I've been frequently disillusioned by how technological solutions often backfire. Of course it's not about the details in cryptography or other security protocols which are the primary topics of IWSEC so that might have been boring for the majority of the participants.

I won't go into the details of Schneier's speech, because most of the individual topics are frequently covered in his blog. Let me write about one of the things intrigued me the most; it was about the risk heuristics. People are risk-aversed or trying to have sure gain. And at the same time, they prefer probabilistic loss or risk-taking behavior when they have possibilities of losing something. With this heurisric way of thinking, people usually don't want to pay for having less risky life, and this is exactly one of the reasons why security products don't make good sales.

After the speech, I asked him why he converted from pure technologist to rather a scientist of broader topics including psycology and sociology. Unfortunately I didn't get a definitive answer on what made him so; he only emphasized the sociological aspects of security were equally important and critical as the technological ones. Maybe I could find the answer in one of his books; especially if the reason is a highly personal one, which no one will ever know.

posted at: 06 Dec 2009 | path: /security | permanent link


The posted date field of each blog article got broken but restored

I decided to use Subversion to manage the blog contents, which was a wrong decision. Subversion does not keep timestamps of repository files at least as in the default mode operation. This is not good for PyBloxsom. So I restored the old files to put back the timestamps.

posted at: 05 Dec 2009 | path: /admin | permanent link


Erlang and Github

Erlang/OTP is now officially maintained under the Github repository, since the release R13B03. I think this is a milestone for the language, because the Ericsson development team finally decided to show the interim results of what they are doing for the time being.

One of the characteristics I like about Erlang is that the language specification and libraries have been maintained by a single entity called Ericsson's Erlang/OTP Development Team. I do not want an anarchy for computer language and operating systems. I prefer BSDism than Linuxism in this sense; I think pieces of code should be rather controlled by the core people while sufficiently accepting improvements from the other developers.

The old Erlang/OTP daily snap archives, however, are no longer sufficient to catch up with the daily development cycles. And many non-Ericsson authors have put in their patches into Erlang, including mine. So there had to be some systems to accept user feedbacks.

Using an open repository system such as Github is a wise idea for incorporating new code into Erlang/OTP, and showing the official status of modifications. Git is flexible enough to allow per-user and per-purpose branches. And Github allows forking between the users. The Ericsson's Team doesn't have to build and publicize its own code repository system for Erlang/OTP, which will cost them significant amount of human and financial resources.

And now I have an official requirement to learn Git; to catch up with the Erlang/OTP development cycles.

posted at: 05 Dec 2009 | path: /erlang | permanent link


The definition of eventually secure systems

I've been using Web services under a new assumption of integrity these days, which allows the data inconsistency during a span of a few minutes. The designers of those systems allow such a relaxed condition to data consistency, for putting higher priority to availability and tolerance to split database subsystems within a cluster representing an integrated database.

Then a question comes into my mind: what does it mean for a database to be secured, while allowing unstable condition in a range of few minutes? Of course guaranteeing unconditional access restriction is a solution to claim a database secure, provided each party who is allowed to get access to the database does not harm the integrity at all. This sort of strict access limitation, however, is impractical for a public system. So, a new notion of security, probably called eventually secure systems, should be introduced. But how? I still have no idea about this.

Traditionally, databases are designed under the restriction of Atomicity, Consistency, Isolation and Durability (ACID) for every query and update operation. The ACID policy demands locking of critical sections between conflicting database requests and causes performance degradation.

On the other hand, Gilbert and Lynch [1] claim in their CAP Theorem for a distributed database, that the three properties of a database will not be realized at the same timing: data consistency, availability, and tolerance to network partition. BASE [2], which stands for basically available, soft state, eventually consistent, is an example of anti-ACID design policies based on the CAP Theorem, giving higher priority to availability and tolerance to network partition than the data consistency.

Vogels [3] also explains the idea of eventual consistency, or an eventually consistent change of states, as an analogy to Domain Name System (DNS), which allows the clients to query the distributed database to see the inconsistency during the propagation of database update events, while the inconsistency will be resolved in a finite period determined by the configuration of the replication network between the database caches.

While CAP Theorem, BASE, and the notion of eventually consistent systems are effective to relax the boundary condition of data inconsistency for making a very large-scale systems, those ideas will not solve the core issue: how to keep the consistency of a cluster of a database in a finite predictable time range. I understand many applications do not require atomic consistency of data, especially those for casual conversation, such as Twitter or Facebook. I don't think, however, that a bank system can be created under the BASE principle, unless the maximum allowance of temporal data inconsistency or the maximum time of eventual convergence are given and proven.

And I think on running large-scale systems, things are often getting eventually inconsistent and disintegrated, rather than eventually consistent. I still wonder how we can solve this problem consistently.

References:

posted at: 02 Sep 2009 | path: /security | permanent link


Current and outdated references of secure C programming

C is the modern assembly language for many architectures, and still the most useful computer language for me. C does not have a rigid grammar and has a lot of variants and local dialects, and have revised a few times including the old UNIX C, ANSI C 1989 which first introduced prototypes, and C99.

Finding out the de-facto standard elements of C is a complicated work. You can find a bunch of different indentation and writing styles on C code. I do not recommend a specific coding style in this article; I can only recommend you need to follow the mainstream style when working in a project.

Sometimes you have to read the books for discovering what is the most right thing to do. I recommend following books for C programming now:

For practical programming, however, depending on books is not enough. Actually those books I recommended above are 5 to 7 years old as of 2009, so if you want to know the cutting-edge details of programming, you should read the latest software. Consulting a C compiler manual and well-written source code such as that of BSD kernels is a must if you want to write an efficient code (those are freely available).

One thing to which you've got to pay special attention is that books are eventually but surely getting outdated. Books are not the Web articles; they are static and will not change. The lifespan of a reference book for computer science is typically very short these days, due to the rapid change of technologies. Books about C is not an exception either.

And I should confess that a few days ago I decided to sell the following old worn-out books because I found out them simply outdated (and I no longer recommend the following two books any more):

The reason that I found them outdated were as follows:

Frankly speaking, I loved those old books, especially which I referred to the most during my apprentice time of learning the language in the late 1980s. Those books were the only source before the Web. I had to repeatedly read the old bestsellers many times to discover the details. I do respect the authors of those books. They are pioneers of UNIX and C programming.

Nothing is eternal, however; and I suggest use to stop using outdated reference books ASAP for every subject, not only for programming.

posted at: 12 Aug 2009 | path: /security | permanent link


Social web points of failure

I noticed Twitter was dead during 13Z-15Z 6-AUG-2009. (Z = UTC hours, BTW) Facebook was also affected. Other major world-wide social web sites including Livejournal and Blogger/blogspot.com were victims of a denial-of-service (DoS) attack altogether.

Elasticvapor.com has an article which says the attack was to one Georgian account from Russia, and that the attack was multi-staged through BGP and DNS vectors. I also feel the attack is not just a simple HTTP DoS, though unconfirmed.

DoS attacks are so popular on the Internet that I've got nothing to talk about. You can even hire a botnet to make a specific attack. Nevertheless, DoS attacks to popular sites do impact on our lives. Watching how people including myself responding to the simultaneous attacks to social web sites disclosed that we were living in a vulnerable society depending on just a couple of domains and IP networks.

I always ask a question to myself when I see a global service disruption event: isn't Internet a distributed system with adequate redundancy? Unfortunately, the answer is, no. Taking down a few systems of the social web will effectively paralyze the whole networking of people. Those systems have become a set of points of failure.

Regaining redundancy on Internet is not easy. Making a redundant system with multiple layers of technologies is a very hard task. The designer needs to put redundancy on every layer of the system; host machines, identifiers (such as IP addresses and domain names), physical networks, logical/overlay networks, data (or objects), etc. Just a simple replication will add a big price to an existing system. Not many sites can afford this.

Nonetheless, we've got to do the replication ASAP anyway; the Twitter attack incident tells us that replication, either manual or automatic, of data or identifiers, is still critical to provide alternative routes once the popular systems are took down.

And I now feel much safer to know that I've got my own Web site space other than external blogs out of my control. Diversifying the data and identifiers by replication is the easiest, if not only, way to deal with failures and attacks, which are inevitable in the hostile world of Internet. Maybe I need to make some copies of my articles put on the external sites into this blog too.

posted at: 08 Aug 2009 | path: /security | permanent link


Plagiarism or unauthorized quotations

I was checking the reports submitted to me a few days ago. One of the reports looked pretty much professional and eloquent. So I decided to pick a sentence and put the sentence as is to Google. Then I discovered most of the contents were copied, or plagiarized, from the same Web page of a professional article. No wonder it looked professional.

I'm always telling other people that you need to describe the source of the quotations when you put them into your contents. I am pro-share, pro-remix, and pro-reuse person, and I support Creative Commons. I've never been against quotations, provided that proper and legal indication of the sources are given.

But I didn't see the source in the report. So I had to give very low evaluation to it.

Lessons:

posted at: 30 Jul 2009 | path: /writing | permanent link


Prologue (3) - added plugins

recent posts, categories, and monthly archives are statically generated by installing plugins of pyarchives.py, pycategories.py, and pyrecentposts.py.

Also titles are given to indivudual articles by another plugin called pytitle.py.

(So many things to do for doing a fancy stuff...)

posted at: 28 Jul 2009 | path: /admin | permanent link


Recent entries

Categories

Archives

Copyright 2009 by Kenji Rikitake. All Rights Reserved.

The contents are licensed under Creative Commons License Attribution 3.0 Unported (CC-BY-3.0).

Blog made with PyBlosxom.