Interesting to see BSP rising up after 20 years. See http://www.quora.com/BSP and tinyurl.com/mikes-thesis.
In “Who are You?” (http://mfktech.wordpress.com/2011/05/13/who-are-you/) I argued what we need a means to authenticate that your online identity matches a real-life identity.
Here is a great example of the problem: http://www.cnn.com/2011/11/15/tech/social-media/salman-rushdie-facebook-name/index.html?hpt=hp_bn6. Salman Rushdie found it difficult to prove his identity sufficiently to Facebook to be considered “Salman Rushdie” on Facebook. Interestingly, not only did they want him to prove his identity (ultimately with a copy of his passport), but also to require him to use the name on the passport (Ahmed is technically Mr. Rushdie’s first name but not the one he goes by).
There are a number of interesting points to this:
- Facebook is becoming a de facto primary authentication service. Is there anything wrong with that? The fact that it requires membership in Facebook is probably the biggest objection I have. The current facebook authentication mechanisms are certainly weak. Would you want to authenticate with Facebook to access your bank account?
- Authentication credentials and user name need not be the same. Many people go by middle names, pseudonyms, nom-de-plumes. Your authentication credentials may be something very different than a user name (though that may be part of it). SecurID is an example of this: your PIN/revolving code need not have anything to do with your username.
Here is a powerpoint I presented at MIT. It describes examples where distributed computing and network coding can work together. It includes Battlefield Logistics, Video Caching, and Nanobots.
When are companies going to learn that my privacy should be the default and I need to Opt-out of it. OnStar takes the opposite approach and should revise its policy.
Well, Hurricane Irene has come and gone: Rain, wind, a disrupted vacation, power out for a day. All-in-all, not too bad an experience for our family. My heart goes out who were more dramatically effected by the storm.
What I want to talk about here is why Hurricane Irene demonstrates that the “Cloud” will become ubiquitous (and we’re not talking cumulus here).
It just so happens that at the height of Irene, a customer of mine was having some issues with deploying a software solution for distributing video content. With no power, no landline, no cable connection I was able to do the following:
- Fire up the old generator and get enough circuits active in my house so that I could run a laptop.
- Tether my laptop to my iPhone so I could have Internet access.
- Launch a couple of instances on Amazon EC2
- Download the latest release from Subversion hosted on Codesion.
- Build the software
- Deploy the test system through our automated Amazon test system
- Send E-mail that the problem is solved
The only local resources are my phone and a laptop, electricity and cell connectivity. The laptop is only for connecting to my real computing environment in the cloud. See http://mfktech.wordpress.com/2011/03/11/calling-all-human-interface-engineers-or-why-honeycomb-might-be-a-bad-idea/ for a discussion of what the ultimate client might be: It is not the laptop and it may not be a tablet either!
The upshot of this is that with fairly minimal resources (a generator, a cell phone), I could do all the work that I could with my “fully connected” office, except maybe play video. With LTE, that limitation will give way as well.
The “extreme mobile” worker is not new. What is newer is that:
a) Connectivity is becoming more ubiquitous
b) The availability of “cloud” resources is growing
c) Tools are now web-enabled (and HTTP is becoming the lingua franca for transport – see http://mfktech.wordpress.com/2010/10/26/http-as-a-storage-transport/)
In short, it is easier to be a Hurricane Worker.
What still needs to be done?
- Connecting to the cloud is still fairly clumsy. I authenticate a different way to each service: name/password for E-mail, public key authentication for my Amazon instances, a different name/password for my Codesion account. What we really need is the ability to launch windows just like I do on my laptop (or tablet for you iPad types), and have all the cloud brokering happen in the background.
- Lower costs. In two weeks I burned through 1.75GB of data across my tether. Over 2 GB and I pay surcharges. While data costs will undoubtedly drop as the LTE infrastructure is capitalized, the proliferation of WiFi also goes a long way to bringing down costs. But again, it is a hodge podge of infrastructures with different use policies and logon mechanisms.
- LTE may make the performance issue moot; it will be about as fast as “normal” cable, though not as fast as 100 Mbps Coax (at least not for a while) or 1 Gbps Optical (e.g., FIOS). It will be up to the telco’s to see how much they gouge the customer (At 10x the speed do I reach my GB/month limit 10x faster?).
- Better battery life! I long for the day when you can work for 24 hours on a single charge (or maybe a single refill of a micro-fuel cell).
- Integration between the local state of my computing device and the cloud would be nice. Ultimately all the state may be in the cloud. Today I still need to edit “stuff” locally and upload (Google docs not withstanding). We want to get to the point where there is virtually no state locally.
This last point is particularly challenging when you consider how easy it is to put state on a device (e.g., a 32 GB iPhone can hold a lot of state). What we really want is to make that state a bidirectional cache of the cloud state. That is a blog entry for another day.
I went from the Town of Harvard Oxbow boat landing on the Nashua river and went just north of the Rt. 2 overpass in Ayer. It is a short trip (under 2 hours one way including a side trip here and there. On one side of this stretch of river lies the former Fort Devens (and an area still owned by the government if the No Trespassing signs are to be believed). I think the old Tank Road parallels this. The other bank is the Oxbow Nature Refuge.
The Nashua is a slow moving river in this stretch (it isn’t much more to paddle upstream than it is to paddle down stream). I had the river to myself.
What made this such a fantastic paddle was the range (and quantity) of wildlife and plantlife I saw. The banks are rich with arrow-root, cattails, small purple flowers, red berries, ferns, and moss climbing over fallen trees that litter the waterway. The water clarity is not great but in one spot I could see a school of 9″ small-mouth bass swimming slowly upstream. Where the current seems to be non-existent on the surface, it is more evident near the bottom where you see fish and grasses bending to it.
Dragon and Damsel flies were everywhere. To the uninitiated: Damsel flies are dragonflies that hold their wings together vertically when they alight; dragonflies keep their wings outspread. The damselflies are often iridescent. I saw stunning green and blue ones. One damselfly rested on my hand. Every now and then you would see a dragonfly dipping to catch an insect on the surface of the water. Here and there you would see huge masses of tiny water-striding insects that the kayak just glided over, hardly disturbing them.
Upstream from the bass a duck noisly flapped across the water just in front of me, waddling out on the opposite shore. Looking from where she came I could see a brood of ducklings; the duck was clearly trying to draw my attention from her family.
From there I saw three large great blue herons sunning themselves on a fallen tree. These would be three of half a dozen that I would see on this trip.
Just before reaching Route 2, I slipped into a very shallow pond that connects to the river (this is part of the large wetland you see on your left as you travel west on Rt. 2 past the Shirley exit). Another pair of enormous Great Blue Herons flapped away. The shore was peppered with I think are killdeer (a type of shore bird…though more typically associated with beaches). A number of painted turtles plopped into the water as I passed by. Schools of minnows swam by.
Returning to the river, I passed a gaggle of Canada geese. Just past the Rt. 2 bridges I saw a lone mute Swan. I wondered what had happened to its mate.
Also in this area were a pair of Kingfishers with their distinctive white collar, crest, bill and swooping flight.
Finally, on my way back I rounded out the animal kingdom with the only mammal I saw on the trip: A beaver crossing the river.
Despite a 90+ degree day, paddling on the river was quite pleasant. Much of the time I was in the shade and the water tended to keep it a little cooler.
Truly an amazing venture practically in my backyard!
Recently I participated in a brain storming session on distributed communication and computation. One of the challenges we had was to figure out what were the use cases for this type of technology.
Imagine a (large) group of autonomous agents geographically dispersed within some region (e.g., on the ground). These agents are mobile, and may communicate with one another. Above the agents (e.g., in the air) are communication “collectors.” All of the agents may communicate with the collectors that are within some proximity. The collectors are also mobile so they come in and out of communication range of the agents on the ground. These collectors can consolidate information and pass it between themselves or perhaps to other receivers.
Communication may be bi-directional. Agents may receive information (instructions) from collectors which may receive it from their receivers.
None of this communication is particularly reliable, is certainly noisy, and may be subject to deliberate interference, and should not be susceptible to eavesdropping.
The information collected may be used, among other things, assess provisioning of resources, identify problems in the distribution of resources, etc. While the information transferred from the agents need not necessarily be complete, instructions to a given agent do need to be accurate.
An obvious example of this scenario is battlefield logistics: soldiers, drones, command center. Other examples might include nanobots in the human bloodstream being sensed by external monitors (that are not necessarily fixed). One can imagine sensors flowing through a network of pipelines that are being detected by planes flying over them. Many logistics problems fall into this category (imagine shipping containers being transported from one side of the world to another). Certainly these examples stress different aspects of the scenario: nanobots in a clinical setting are probably less worried about malicious interference and eavesdropping, where this may be an issue for the shipping container owner who wants to ensure that competitors are not getting competitive advantage by intercepting communication.
Here are some illustrations of this scenario:
What kind of computation might we need to perform with this scenario? Suppose we use our nanobot example. A nanobot can tell you the shape of the space near it, where it is to very fine resolution, and the time to millisecond resolution or better (it may use external signals to determine this information but as we know, these external signals may be poorly received). The nanobot can also signal information (not necessarily electromagnetically). Finally, it is possible to guide nanobot “clumps” to concentrate them in certain areas.
Finally, it may be when there is a concurrence that the nanobots have found a tumor cell, they release a toxin inside the cell to kill it.
We may be dealing with many millions of nanobots distributed throughout the body of the person injected with them. From the information received from the nanobots we can do imaging of the area they are in, extent of tumor spread, constituency of the tumor, etc.
The next scenario is from http://mfktech.wordpress.com/2010/10/12/better-mobile-video-delivery/. Content is being delivered to a consumer that is in motion passing by cell towers. The consumer is not in proximity of a single tower during the transmission; many towers may need to have the content to transmit. Access to the content is not necessarily linear. For example, imagine someone viewing a movie and rewinding, jumping to scenes, etc.
The simplest approach is to have a central repository of the content and to feed all requests to that repository. This can involve a lot of “back-haul”, network traffic traversing from the handset all the way potentially to web sites on the internet (think NetFlix or Hulu).
The next step to improve this is to place caches closer to the content. Now as a consumer passes from one tower (cache) to another (cache), it will pick up the content from the new tower. This works well if there is more than one user accessing the same piece of content.
The problem with caching the content at the towers is that each cache needs the entire content. With a coding approach, this need not be necessarily true. Instead, we code against a group of towers that will simultaneously transmit “chunks” of the cached content. As towers fade from reception, the fragments can be recovered at forward towers. The net capacity is reduced, and the bandwidth to put the “chunks” in place is reduced: A win-win combination.
With the use of coding, it is also possible to provide some content protection as well. Instead of
encoding the content, the coefficients for the coding can be encrypted. It may be possible to reduce the amount of copies that we need and still have strong encryption results. This requires some more thought.
We haven’t spoken much about computation in this paper. Clearly there are logistic optimization algorithms that might apply to the battlefield and container transport examples. The nanobots might require local consensus algorithms to decide where they are and if they have found a target cell. An interesting question is how to apply inexact algorithms: Algorithms designed to work with partial information (or inexact information) distributed among many agents. Another interesting area are distributed algorithms that work with local information to create global effect. For example, the way ants create trails optimizes the amount of distance traveled without global communication.
In this example, ants see some number of ants in front of them and end up “optimizing the path.
This short paper has tried to motivate distributing computing, communication, and storage with a number of use-cases. We are only scratching the surface.
How is it possible that the default setting for this is “globally accessible”? What were these guys thinking!?!
The whole attitude of everything open everywhere will have to change:
Fitbit users are unwittingly sharing details of their sex lives with the world
I’ve recently been thinking more about user authentication.
Most authentication systems work to identify that you are who you said you were when you first registered with a system. For example, I might register with Yahoo to get a My Yahoo page. I can represent myself as virtually anyone, and, once those credentials are established, authentication is really just verifying that the credentials I present (a username/password) match what was originally established. Various schemes make this authentication stronger (e.g., two factor authentication like SecureID). As a means for authentication, this is not inadequate (damning with faint praise).
But that is not a very robust mechanism if we want to share information. Suppose I want to share information with my doctor. Do I send email to email@example.com? Is it dr.pimple or DrPimple? How am I sure that Dr. Pimple even has an online presence? How are we going to authenticate people who we are relying on to be trustworthy? Remember, it was not that long ago that http://www.whitehouse.com was definitely not what you would expect when you meant to type http://www.whitehouse.gov (see http://en.wikipedia.org/wiki/Whitehouse.com).
Similarly, how will I validate that I am a resident of my town when I vote online from the comfort of my home?
Suppose my children’s school finally gets around to online emergency contact information. I don’t want everyone to access this information; I may just want the school nurse and principal to be able to access the information. How am I sure that the online entity I’m granting access is the actual person represented by the online entity?
In “real life,” we accomplish this by familiarity with the person, by consulting a reliable source of information (e.g., the phone book), a referral from your physician, etc.
What needs to happen in the Internet world if we are to begin sharing important information (e.g., medical records, financial records, personal data), is that we are going to need to trust that the online entity we are dealing with is a genuine proxy for the real-life entity. I can imagine the establishment of registries where identities are validated and available for use in E-mail, facebook, etc.
We can take this a step or two forward. The quality of the authentication can be established as well. For example, if the person (company, real-life entity) can be authenticated strongly, e.g., an agency actually speaks to the person, verifies by objective third-party registry (think physician licensing with a state agency), then the quality might be marked “high”. A self-registered entity with only a given E-mail might be marked “low”.
You can take this one further and have users “grade” the authenticity of an entity. Large scale negative grades are useful. Grading is always tricky though: there needs to be accountability for the grade, i.e., anonymous grading is not acceptable and the quality of the authenticity of the grader needs to be accounted for. Nevertheless, you can imagine a system where the community helps to “clean out” nefarious entities on the web.
Now, for a question I can’t quite answer right now: How do you offer this service? Can you generate money from it and still have it viewed as objective? How does it fit in the everyday workflow of E-mail, Social Media, etc.? Great questions for another day.
Recently I’ve been working with some students at MIT on organizing computation in a distributed network. We are implementing a simulation of nodes and communication paths that will carry out a computation. One question that came up was how to represent, under the covers, the parallel execution of these nodes in the simulation.
In fact the nodes typically are tiered…one set of nodes computes and sends results to another tier. Loops may exist where nodes send messages to nodes in upper tiers.
The basic model can be categorized as a set of nodes that carry out a single method, generating results, and sending that result to a new set of nodes (which may contain some of the original nodes) where the process continues. Here is a diagram.
This goes back to my thesis (1992, Harvard University, Parallel Sets: An Object-Oriented Methodology for Massively Parallel Programming – http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.4118&rep=rep1&type=pdf). We create nodes, throw them in a set, and Apply a method to these nodes. The method can return the nodes they have sent messages to and this creates a new parallel set. The process repeats until done (which might be when there is only a single element in the resulting parallel set).
The code is very nice:
ParallelSet<Node> *r = inputSet->Apply(&Node::compute);
This applies the method compute to every member in the inputSet in parallel (behind the scene, using multiple threads). The results of compute (in this case, the next tier of nodes), are combined to create a new parallel set. This can be used for the next wave of computation.
To keep the computation reasonable (i.e., you can figure out what it is doing), there is an implicit barrier synchronization at the end of the Apply. You do not get maximum parallelization, but you can reason about what the computation is doing (see my thesis).
A very simple prototype of this is available through sourceforge.net. See https://sourceforge.net/projects/parset/. It currently works for smallish sets (< 1000 items), and for some reason doesn’t work on MacOS. Linux and Windows/Cygwin seem to be fine.
If you are interested in this code, drop me a line (firstname.lastname@example.org). I will be fixing it up as I have time. Immediate tasks are: Make it scale to large numbers of items (easy), and allow nested parallelism (objects in parallel sets that reference other parallel sets). That’s a little trickier.
And remember my post a ways back about the need for new models of synchronization and multi-threading (http://mfktech.wordpress.com/2010/11/18/concurrent-programming/). Well here we are with one example.