by Jack Krupansky - Base Technology
March 4, 2007 Current Version, Prior version: May 3, 2006
This document, a proposal for Distributed Virtual Personal Data Storage (DVPDS), supersedes my previous proposal for a Distributed Virtual Personal Computer (DVPC). DVPDS includes all of the concepts of my previous DVPC proposal, but simply changes the name to emphasize the focus on the data storage aspects of a personal computer (PC) as distinct from the computing or processing capabilities of a PC. In particular, it abstracts the user's personal data to give it a virtual form distinct from the actual storage used to store that virtual data.
The intention remains that all of a user's data would live in a distributed, virtual form on the Internet, and that the user's device (PC or phone or other computing device) merely caches the distributed, virtual data. The intention is that the user gets all of the performance and other benefits of local mass storage, with none of the downside, such as need for backup, anxiety caused by lost or mangled data, inconvenience of access from other machines, difficulty of managing archives, etc.
The intention is not that the user would "work on the Web", but to continue to emphasize higher productivity through rich client devices with instantaneous data access and full control of that data. In practice, users will frequently or usually work directly on the Web, but occasionally or sometimes frequently or for extended stretches of time they may work disconnected from the Internet, all seamlessly and with no loss of the positive aspects of the user experience.
With regard to the requirements for being distributed, the emphasis is on maximum diversity so that users can be guaranteed that their data will be both readily accessible and protected from loss due to even the most extreme of contingencies. Degrees of diversity include vendor, geography, communications backbone, and offline, so that neither human error, fire, flood, earthquake, explosion, vendor financial difficulties, sabotage, theft, or legal disagreements, can cause any of a user's data to become inaccessible for more than a shortest period of time. A particular emphasis is placed on avoiding vendor-specific solutions. Vendor "lock-in" is unacceptable.
One area that needs attention since my original proposal is the more-demanding storage requirements for media such as music, video, podcasts, and movies, as well as intellectual property issues such as DRM.
This proposal is in the public domain. It may be copied and modified -- provided that Jack Krupansky and Base Technology are credited and a link back to this original proposal is provided AND these same use and distribution terms are carried along.
Please note that DVPDS is only a concept right now, with no implementation or business plan to turn the concept into a product and service.
The rest of this document is unchanged since its creation to describe the DVPC concept, but should be read as referring to the DVPDS concept.
| Note: This concept paper is now frozen since I have accepted a full-time employment offer with Microsoft (unrelated to anything discussed in this paper) and will not be doing any more work on this concept as long as I am employed. |
The basic concept of a Distributed Virtual
Personal Computer, or DVPC, is that you have your full-blown PC with hard-drive,
but the hard-drive is really just a cache, with all your data and settings being
redundantly stored (and mirrored and cached) across any number of servers on the net,
with diversity of both vendors and geography, all transparently.
The PC operating system would need to be "upgraded" so that file changes are written through the
local hard-drive cache to the distributed virtual drive on the net. If you
occasionally are disconnected, such as on a plane or in the mountains, the cache
changes would accumulate and then incrementally be written out to the net on
future connections. Changes could also be written to a USB hard-drive or flash
memory "drive" as well. Your physical PC could act as one or more virtual PCs
via a logon, and more than one physical PC could be used to access each of your
virtual PCs. Files on a virtual PC could be shared as defined using some access
control scheme.
Now, here's the big benefit... you're happy with your PC, you're traveling with
all your important business presentations and settings and then... you drop your
PC or it's stolen or a virus deletes your files or maybe you've done a bunch of
edits and then realize that you deleted or mangled something that you now want
to bring back without necessarily losing your later edits. Sure, maybe you remembered to
create a backup disk and maybe you even remembered to bring it with you... or
maybe not. With DVPC, you could simply go up to any PC, log on and presto,
you're accessing your virtual PC with all your data and settings, without your
PC needing to be online even operational (as would be required by "remote access
tools). Sure, it may
take some time for the data to load into the local hard-drive cache, but with a
high-speed net, that shouldn't be a real problem.
I also envision that the DVPC "hosting services" would deliver a DVD image of
your virtual disk on a weekly or monthly basis (or overnight on an emergency
basis) so that you could walk up to any PC with your backup DVD and instantly be
up and running. Any recent changes would come from the net or from your USB disk
or flash memory drive.
I also envision adding support for "smart versioning"
(or "continuous versioning") of files. Apps would do
auto-save and then the user could go back and view or restore an old version
based on backing out changes or select by time of day or date or whatever. You should be
able to close an app, restart the app and then hit undo, for example.
I was thinking that the project could start using Linux and OpenOffice so that
all the OS and app changes could be made easily with minimal bureaucratic hassle, but come up with a reference
design that could be "licensed" to Microsoft for Windows and Apple for the Mac.
Summary of key aspects:
Oh, and this is just the start. The virtual PC would eventually include the ability to "background" a PC task so it runs on a remote server and can be foregrounded as/if desired. This virtual PC network execution environment would be an excellent basis for a software agent platform. And all this would tie in with grid computing as well. And a great platform for sharing and collaboration. And an easier way to manage a personal "web" site (e.g., simply put files in a "web" folder, set some attributes for each file and an "auto-web" feature would automatically generate a user-friendly web page for convenient access.)
The DVPC focuses on your data, but can include software as well. Actually, you wouldn't want to include the actual software, but "links" to the software stored on separate software distribution servers.
One of the side effects of the whole DVPC model is that it really is "platform independent" to the extent that you store your data in compatible formats that can be read by application software for the different platforms. So, in theory you could create and edit some text documents, spreadsheets, presentations, graphics, etc. using a Mac and then logon from a Mac or Linux PC and work with your files.
The DVPC is not a "network computer" as the traditional term is used, as in "thin client". DVPC is not designed to make PCs cheaper or more of a commodity. It's not designed to eliminate the hard disk. Rather, it is intended to complement the full features of the personal computing experience, while at the same time focusing on eliminating all the hassles of the local hard-disk without giving up any of the benefits of a local hard-disk. So, users will still go out and buy a high-end computer with a big fast disk and a fancy graphics card to enrich their personal computing experience, but with DVPC they will no longer have to worry about managing their hard-disk data.
The DVPC would enable and support accessing your virtual PC data using a PDA or smartphone. This would actually enable a large part of the "vision" of replacing your bulky notebook PC with a PDA or smartphone.
One of my motivations for the DVPC is to "solve" the backup and archive problems once and for all. Right now, every poor user has to "worry" about backup and many suffer undue anguish whenever something "happens" to their files. If backup is an inherent, transparent part of the computing environment, all these problems go away. Coupled with versioning, the user can simply say "Gee, I'll go back and look at what the file looked like yesterday morning".
In addition to the versioning of files that would be supported by the virtual hard-drive, I envision that the servers would have automated backup software that would in turn mirror the backups to other servers, including geographically dispersed servers, to ABSOLUTELY guarantee data integrity.
In addition to backup, I envision that periodic archives of the virtual hard-drive can automatically be written to DVD and those DVDs shipped off to physically secure sites. My basic DVPC concept is agnostic as to where to draw the line ass far as what is considered a "basic" service and which services would incur extra fees. One convenient offshoot of this auto-archive feature is the weekly/monthly/whenever distribution of an archive DVD to the user so that they can conveniently move to any PC anywhere and instantly continue their work without any horrendous latency.
I believe that it is important for the virtual hard-disk to be distributed (as in mirroring) over multiple servers so that the user is not "stuck" if one server goes down or is bogged down by traffic. Real-world events like earthquakes, back-hoe "accidents", terrorism, and simple human error do happen. Rather than simple mapping the virtual hard-drive to a specific domain name, there needs to be a multi-domain lookup scheme so that the network access software will automatically select an alternate domain name if access is impeded. The delta upload process also needs to have a multi-path scheme plus a distribution mechanism so that changes can be distributed to all the mirrored copies on an asynchronous basis. I would surmise that four mirrored images of a user virtual hard-disk would be the "norm". Disks are cheap, so the replication should not be a significant cost issue. Some users might want more replication to gain either greater reliability, to guarantee better access when net traffic is high, or to exploit parallelism.
I envision that something like a USB flash memory "key" or a biometric device
of some sort will be desired to assure rock-solid control of access. Or
maybe something as simple and old-fashioned as a small "one-time pad" of
passwords that you carry around either physically or on your PDA (e.g., system
displays a code word and you scroll down to that code word and enter the
corresponding password). My concept is agnostic as to the access control
method, but there are numerous ones that would be acceptable and more can be
invented in the future.
My intent is that the initial focus would be on making the consumer PC easy and
foolproof to use and manage. Then extend the concept to business and enterprise
computing.
There's a lot more to it than that, but that's the basic idea.
What do you
think?
A full implementation of the DVPC concept requires the introduction of a Global Virtual Storage Network infrastructure. A mere storage network is a place where bits can be stored and accessed, hopefully with some amount of redundant storage such as RAID and mirroring. A virtual storage network is a little more like a P2P file sharing overlay network where the focus is on what is being accessed as opposed to where it is stored. A global virtual storage network takes the concept to the global level, assuring global geographic diversity and enough redundant storage to assure access with even significant network outages.
And of course a global virtual storage network would support significant vendor diversity. A user might contract storage services through a specific vendor (like their PC maker or ISP), but the actual bit storage would be independent of such a front-end vendor. We need an economic model that will encourage and reward reputable vendor for making the significant investments to implement, operate, maintain, and enhance the components of such a global network.
There would need to be very robust auditing services which would monitor network services in a vendor-independent manner to assure the high quality level needed to absolutely guarantee consumers that their precious data will never be lost due to "a foul-up", "an honest mistake", "a deeply-regretted mistake", or things like acts of God, stupidities of governments and bureaucracies, or acts of sabotage and terror.
A key concept here is that to a consumer their data would be virtual bits. An individual storage vendor would manage physical bits or logical bits. At the higher levels of the network that the physical or logical bits would simply be partial implementations of the consumer's logical bits. There may also be at least two levels of virtual bits: the "true" bits versus encrypted variants of the true bits.
There are many products that provide elements of DVPC such as backup, archive, file synchronization, file sharing, remote PC access, revision management, etc., but none provide the full scope of DVPC.
The DVPC is not about remote access to another physical PC such as is provided by pcAnywhere, GoToMyPC, LogMeIn, or Microsoft Remote Desktop Connection. The distinction between DVPC and remote desktop access is that all of your data is stored on the network as a virtual hard-drive and the local PC accesses that data through DVPC without the need to access another PC elsewhere on the net. Remote desktop access presumes two PCs and requires that the remote PC be online. Remote desktop access is running the applications remotely, so there is minimal opportunity to take advantage of any rich hardware (e.g., real-time zooming of large graphic images, playing high-definition movies, etc.). With DVPC, there is no need for any other physical PC than the one you are currently running on. With DVPC you no longer have the problems of forgetting to copy file changes if you frequently work with multiple computers (e.g., work, home desktop, notebook, hotel business center, cyber cafe, client work site, etc.) Also, help desk personnel can be give controlled access to your virtual hard-drive (without the need for an online physical PC) to trouble-shoot files and settings which will then automatically download to your PC when you logon.
DVPC is not a tool for managing enterprise data centers, such as VMware. DVPC would dovetail nicely with such "virtual infrastructures", but not be competition.
DVPC should not be confused with the traditional "virtual PC" such as in the Microsoft Virtual PC or the VMware Workstation which are designed to let you run multiple operating systems simultaneously on a single physical PC. They are using the term virtual to mean multiple "virtual" PCs simultaneously on a single physical PC, and they are sharing the hardware between them. The DVPC refers to having a "virtual" hard-drive out on the internet and then "connecting" to it from any PC and working as if the virtual hard-drive was your local drive. More properly, DVPC lets you walk up to any DVPC-enabled physical PC and work with your data as if it were already on the local hard-drive.
Desktop Virtualization is a topic that is getting more attention in IT circles, but doesn't address the needs of the lowly consumer and is really about how the desktop PC resources are used and doesn't address the remote storage and smart versioning concepts of DVPC.
DVPC by itself is not an interactive collaborative computing environment such as Alan Kay's Croquet. DVPC focuses on managing a user's persistent data -- the files and settings stored on the hard-drive. DVPC could also be thought of as a virtual file system. The DVPC concept is orthogonal to the user interface and collaboration concepts, meaning that DVPC is relevant to any user interface and any collaborative computing environment. So, you should be able to set up your desktop on one PC, logout, move to a completely separate PC, log back in and be using all the same settings and work with all the same data. Another way to think of DVPC is to separate the operating system and applications from the underlying data (and settings). In theory, you should be able to access the same data using different operating systems, different user interfaces, and different collaborative computing environments, assuming that conventions were established so that the different operating systems and applications don't use the same files for different purposes.
DVPC is not simply a way to backup and share files in a local/home network. Products such as the Mirra Personal Server (from Seagate) are nice for backing up, sharing, and remotely accessing files on your consumer/SOHO network, but don't at all address the problems of power failure, human error, fire, earthquake, flood, theft, and vandalism that can wreak havoc on priceless data and don't address the issues of data on your notebook computer while away from "home".
There are a lot of interesting products and services available for larger organizations, but many of them simply are not oriented towards clueless consumers who just want their PCs to work right out of the box with no fuss.
Allmydata is a hosted storage platform based on "grid storage technology". Has a novel plan for free storage if you agree to store data for other users on your hard drive, but has traditional hosted storage for a monthly fee. No vendor diversity. Unclear geographic diversity. No smart versioning.
Box.net is a hosted storage platform that offers "free online storage" (1GB) or $4.95/mo for 10GB. No vendor diversity. Unclear geographic diversity. No smart versioning.
Connected Data Protector (from Iron Mountain) provides automated backup for PCs, focused on the needs of large organizations.
AllenPort sounds quite similar to DVPC, but the description is too sketchy to be sure. One difference appears to be than AllenPort stores your data in one centralized location rather than distributed for added protection, although they do claim to support mirroring -- DVPC is inherently designed to be decentralized, with mirroring on top of that. They also don't support the DVPC smart versioning. It's not clear to what extent they support disconnected operation and DVD-based operation.
BeInSync synchronizes files, such as between your home and office computers, but doesn't eliminate the need for backups, archives, smart versioning, geographic and vendor diversity, etc. It focuses on remote access as well.
Carbonite Backup for Everyone - backup over the Internet. Is consumer-oriented, but no support for vendor diversity or smart versioning. Still somewhat selective for backup and not an "out of the box" OEM solution.
DataPod seems more focused on synchronizing data between multiple computers. It's not clear how/if they support disconnected and DVD-based operation. They focus on their peer-to-peer synchronization, but offer no support of the distributed protection of DVPC, nor the ability to logon from an arbitrary PC. They also focus on a special web interface for managing files. And they lack support for smart versioning.
EasyReach Find and Workspace supports organizing, searching, and remote access to enable users to "instantly find any file or e-mail on their work or home computers." This is all good stuff, but doesn't address the issue of robust storage, vendor diversity, and versions.
FolderShare synchronizes and shares files. Acquired by Microsoft on November 3, 2005.
GDrive is a rumored upcoming capability from Google that would keep a copy of your hard-drive on Google's servers. This would violate the DVPC goal of vendor diversity. DVPC is a little bit more than just a remote copy of your hard-drive.
indi from InfoEither offers a way to carry around your "identity" in a USB flash drive. This might actually dovetail with DVPC as a form of identity, but doesn't address global virtual storage in a larger sense.
Live Drive is rumored to be Microsoft's answer to Google's rumored GDrive (or Google Drive) online storage service.
LiveVault (acquired by Iron Mountain on December 1, 2005) provides internet-based automatic backup for servers, focused on small and mid-sized businesses and corporations with remote offices. Iron Mountain calls this part of their "Distributed Data Protection Strategy" and the "Online Backup and Recovery Market".
.Mac iDisk, Sync, and Backup from Apple - an Apple-specific service for network storage, synchronization, and backup. No support for vendor diversity.
Memeo AutoBackup provides "Protect, Sync, and Share" for "Everything that Matters". Sounds like a decent backup solution, but still doesn't provide complete transparency or vendor diversity.
Mozy from Berkeley Data Systems is a hosted storage platform that offers "free remote backup" (2GB), $19.95/yr for 5GB, and other higher-end plans. Supports encryption, and automatic and incremental backup. No vendor diversity. Unclear geographic diversity. No smart versioning.
Omnidrive offers a "hosted storage platform." No support for vendor diversity. Unclear about geographic diversity. No support for smart versioning.
Permabit Permeon Compliance Store provides reliable long-term storage for fixed data such as required for regulatory compliance. Obviously focused on high-end corporate needs rather than on consumer needs.
The (Personal) Virtual Computer, a proposal by Henry Minsky at MIT, contains a number of interesting ideas, including virtual servers.
PowerFile permanent storage solutions - archival storage, permanent storage appliances. Not consumer-oriented. No vendor or geographic diversity. No support for transparent versioning.
RepliStor from EMC/Legato is a tool for maintaining a master copy of data on an organizational server and then pushing it out to other locations such as branch offices.
S3 from Amazon offers "storage for the Internet" and is "designed to make web-scale computing easier for developers" for $0.15 per GB/mo of storage used and $0.20 per GB of data transferred. Oriented towards Web Service developers. No vendor diversity. Unclear geographic diversity. No smart versioning.
Streamload is a hosted storage platform that offers "free online storage" (25GB but only 100MB monthly download limit), $4.95/mo for "unlimited storage" with 2GB monthly download limit or several higher-end plans. No vendor diversity. Unclear geographic diversity. No smart versioning.
Strongspace from Joyent is a hosted storage platform for backup, file sharing, and remote file access that offers a variety of storage plans that start at $8/mo for 4GiB up to 160GiB for $290/mo. No vendor diversity. Unclear geographic diversity. No smart versioning.
Sun Grid Storage Utility offers Sun Grid Remote Backup and Restore Service (RBR) and Sun Grid Remote File Vault (RFV) that "completely eliminate the risks and costs of internal backup and archival infrastructure and processes, while providing services that will scale with a customer's unique and evolving storage needs." Oriented towards IT sites. No vendor diversity. Unclear geographic diversity. No smart versioning.
Symantec's GoBack is more of a "system" utility than a transparent part of the system. They do allow you to "go back" to a prior state of your file system, but only to a limited extent. The ability to restore deleted or modified files is merely a "poor man's" approximation of DVPC's Smart Versioning that would allow you to simultaneously view any number of different versions from the history of the same file, all the way back to the point where you originally began using DVPC, for example.
TimeData Continuous Data Protection (CDP) from TimeSpring does do continuous backup, but doesn't have clear support for vendor and geographic diversity or transparent user-oriented version support. Seems more oriented to server-oriented IT shops than consumers.
Microsoft Windows OneCare Backup and Restore helps automate the file backup problem.
Xdrive is yet another approach to creating an internet-based, sharable "drive" for your system, where you can store, backup, share, and access data, but on a more explicit basis, without the transparency and multi-vendor, multi-server capabilities of DVPC. One of the biggest drawbacks of Xdrive is that it is a single-vendor, single-server solution, so there are the issues of the possibility that they might go out of business or be acquired by a business which allows the service quality to degrade and the vulnerability of their servers to earthquakes, terrorism, and sabotage.
Please contact us with any questions or comments.
Please contact us with any questions or comments.
Updated: April 14, 2007 10:10:14 PM -0400
Copyright © 2007 John W. Krupansky d/b/a Base Technology