A Data-Driven (REST) Approach To Distributed, Concurrent Software In The Enterprise: George Chrisanthakopoulos & Henrik Frystyk Nielsen

George Chrisanthakopoulos and Henrik Frystyk Nielsen gave a presentation on the work they have been doing on the data-driven approach to distributed software, architectural challenges experienced by enterprise customers, and combinations of actors and relationships in the software and services world. Following the presentation, the fielded questions from audience members.

Henrik Frystyk Nielsen: Welcome to the event and rainy Seattle. You may have come across our content in the past looking on Microsoft.com. You may be aware of white papers we’ve been writing for past two years. Where we come from.Combinations of actors and relationships in software and services world.Do this to drive content about what are the architectural challenges our customers experience.

George Chrisanthakopoulos: I’m here to talk about run time and actual app model. People have started using runtime in really high-end applications. The rest of Microsoft is in some ways believing in same model. Need to have observable, simple things. But not just go out with plumbing. You need to go out with some model.

<Why Data-Driven?>

Where our approach is different from web services, is service is a living document. Could be jpeg or xml structure.Becomes a very elegant model to communicate with. We don’t just say it’s a living document. We also rewrote the code that makes this possible to program again. We have one fairly simple specification that says what is state and what isn’t. One thing to be able to send messages, but another to dynamically bind in a type-safe way and just declare your intent.

<Why Robotics?>

Very challenging. Large scale back ends need similar facilities. Also people writing drivers. People need to coordinate asynchronous I/O, etc.

<Software Challenge!>

These undergrad students decided to use ours. There are some pretty big message loads. A lot of coordination required. High degree of robustness and isolation

<Microsoft Robotics Studio>

At the core you have runtime. CCR for coordination. We also provide authoring tools. Again, not robotic specific. Visually express data flow model. Also services so people can understand the patterns. How to buy this is with a commercial license. Essentially $2 per deployment. Very hard domain to crack, but we seem to be making headway. Also released additional packages.

<Diversity Of Applications>

Robotics first thing we’re targeting. We did get picked up for financial trading and analysis. People want to use us for scientific modeling on very large scale. Also sensor networks.And Large-scale web app processing. MySpace is running on this stuff right now. also academic curriculum ad research.

<Authoring Tools>

Part of our tools is the Visual Programming Language. Discover what’s there, drop it in, and create connections. So you can have a complex diagram across machines. The underlying protocol can make it concurrent or exclusive. Also we have industrial strength simulation environment.

<Runtime Environment>

We are a completely managed framework. Can run on cell phones and PDA’s.underneath is a hosting environment, so you can start and stop hundreds of thousands of services within a process. Then can transfer the state. You can go to any layer in the infrastructure and observe it. We always integrate it from the beginning.

<Runtime Goals>

Distributed Application Model (DSS). We are fast, flexible and service oriented. Secure by default. Also concurrent programming model (CCR). We hide traditional threads and lock primitives. Also innovate use of language features. When you have a data dependency it can look like sequential code. we figured out how to do it with C#. It’s programmable and code can look very concise.

<Scale-Up And Scale-Out>

You can have failure around your system and don’t know right level to deal with. Here you can go to any level. We also tell you how to build applications.

<DSS App Model>

It can be controlled or autonomous. Service is the unit.

<Service Interactions>

Instead of a huge variety, we have one specification. It gives you management. We think we are augmenting the REST model. When the queries are satisfied, messages flow across the network.

<CCR Programming Model>

This is actually separate. People are actually deploying right now for financial services. We’re about coordination. Not about concurrency. We focus on coordination. Try to make a very performant context. We do isolation by looking at your code and cloning data structures. But then you can modify them and nobody sees the side effects.

<Anatomy Of A Service>

Identity URI. Separate your state. Don’t have a black box or a bunch of API’s. instead have a document. You can load it from remote web address and initialize it. Then the infrastructure really helps you with this. Since everything is about these independent items that happen. You can say you want it to be exclusive modifying three things at once.

<Services Benefits>

Extremely responsive.Very robust. We have all this architecture for cloning. It gives us scaling. At the same time incredibly robust. Nobody has put this amount of detail into making software robust and performant at the same time. Don't have black boxes anymore. Your system starts looking very different. We can capture everything in detail, time stamp it, throttle it, etc.

<Service Benefits 2>

Example of REST model for UI. The one in the middle, we say the infrastructure gives you a one-liner. We will capture and represent your state. You can do this across the world. You don’t need any local UI. And here’s a win form and GDI to do this visual representation. The author was never aware of this. We want it to be independent and that's what this separation of state from behavior does.

<Runtime Performance>

Don’t see this level of performance in a managed environment, etc. One hundred and fifty thousand messages per second. Cross notes fully encrypted, can do about 6,000 messages per second. These numbers are on a 4-core 2.4 GHz system. Again the performance for us was key. We made it fast to keep the model consistent. We can compete with very high-end http servers. Can have 100,000 of those services within one NT process.

<Service Activation>

Can start things programmatically, through a visual editor, etc. you can create those instances of cross notes.

<Service Partnering>

Compilers for years have given you nice features. But few have given you this across the network for services that only link for proxy’s. if it’s running within your note we just use that URI. We’re sending this infrastructure in a type-safe way, but do it dynamically at runtime. Also you can override some of these relationships. This can all be over without the developer ever knowing. Also this graphic robotics tutorial.

<Persistence And Synchronization>

Natural side effect of having this state separation. We can do this for you. It can be SQL, local file system, etc. You can also have services that are stateless. We also give you services that give you publication subscription. You just declare this partner and when you change state it just flows through the partner.

<Service Extensibility>

We don’t believe in traditional hierarchy. we believe in logical hierarchy. So what we actually enable is multiple contracts in the same service. But they have their own independent queues. From outside world, they look completely independent. Also versioning model enables progressive rollout.

<Dealing With Partial Failure>

I am surprised at how little people talk in terms of failure. Bluntly, robotics is a very good example. We have this runtime in nanoseconds. So as you spread work across systems, we know the root cause, and know they’re logically related. So can bring it back when any failure happens. In this world, it can have many failures. But because of support, you can coordinate.

<Summary>

We think we have a fast and reliable runtime that has a fundamental application model around services. It’s not just services. We did the work to create a performant infrastructure. We thing the state separation is a key aspect of this. It’s a truly concurrent and scalable system and tat the same time, you can observe and interact with it.

<Additional Resources>

Online docs, downloads and discussion forums. Question posted about three million messages per second. The forums give you a glimpse of the community and how they’re starting to grasp the underlying runtime.

Attendee: We’re looking at using this in an existing app using classic threading and locking techniques. Is this something where we should holistically migrate our app to this framework, or take part of the app? Will we discover other areas where we can use concurrently?

George Chrisanthakopoulos: So do you have to migrate completely? I’ve seen examples of both. I’ve seen people using it for stitching together and see enormous compatibility. It converts them. People are really infected with this model. Wow. Can managed 13 clusters with 12 machines each, and that’s what they’re doing. They discovered the UI became more responsive. So I recommend start where you have the biggest pain. Then you’ll see that you’ll be bothered yourself by having two different frameworks. We do things compositionally. We don’ have global thread pools. We can give you a mini thread pool or a mini way to isolate that. We thought quite a bit about integrating with legacy systems.

Henrik Frystyk Nielsen: Another aspect is that it may help with you inspecting the system. So if you use from high level down, can put service around existing functionality. The abstract file system, databases, etc. Once you have a wrap around functionality can start using the other added values.

Attendee: What tools do you have around deploying manageability? You have a huge capability to tune.

George Chrisanthakopoulos: One interesting thing about system is that core services are separated. If you want to monitor all interactions in a particular service, you could write a java script that gathers in one place and put in SQL or Excel. You can get to stuff. We tried to keep our transfer from being stateful. We haven’t done a lot in terms of debugging. We run an x path to see if passed or failed. People can use existing graphic tools to start probing the performance of the system. So we don’t, in some ways, need tuning tools. We are going to start doing domain stuff with computational units.

Henrik Frystyk Nielsen: We have a tool saying these are the services you need in order to run your application.

George Chrisanthakopoulos: Little file creates a sandbox and you run it. You need some form of Windows and .NET. all our state can be sandboxed within that structure.

Attendee: Is it possible to show us a short demo using this to get to the picture?

George Chrisanthakopoulos: Yes, we have time to do that now. any questions first?

Attendee: Nice presentation. Can talk more about security model?

George Chrisanthakopoulos: We are a tiny team and didn’t want to reinvent the world of security. Have two transports. For HTP we use the HTP.sys. It provides all security for us because we set the credentials. It’s elegant. You can add people through the web pages. So it’s just state. So it’s basically the NT model. For us it's a one-liner. Very simplistic model.Doesn’t do proxying and intermediate stuff. More like you ask it and we built it. Using industrial strength runtime. Also NT domain is fully encrypted. MTLM with secure stream of managed code. So that’s the context you’re running in. can go to the website and add other users. In terms of security, we are all or nothing..once you get access to the node, you can add or drop any of those services. In 2.0 you will have a more granular security model. It can have “get” access but no “drop” access.

Henrik Frystyk Nielsen: This is actually critical because in many other systems, you have to say, “For an arbitrary number of applications you can do, what are the arbitrary side effects?” Hard on the producing side because have to annotate all those on consuming side, and have to understand what they all mean. In this world, it's really a few buckets. Easier to convey and consume what you want. Things have to be easy to consume as well as reproduce.

<Demo>

George Chrisanthakopoulos: We’re going to have a ten-line service. Go to his URI. I did a “get” on that service. I’m going to do a little XSLT or some java script. This is at a level of a little service row. Another example is this service by itself. It is “Directory.” It has state. It’s not a magical service. In the end, it’s a list of records. So a bunch of records that I call service info type, and here’s instance. My directory is nothing more than a list of records. That is its state. It can also present itself this way.

You can decide which web 2.0 technology you’re going to bind to the service running. We don’t really have objects.

Attendee: In a massive scenario, isn’t the overhead parsing the XML?

George Chrisanthakopoulos: This is for a human interacting. There is no XML here. I can do hundreds of thousands of these without going here. At the slower human speed, this is why this is here. You don’t write code for this. We present it here.

Henrik Frystyk Nielsen: Critical part is that performance has to be so good that you can write these without having to recode them, whichever node they’re on.

Attendee: What's your versioning story?

George Chrisanthakopoulos: You have a service already running..has been give a URI and some node. Here’s a service that’s already running. We capture the timestamp, stack trace, URI of everything. You can do dynamic filtering. So running, and you have a schema. So you say, OK I’ll dynamically start a new instance, give new URI, insert itself into the directory. The new service you create has a port RQ where it implements that state and responds to operations of the original version. It inserts itself twice. To you it looks like V1. But at the same time, it has inserted itself as V@. we have partners. We will show you a list of other aliases and other things your service implements. You can go to V1 front end of the service and find that also implements V2. Our concurrency tells you I want the V1 applications to be exclusive. Or concurrent. So all you do is declare them concurrent. We have actually run up to 64 core systems. So the run time gives you help to link them and manage them. They’re both there. there’s no hiding.

Henrik Frystyk Nielsen: Hugely advantageous in a distributed system.. You have to be able to roll new services in and have the existing ones work without knowing anything about the new services. We were not going to run out of URI’s.

Attendee: You’ve been saying that the services are stateful. So your service is a resource and does not focus and encapsulate processes. So someone else has to build processes outside the scope of your services?

George Chrisanthakopoulos: Yes you have to compose so it will bottom up. But as you said, some services are not that simple. May be a fusion system.It’s state is like running state of the system. So that’s little more complicated. To you it extracts their state. But each cell can have a meaningful state. So while we separate state from behavior, it can have no state of its own that’s meaningful.

Henrik Frystyk Nielsen: Services is really the extraction we have. How to write something extra? You write a service that says what you need to do, and hook that in. It trickles all the way down. It’s about hooking services to make them easy to compose and to consume. At the end of the day, still ends up with something that looks like a document.

George Chrisanthakopoulos: This particular robot in the middle has a collection of services. Let me show you something that has to do with versioning. Services with a bumper contract. The doc is nothing more than a time stamp comes down to are you pressed or not pressed? That’s all it comes down to. in the end, this services has a state of pressed or not pressed. So you can subscribe with a standing query. Whenever that changes, we forward hundreds of thousands of our notifications. So fine. That’s what you did in V1. Later, someone said we want an alternative face to it. So here you see a second URI. It is a generic version of the bumper. But it’s a form of versioning. You’re transparent to the other one. You don’t know if Microsoft or IBM. Look there is a partner here. So if you click on this guy, you’re back to V1. Lasers, by the way, have message loads that are very similar. Sixty times per second, they lend you a list of integers. They did a cylindrical projection. It’s a very basic concept, and so basic hard to explain to people. But actually scales massively. People are adopting this kind of REST on the side. We don’t try to recover by having instrumentation, etc.