Overview
Country or Region: United Kingdom
Industry: Hosting—Software as a service
Customer Profile
Digital Field Solutions provides solutions that capture handwriting electronically to simplify data capture and automate forms processing. The company is based in London, England.
Business Situation
Facing an increasing workload for its hosted solution, Digital Field Solutions had to get the most out its existing hardware—or pay an additional £12,000 per year to lease another server.
Solution
The company adopted new parallel processing aids in Microsoft Visual Studio 2010 and the .NET Framework 4, enabling it to get four times the performance from existing hardware.
Benefits
Ease of adoption
Reduced developer effort
Significant performance gains
£12,000 annual cost savings / / “Microsoft has done a fantastic job with the parallel programming tools provided in Visual Studio 2010 and the .NET Framework 4.”
Marshall Brooke, Lead Developer, Digital Field Solutions
Digital Field Solutions provides digital pen-and-paper solutions, which capture and process handwriting electronically to simplify data capture and automate forms processing. As the demand for its hosted service grew, the company had a choice: figure out how to get more performance out of existing hardware, or purchase an additional server. The company quadrupled the workload capacity of its server by modifying its code for parallel processing, thereby enabling the server’s workload to be distributed across all four processor cores instead of one. Digital Field Solutions was able to implement parallel processing quickly and easily, with minimal developer effort. The company’s customers are benefiting from fast and reliable service, and Digital Field Solutions saved an estimated £12,000 per year in hardware costs.
Situation
Founded in 2005, Digital Field Solutions provides digital pen-and-paper solutions, which capture and process handwriting electronically to simplify data capture and automate forms processing. Based on market-leading Anoto technology, the solutions use a digital pen (sometimes known as an electronic pen) that looks and feels like an ordinary ballpoint pen, enabling people to write directly on paper with ink. In addition, a sensor in the pen uses a dot pattern printed on the paper to capture pen strokes in terms of x- and y coordinates, and that information is stored digitally.
Data captured with the pen is then uploaded to Digital Field Solutions for processing. This processing may include the overlay of captured pen strokes on a digital copy of the paper form, resulting in a PDF document that looks exactly like the ink-on-paper copy generated in the field. It also can involve the use of handwriting recognition to convert the writing in each field of the paper form to text, which is transmitted as an XML document to a customer for back-office processing. Through such functionality, customers of Digital Field Solutions can quickly capture and process information in a way that is cost-effective and that requires little user training.
As a hosted solution provider, Digital Field Solutions develops and maintains the software that handles this postcapture processing. And because the company handles thousands of forms a day across many different clients, it’s important to achieve as much concurrency in processing as possible. “Workflow steps can include handwriting recognition; generation of PDF documents or XML files; and transmission of those files by email, file transfer protocol, HTTP calls, and web services calls—many of which are nondeterministic, long-running, asynchronous tasks,” says Marshall Brooke, Lead Developer at Digital Field Solutions.
In the past, when all processing was done sequentially, Digital Field Solutions found itself quickly running out of headroom. Because the application was not multithreaded, even on a four-core server, three cores would sit idle while one core was at 100 percent processor utilization. “When each workflow step for each form was processed sequentially, it was safe but performance was limited—and clients expect to get their information quickly,” says Brooke. “If there’s a problem or delay because processing gets backed up, we start getting calls from customers immediately. At the same time, we’re a small, five-person company with limited resources, and we need to get the most out of the servers we have before buying new ones.”
Adapting his code for multithreading by hand, however, was not something that Brooke wanted to attempt. “We needed both performance and safety,” says Brooke. “When writing multithreading code entirely by hand, it’s easy to run into problems such as thread safety and race conditions. You may develop a multithreaded application that runs just great on your desktop PC, but find that it crashes when you run it on a server with several multicore processors.”
Because of those obstacles, Brooke held off parallelizing his application as long as possible. Eventually, however, he faced a choice: Tell his boss that another server was needed, or modify his application for parallel processing. He recalls, “I realized that the time had come to parallelize our application—as required to improve performance for computationally intensive tasks such as handwriting recognition, and to efficiently and reliably handle all the asynchronous input-output we generate in sending processed forms to clients.”
Solution
Brooke adopted the Parallel Extensions to the Microsoft .NET Framework, which were released in November 2007 as a Community Technology Preview (CTP). Today he writes parallel code using the released and supported successor of the Parallel Extensions: new parallel processing capabilities that are included in the Microsoft Visual Studio 2010 development system and the .NET Framework 4.
“I’ve been interested in parallel processing since the Parallel Extensions were first released and was happy to see Microsoft address the challenges of parallel programming,” says Brooke. “My initial reaction was, ‘Wow—parallel programming isn’t scary anymore.’ It’s been interesting to see how all of this has developed over the past few years, and the approach pioneered with Microsoft continues to be logical and well thought out.”
Rich Parallel Programming Libraries
In enabling its code for parallel computing, Digital Field Solutions now uses the new parallel programming libraries provided in the .NET Framework 4, which are supported by new features in Visual Studio 2010. The parallel programming libraries provided in the .NET Framework 4 include:
Task Parallel Library (TPL), which includes parallel implementations of for and foreach loops (For and For Each in the Visual Basic language) as well as lower-level types for task-based parallelism. Implemented as a set of public types and APIs in the System.Threading.Tasks namespace, the TPL relies on a task scheduler that is integrated with the .NET ThreadPool and that scales the degree of concurrency dynamically so that all available processors and processing cores are used most efficiently.
Parallel Language-Integrated Query (PLINQ), a parallel implementation of LINQ to Objects that combines the simplicity and readability of LINQ syntax with the power of parallel programming. PLINQ implements the full set of LINQ standard query operators as extension methods in the System.Linq namespace, along with additional operators to control the execution of parallel operations. As with code that targets the Task Parallel Library, PLINQ queries scale in the degree of concurrency according to the capabilities of the host computer.
Data Structures for Parallel Programming, which introduces several new types that are useful in parallel programming—including a set of concurrent collection classes that are scalable and thread-safe, lightweight synchronization primitives, and types for lazy initialization. Developers can use these new types with any multithreaded application code, including that which uses the Task Parallel Library and PLINQ.
In parallelizing his code, Brooke found the Task Parallel Library to be ideal for handling the processor-intensive workload of handwriting recognition. “Most forms are composed of several individual rectangles, in which text is written,” he says. “With Parallel.ForEach, I can process all of those text boxes in parallel with just a single line of code.”
Brooke also makes extensive use of the new data structures and synchronization primitives included in the .NET Framework 4, which he applied to the challenge of reliably handling a large number of asynchronous events—such as communication with customers’ own systems by using File Transfer Protocol, HTTP calls, and web services calls. “Every item in a list of processing steps kicks off its own Task, and the processing for that item—such as FTP, SMTP, and HTTP calls—is also handled with the Task Parallel Library, largely making use of TaskCompletionSource,” says Brooke. “TaskCompletionSource and Task continuations are great features, in that they enable me to fling code around the system knowing that stuff will happen in a robust and manageable manner.”
Powerful Parallel Debugging Tools
Brooke also appreciates the rich parallel diagnostic tools in Visual Studio 2010, which include new Parallel Stacks and Parallel Tasks windows for debugging code. Visual Studio 2010 Premium and Ultimate also have a Concurrency Visualizer, which is integrated with the profiler. The visualizations provide graphical, tabular, and numerical data about how a multithreaded application interacts with itself and other programs, enabling developers to quickly identify areas of concern and navigate through call stacks and to relevant call sites in the source code.
“I use the Parallel Tasks and Parallel Stacks windows a lot—they’ve proven especially helpful in debugging the handwriting recognition code, which uses third-party libraries,” says Brooke. “It wasn’t clear which parts of these libraries were thread-safe, and the parallel tools in Visual Studio 2010 helped me find a few issues that I had to address. That said, the parallel libraries in .NET 4 are so noninvasive and easy to use that I really don’t have to do all that much debugging.”
New Tools for Asynchronous Development
Use of the parallel libraries is now so deeply engrained in Brooke’s coding style that he finds it difficult to do without them when writing code for the Microsoft Silverlight browser plug-in or Windows Phone 7. “I actually created a minimal library that mimics TaskCompletionSource, so that I can use the same asynchronous patterns across all three platforms,” he says. “When the parallel libraries come to those platforms, I’ll be able to swap out my code without any changes.”
To that end, Brooke is keeping his eye on the Visual Studio Async CTP, which extends Visual Studio 2010 with a new, streamlined syntax for asynchronous development. “With the Visual Studio Async CTP, Microsoft is providing first-class language support for Tasks, in terms of both asynchronously awaiting them and producing them,” he says. “For me personally, the formal release of this new technology by Microsoft will be a huge event.”
Benefits
With the parallel programming tools provided in Visual Studio 2010 and the .NET Framework 4, Digital Field Solutions was able to parallelize its code quickly and easily, with minimal effort and a negligible learning curve. The company’s efforts have yielded benefits for all parties, including fast and reliable service for customers and lower hardware expenses for Digital Field Solutions.
“Everything is great with our parallel implementation, which is working extremely well and has never exhibited any performance, memory usage, or reliability issues,” says Brooke. “Microsoft has done a fantastic job with the parallel programming tools provided in Visual Studio 2010 and the .NET Framework 4.”
Ease of Adoption
Brooke adopted the new parallel libraries in the .NET Framework 4 quickly and easily, with a minimal learning curve. “The parallel libraries are an absolute joy to use—they just make sense, and there’s no way I could go back to programming without them,” he says. “For me, parallel programming is a natural progression—the way that coding should be. Code snippets and other posts from Stephen Toub’s blog on Parallel Programming with .NET have been especially useful, providing what people need most: information and guidance on how to use these great libraries.”
Reduced Developer Effort
Brooke also was able to parallelize his code with minimal effort. “Without the parallel libraries, handling the asynchronous events alone would have required hundreds of lines of code and would have been difficult to debug—with the risk of starting so many threads that it could starve the system of resources,” he says. “Today, I can handle asynchronous tasks with just five lines of code and know that it’s going to work reliably. Parallelizing the handwriting recognition functionality would have been even more complex—to the extent that I never would have attempted it on my own.”
Significant Performance Gains and Cost Savings
With parallel code, Brooke is able to get four times the performance out of a quad-core system. This has enabled Digital Field Solutions to avoid purchasing an additional server to handle its current processing workload. “In the early days, before we parallelized our code, we used to get calls from customers asking, “Where is this form we submitted?’ because the forms were taking too long to process,” he recalls. “Today, that’s not an issue. Just as important, we know that we’re getting all that we can out of our current hardware. Without parallelization, we would have already had to lease an additional server from our hosting provider, at a cost of an additional £12,000 per year.”
A Win-Win for Everyone
For Digital Field Solutions, the adoption of parallel computing has been a win for all stakeholders, including customers, developers, and company management. “Our use of parallel programming has benefited everyone,” says Brooke. “Customers are delighted because they get fast and reliable service, my boss is pleased because he doesn’t need to buy another server, and I’m happy because everything was so easy to implement. The only reason I can see for developers to avoid using the parallel libraries is because they’d prefer to continue throwing additional hardware at their performance problems.”
Microsoft Visual Studio 2010
Microsoft Visual Studio 2010 is an integrated development system that helps simplify the entire development process from design to deployment. Unleash your creativity with powerful prototyping, modeling, and design tools that help you bring your vision to life. Work within a personalized environment that helps accelerate the coding process and supports the use of your existing skills, and target a growing number of platforms, including Microsoft SharePoint Server 2010 and cloud services. Also, work more efficiently thanks to integrated testing and debugging tools that you can use to find and fix bugs quickly and easily to help ensure high-quality solutions.
For more information about Visual Studio 2010, go to: