A Grand Challenge:

Building the Linguistic Web 3.0

George White

7/11/2006

I. Introduction

This is a challenge I offered initially to Singapore but it can be adopted byother forward-thinking countries & businesses as well.

The challenge is to take a leadership position in ushering in the tidal wave of progress destined to follow in the wake of a new form of Internet technology, “Web 3.0”, as I like to call it. Web 3.0 will be characterized by speech understanding,net-connected, mobile phones.

Web 3.0 (the speech enabled, mobile, linguistic Web)is going to happen regardless of anything proposed here. You will know it’s here when you start talking on mobile phones to access the Web on a regular basis; when speech-recognition translates requests to the Web, when GPS on the phones allows the Web to know where you are, and the Web is able to understand your natural language queries. At this point, the Web will guide you to locations you want to visit, to products you want to buy, to friends, and to information you (or others) may have stored in particular physical locations. It will record your life story: your words, your pictures, your likes & dislikes, the people you visit, and your business transactions. It will educate you and your children by providing convenient anytime/anywhere access to the world’s published knowledge. It will keep you in constant touch with the creation & flow of news from around the world and across the street. It will transform what you buy, how you are entertained, where & what you eat, the friends you maintain, how medical care is delivered, and how police & emergency services are rendered. In short, your life will be utterly transformed.

The challenge is to be first to provide certain critical pieces of this vision, and thereby reap the rewards that can be had for market innovators in these enormously leveraged areas.

Web 3.0 will be built on top of Web 2.0 which is already under discussion, and can be read about at

It's clear to everyone, I suspect, that mobile phones will become the most pervasive and popular Web access devices in the world. On the other hand, it’s NOT clear how speech recognition fits in. After three decades of research, speech-recognition has yet to demonstrate unequivocal success on problems of significant economic importance. That’s why this problem is worthy of becoming the heart of the Grand Challenge. This document will propose a new solution to this problem, one that I firmly believe can succeed.

In addition to identifying the key technology hurdles in the challenge, I will suggest novel economic objectives to motivate this research, and different organizational structures were carrying out research.

II. Major ideas:

  • Linguistic Web infrastructure: What makes adding speech to the Web possible is the increasing ability of Google, Ask.com, Yahoo and other Web portals handle questions typed on computer terminals and mobile phones. The Web is becoming linguistic. It’s doing this without speech. It simply relies on key strokes and button presses. But this ability to handle typed input provides the critical missing piece that was beyond the speech community. By inferring “meaning” from typed queries, Google et al are supplying infrastructure for natural language for the first time. Adding speech recognition now becomes feasible for the first time.
  • Leverage: Mobile phones and the Internet are ideal for the Grand Challenge for a powerful reason: They are the most highly leveraged technologies on the planet. They offer the most potential economic impact per dollar invested.

For example, the microprocessor gave Bill Gates economic leverage greater than any human being previously enjoyed. (Bill Gate’s hourly earnings were at least 10 million times that of a day laborer). The Internet, building on the microprocessor, gave rise to more leverage than the microprocessor alone. This combination (microprocessor + Internet) catapulted 2 Stanford University graduate students with a mere $50,000 investment into positions of being among the 20 wealthiest people on the planet within 7 years, instead of the 15 years it took Bill Gates. Soon mobile phones access will be added to the Web. As this becomes pervasive, it will produce even more economic leverage than Bill Gates, Sergy Brin and Larry Page had.

Web 3.0 could provide the Singapore government this sort of leverage. It is certainly safe to say that the Web is one of the richest areas for innovation in the world today and it’s potential for creating jobs and economic impact is unmatched in any other industry.

  • The Internet will soon provide universal access to all knowledge. It won't be long now until “universal access to all knowledge” becomes reality on the Web. We have the technology to digitize and make available on the Web everything ever written (including all books), everything ever recorded (all movies, all TV and all music) and all pictures (paintings and photographs). In fact Google itself is already stepped up to providing access to the sevrral of the bigger libraries of the world. In fact, the incremental costs of putting everything on the Web that's not already scheduled to go onto the Web would probably be less than $1 billion USD!! Any technology which can give anyone, any time, anywhere, universal access into the storehouse of all public knowledge of all mankind, is truly magical. Universal access will elevate the quality of life all for human beings on the planet for all future generations. And gaining access to it from mobile phones will become essential as will speech recognition to replace the tedious pressing of tiny buttons on increasingly tiny phones.
  • Mobile phones will makeWeb access truly universal. Having access to all knowledge wherever you are, be it a school room in Africa, or a sales office in New York, will fundamentally transform the nature of our lives.

You can already type into Google “Where is the nearest post office?” And get a reasonable answer. Soon you will be able to ask for any commodity you want to buy from milk to automobiles, and Internet will tell you where to buy it, how much it costs, and will give you driving directions to get to the store. Mobile phones will become the focal point for integrating all the new features of the Internet, GPS, RFID, wireless mini-devices, radios, Podcasts, recorders…., you name it and it will be on the phone…and thus on the Internet.

  • Being able to enter spoken queries to the Web into mobile phones, rather than pressing tiny keys, will provide a huge market need, and thus economic reward for solutions, as mobile devices become smaller. Furthermore, in under-developed societies, many more people talk and write. In addition, there are difficulties in entering data on the move for which speech is ideal and button pushing is hard. Even in developed societies, mobile phone users certainly would rather talk than type. All tolled, there is a wave of unmet need coming our way as mobile phones enter the Web access arena.
  • Emerging nations, especially Singapore's Asean neighbors, will soon undergo rapid economic expansion, and Singapore can help them by bringing Web 3.0 to them. All eyes are on China, and India. But the newly emerging markets of Singapore’s neighbors , small now, is destined to become huge. Over half the people on the planet have never made phone call. Over half the population of the world will soon be brought into the world community through mobile phones. Helping those previously disadvantaged people and cultures enter the modern world through the Internet will ultimately yield huge economic rewards for those that provide the enabling technologies and infrastructure.
  • The X-Prize theory of catalyzing research should become a significant part of delivering on this grand challenge. The theory behind the X-Prize is that highly structured projects, managed top-down, are much less efficient in terms of generating breakthrough technologies, than groups of individuals motivated by substantial prize money. See . We recommend combining the X prize idea with the more structured approaches that would be used to pursue this grand challenge in Singapore.
  • Uniquely Singapore

1.)Multicultural & multilingual assets for language development technologies. The fact that Singapore is home to many ethnic minorities from Asia gives it a natural advantage when it comes to developing computerized language processing systems for Asia. Customized spoken Asian language understanding will better serve Asians for location based services, travel directions, schedules, multimedia information capture and dissemination, local communities of interest, learning systems, and financial transactions.

2.)Budget:Someone once remarked that the entire annual R&D budget for the Nation of Singapore was less than the R&D budget for IBM. So Singapore can't expect, and indeed doesn't expect, to prevail based on monetary strength alone; but Singapore most assuredly can expect to prevail by cleverly focusing on the right problem, at the right time, in the right industry. Web 3.0 is not yet explicitly the focus of any nation. While, Internet companies are certainly working on the near-term future of Web, Web 3.0 as defined here, is not on their radar and therefore would be a good fit for Singapore’s resources.

  • The linguistic wireless Web 3.0 is intended to support spoken language understanding. This means that means mobile phone users will be able to carry on conversations with computers on the Internet. Conversational speech recognition is a notoriously hard problem. However, substantial progress is being made in the two critical areas needed to achieve it: 1) the speech recognition of words themselves, and 2) the interpretation of the "meaning" of the words.

Regarding 1:speech recognition of words: Current attempts to accomplish speech recognition have been limited because of the centralized approach of using speaker independent speech-recognition servers. We’ll propose a way of using speaker dependent, custom trained recognizers, in the research proposal below.

Regarding 2: interpretation of meaning. Today the Web is just beginning to return sensible results when we type in full sentences such as “What time is it in London?” or “Give me driving directions” or “What is 15% of $24.95?”. In this sense, because of the success of the search portals in understanding typed English queries, it is clear that the Web is capable of becoming "linguistic" in the sense that simulates understanding “the meaning” of words in a query.

So the “understanding of the meaning” of full typed sentence English queries is getting better every day, and the technique used to accomplish it uses the Web itself. Queries are captured from millions of users. For example, the spelling checker in the Google search bar is not a spelling checker at all. Google simply keeps track of things people type and then notices what they ultimately get. Google can then say “Did you mean…” when someone misspells an entry. The same applies to capturing full sentences. Thus the Web is becoming capable of extracting meaning from words independently of the Singapore Grand Challenge. But more importantly, we can count on it happening and use it to achieve the larger vision of spoken language understanding.

III. The Grand Challenge: Building the Linguistic Web 3.0

The Grand Challenge is to build the linguistic Web 3.0 capable of engaging in spoken language dialogs with users of mobile phones. It will require the execution of a program that delivers solutions to the main points outlined above, or otherwise uses them as guidelines.

In summary, the goal is to provide

1.)Speech recognition to translate spoken input to the Web from mobile phones,

2.)Web services that take advantage of GPS location-based input from mobile phones;

3.)Web 2.0 features defined in discussion groups, especially those focused on gathering summations of many individuals opinions, which yields the “wisdom of crowds”.

The first phase should be dedicated to understanding the key enabling technologies of mobile phone speech-recognition, GPS location based services on cell phones, and Web 2.0 concepts. From this understanding, greater detail can be added to define exactly how voice control from mobile phones can take advantage of the wealth of knowledge and data that are being poured into the Internet.

To accomplish item 1, speech understanding on mobile phones, we advocate the following:

1.)We would use rapid adaptation, speaker specific, speech recognition technology to augment speaker independent technology.

2.)The recognizer itself might reside on the users PC rather than the mobile phone/PDA itself.

3.)We would use the Web to learn what people want when they speak particular phrases by watching what they do next to achieve their desired objective.

4.)We would begin by collaborating with Google to provide voice access to the list of things they already provide cell phone users who enter questions using text messaging. It's possible already to send a text message to Google asking for “help” to which Google will reply with a list of things you can do which include: language translation, identifying show times and theater locations, driving directions and various calculations. Since Google has gone to the trouble of finding out which questions people are most likely to ask, it provides an ideal context for building an automatic speech recognition. This context can allow speech-recognition to succeed where it earlier failed for lack of sufficient contextual constraints.

5.)We propose to provide free speech recognition to PC users through “widgets” that they will download free from the Web, anywhere in the world. The speech-recognition will answer questions in well-defined areas, such as questions about movies or address book/phonebook information. These widgets will provide us with feed back information sent to a central speech understanding learning system. We will capture recordings of user queries. We will use this speech data to train speech recognizers. We will also use it to develop language models of things people say when they want particular results.

In pursuit of item 2, taking advantage of speech-recognition and GPS on mobile phones, we propose the Human Tertiary Memory Assistant.

Mobile phones with GPS connected to the Internet make it feasible to record everything you see and hear, your location everywhere you go, and record your commentary, all the time, every waking moment of your day. In other words, you will be able to augment your memory. Thus mobile phones will create a living diary for us from which we will be able to retrieve multimedia memories. We would call this the Tertiary Memory Assistant.

We envision being able to set reminders within the Tertiary Memory Assistant which will be triggered on location or time/dates or word associations.

Recording information creates the challenge of retrieval, securing privacy & safety, and providing universal access no matter where you are. We envision speech-recognition to perform keyword spotting both to tag memories and retrieve them.

As part of the Grand Challenge, we may decide to develop a suite of technologies to make it easier to upload personal information, as well as public information, onto the Web. We may decide to develop techniques that make it easy to keep content private or share it with friends, collaborators, business partners or the public. And once the information has been uploaded, we may develop methods for rapidly retrieving it, when you want it, using associative memory cues as well as date/time/location stamps.

In summary, the Web of the future, Web 3.0, will be accessed constantly from mobile phones; it will support spoken natural language queries (we’ll be able to talk to it); it will augment human memory; it will provide universal access to all knowledge, and it will answer questions with a level of insight that transcends any single human contributor (the key idea in Web 2.0). I believe Singapore could lead the world in ushering in Web 3.0. This is, indeed, a Grand Challenge.