The Core Interpreter

The Core Interpreter......

The Interpreter......

The Parser......

The Lexer......

The Symbol Table......

The Stacked Symbol Table......

Error Handling......

The ActiveX Script Engine......

Requirements......

The IActiveScript Interface......

IActiveScript::AddNamedItem......

IActiveScript::SetScriptState......

IActiveScript::Close......

The IActiveScriptParse and IActiveScriptParseProcedure Interfaces......

IActiveScriptParse::ParseScriptText......

IActiveScriptParse::AddScriptlet......

IActiveScriptParseProcedure::ParseProcedureText......

The Scripting Engine......

NamedItem List......

Event Handler List......

Interpreter List......

The ActiveX Interpreter......

The Event Handler......

Error Handling......

Automating External COM Objects......

Calling Script Methods from External Clients......

Active Debugging Features......

Setting Breakpoints......

Hitting Breakpoints......

Call Stacks......

Property Browsing......

Immediate and Watch Windows......

Active Debugging Support......

Requirements......

The Script Engine......

The Interpreter......

The Parser......

The Symbol Table......

Debug Expressions......

Simple Host Backup......

SimpleHostBackup::Initialize......

SimpleHostBackup::PreParseScriptText......

Error Handling......

The Core Interpreter

At the core of every Active Script Engine is a humble interpreter, and the sample script engine is no different. It is here that the lion’s share of the work of interpreting script blocks is done. The details of interpreter design and implementation are beyond the scope of this sample. However, the basic stages and classes of the core interpreter are presented here to give a foundation for the Active Scripting Engine integration a little later.

It should be noted that there is a great deal of latitude in the design of an interpreter. This sample presents one possible implementation, but it is by no means the only implementation possible. In designing an interpreter, care and thought should be given to the functionality that will be required. In the sample interpreter, these requirements have been pointed out in an effort to make design easier.

The Interpreter

The core class of the sample interpreter is the CInterpreter class. It controls the parsing of script code, as well as the management of variables. The CInterpreter class acts like a virtual machine for the scripting language, keeping track of the state of the script and reconciling function calls and variable references. Script can be added directly to the interpreter, through the ParseText method.

void CInterpreter::ParseText( LPCOLESTR scriptText, ULONG startingLineNumber,

DWORD dwSourceContext )

This method takes the script text to parse, the line number that the script starts on and a unique identifier for the script block. The last two parameters take into account the fact that a script block may not start on the first line of a document, and that more than one script block may be added to the interpreter.

At this stage of script engine construction, there is only one CInterpreter. However, it’s important to remember that an Active Script Engine may need to have more than one. It is also important that facilities are in place for the script to reference functions and variables outside the interpreter, and for other clients to be able to do the same for functions and variables inside. For this reason, CInterpreter defines and makes use of the PutValue, GetValue, and Call methods. These methods abstract method and variable resolution so they can be more easily overridden later.

The Parser

When an instance of the CInterpreter class is given script to run, the interpreter must translate it into meaningful instructions. The parser’s job is also to ensure that the stream of tokens is meaningful, and to resolve any ambiguities that may exist in the script. This is necessary because the stream of tokens may contain only nonsense. “Inside value why lately” contains valid words, but they don’t make up a valid statement.

In the sample script engine, this job falls to the CParser class. CParser begins parsing script by lexing the raw text into tokens. Then, using a well-defined grammar and a simple recursive-descent style, CParser takes the tokens and uses them to create a stream of meaningful, unambiguous instructions for the next phase. CInstructions, which contain the type of the instruction and the labels and/or tokens that are associated with it, represent these instructions.

CParser receives script blocks through a call to its ParseText method. This method takes the address of the instruction list where new CInstructions should be placed, the text of the script, the starting line number of the script, and the unique identifier for the script block

void CParser::ParseText( TList<CInstruction*>* pIList, LPCOLESTR scriptText,

ULONG startingLineNumber, DWORD dwSourceContext )

The Lexer

At the most fundamental level, a block of script is nothing more than a string of characters. The role of the lexer is to take that string and break it down into “words”, or tokens, for the next phase in the process. The sample script engine’s CParser class uses the CLexer class to perform this function. CLexer takes a Unicode string and breaks it down into CTokens, which encapsulate the tokens for the script. These CTokens are then returned to the CParser one at a time for use in the recursive-descent parsing process. This is accomplished through the getNextToken method.

STDMETHODIMP CLexer::getNextToken(CToken** theNextToken)

In designing a lexer, it is important that as little information as possible is lost during the lexing process. CTokens encapsulate the text and type of each token of course, but they also contain the row and column position of the token, the offset of the token from the beginning of the script block, and a DWORD which identifies the script block where the token originated. All of this information will be needed later.

class CToken {

private:

//Data members

MY_TOKEN_TYPE m_theType; //Type of the token

LPCOLESTR m_theSource; //Source string of the token

TEXTPOS m_theTextPosition; //Row and column position of the token

ULONG m_charOffset; //Character offset from the beginning of

//the script block

DWORD m_dwSourceContext; //Unique identifier for the script block

//that this token is found in.

};

The Symbol Table

In order for the values of variables to be stored and retrieved, the interpreter must maintain a symbol table. Here, a list of all the symbols is kept, and the interpreter can look up a variable by name to save or change its value. This job falls to the CSymbolTable class. CSymbolTable maintains a simple linked list of CSymbols, which encapsulate individual variables. The CInterpreter class creates an instance of CSymbolTable during initialization, and fills it as CInstructions are executed. Through the FindSymbol method, CInterpreter can locate the variable it needs and modify its value.

BOOL CSymbolTable::FindSymbol( LPCOLESTR symbolName, CSymbol** pp_theSymbol )

The Stacked Symbol Table

In languages where there is only one, global scope, symbol table management is trivial. However, most languages allow more than one scope. Brackets, function calls, and overloaded variable names all contribute to scoping problems. In these cases, there must be a way to ensure that the variables are reconciled correctly. In the sample script engine, the CStackedSymbolTable class does this work.

The CStackedSymbolTable class maintains a pseudo-stack of CSymbolTables. When a new scope is created, the CStackedSymbolTable pushes a new CSymbolTable onto the stack. When the scope is destroyed, the CSymbolTable is popped off the stack and deleted. In between, variable references are reconciled by searching the each of the CSymbolTables on the stack, starting with the top. In this fashion, variables go in an out of scope correctly.

Error Handling

There are nearly unlimited combinations of characters that can be called scripts and sent to an interpreter. Only a small fraction of those are valid programs. Invalid characters, syntax errors, invalid variable references and bad method calls all add to the need for an error handling facility. This allows the interpreter to report to users what has gone wrong, and hopefully allow them to correct the problem. The CErrorHandler class provides this ability in the sample script engine.

Errors are detected at various stages of the interpreting process. The CLexer class detects invalid characters and reports them to the error handler. Similarly, the CParser class detects syntax errors and the CInterpreter class detects runtime errors such as invalid method calls and variable references. At this stage, the error handler simply displays a message to the user explaining the problem and where it occurred. With the addition of Active Scripting however, error handling becomes much more elaborate however. In order to facilitate this new functionality, two function pointers, HandleCompileError and HandleRuntimeError are used to abstract the error handling process. These function pointers are directed to the error handling methods for the stage of the engine that is being used.

The ActiveX Script Engine

The purpose of ActiveX Scripting is to allow hosts to use script engines from multiple sources and vendors to perform scripting. The interfaces that make up ActiveX Scripting make it possible for hosts to use arbitrary languages for scripting. On the language engine side, the fact that the ActiveX Scripting interfaces are language independent presents the opportunity to decouple the interfaces of an ActiveX Script Engine from the actual implementation of its interpreter. If the core interpreter has been carefully designed, no changes need to be made to existing code to add ActiveX Scripting support.

Requirements

All ActiveX Script Engines must implement the IActiveScript interface. This interface serves as the primary interface of the ActiveX Script Engine, and allows an ActiveX Script Host to control the state of the engine, as well as its namespace and output facilities. An ActiveX Script Engine must also implement either the IActiveScriptParse interface or one of the IPersist interfaces. This permits scripts to be added to the engine. Most script engines choose to implement the IActiveScriptParse interface, because that allows script to be added directly to the engine without requiring it to be saved in a special format first.

An ActiveX Script Engine may also choose to implement the IActiveScriptParseProcedure interface. This interface provides the ability for a Host to construct a script method within the engine “on the fly” and wrap it in an IDispatch pointer so it can be called from outside the script engine. This functionality is particularly useful in creating events without the use of the IConnectionPoint interface.

The IActiveScript Interface

Although simple to use, the methods provided by the IActiveScript, IActiveScriptParse, and

IActiveScriptParseProcedure interfaces are intricate. Often, they perform complex actions that the Host is not aware of. The IActiveScript interface, which serves as the primary interface of the ActiveX Script Engine, exposes three such methods.

IActiveScript::AddNamedItem

This method is used to add new items and namespaces to the script engine. The first parameter to this function, of course, is a new name to add to the engine. The second parameter contains the flags that dictate what effect the new name will have on the engine’s namespaces. The available flags are:

SCRIPTITEM_CODEONLY – This flag indicates that the name is only used to create a new namespace within the script engine. It informs the engine that there is no external COM object that will provide methods for scripts to call.
SCRIPTITEM_NOCODE – This flag indicates that the name only refers to an external COM object that will provide methods for scripts to call. It informs the engine that it doesn’t need to create a namespace for this name, because the Host will not add any scripts for this name.
SCRIPTITEM_GLOBALMEMBERS -- This flag informs the engine that the Host wants this name to be used to extend the run-time libraries of the engine. Methods provided by this object can be called from any script context without specifying the named item they belong to.
SCRIPTITEM_ISPERSISTENT – This flag indicates that this named item should be persisted. It informs script engines that support any of the IPersist* interfaces or the IActiveScript::Clone method that this named item should be saved as part of the engine’s state.
SCRIPTITEM_ISSOURCE – This flag indicates that this named item supports events that the Host may wish to sink in script. The engine uses this hint to determine which objects it needs to connect as event sources.
SCRIPTITEM_ISVISIBLE – This flag informs the script engine that the named item can be referenced in scripts. Without this flag, references to the named item will be unresolved.

Some of these flags conflict with one another. SCRIPTITEM_CODEONLY conflicts with every other flag except SCRIPTITEM_ISPERSISTENT. The conflict with SCRIPTITEM_NOCODE is obvious, and the conflict with the other flags exists because SCRIPTITEM_GLOBALMEMBERS, SCRIPTITEM_ISSOURCE and SCRIPTITEM_ISVISIBLE all require a COM object to interact with. The SCRIPTITEM_NOCODE flag conflicts with SCRIPTITEM_ISSOURCE because marking a named item as an event source requires that it be able to have scripted methods, which the_NOCODE flag denies. Although SCRIPTITEM_NOCODE doesn’t actually require SCRIPTITEM_ISVISIBLE, it doesn’t really make sense without it. Essentially, such a named item would be invisible, because internal scripts could not see it to resolve references, and external objects would have no need to.

When the Host calls IActiveScript::AddNamedItem, it is entering into a contract with the script engine. Except in cases where the SCRIPTITEM_CODEONLY flag was set, the script engine will call the Host’s IActiveScriptSite::GetItemInfo method to get the IUnknown interface pointer of the named item. In some cases, it may also request the ITypeInfo interface pointer of the named item. It is the Host’s responsibility to resolve the names it added to the script engine with IActiveScript::AddNamedItem into COM objects when IActiveScriptSite::GetItemInfo is called.

IActiveScript::SetScriptState

The IActiveScript::SetScriptState method is used by the Host to control the current state of the script engine. Under normal circumstances the Host only calls this method to instruct the engine to begin executing immediate code or to connect itself to event sources. However, the engine cannot transition arbitrarily from one state to another. Like a dial, if there are intervening states between the current state and the requested one, the engine must transition through each on its way to the requested state. The possible states of the script engine, and their order, are as follows:

SCRIPTSTATE_UNITIALIZED – The initial state of the script engine, the engine has been created but has not had any scripts loaded into yet. The engine is not generally usable from the state.
SCRIPTSTATE_INITIALIZED – The script engine has had an IActiveScriptSite interface set, and has been initialized via IActiveScriptParse::InitNew or IPersist*::InitNew, but is not running script and has not connected to any objects.
SCRIPTSTATE_STARTED – Setting this state causes the engine to execute any immediate code that was added during the initialized state. In this state, the engine has not connected to any event sources, and so will not receive any event notifications.
SCRIPTSTATE_CONNECTED – This state causes the engine to connect itself to all the event sources it’s aware of from calls to IActiveScript::AddNamedItem, and to respond appropriately to event notifications.
SCRIPTSTATE_DISCONNECTED – In this state, the script engine has temporarily disconnected itself from event sources. In normal operation, a script engine may transition freely back and forth between this state and the connected state as new script blocks and named items are added to the script engine.
SCRIPTSTATE_CLOSED – The script engine has shut down in this state, and all that is left is to release its interface pointers. Once an engine reaches this state, it releases all of its interface pointers and destroys its run-time state. Most method calls to the engine, including IActiveScript::SetScriptState, will fail when the engine is in this state.

It is not possible for a script engine to skip states when transitioning from one state to another. For example, a common scenario is for the script engine to be in the initialized state, with named items and script blocks added and ready to run. The Host calls IActiveScript::SetScriptState with the SCRIPTSTATE_CONNECTED flag to tell the engine to connect itself to events. However, the started state intervenes between the initialized and connected states. The engine must transition through the started state, executing all immediate code queued in the initialized state, before connecting to events. For this reason, no events can be handled while the script engine is executing immediate code.

IActiveScript::Close

Reference counting requires special care with ActiveX Script Engines, because of the significant number of back-references that are maintained by various pieces of the system. A common mistake in ActiveX Scripting is to fail to call the IActiveScript::Close method before releasing a script engine. The Close method instructs the script engine to destroy its run-time state, and to release all interface pointers it may have acquired. This includes the IActiveScriptSite pointer, which points back to the Host. If this method is not called, then reference counts could trap the Host and other COM objects in memory after they’ve been released.