Domain-specific Language (DSL) Projects:
These projects involve the creation of specialized sublanguages for describing given domains, as well as asking and answering questionseach domain. Each sublanguage generally has a syntax, vocabulary and semantics similar to a small subset of a natural language (English in this case) including terminology and concepts related to the domain. Each language should be extendable by extanding its syntax, vocabulary and semantics. Below are descriptions of five domains for which a DSL will be developed.
Your general approach may be as follows:
A. Determine what types of questions you want to answer in your domain.
Look at the examples given for each domain below.
- Determine an abstract syntax for entering the facts which will be the domain.
- Determine how you will display the facts, check for inconsistencie and make changes.
- Determine an abstract syntax for the questions.
e.g. How do you <do a particular task>?
What is <attribute name> of <object>?
Look at the examples given for each domain below. What types of syntax rules are needed to express them? Create abstract syntaxes (such as the above) that can express those questions.
- Fill in the blanks and provide the concrete question.
example question: What is A’s color?
or
What is the color of A?
- Determine an answer syntax
These are generally easier. They are determined by the question’s syntax. For each question syntax, you can define one or more abstract answer syntaxes.
- Fill in the blanks and provide the concrete answer.
E.g. <object>’s <attribute> is <attrcibute-value>.
example answers : A’s color is red,
or
Color of A is red.
Either combination is acceptable.
B. Design a an abstract syntax for expressing the facts to create your domain. (Similar to the above)
C. Design a general structure and storing the facts.
Typically, structure consisting of triples and nested triples are used to store such facts. Each triple expresses a relation or a fact between two things or groups of things.
Examples:
(is_name_of, <person>, <name>)
(is_at_location, <object>, <coordinate1, coordinate2>)
(is_uncle_of, <uncle_name>, <nephew_name> )
(is_gamescore_of, (<team>,<team>,<date>),(<integer>,<integer>))
D: Handling the semantics:
For each type of question, as expressed in the first element of the triple, make a general design for:
The overall logic,
The types of facts needed based on question type,
The data structures needed to store the facts,
The searches over the triples needed to determine the facts and storing them in the data structures,
Performing the calculations needed to create the answers from the facts.
Create a code skeleton to accomplish the above and put the answer in the desired answer syntax.
Fill in the code details.
Test your algorithm for that type of question.
Do lots of testing with different types of inputs for each type of question.
E:(Very Important!) Describe your design in a Design Document for which a format is provided in the project package in the web site. It will be a part of your solution to the project.
1-DSL for Information Retrieval (InfoDSL)
This DSL will operate on documents stored in a folder or on the web. It will permit:
1) Parsing textsof selected documents to extract sentences, words and word-rootsusing information retrieval (IR) techniques (e.g. leaving out insignificant words),
2) Measuring similarities between documents by using IR techniques,
3) Searching for documents on the Web,
4) Ordering them by similarity,
4) Grouping them by subject as determined from retrieved information,
5) Creating and saving links to them in an HTML file by subject,
6) Identifying pictures and storing them in a folder by subject,
7) Saving all folders in a folder with DSL’s name,
8) Asking and answering questions about the documents.
9) Displaying documents
Some sample question and answer types:
- Which are the 5 documents most similar to document D1?
(A measure for similarity must have been defined.)
E1,F3,G4,K3 are the 5 documents most similar to D1.
- In how many documents are the words “nuclear” and “weapon” used in same document?
The words “nuclear” and “weapon” are used together in 17 documents.
- Show the 50 most frequent words and their usage in decreasing order .
The most frequent 50 words and their usage in decreasing order are:
D1,30
D2,27
D3,27
D4,27
D5,26
…,..
- How many unique word-stems are there in document D1? (Note: A word-stem consists of upto first 5 letters of a word.)
There are 355 unique word-stems in document D1.
- Show me the links for documents E1,F3,G4.
<Should display a page containing the desired links.>
- Which documents use the word “atom” the most?
F3 uses the word “atom” the most
.
-Which document contains the most pictures?
G3 contains the most pictures.
-How many pictures does G3 contain?
G3 contains 17 pictures.
….,..
….,..
….,..
…,..
- Which documents came from ? K2,H8,D10 came from
Documents T1,T5,V19 came from
2- DSL for simple 2D scene description (2DSceneDSL)
In this project, a Domain Specific Language which provides facilities for defining a scene which contains 2 -dimensional colored geometric objects, groups of objects, and their simultaneous movements will be designed and developed. The objects will be limited to standard regular convex shapes.
The language will provide facilities for describing:
i) The scene (dimensions, color)
ii) Fixed and movable objects in the scene (coordinates of their vertices and their colors)
iii) Executing a scenario that contains descriptions of object movements (direction, speed, duration (or destination, what happens in case of collisions etc.)
The scenario will support statements like (numbers denote pixels):
Origin at top left (0,0).
X is a yellow square at 10,5 ;20,15. // Diagonal coordinates. Note that the number of points depends on shape.
Y is a red triangle at 20,10;30,10;25,15.
Z is a blue point at 4,4.
X is moving east at 1/sec for 2 secs. // east is assumed to be right direction
Y is moving southeast at 3/sec for 1 sec horizontally and vertically. //southeast means toward right and down.
It will support asking questions about the scene and objects and will provide answers by using the DSL.
Sample questions:
What are the positions of X,Y after 3 secs?
X is at 12,5; 20,15. // Has moved 2 pixels to the right in 2 secs and stopped.
/* For Y, moving at 3 pixels/sec for 1 sec in each directions will mean each vertex will move 3 pixels in each direction. */
Y is at 23,13;33,13;28,18.
How far is X from Y?
/* This means how far apart are the centers of X and Y.
We need to compute the centers of square X and triangle Y and find their distance.
Center of X is at( (12+20)/2), 12+15)/2)= 16,13.5.
For a triangle there are many definitions of a center. We’ll use the “centroid” as the center.
Coordinates of the centroid of a triangle are found by averaging the x-xoordinates and the y-coordinates.
Thus the centroid of Y at its new location is at (23+33+28)/3, (13+13+18)/3 =(28,14.7).
Thus the distance between the two centers is
((28-16)^2 + (14.7-13.5)^2) ^0.5 = ((12^2) + (1.2^2))^0.5 =(144+1.44)^0.5 = 12
*/
X is 12 pixels far from Y.
What color is Z?
Z is blue.
3- DSL for city streets (StreetsDSL)
This DSL deals with describing streets and landmarks (important locations) in a part of a city. It will consist of three sub-DSLs, one for describing the layout of streets and landmarks, one for asking questions and one for answering questions.
Each street consists of 1 or more straight parts. Each part can connect to one or more streets at an intersection. DSL will describe streets in the following way. Between any two given points, the street is assumed to run straight.
GHI street runs from 16,12 to 18,18.
ABC street runs from 4,5 to 8,9; crosses DEF; to 15,9; to 16,12; ends at GHI.
DEF street runs from 1,1 to 26,18.
….
….
The program will thus build up a plan of the streets.
We then place some “landmarks” (known places) in the map:
Restaurant R1 is at 5,7; R2 is at 17,12.
Cinema C1 is at 11,9.
Museum M1 is at 14,12.
…
…
Let’s assume each square is 100 meters x 100 meters.
City Streets (the scene)
0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16 / 17 / 18 / 19 / 20 / 211 /
2
3
4 / R1
5
6
7 / DEF / ABC
8
9
10
11 / C1
12
13
14 / M1
15
16 / R2 / GHI
17
18
19
20 / DEF
21
22
23
24
25
26
27
28
The user can then ask a question like:
How do I go from R1 to M1 ?
The program should answer something like:
- Go to ABC street. Turn left.
- Go until DEF street. Turn left.
- Go about 120 meters. It will be on your left.
4-DSL for Football (FootballDSL)
This DSL deals with football teams and contests. It permits:
Entering,editing, deleting a game history including teams, match scores, who scored the goals and the time of each goal, as well as possibly other details.
Questions and answers will be in a form similar to the following:
- How many times did team A beat team B? A beat B 5 times.
- How many times did A tie with B? A tied B 3 times.
- Who scored most goals in A?Ahmet scored most goals for A.
- How many ties were there? There were 12 ties.
- What are the current standings?
GW L T GF GA P
1. A 108 1 1 20 17 25
2.B …
3.C …
(where A,B,C are team names.)….
- How many points does A have?A has 12 points.
- How many matches did A win?A won 4 games.
- What percentage of goals did A score in first half of games?A scored 60 percent of goals in first half of games.
You may add other types of questions.
5- Family tree (FamilyTreeDSL)
It will permit:
-Giving people’s names, marriages, children (sons, daughters by name, dates of birth), deaths(date).
(We shall ignore divorces and multiple marriages to simplify the problem. First names of people with same last names, i.e. first nam +last name, will be unique.)
-Defining terms such as “father”, “mother” and many others that describe relations mentioned below, in a family,
Asking and answering questions about family relationships such as “Who is A’s father”, “How many uncles does B have?”, “Who is C’s maternal grandfather?” as well as others as mentined below. ….
As many relation names (e.g. uncle, aunt, nephew…) as possible should be supported. Where they do not suffice to express a relationship uniquely by a single word, multiple relation names will be used, e.g. “F is E’s grandfather’s aunt.” See below for more info on relation names in English.
Questions and answers will be in a form similar to the following:
Who is A’s father? A’s father is G.
What is the relationship between A and G? A is G’s son.
Who are G’s children? G’s children are A,B.
Who are B’s uncles(or aunts…)? B’s uncles are G,H,I.
(Note that unless qualified as “maternal uncle” (“dayı”) or “paternal uncle” (“amca”) , “uncle” means both, i.e. “dayı” and “amca”. Similarly for “aunt”. See below).
When did A’s father die? A’s father died on 23 December 1999.
Creating the family tree:
First define the individuals as mentioned above.
(name, sex, birthdate, deathdate)
There should be at least 30 people.
Define the basicfamily relationships among the individuals
Relations are defined using triples: (relation_name, first_operand, second_operand).
They should be stored in disk as tables, in XML form or a relational database in a user-designed structure.
(is_ son_of, son_name,father_name), similarly for daughterand mother.
Note that the person implied by the relationship (“son” in this case) is named first in the relationship.
(is_wife_of,wife_name,husband_name)
There should be at least 6 families including husbands’ sides and wives’ sides, with several boys and girls per famiiy.
Since we are ignoring divorces and multiple marriages, all brothers and sisters have same mother and father.
Therefore the above definitions are suficient to create a family tree.Families will be seperated by unique family names.Only unique first names will be used within a family, i.e. people with same last name.
The family tree should be built from the above relations only. Higher relations will be introduced by definitions using the basic relations as described below.
Higher Family Relationships
Some Relation Names in English
Sisters of mother and father (i.e. “teyze” and “hala”) are called “maternal aunts” and “paternal aunts” respectively.
Brothers of mother and father (i.e. “dayı” and “amca”) are called “maternal uncle” and “paternal uncle” respectively)
All brothers and sisters are called“siblings”(“kardeş”).
Mother, father, sister and brother of spouse (both wife or husband) are called “mother-in-law”, “father-in-law”, “sister-in-law” and “brother-in-law” respectively.
Sons and daughters of uncles or aunts are called “cousins”.
Children of brothers, sisters or cousins are called nephews(male) or nieces (female).
Some Logic Expressions involving Relations
“ Implies” expressions:
The “implies” operator is defined as in formal logic:
(A implies B) means whenever A is true B is true.
These can be used to identify relations which were not not initially provided.
(is_ son_of, son_name,father_name) implies (is_father_of,father_name,son_name)
(is_wife_of,wife_name,husband_name) implies (is_husband_of, husband_name,wife_name)
(is_brother_of, person_name1,person_name2) implies (is_brother_of, person_name2,person_name1)(is_uncle_of,person_name1,person_name3) implies (is_nephew_of, person_name3,person_name1)
“Conjunctive Implies” expresions (using “and”)
((is_sibling_of, person_name1,person_name2) and(is_child_of, person_name1,parent_name))
implies (is_child_of,person_name2,parent_name.
-- since we are ignoring divorces and multiple marriages.
((is_child_of,child_name1,parent_name) and(is_sibling_of,uncle_name,parent_name))
implies (uncle_of,uncle_name,child_name1)
-- Holds for aunts as well.
Equivalence relationships (Optional info. Use if it will be helpful.)
(equivalence_relation, person_name1,person_name2)implies that all upward relations (toward common ancestors) of person_name1 hold for person_name_2 as well.
E.g. “Is_Sibling_of”, “Is_Parent_of”, “Is_Uncle_ of”, “Is_Aunt_of” relations give the same result for both siblings. As an example, if C is A’s uncle, and B is A’s sibling, then C is B’s uncle also.
Queries which can be submitted
Queries can include higher relations. E.g. “Who are A’s cousins?”
What is the relation of “G” to “A”?
However if ambiguity results, the user should be asked to clarify.
E.g “Who are A’s uncle’s children?” Since A can have more than one uncle, and the question was asked in singular form (i.e. uncle’s, not uncles’), only one uncle is to be used. So the user may be warned that the “uncle” is ambigous and that the question must be clarified, e.g. the uncle’name should be givrn or “uncles’” should be used, which implies all uncles.
Often there are multiple possible answers to a question. For example consider the query:
What is the relation between “A” and “G”?
Let’s assume “A” is maternal great grandfather of “G”. The most basic (and longest) answer is (note that maternal means on G’s mother’s side).
-A is father of father of mother of G.
Other, shorter answers are:
-A is grandfather of mother of G.
-A is greatgrandfather of G.
When this is case, the longest and the shortest answers should be given (i.e. first and last answers above).
Another example:
Assume K is a son of B’s paternal uncle who is U’s grandson.
Possible answers:
K is son of son of son of brother U of father of B
K is grandson of son of brother U of father of B
K is son of grandson of brother U of father of B
K is great grandson of brother U of father of B
K is son of grandson of paternal uncle U of B
K is great grandson of paternal uncle U of B
K is great grand paternal cousin of B (lost uncle U’s name here ).
Only first and last answers should be provided. Rest may be provided optionally.
So the approach for determining relations between two persons starts with determining the most primitive (longest) relation first, and applying transformations to shorten the description (as in above example). These transformations are found from those built given above or transformations based on them, which may be stored in form mentioned above. When no more shortening is possible, the transformation stops and the result is returned.
Computer Speech Projects
These projects involve using existing packages and making additions to enable the following projects. Speech recognition packages are available in Java and .NET (using C#) for Turkish and English. You should first locate and read some basic literature about Speech Recognition and Production technology in general, and then do a research into the components available for your specific project. Speech recognition may be one word at a time, or several words at a time. The latter is preferable. Speech production is always phrase or sentence-based. The projects involve a report and a presentation on the technologies available for your project and a demo application mentioned in the project description below. You should be able to discuss different aspects of the technology involved in your project.
For a good example, see the GVZ project, . Be sure to research your topic in Google.
Also see books and theses on the subject.
6- English Speech Recognition & Production in .NET
This project will explore English speech recognition and production in .NET . The latter is preferable.Will examine available technology and implement an application such as inventory management using both speech input and speech output. It will create a unique vocabulary, word recognition and command syntax and semantics definition. It will also evaluate phone-based speech recognition and speech output Some details will be provided by instructor.
7- Turkish Speech Recognition & Production in .NET
It will do similar things for Turkish speech recognition and production. It will implement an interactive banking application. Some details will be provided by instructor.
8- English Speech Recognition & Production in Java
It will do similar things as Project 6 but in Java. Will implement a restaurant meal ordering application.
For cell phones, in particular speech input/output in Android, among others can be evaluated. Some details will be provided by instructor.
9- Turkish Speech Recognition & Production in Java
Similar to Project 7 but in Java. Will implement a travel agency tour information application.
In particular speech input/output in Android, among others, can be evaluated. Some details will be provided by instructor.
10- Voice Verification (VV)
This project will survey techniques for ensuring that the voice belongs to the person it is supposed to belong to. VV is an alternative to other personal identity verification techniques such as kaptcha, fingerprint and retinal identification. It will implement a project for VV that can be used to access certain directories in a PC.Some details will be provided by instructor.
11- Simple English-to-Turkish Interpreter (English STT (speech-to-text) input, Turkish TTS (Text-to-speech) output)
a)Simplest case: one word at a time. Oral English-to-Turkish dictionary. English input will be one short word. The Turkish answer may be multiple words.
b)Medium difficulty: Two short words spoken slowly. “My school”, “Soccer ball”, “black dog”… The English input should be grammatically parsed. The Turkish output should be grammatically correct. E.g. “My school” should generate “Benim okulum” or just “okulum”. “Soccer ball” should generate “Futbol topu”, “black dog” should generate “siyah köpek”etc. Only a small set of words (e.g. upto 15) and only a few types of phrases may be allowed (e.g. noun-noun, adjective-noun including appropriate suffixes on Turkish side, e.g. “top-u”, “okul-um”)