Pokéyword
Bryce Dorn (bsdorn2) and Tedman Marszalek (tmarsza2)
Abstract
Most modern information systems are primarily founded upon users, data, and function. Specialization of these factors offers opportunity for the progression of search engines by creating unique search systems. Individualization of the users, data, and task support provided us a realistic goal which we used to create a tangible tool. Specifically, the users are all current or prospective Pokémon fans and the data is also purely Pokémon related. The function of our tool is to develop a Pokémon web search engine which would retrieve a set of relevant Pokémon based upon a query of physical and personal traits. With the continual expansion of Pokémon, it is often difficult to recall names and attributes defining specific Pokémon. Our tool will utilize multiple Pokémon internet databases to construct a broad relational database of traits and Pokémon, and will use NLP to translate a natural language input into the terms that should be queried for. Specifically, we decided to focus on creating a keyword search engine which would be used as a Pokémon identifier.
Introduction
I. Background
As fans of Pokémon, the primary motivation was to apply the concepts and tools we used in class to create a personalized modern search engine for something that we love. Additionally, as fans we understand the complexity of Pokémon characteristics and attributes, especially with the expansion and continued popularity of new Pokémon generations. While traditional Pokémon fans have grown accustomed to the complexity and abundance of information, new fans might be overwhelmed by the many details surrounding each Pokémon. By providing a simple search engine for identifying forgotten Pokémon, we believe we can both introduce Pokémon to new fans and also welcome back those who were estranged.
II. Problem
The problem is efficiently introducing Pokémon to those who are foreign to it. While the internet has a multitude of databases and related database search engines, there are few keyword search engines for Pokémon. There is an overabundance of information regarding Pokémon, yet no efficient method of simple characteristic description retrieval. It is oft the case where a not-so Poké-savvy person, usually the parent of an avid fan of the series, tries to remember what the “big blue turtle thing” is called. An experienced player will immediately recall that “Blastoise” is the Pokémon they are thinking of, but without their accustomed child or the ability to search by these traits, this is difficult to discern. Through this we believe we can help expose and familiarize new people to Pokémon.
In addition, as fans of Pokémon we found this problem interesting because we are aware that current users of Pokémon search engines have been dissatisfied at the quality of modern Poké-searches, being both limited and inefficient. Our tool will ideally both refine their qualities and integrate our own retrieval algorithm. Specifically, in assignment three we experimented with a personalized algorithm which we intended to adapt to the Pokémon data and eventually utilize in the search engine. Overall this problem is interesting and personally important because it allowed us a good opportunity to apply class information retrieval techniques learned in lecture and class assignments. Specifically we wanted to index and chunk relevant Pokémon text data and filter it to provide simple and accurate results
III. Novelty and Procedure
Regarding the novelty of our project, we aimed to create an easy to use search engine with a friendly interface and up to date features, while altogether maintaining the efficiency of the search engine and implementing the numerous ideas covered in class. We wanted to specialize the users, data, and function of an engine to create a tool which would allow us to practice class techniques. Through this specialization, we reduce data and improve efficiency by immediately filtering out irrelevant search results. This provides an ideal way of consolidating appropriate information for the user and Pokémon audience.
Related Work
Regarding related systems and exterior project research, we were previously knowledgeable about several Pokémon information mobile apps. Basically they are phone applications attempting to mimic a Pokémon Pokédex, however they are often buggy and inefficient. Further exploration after our proposal revealed that they often contained too much data which limited search accuracy. While these mobile apps were interesting research for comparison when beginning our project, we aimed to implement a purely web based search engine. Only after experiment and refinement did we plan to focus on moving the engine onto mobile devices.
Additionally, there are a multitude of Pokémon databases on the internet, which implement retrieval both successfully and non-successfully. Some of the popular databases were good examples of database retrieval engines where there is a lot of information and a user searches with a simple name query or through a selection of categorization filters. These databases, such as Pokémondb and Bulbapedia Pokémon encyclopedia were helpful resources, both for comparison and for crawling and retrieving helpful data. Additionally, we found PokéAPI, along with its associated python wrapper Pykemon, which provided an API for attribute aggregation. Therefore, largely we discovered there are also many other databases such as Veekun, but no relevant simple keyword search engines for simple Pokémon identification. From here we drew our inspiration.
In comparison, our tool will specifically target Pokémon physical traits and personal qualities, for example when a user wants to find a Pokémon he/she saw but could not remember their name and various attributes. Additionally our function retrieves similar Pokémon based on their general statistics, look, and “personality.” By page indexing per Pokémon we organized the data in a manageable way. Therefore the novelty of our project lies in our friendly Pokémon identification system.
Problem Definition
After crawling, retrieving, and indexing each Pokémon’s data, the primary problem lied in the correlation between the input and the output. The focus was primarily upon a user looking for a specific Pokémon name while they only have knowledge of certain characteristics, either physical or personal. Basically, the input is whatever the user remembers in order to identify a Pokémon. Regarding the computation problem, we expect the input to be a colloquial phrase or sentence attempting to describe a Pokémon or aspects regarding a category of Pokémon. Therefore we expected the query would have to be parsed along with the documents in order to eliminate unnecessary wording and give high weights to relevant characteristics. This would demand some sort of computational syntax analysis. With this parsed input we can filter inaccurate documents and retrieve relevant ones, specifically by searching through the indexed Pokémon pages.
Regarding the specifications and requirements, in order to successfully parse the query, retrieve good information, and provide accurate results, we needed an expansive set of Pokémon data, which was crawled and retrieved from the multiple databases listed as references, such as Bulbapedia, Pokemondb, and PokéAPI. Additionally, some natural language processing would be necessary to parse and effectively analyze the queries. Thus we researched and investigated some smart search function in order to remove stop words from query and provide other text analysis and reduction. Ultimately the output would be the most relevant Pokémon, along with several other related Pokémon, and their simple identifiable traits: number, type, health points, height and weight. We provided the “runners up” Pokémon in order to allow for implicit user feedback in the future.
Methods
In order to maximize accuracy, we took data from multiple online sources. This included a RESTful API that returns specific attributes, a Wikipedia-like website (Bulbapedia), as well as a general Pokemon information index (Pokémondb). While the specific attributes obtained from the API were already in an organized format, we simplified and adapted the information to provide as a queried result.
With respect to alternative solutions to the problem, we really wanted to experiment with accurately parsing a free form query, as opposed to using categorization filters which most of the popular database retrieval functions have. We would solve this problem by making a personalized in-browser search function based in Lunr. Through this we were able to take advantage of the simple light weight javascript search engine in order for fast online browser querying.
Several challenges arose when we were experimenting with the search engine. For example we had several issues with regards to the word frequency distorting the accuracy. We discovered that a more efficient natural language processing format was necessary in order to better filter the query and retrieve results. For example we aimed to further utilize term and total frequency in order to potentially normalize the accuracy. We will discuss more of the challenges we encountered in the evaluation section.
Evaluation/Sample Results
Regarding the initial evaluation of results, our solution relied upon pure keyword searching for relevant Pokémon. For an initial goal, our implementation does not currently work as well as planned. Specifically we ran into several road blocks because of the data we crawled and the information we retrieved and indexed. For example a Pokémon such as Charmander has “...fire burns at the tip of this tail” which is a defining searchable characteristic. However other Pokémon information which we indexed tended to have poor information which we left unfiltered. In contrast to Charmander, Wartortle also was described as being“...part of a fire-fighting squad” which limited Charmander’s relevance in favor of Wartortle. Basically, there are slight terminology differences (such as in “fire fighter” and “fire”) but large differences with respect to differentiating Pokémon. Unfiltered information in the text we retrieved was the major source of the inaccuracies of our solution.
These experimental errors displayed the limitations of raw text indexing without word frequency analysis. Inevitably, we ended up needing to go back and reanalyze retrieved data and attempt to implement better filtering and categorization techniques. Specifically, the Lunr Javascript search engine provides a pipelining function which provides stemming, stop word filtering, and tokenization of both the query and the indexed documents. We do not have quantitative or graphical evaluation of the results, but one can easily experiment with the search engine to see and understand the low accuracy differentiation between those Pokémon selected as relevant. By providing specific queries with an expected result, one can distinguish successful queries from those less efficient.
After an initial set up of our search engine, we did gather user recommendations and complaints in order to reevaluate and reassess our final goal for the project. Initially, the project was not specifically a keyword search, but upon inquiry we decided that was our ultimate goal. Most of the user feedback was from friends and Pokémon fans, most of whom enjoyed the friendly interface design and simplicity of the engine. Specifically, one user enjoyed the results and questioned whether or not it could be implemented further as a Pokémon identification game. Other users struggled with the lack of discourse, or the ability of the engine to handle large chunks of text. Some users attempted vague, colloquial, and lengthy queries which are not successfully parsed, mainly because it is a keyword search engine. Overall, the user feedback was very positive, as it provided us with some constructive criticism and inspiration to continue to develop a better system.
We are also considering several solutions to the user feedback we gathered. Currently we are very interested in making it some sort of identification game. Additionally, dealing with the discourse would require a more successful extraction of relevant words, possibly by weighting the words within the query. While it is important to acknowledge word weight, as it is indicative of significance, the engine should also not disregard small weight, as a phrase could potentially hold more value than isolated words. Therefore it would be ideal to adapt our personalized word weighting function from assignment three to deal with not only term, but also phrase weighting.
Conclusions & Future Work
In conclusion, we built a simple keyword search engine which can be used for Pokémon identification. We learned a lot about IR technique implementation, and it was a good opportunity to apply the methods and tools we learned in the assignments to a practical application. Working as a team also allowed us to discuss and experiment with each other’s algorithms and design concepts. Specifically, we learned some limitations of both raw text indexing and non-NLP, lightweight search engines. Therefore it became difficult to cluster and rank the text efficiently in order to improve search accuracy. Through this our engine became overly strict, relying upon identical wording instead of expanding upon the user’s query. Diversification of query acceptance would help relax this strictness, possibly through some sort of feedback.
Currently, we enjoy using and experimenting with our tool and we plan to continue to expand on it for the rest of the year. Ideally we hope other Pokémon will find and utilize our tool also. Through this we might find others interested in working on the project together. Ultimately we believe it was a positive impact on our class experience and the development of Pokémon fan tools. Regarding our specific plans for Pokéyword in the future, it would be interesting to follow up on our original idea of creating a “Which Pokémon are You?” feature based upon sentiment analysis of Pokémon documents and a series of inputs from the user. This would be a good step in implementing and practicing more IR techniques and also expanding the function of our engine. Secondly, the gathering of user response statistics would be helpful in increasing accuracy ratings. For example, the system would accept implicit user relevance feedback and store queried Pokémon characteristics to the resultant Pokémon, as long as the user deems the result accurate. However this would require some database expansion so we are currently in the midst of planning methods of implementation. Lastly, new generations of Pokémon are always being released, so it is our goal to continue to update our database with new relevant information.
Appendix (Individual Contributions)
Bryce: Tool and database research and retrieval, programming, presentation, report writing and editing, experimentation/result analysis, and user feedback analysis.
Tedman: Tool research, code editing, presentation, report writing, experimental result analysis, and gathering user feedback/responses.
References
[1] - Pokémon information database.
[2] - Used for Pokémon attribute aggregation.
[3] - Python wrapper for Pokeapi.
[4] - Encyclopedia entries for each Pokémon.
[5] - A simple javascript search engine which provides expansive raw text and document processing as well as fast querying.
[6] - Ruby backend for routing and allowing the search engine to be put online.
[7] - Frontend framework.
1