Reviewer #1
Proposal Name: Vertical Data Mining
General comments.
The topics covered in the manuscript are selected by the author. He presents a case for the position he advocates, and although extensive this seems to more in the nature of an extended academic paper than a book. While he refers to some definitely practical examples of possible application for his presented techniques they are presented almost as a possible application rather than an application that has produced commercially valuable results. It is most certainly an entirely academic work aimed at potential advanced students rather than practitioners. He is concerned to present the theoretical underpinnings of a putative approach, and proofs of certain concepts, but little about the commercial practice and value of these techniques.
1.Does the outline seem to cover the major topics you consider important in the field? Do any subjects appear to be too lightly or heavily covered?
For the author’s selected topics he presents sufficient, if somewhat technical coverage.
2. Is the sequence of the chapters appropriate? Should any chapters be added, deleted, moved or recombined?
The sequence is logical and flows well.
3. From the material available, what appear to be the major strengths and weaknesses of this proposed book? Does the work rank among the best in the field in your opinion? Does the work have something unique to contribute to its readers?
The major strength seems to be the academic presentation that will be well accepted for teaching. However, for academic purposes the material needs to have more examples and student guided practices. This book will certainly benefit from additional teaching materials in order to appeal to this potential audience.
Whether it is a “best in field” is hard to say since it does cover relatively novel material in a way not addressed elsewhere, but the presentation is highly technical, requires a very high degree of previous knowledge and contains few “hooks to understanding” that would make the material easier to master. A “best in field” book would require a presentation that yielded easier access to the material covered.
It does present novel material certainly, but this is not exactly unique.
4. Do you feel the book and its contributionsare uniformly valuable? Do they have lessons to teach a wide audience? Isthere an obvious practical focus forthe designer and the design process?
The exact value of the contributions is not exactly clear. Being more of an extended academic paper it’s hard to judge the practical applicability of these ideas. It certainly introduces concepts that may prove valuable.
The topics (lessons) will be of interest to a relatively narrow audience. In the proposal the author notes “Students in any database or data mining course would be interested in this book. Those who need a reference for scalable data mining techniques would also be interested.” The first sentence in this statement seems unreasonably optimistic. A few students in a database course who choose to investigate this narrow topic in data mining will find the book a good theoretical introduction to some of the necessary topics. Also, a few students in some of the more advanced technical data mining courses may find the material of interest. Actually it seems likely that the first sentence covers the smallest group of potential interest, and the book is actually intended to address the needs of the group covered by the second sentence. In general this will be a book for a very narrow audience.
There is no obvious practical focus since the book is clearly intended to appeal to an academic audience and to present a theoretical background with little attention to turning these ideas into a practical implementation.
5. Of the other books on the market, which, if any, appear to be most similar and therefore, most competitive with this proposed text? Do you believe this book has the promise to deliver a significantly better book than can currently be found on the market? If not, can you suggest additional topics or reorganization that would help insure its success?
There are no books devoted to this topic. Coverage of this topic is mainly through academic papers or chapters in collections (many of which are contributed by Dr. Perrizo since this is an area of his specialty). It seems unlikely that there is a book that would compete directly with this.
It seems to be the intent of the author to address his book exclusively to a limited academic audience in spite of the statement in the proposal of who will find the book appealing. That being the case it would require a massive overhaul of this book to potentially appeal to the broader audience than “[t]hose who need a reference for scalable data mining techniques”.
6. Are you familiar with the author(s) and do they have a well-known reputation within this field?
I know of this author and his reputation is solid and well established especially in this field.
7. Who do you think will be interested in this book? What about the book will attract the audience you envision? Do you feel this book would be widely accepted by professionals in the field? How will this book be used and for what job functions or in which professions?
In section 6 of the proposal the author identifies the target audience. He states “The size of the market will be huge if the book catches on. It will be a required complement to any book on data analysis.” It seems unlikely that the term “huge” is likely to be applicable to the potential size of the audience. It seems more likely that there will be a small (many hundred to at most a very few thousand) people potentially interested in this work even if the book does “catch on”. It seems highly unlikely that this will become a “required complement” to “any” book of data analysis.
The professionals who are attracted to this book will be those in the fields identified in the proposal, and it seems that those who are primarily academic or researchers will accept this book as a (possibly) fruitful source of ideas.
It will be primarily a reference book and a source of theoretical background for those attempting to develop practical applications – not for current practitioners.
8. What do you think of the title? Could you make any suggestions?
The title is fine.
9. Please add any additional comments you feel might be helpful to us in our consideration of this proposal.
On balance this is fine book for a limited and highly technical audience. It is extremely unlikely to appeal beyond this limited audience, but will be well accepted and useful for that audience. This audience is likely to be relatively small – high hundreds to low thousands of readers.
Reviewer #2
Proposal Name: Vertical Data Management and Mining
1.Does the outline seem to cover the major topics you consider important in the field? Do any subjects appear to be too lightly or heavily covered?
The proposal is focus on data management and mining with one data format: vertical format in contrast with the convention relational databases which usually adopt horizontal format. It is well-known that vertical format has its advantages for some query and mining tasks but may also has its disadvantages for some other query and mining tasks. There have been a lot of studies on vertical data mining, such as M. Zaki’s work, etc. However , it seems to be strange that this Vertical Data Mining proposal does not even cover such kind of popular works. This may raise question on author’s balanced treatment of this topic. Overall, I feel vertical data mining seems to be a too narrow topic. It is OK for a thesis or a research monograph but may not be so attractive as a book.
2. Is the sequence of the chapters appropriate? Should any chapters be added, deleted, moved or recombined?
The sequence of the book (or the coverage of the book) seems to be too biased and also not well-organized. There is not much discussion on how to mine different kinds of patterns, such as frequent pattern (association), sequential pattern, clustering, classification, and anomaly mining. It puts much emphasis on multi-relational data mining. Also, it is strange that it puts multi-relational mining in the middle but put some vertical format query optimization after it. Such organization seems not appealing to most users who may like to see query process before mining.
3. From the material available, what appear to be the major strengths and weaknesses of this proposed book? Does the work rank among the best in the field in your opinion? Does the work have something unique to contribute to its readers?
The major strength of the materials is the focus on one mining method. However, this is also its weakness since the coverage of the book is somewhat biased, not only on the methodology but also on the scope of the data mining tasks: Many essential data mining tasks are not covered. Instead, it covers more on database management and processing methods. Even in this context, it is good to discuss the pros and cons of vertical vs. horizontal formats. Overall, the book seems to be too narrowly focused to be a popular one. Maybe it may fit in the scope of some special research monographs.
4. Do you feel the book and its contributionsare uniformly valuable? Do they have lessons to teach a wide audience? Isthere an obvious practical focus forthe designer and the design process?
I feel the book will have problems to be adopted as a textbook---to teach a wide audience. It promotes a methodology that is just one of several possible implementation methods. It is too much on author’s own hobby instead of well-adopted methodologies in data management, warehousing and mining. It is ok for author’s research but may not be so attractive as a textbook. It could also be good to discuss the weakness of vertical data instead of its strength only.
5. Of the other books on the market, which, if any, appear to be most similar and therefore, most competitive with this proposed text? Do you believe this book has the promise to deliver a significantly better book than can currently be found on the market? If not, can you suggest additional topics or reorganization that would help insure its success?
Of other books on the market, this one is a different kind. However, I doubt if it becomes a competitor of the available textbooks on the market due to its narrow and biased coverage. If the author would like to promote vertical data mining, he should have a complete and balanced coverage of this methodology in all kinds of mining tasks. Even at that point, the mining method may still be viewed as one of several mining methodologies, and may not be very attractive as a general textbook----but could be a good reference for some researchers if the overview of this mining methodology is balanced and focused.
6. Are you familiar with the author(s) and do they have a well-known reputation within this field?
I am not familiar with the author and he may not have well-know reputation in the field. Based on his CV, he does have many publications but most appear in small conferences. That could be the reason that the author name is not well known in the field.
7. Who do you think will be interested in this book? What about the book will attract the audience you envision? Do you feel this book would be widely accepted by professionals in the field? How will this book be used and for what job functions or in which professions?
Some researchers in the field but such researchers may read similar research papers. It may not necessary to publish book, so based on my view, a research monograph by Springer may be more appropriate. I believe the book may not be widely accepted by professionals in the field of data mining. However, it may help some professionals to try to implement their algorithms in “vertical way”.
8. What do you think of the title? Could you make any suggestions?
The title correctly reflects its focus and it is clear. I do not suggest the change of the title.
9. Please add any additional comments you feel might be helpful to us in our consideration of this proposal.
I think the book may not be so appealing. The focus of the book is somewhat narrow and also the coverage is not broad nor reflects the state of the art. If you would like to have a book to catch such a theme, you may like to find a different (more reputed and balanced) author.
Reviewer #3
Comments
All I can tell is the book is raw and hard to read
There is material extracted from students
The book is unreadable without having the Han and Kamber book –2nd edition on the side
Also there is too much math which should be moved to appendix
The author should define P-trees before talking about them
Reviewer #4
1.Does the outline seem to cover the major topics you consider important in the field? Do any subjects appear to be too lightly or heavily covered?
Yes, I believe that the general topic of the books is very important to the field of data mining. As for the coverage of the book, a number of chapters, specifically with the large numbers of graphs and/or equations should have more explanation. Also, some of the introduction of concepts could use more information to provide a suitable background based on the intended audience of the book (from proposal). In any case, I generally enjoyed the book and found the information contained within. I always remind myself that when someone is reading books, often the question “Yes, but how?” will remain in the readers mind. We…who are in the industry, will have a better vision, but new comers will not.
The Introduction (chapter 1) could use some more information to clearly present to the intended audience. Specifically 1.2 could be broken in to smaller parts, with more information in reference to the graphics (and nearer to); specifically the GEO star Schema. 1.3 is nice, but dense.
2 is very nice. I like the weakness information, but a new reader may need more information to understand the difference between horizontal and vertical. The rest of 2, specifically 2.2 thru 2.6 is very strong, but it could use clarity and some more thinking about the intended audience. It is a lot of information, very quickly covered. 2.7 needs work specifically more meat.
3 is very dense, much like an text book. Not bad, but could use clarity around the equations. And in some cases, I wonder is the proofs are really necessary to this detail. 3 should also be broken into smaller sections; it reads must like a run on. There is a lot of real good information in chapter 3 and it deserves some real attention on how it is presented to the reader. For example: 3.1 is real long without sub-sections and very academic. It might help to organize the book such that the academic can read thru these sections and the implementer can skip these without great loss of understanding. Again, through-out chapter 3, I would look for ways to reduce the density by subsection. In fact, chapter 3 could be broken into multiple chapters. Or the book could be in 3 large sections, with multiple chapters in each section. (Chapter 3 is to long.)
Chapter 4 needs work, but I get the basic idea and should be more fully developed.
Chapter 5 and 6 are nice, but I would like to see more examples and more detail and gain remember the audience.
2. Is the sequence of the chapters appropriate? Should any chapters be added, deleted, moved or recombined?
The order of the chapters seems to be correct. At this point, I would not add, delete, or move any chapters. Within a few of the chapters, once additional content is added (per comments on #1), you may need to do some minor tweaking. (See info in #1)
3. From the material available, what appear to be the major strengths and weaknesses of this proposed book? Does the work rank among the best in the field in your opinion? Does the work have something unique to contribute to its readers?
The strength of the book is the topic and the level of detail presented in the book. This is also the weakness in that the author must more carefully consider his audience and make sure that adequate information is presented so as not to lose the reader. It is hard to rank the work without actually implementing some of the ideas and understanding how they might impact data mining results. But from a quick read, I believe the ideas are worthy of some more research (and my group reviews 10s of books per month to see if we can find anything useful). At the leader of the group I am very picky how I use my time, but given what I have read…I will spend time reading his public papers. So I guess I believe it will be well received and ranked high. Yes, his approach in the book and the vertical topic of the book gives it a unique standing. In fact, this might be the strongest selling point.
4. Do you feel the book and its contributions are uniformly valuable? Do they have lessons to teach a wide audience? Is there an obvious practical focus for the designer and the design process?