The market of small hand-held computers like hand-held PCs and Pocket PCs is growing very fast in recent years. Due to their small size, it becomes practically or physically impossible to use a regular keyboard to input data. Handwriting is one of the most natural ways for human computer interaction, therefore it becomes the primary input method for hand-held devices.
The variety of built-in handwriting recognizers shipped with hand-held computers have very limited functions, because they were designed with the assumption that people will use these devices simply for writing short memos, editing simple texts, and storing addresses and telephone numbers, etc. This largely limits the things that the hand-held computers can do.
With the fast improvement in computer hardware, these devices are becoming more and more powerful. In the very near future, hand-held computers will be powerful enough to host many computationally intensive applications, for example video editing and complicated mathematical calculation. Our main interest is with computer algebra systems, which are widely used in scientific research and the academic world. We can expect that special versions of computer algebra systems will be available on hand-held computers very soon. This will no doubt be a great help to the users for example mathematicians, physists, engineers, and science students, etc. However, there is a problem that must be solved in order to run a computer algebra systems on one of these devices: how should they interact with the users?
Since handwriting is the preferred method of interaction, we need an application that can correctly recognize handwriten mathematical expressions and correctly translate them into a form understandable by the computer algebra systems. Unfortunately no such application is available presently.
The built-in handwriting recognizers of the hand-held computers do not have this capability for several reasons. Firstly, they can only recognize a very limited set of symbols, normally those provided by a standard keyboard. And they can not recognize most of the mathematical symbols like the square root and integral sign. Secondly, even if they can recognize all symbols in an expression, they have no knowledge about the geometric relationship between the symbols. Thirdly, for some recognizers (e.g. Windows CE default recognizer), the input for english letters, digits and other symbols are designated to different regions of the input area, making it impossible to input mathematical expressions.
After observation of the above problems, we propose an experimental Pocket PC application for recognizing handwriten mathematical expressions and generating corresponding presentation MathML code. The objective of our application is to serve as a bridge between the hand-held computer users and a computer algebra systems.
This application is developed with Microsoft Embedded Visual C++. It is composed of a handwriting recognizer for recognizing individual handwriten mathematical symbols, a two-dimensional structural analyzer for interpreting the relationship between the recognized symbols and maintaining the relationships in a corresponding expression tree, and a generator for generating presentation MathML code according to the expression tree. The following is a brief introduction to our approaches.
A user interface is provided as a drawing board for user to write mathematical expressions. It has a small window for displaying recognition result for the most recent symbol, and a small window for displaying other candidates. In case recognition error occurres, a user can fix it by picking the correct candidate from a list of candidates.
In our application, a symbol (called a scribble) may contain multiple strokes (a stroke contains the points digitized from pen-down to pen-up). An internal timer is used to group strokes into symbols, if the delay between two strokes is within the timer's threshold, they are considered to be in the same scribble, otherwise they belong to different scribbles.
Both handwriting recognition and two-dimensional analysis take place in on-line mode, for each finished symbol, its identity and its geometric relationship with other symbols are recognized immediately. The advantage of on-line recognition is that it provides more interaction with the users.
To be able to provide a friendly visualization, our application takes advantage of Unicode to display the variety of mathematical symbols.
The handwriting recognition problem has been studied for more than three decades, and a lot of approaches have been proposed for both off-line and on-line recognition. For the on-line part, which we are interested in, there are methods like neural networks, hidden markov model, Structural and Syntactical Approaches, elastic matching, etc. Elastic matching is commonly used in on-line handwriting recognition, many on-line handwriting recognition systems were developed using it. In this application, the elastic matching approach is used.
This method is model based, it uses the dynamic programming technique to calculate the distance between a scribble and available models. The model that gives the minimum distance is considered as the recognition result. The basic data flow is like this:
Firstly a scribble is preprocessed to increase recognition rate and speed. This includes size normalization, smoothing, and point distance normalization. Size normalization scales the scribble up or down to a certain size and location, therefore a scribbles size and location will not affect the recognition process. Smoothing reduces possible noises in a stroke to reduce recognition errors. Point distance normalization removes redundant points to improve performance of the recognition process.
Once preprocessing is finished, the elastic matching algorithm is applied to calculate the distance between the scribble and all available models. The model that gives the minimum distance is chosen as the result, other models that also give small distances are reported as possible recognition candidates.
In a mathematical expression, symbols are spatially arranged as a complex two-dimensional structure, possibly in different sizes. Within a mathematical expression the grouping is very complicated, which makes it hard to recognize. Besides basic symbols like letters and digits, there are also binding, fence and operator symbols, each has its own grouping criteria. Futhermore, relationship between symbols in an expression can be either explicit or implicit. A factor that makes recognition even harder is that some symbols may have different meanings depending on the context they are in. All these problems have to be considered in the structural analysis process.
Whenever a scribble is recognized, it is passed to the structural analysis process. The structural analysis procedure heavily uses the bounding box information of symbols and subexpressions for calculating both the distance and the direction between symbols.
For a newly recognized scribble (let us call it newnode), first calculate the distance between its bounding box and that of all other symbols (maintained in an expression tree whose structure is determined by their relationships) to find its nearest neighbor (NN). Start with NN we can find out the proper position to attach the newnode to the expression tree. The basic idea is that the newnode may be in a grouping with the NN, or it may be in a grouping with a sub-expression containing the NN. This is the case especially for row and column relations. Therefore we designed our algorithm to check the ancestor nodes of NN to find the highest level node that is in a row or column relation with the newnode, if a node is found, it will be in a grouping with the newnode, otherwise we can be sure that newnode is in a grouping with the NN. Once the proper position is found, we can attach the newnode to the expression tree at that position with necessary tree rearrangement operations, depending on the actual relation involved.
The structural analysis is very complicated, there are alot of special cases that have to be considered, this includes numbers, function names, parentheses, implicit operators, and so on. There are also complicated rules for tree arrangements. Much effort is taken to make sure that all the relationships between symbols, whether they are explicit or implicit in the expression, are well organized in a hierarchical structure.
All the operations in the structural analysis depend on the bounding box information, the key to success is to properly update the bounding boxes of sub-expressions. Usually we determine the bounding box of a node by taking the union taking the bounding boxes of all its children nodes. However when an expression gets large, its bounding box will no longer reflect the sub-expressions baseline correctly, therefore it will cause recognition errors. To solve this problem, we propose a conditional updating method in our application. In this method, when a change happens in the expression tree, each subtree may have a different way to update its bounding box depending on its contents. In this way, even when the expression gets large, their bounding boxes still reflect the respective sub-expressions baseline.
MathML is an XML application, it is "a specification for describing mathematics as a basis for machine to machine communication" (www.w3c.org/TR/MathML2). Since MathML is the standard for mathematical communication on the web, it is supported by most computer algebra systems.
Since the structural analysis process maintains the expression tree in a well organized hierarchy, generateing MathML code is very easy, basically a preorder traversal of the expression tree is sufficient.
Currently, the handwriting recognizer - ElasticRecognizer has good recognition rate for a small set of models, however, for large collections of models, reduced recognition rate is observed. Also due to the intensive computation involved, when model the size gets large, the recognition procedure becomes very slow.
It is possible to improve performance and recognition rate by classifying models with depending on their structural information. Each symbol's structure is analyzed first to find a proper category, then applying elastic matching only on models in that category. This will largely reduce the number of models involved, therefore improving the performance. At the same time getting rid of possible recognition errors caused by models in a different category.
The structural analysis procedure in our application is able to correctly recognize mathematical expressions including polynomial equations, fractions, trigonometric functions, etc. However, it is not able to recognize expressions containing square root, matrices, etc. Future research will be dedicated to solveing the current problems involving square roots, matrices and other special mathematical elements. At the same time we are trying to make the application more user friendly. For example, removeing the restriction that symbols in an expression have to be written from left to right, and the restriction that symbols can not be intersecting each other.