Organization and optimization of information space user
Daily performing their official and other functions, modern man is faced with the task of analyzing large amounts of information and search for necessary data. Over time, the accumulation of user data in the form of documents. These documents in the amount of some information space for the user. With each new document, all the more acutely raises the question of the organization of this space: with time of a pair of three folders hierarchically – arranged their files and get a huge pile of documents, which is difficult enough to lead to a hierarchical form with linear constraints. The challenge is concretization, categorization and visualization of the information space of the user.
Define some terminology: the information under the user space in this article will be understood to be a set of text (not tabular and graphical) and documents (files), distributed file system within a hierarchy of directories. For clarity, simplify the description of the conditions of belonging of the documents of the information space to one subject area, such as the economy. Text files can represent economic articles, scientific papers, academic literature, and other forms of presentation of economic information text.
At the initial stage of formation of the information space, the user can simply navigate because of its small size and, consequently, a fairly clear structure and relationships between its elements. With time and doing the official, academic and everyday functions, the power of the information space increases, the weight of individual links between nodes (files) and decreases to navigate it is becoming increasingly difficult. This increases the search time of the necessary information, decreases the quality and productivity of activity of the user within the framework of information space.
As a rule, it is connected not only with the increase of textual information, but also with low speed of its perception by the user. Search the desired scene in the entire array is also difficult: the user should properly make a search query to obtain adequate results, and sometimes it can be problematic due to, for example, low awareness of the user in the subject area or the presence of synonyms or the facts that describe two different things with similar wording. Also, the use of full-text search of documents forces the operating system does not provide personalization and relevancy of the SERPs, which also negatively affects the speed of the user experience and the quality of his information space.
Of the above shortcomings of the standard search and the organization of the information space, it follows that for optimization of the information activities of the user should:
the
Article based on information from habrahabr.ru
Define some terminology: the information under the user space in this article will be understood to be a set of text (not tabular and graphical) and documents (files), distributed file system within a hierarchy of directories. For clarity, simplify the description of the conditions of belonging of the documents of the information space to one subject area, such as the economy. Text files can represent economic articles, scientific papers, academic literature, and other forms of presentation of economic information text.
At the initial stage of formation of the information space, the user can simply navigate because of its small size and, consequently, a fairly clear structure and relationships between its elements. With time and doing the official, academic and everyday functions, the power of the information space increases, the weight of individual links between nodes (files) and decreases to navigate it is becoming increasingly difficult. This increases the search time of the necessary information, decreases the quality and productivity of activity of the user within the framework of information space.
As a rule, it is connected not only with the increase of textual information, but also with low speed of its perception by the user. Search the desired scene in the entire array is also difficult: the user should properly make a search query to obtain adequate results, and sometimes it can be problematic due to, for example, low awareness of the user in the subject area or the presence of synonyms or the facts that describe two different things with similar wording. Also, the use of full-text search of documents forces the operating system does not provide personalization and relevancy of the SERPs, which also negatively affects the speed of the user experience and the quality of his information space.
Of the above shortcomings of the standard search and the organization of the information space, it follows that for optimization of the information activities of the user should:
the
-
the
- to Divide the subject area into categories or "zones" the
- to Highlight the key components of the subject area the
- to Visualize the subject area to accelerate the perception of the person the
- to Determine the nodes inside each element of the subject area (ontology formation) the
- to Define properties of objects within the nodes of the subject area and their relationships (the completion of the ontologies) the
- Define the connection and interaction between the nodes of a domain (semantic network from the nodes of the ontology) the
- to Link together layers and functional description of the subject area (Map overlay Tags, ontologies and semantic networks on top of each other) the
- Implement the function of personalization of the subject area and the relevance of its presentation based on an iterative learning process of interaction with the user.
- fault tolerance (objects of the ontology will be stored on the server with the data backup system) the
- scalability (the system fairly quickly will be possible to connect new users) the
- optimal use of computing the
- development of a group of users (ontology will be updated and optimized, not one, but several people that will accelerate its development will allow to build ontologies of high capacities in a relatively short time, and will avoid redundancy by providing the ability to search for duplicates and synonyms means of designing and supporting ontologies)
- scope Definition and scope of the ontology
- Which area will be covered by the ontology? the
- To what will be used the ontology? the
- what types of questions should give the information in the ontology?
- Consideration of options for reuse of existing ontologies
- Enumerate important terms in the ontology
- Defining classes and a class hierarchy
- Process down development starts with defining the most General concepts of the subject area with the subsequent specialization of the concepts. the
- the rising Process of development begins with the definition of the most specific classes, the leaves of the hierarchy, with subsequent grouping of these classes into more General concepts. the
- the Process of combined development is a combination of top-down and bottom-up approaches: First, we define the concept more visible, and then appropriately summarize and limit them.
- define the properties of classes (slots)
- internal properties object the
- external object property the
- if the object has the structure (can be both physical and abstract parts) the
- relationships with other individnal concepts
- definition of the facets of the slots
- Power slot
- Domain slot and a range of values
- Creating instances
- If the intersection is not empty, for each term from T(O) are plotted two sets T_s and T_q — terms that are related to each ontology in any relationship the
- For each term from T(O) is the intersection of the sets and T_s T_q. the
- analysis of the types of relations between terms from T(O) and crossing the T_s and sets T_q. (all relationships of the ontology are divided into three types – hierarchical, synonymous and other). the
- to build the ratio of similarities of ontologies, a numerical display of the similarity of semantics of two ontologies. This takes into account the following factors: the occurrence of the same term in both ontologies; the fact that two terms are in different ontologies in the same attitude; the fact that two terms are in different ontologies in the relations of the same type or different (e.g., in a hierarchical relationship, and the relation of synonymy); whether there is any relationship (direct or indirectly) between the same terms.
To implement the above it is advisable to use three technologies: it is a Card or tag clouds, ontologies and the semantic web, because individually none of them helps to eliminate all existing drawbacks in the organization of the information space, but optimizes your part, which will help to improve the information activities of the user.
Cards (cloud) of tags represented the highest level of detail, subject areas, this kind of GUI (Graphic User Interface) of the subject area. In this case, there is some deviation from the classical description of the cards (cloud) of tags: the Tag is taken more than just a text label – in our case, the tag will be considered the naming object that is a Union of one or more ontologies of the subject area. The deviation from the classical "cloud" to the side "cards" due to the presence of dividing the subject area into zones ( as the country is divided into regions on the political map). This division was introduced to enhance the speed of visual perception and intutively find the necessary data from the ontology (facts, documents) of the subject area to the user. The tag map is the upper level, to form uses data from two of the following levels: ontologies and the semantic web.
Working with maps of the tags, it is possible to ensure the relevancy of the SERPs for a specific user. It is advisable to use mechanisms for logging navigation history through the information space of the user's tags, the so-called "route of knowledge". This "route" will be drawn and corrected over time — with each new search, the user movement on the map tags will specify the relationship – the relationship between its nodes. And the next time the user accesses the site, in addition to ontology, which he represents, also will be given ontologies or nodes relevant to him. The use of this method of organizing search results allows you to personalize the information space of the user: he will be offered options that he needs, in accordance with data collected by the system about his preferences.
When working with a standard tag cloud there was one significant drawback: new and old data sources (by date) on the map are indicated the same way – it is not possible to know in advance what the date of adding a node is going to access the user. It is proposed to add to the tag property of "novelty" — the visual display of colors in accordance with a predetermined relative scale (palette). For example, tags that describe the nodes added no later than the day, there will be applied a white color, tag attached, no later than the week – yellow, etc. the Addition of this property will also allow you to achieve a better organization of the information space of the user and to expedite the receipt of the required information.
Ontologies represented the lowest level of detail. Each custom file is treated as a separate ontology. Such a Convention is adopted to simplify the functional structure of the information space-because the raw data are text documents in the form of files distributed on a file system in the directory structure. For the ontology in this system will be made a formal explicit description of site the subject area. This is a departure from the classical definition of ontology in terms of what it describes is not a node and a subject area – it is done to increase the degree of personalization of the information space of the user. There are several ready-made ontology of the subject area "Economics" – they all cover different scales, have different granularity, but they are not customized for a specific user.
the
-
the
In the case of using a client-server architecture the server will be stored: domain classes, their slots, facets and fundamental relationship. On the client side will contain the instances of classes, lookup (personalizes) the relationships between classes and instances. Thus, the system on the user side can use a ready-made, stored on the server ontologies and their elements to create their own personalized, and the results of its activities also to send to the server, thereby ensuring its updating and evolution.
As mentioned earlier, each file will be taken as a separate ontology, the relations ontology server will be of type "contains". You may receive a question: "Why use such an architecture ontology, why not just create one big ontology and work within its framework?". The explanation is the following: the use of multiple ontologies allows you to personalize the system, the information space provided the client-server architecture, at the same time, the analysis of relationships between ontologies is no more difficult in their number, due to the mechanism of mapping ontologies.
Adding a new document in the information space of the user much easier to handle if you create from it based on some universal algorithm separate small ontology. It will later be mapped to the server, including the preservation of the original ontology.
For example the subject area "Economics" is possible, with some assumption to say that for the formation of ontology nodes subject area of the user files there is no universal methodology. In General, the formation of any ontology is made in several stages:
the
-
the
Before creating the ontology is necessary to define its scope and scale, it is necessary to answer several questions:
the
-
the
The answers to these questions may change during the design process of the ontology, but at any given time they help limit the scale of the model.
the
-
the
You need to check – is there a possibility of using or improving the source server ontologies
the
-
the
It is useful to make a list of the main terms of the ontology and their properties
the
-
the
There are several possible approaches in developing a class hierarchy:
the
-
the
the
-
the
Classes by themselves do not provide enough information about the subject after the class definitions necessary to describe the internal structure of concepts.,
In the ontology, slots can be several types of object properties:
the
-
the
The slot should be linked to the most common in the hierarchy of class, which can be this property.
the
-
the
Slots can have different facets, which describe the value type, allowed values, number of values (power) and other properties of the values that can take the slot.
Here are some common facets:
the
-
the
Power slot determines how many values can have the slot. In some systems differ only in the individual capacity (only one value) and multiple cardinality (there can be any number of values).
the
the value type of the slot
Facet value type describes what types of values you can enter in slot. Here is a list of the most common value types: string, number, Boolean slots, numbered slots, slots and instances (describe relations between instances)
the
-
the
Classes to which a slot is attached or a classes which property a slot describes, are called the domain of the slot. In systems where we will attach slots to classes, the domain of the slot is usually make up classes, which are linked to slot. The basic rules for determining a domain slot and a range of values of this slot are similar to each other:
When defining a domain or range of values for the slot, find the most General classes or class that can be respectively the domain or range of slots.
On the other hand, it is too General to determine the domain and range of values: all classes in the domain of a slot should be described by the slot and instances of all classes in the range of values of a slot should be potential fillers of the slot. One should not choose a too General class for the range of values.
the
-
the
The last step is creating individual instances of classes in the hierarchy. To identify a specific instance of a class requires (1) select class, (2) to create a separate instance of this class and (3) enter the value of slots.
After the formation of the ontologies necessary to match them with the back end for removing of duplicates, enrichment and replenishment of each other, etc. Also the mapping of ontologies allows to go from a full-text search in the information space of the user.
In the system of ontologies will be presented not only subject area and user documents, and user search queries. Review of each search query as a ontology will help using the technique of mapping ontologies to achieve with the detailed wording of the request a high level of correspondence with the real search results to the desired information. This is the approach of finding, starting not from the "answer" (available in the data system), but from a question (trying to understand what exactly is necessary to find for the user?). When using this approach, between specificity, accuracy of search results and expand the search query will be set directly proportional relationship: the more the user describe what he needs, the greater will be composed of the ontology, the more precisely will work out a mechanism for matching ontologies, the better the results will be provided.
Briefly, the method of mapping of ontologies can be summarized as follows:
the
-
the
The lower (ontology) and the highest (cards tags) levels of detail of the information space of the user you want to associate with each other. This can be done through the establishment and permanent extension of the semantic network of the subject area. Under the semantic network will be understood to be an information model with a directed graph, whose vertices correspond to system objects (tags and ontologies), and arcs specify the relationships between them.
Semantic web will perform a liaison role between the ontology files and pointing to them with tags like hyperlinks, but with a more complex structure, would represent a "transport" system. Will be stored on the client side, because server-side part to no information about the location and structure of your documents. It is through this technology, the user selecting a tag from any area on the map tags will get information about the tag data ontology and the source document file mapped to this ontology. And with the ability to create a complex structure, the user in addition to the document file will be able to get a configurable number of peripheral (to be confirmed) information, such as associated with the document ontology, tags and map the tags, documents, relevant results.
System operating by the above principles and algorithms must have a sufficiently large
manufacturing capacity, enhanced scalability, security and potential for expansion and evolution, due to client-server architecture the main component. It will allow for the categorization, personalization, summarization and visualization of the user's information space, which ultimately should positively affect the quality of its information activities in General, and to develop a detailed ontology of subject domain of informational space of the user.
Literature:
1. Gladun A. J., M. V. Worn, Shtonda V. N. Intelligent agent-oriented services based on platforms intelligent networks// Computer means, networks and systems, 2004, No. 6, pp. 112-122.
2. Gladun A. I Rogushina Yu. V., Stand V. N. Ontological analysis of web services in intelligent networks // International Conference "Knowledge-Dialogue-Solutions" 2007.
3. Deborah L. M., Natalya F. N. Ontology development 101: a guide to creating Your first ontology// Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001.
4. Kleshchev A. S., Artemieva I. L., the Relationship between ontologies subject areas. Part 1. // Information analysis, Issue 1, p. 2, 2002. – P. 4-9.
Комментарии
Отправить комментарий