As the use of graph databases has grown in recent years, ever more applications of this technology involve storing, searching, and reasoning about events. In fact, many companies use this technology for this purpose, and the size of these databases is rising in many cases to billions of events. Now, there is advanced technology which overcomes performance problems that emerge when searching and reasoning over event databases of such size.
The kinds of events that graph databases manage typically have at least the following elements:
- A type, such as a phone call, a text message, a bank transaction, an observation of a moving vehicle, and so on
- A start and end time, or a single instant of time (the temporal dimensions)
- Location coordinates (the geospatial dimensions)
- A set of actors, such as sender and receiver, payer and payee, vehicle and operator, and so on (the social network dimensions).
The next logical step was to bring these advanced capabilities together to support searching and reasoning over records that combine all of these dimensions. Recent technical innovations continue the progress along this path by enabling highly efficient applications dealing with vast amounts of such multi-dimensional data. Diverse kinds of applications can benefit by harnessing this newly available power, such as tracking moving objects in time and space, managing weather data, detecting fraud or other criminal activity, and more.
Key Kinds of Data and Reasoning
Before we delve into unified multi-dimensional facilities, it is useful to summarize the key characteristics of geospatial, temporal, and social network data and the kinds of reasoning we do about such data.Geospatial Data
Geospatial data is about location in space. Two-dimensional geospatial databases describe location in terms of latitude and longitude or in terms of x- and y-axes on a grid. Three-dimensional geospatial databases might add a dimension for altitude, height, or simply a z-axis in a 3D grid.We can ask the following questions about locations and shapes that we have stored in a database:
- What are all the events that occurred within a specified radius of a given location?
- How far are two given locations from each other?
Temporal Data
Temporal data is about time. Key questions we ask about time often have to do with time intervals. Given two time intervals, we can, for example, ask the following questions:- Does one interval occur entirely before the other?
- Do the intervals meet (meaning one interval starts where the other interval leaves off)?
- Do the intervals overlap?
Social Network Data
Social network data captures connections between actors, such as the fact that one person is a friend of another. But social networks do not have to be about people; social network technology is proving useful in other fields such as life sciences, where, for example, researchers study protein interaction patterns as social networks in which the actors are proteins.We can ask the following kinds of questions about a group of actors and the connections among them that we have stored in a database:
- How far apart in the network are two given actors, and how strong is the relation?
- What are the cliques and ego groups?
- How important is a given actor in the group?
- How cohesive is the group?
Multi-Dimensional Graph Data – Bringing it All Together
Reasoning about each of these kinds of data as described above is useful, but, as we have seen, event databases require combining these facilities.Geospatial, Temporal, Social Network, and More
For example, a log of cellular phone calls may well include the following data for a given call:- The latitude and longitude of the call originator’s location and receiver’s location
- Start time and finish time
- The calling and receiving phone numbers.
- What calls have an originating location within a given radius of a given location within a given time interval?
- Did a given phone number place a call within a given radius of a given location within a given time interval?
- What calls to a given area code were made by a given phone number within a given time interval?
Figure 1 is a snapshot of a screen from a phone call application of this genre. The application displays a Google map. The user sets the radius and date/time interval and clicks on a location on the map. The application then displays the locations of callers and receivers of calls that originated within that radius within the date/time interval.
For an additional enhancement, we could construct a social network of connections between phone numbers, making it possible to pose questions such as finding the phone numbers (and, implicitly, their owners) that are the most central for phone call traffic in a given radius of a given location for a given subgroup of phone numbers.
Consider another example, where we store observations of airplanes moving through space. At regular periods we record the latitude, longitude, altitude, and heading of flying airplanes and we time stamp each observation. This data enables us to ask questions such as how many airplanes were within a given altitude range with a given heading during a given time interval.
N-dimensional Data
Clearly, databases that support these scenarios are multi-dimensional. They have geospatial dimensions, temporal dimensions, and social networking dimensions. Moreover, there are strong use cases for adding additional dimensions to such databases. For example, in the case of airplane tracking, each time-stamped observation may also include weather readings such as outside air temperature, wind speed, and barometric pressure.As mentioned earlier, previous technology supported searching and reasoning over two-dimensional geospatial data, whereas more recent technology supports three-dimensional geospatial data. But new technology, such as AllegroGraph version 5, can search and reason over an open-ended number of additional dimensions. Thus these new facilities are not merely three-dimensional, because there is no restriction to three dimensions. It is more accurate to use the term N-dimensional to describe the nature of graph databases and related applications that use these new facilities.
Efficiency Breakthrough
The idea of combining geospatial, temporal, social networking, and other dimensions in a database record is not new, but up to now implementation of this idea has been limited. The roadblock has been serious performance degradation as multi-dimensional databases grow to enormous sizes. Despite the fact that graph databases have known efficiency advantages over relational databases for dealing with geospatial, temporal, and social network data, simply using a graph database is not enough to get over the performance hurdle with gigantic multi-dimensional databases. The performance hit is most severe when search parameters are about proximity, such as searching for events that occurred within a specified radius of a given location or within a given time interval, or within a temperature range.But performance is another area that has recently seen the addition of innovative technology that can answer complex proximity questions across multiple dimensions over billions of records in sub-second time. A key characteristic of this new technology is that, with proper database design, the time required to execute a search does not increase substantially as the size of the database increases.
Highly Scalable Applications
We are just beginning to tap the potential of this powerful technology. Here are a few examples of what we can do with the new N-dimensional search and reasoning capabilities:- Insider Threat Detection: Quickly identify risks and the potential impact that an individual’s actions pose to the public or an organization. New semantic-based behavior models can empower companies to gain the critical knowledge necessary to predict high-risk events to prevent or aid in crisis situations.
- Precision Medicine: Integrate information from structured and unstructured data (and integrate different types of data – patient information with socio-economic and genetic information, etc.) to improve efficiencies and personalize care. Provide graphical analysis of genetic info, images, clinical trials, and public health data to help fuel discoveries, improve patient care and cut the overall cost of healthcare.
- Law Enforcement/Homeland Security: An application tied to a constantly updated database of telephone calls and text messages could use the location data, the time stamps, and the social network represented by the phone numbers to determine the focal points of a criminal enterprise and monitor the movements of the key actors in near real time as their centrality emerges from the data.
Source : datanami
 
No comments:
Post a Comment