1.Project Title: Smart Traffic Analytics for Hyderabad city (13/7/2015 - 12/7/2017)
Amount: 10 Lakhs
Funding Agency: BITS
In the proposed work we aim develop software application using Big Data
- to identify trends/patterns in traffic
- to calculate average speed in different locations and vehicles
- to monitor and manage Urban traffic
- to suggest alternate routes based on the locations
- to provide updates on traffic based on incidents, traffic jams etc.
2. Credit Analytics: (In association with BITS Alumni)
The aim of this work is to evaluate housing loan data and analyze the defaults and find Co-relations between the loan approvals and real estate market fluctuations. The implementation of this work is on Hadoop.
3. Modelling Bayesian Networks for Performance and Capacity Management of Data centers.(In association with TRDDC Pune)
A data center consists of large numbers of computing, communication, and storage systems supporting wide range of applications and services. Consider, for instance, a banking application operated by a US-based investment bank. We observed that this data center hosts hundreds of DB2 applications on several logical partitions of mainframe boxes. The applications fire millions of queries every day and access a complex array of storage devices consisting of thousands of storage volumes and datasets. Such systems need automated solutions for performance and capacity management to better understand and control their operations.
For each component, an enterprise monitors many different metrics – including workload, latency, CPU, memory, IO and network utilizations, cache hit/miss rates, among others. Furthermore, each of these metrics is monitored at relatively fine time-scales (e.g., every few seconds). We propose to leverage Bayesian networks to analyze this data in order to perform following operations:
One of the biggest challenges in performing many performance and capacity operations is the construction of causal relationships between various system metrics. Consider an example system of a database tier with Oracle instances hosted on Windows/Linux machines where various metrics are monitoring at database instance, operating system, and system hardware. Deriving causal relationships across various system components/metrics can prove very useful in gaining a better system understanding. It also opens up the opportunities for performance debugging, capacity planning, prediction, what-if analysis, etc.
Another very relevant application of Bayesian networks is in root-cause analysis of performance problems. In the event of performance problems at an application the root-causes behind performance problems can be diagnosed at application, compute, storage, network layers using belief propagation techniques.
Many infrastructure components are programmed to generate alerts based on certain definitions – e.g. an alert is generated when CPU utilization exceeds 90%. In today’s scenarios due lack of intelligent alert generation mechanisms or due to poor alert suppression mechanisms, large volumes of alerts are generated (~ 1million alerts per day). Analysis of these alerts becomes unmanageable. Bayesian networks can be used to identify dependencies across alerts. This information can then be used to suppress spurious alerts, generate signatures of correlated alerts, etc.
Research Interest