- -

- CMS Features -

- Problem Specification -

- Original Problem -

- This is the original specification given to us when we - started the project. The i-scream central monitoring system - meets this specification, and aims to extend it further. - This is, however, where it all began. -

- Centralised Machine Monitoring -

- The Computer Science department has a number of different - machines running a variety of different operating systems. - One of the tasks of the systems administrators is to make - sure that the machines don't run out of resources. This - involves watching processor loads, available disk space, - swap space, etc. -

- It isn't practicle to monitor a large number of machines by - logging on and running commands such as 'uptime' on the - unix machines, or by using performance monitor for NT - servers. Thus this project is to write monitoring software - for each platform supported which reports resource usage - back to one centralised location. System Administrators - would then be able to monitor all machines from this - centralised location. -

- Once this basic functionality is implemented it could - usefully be expanded to include logging of resource usage - to identify longterm trends/problems, alerter services - which can directly contact sysadmins (or even the general - public) to bring attention to problem areas. Ideally it - should be possible to run multiple instances of the - reporting tool (with all instances being updated in - realtime) and to to be able to run the reporting tool as - both as stand alone application and embeded in a web page. -

- This project will require you to write code for the unix - and Win32 APIs using C and knowledge of how the underlying - operating systems manage resources. It will also require - some network/distributed systems code and a GUI front end - for the reporting tool. It is important for students - undertaking this project to understand the importance of - writing efficient and small code as the end product will - really be most useful when machines start run out of - processing power/memory/disk. -

- John Cinnamond (email jc) whose idea this is, will provide - technical support for the project. -

- Features -

- Key Features of The System -

A centrally stored, dynamically reloaded, system wide - configuration system -
A totally extendable monitoring system, nothing except - the Host (which generates the data) and the Clients (which - view it) know any details about the data being sent, - allowing data to be modified without changes to the server - architecture. -
Central server and reporting tools all Java based for - multi-platform portability -
Distribution of core server components over CORBA to - allow appropriate components to run independently and to - allow new components to be written to conform with the - CORBA interfaces. -
Use of CORBA to create a hierarchical set of data entry - points to the system allowing the system to handle event - storms and remote office locations. -
One location for all system messages, despite being - distributed. -
XML data protocol used to make data processing and - analysing easily extendable -
A stateless server which can be moved and restarted at - will, while Hosts, Clients, and reporting tools are - unaffected and simply reconnect when the server is - available again. -
Simple and open end protocols to allow easy extension - and platform porting of Hosts and Clients. -
Self monitoring, as all data queues within the system - can be monitored and raise alerts to warn of event storms - and impending failures (should any occur). -
A variety of web based information displays based on - Java/SQL reporting and PHP on-the-fly page generation to - show the latest alerts and data -
Large overhead monitor Helpdesk style displays for - latest Alerting information -

- An Overview of the i-scream Central Monitoring System -

- The i-scream system monitors status and performance - information obtained from machines feeding data into it and - then displays this information in a variety of ways. -

- This data is obtained through the running of small - applications on the reporting machines. These applications - are known as "Hosts". The i-scream system provides a range - of hosts which are designed to be small and lightweight in - their configuration and operation. See the website and - appropriate documentation to locate currently available - Host applications. These hosts are simply told where to - contact the server at which point they are totally - autonomous. They are able to obtain configuration from the - server, detect changes in their configuration, send data - packets (via UDP) containing monitoring information, and - send so called "Heartbeat" packets (via TCP) periodically - to indicate to the server that they are still alive. -

- It is then fed into the i-scream server. The server then - splits the data two ways. First it places the data in a - database system, typically MySQL based, for later - extraction and processing by the i-scream report generation - tools. It then passes it onto to real-time "Clients" which - handle the data as it enters the system. The system itself - has an internal real-time client called the "Local Client" - which has a series of Monitors running which can analyse - the data. One of these Monitors also feeds the data off to - a file repository, which is updated as new data comes in - for each machine, this data is then read and displayed by - the i-scream web services to provide a web interface to the - data. The system also allows TCP connections by non-local - clients (such as the i-scream supplied Conient), these - applications provide a real-time view of the data as it - flows through the system. -

- The final section of the system links the Local Client - Monitors to an alerting system. These Monitors can be - configured to detect changes in the data past threshold - levels. When a threshold is breached an alert is raised. - This alert is then escalated as the alert persists through - four live levels, NOTICE, WARNING, CAUTION and CRITICAL. - The alerting system keeps an eye on the level and when a - certain level is reached, certain alerting mechanisms fire - through whatever medium they are configured to send. -

- -