- CMS Features -
-- Problem Specification -
-- Original Problem -
-- This is the original specification given to us when we - started the project. The i-scream central monitoring system - meets this specification, and aims to extend it further. - This is, however, where it all began. -
-- Centralised Machine Monitoring -
-- The Computer Science department has a number of different - machines running a variety of different operating systems. - One of the tasks of the systems administrators is to make - sure that the machines don't run out of resources. This - involves watching processor loads, available disk space, - swap space, etc. -
-- It isn't practicle to monitor a large number of machines by - logging on and running commands such as 'uptime' on the - unix machines, or by using performance monitor for NT - servers. Thus this project is to write monitoring software - for each platform supported which reports resource usage - back to one centralised location. System Administrators - would then be able to monitor all machines from this - centralised location. -
-- Once this basic functionality is implemented it could - usefully be expanded to include logging of resource usage - to identify longterm trends/problems, alerter services - which can directly contact sysadmins (or even the general - public) to bring attention to problem areas. Ideally it - should be possible to run multiple instances of the - reporting tool (with all instances being updated in - realtime) and to to be able to run the reporting tool as - both as stand alone application and embeded in a web page. -
-- This project will require you to write code for the unix - and Win32 APIs using C and knowledge of how the underlying - operating systems manage resources. It will also require - some network/distributed systems code and a GUI front end - for the reporting tool. It is important for students - undertaking this project to understand the importance of - writing efficient and small code as the end product will - really be most useful when machines start run out of - processing power/memory/disk. -
-- John Cinnamond (email jc) whose idea this is, will provide - technical support for the project. -
-- Features -
-- Key Features of The System -
--
-
- A centrally stored, dynamically reloaded, system wide - configuration system - -
- A totally extendable monitoring system, nothing except - the Host (which generates the data) and the Clients (which - view it) know any details about the data being sent, - allowing data to be modified without changes to the server - architecture. - -
- Central server and reporting tools all Java based for - multi-platform portability - -
- Distribution of core server components over CORBA to - allow appropriate components to run independently and to - allow new components to be written to conform with the - CORBA interfaces. - -
- Use of CORBA to create a hierarchical set of data entry - points to the system allowing the system to handle event - storms and remote office locations. - -
- One location for all system messages, despite being - distributed. - -
- XML data protocol used to make data processing and - analysing easily extendable - -
- A stateless server which can be moved and restarted at - will, while Hosts, Clients, and reporting tools are - unaffected and simply reconnect when the server is - available again. - -
- Simple and open end protocols to allow easy extension - and platform porting of Hosts and Clients. - -
- Self monitoring, as all data queues within the system - can be monitored and raise alerts to warn of event storms - and impending failures (should any occur). - -
- A variety of web based information displays based on - Java/SQL reporting and PHP on-the-fly page generation to - show the latest alerts and data - -
- Large overhead monitor Helpdesk style displays for - latest Alerting information - -
- An Overview of the i-scream Central Monitoring System -
-- The i-scream system monitors status and performance - information obtained from machines feeding data into it and - then displays this information in a variety of ways. -
-- This data is obtained through the running of small - applications on the reporting machines. These applications - are known as "Hosts". The i-scream system provides a range - of hosts which are designed to be small and lightweight in - their configuration and operation. See the website and - appropriate documentation to locate currently available - Host applications. These hosts are simply told where to - contact the server at which point they are totally - autonomous. They are able to obtain configuration from the - server, detect changes in their configuration, send data - packets (via UDP) containing monitoring information, and - send so called "Heartbeat" packets (via TCP) periodically - to indicate to the server that they are still alive. -
-- It is then fed into the i-scream server. The server then - splits the data two ways. First it places the data in a - database system, typically MySQL based, for later - extraction and processing by the i-scream report generation - tools. It then passes it onto to real-time "Clients" which - handle the data as it enters the system. The system itself - has an internal real-time client called the "Local Client" - which has a series of Monitors running which can analyse - the data. One of these Monitors also feeds the data off to - a file repository, which is updated as new data comes in - for each machine, this data is then read and displayed by - the i-scream web services to provide a web interface to the - data. The system also allows TCP connections by non-local - clients (such as the i-scream supplied Conient), these - applications provide a real-time view of the data as it - flows through the system. -
-- The final section of the system links the Local Client - Monitors to an alerting system. These Monitors can be - configured to detect changes in the data past threshold - levels. When a threshold is breached an alert is raised. - This alert is then escalated as the alert persists through - four live levels, NOTICE, WARNING, CAUTION and CRITICAL. - The alerting system keeps an eye on the level and when a - certain level is reached, certain alerting mechanisms fire - through whatever medium they are configured to send. -
-