www/cms/features.shtml

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   2    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3
   4 <html>
   5
   6 <head>
   7 <title>CMS Features</title>
   8 <!--#include virtual="/style.inc" -->
   9 </head>
  10
  11 <body>
  12
  13 <div id="container">
  14
  15 <div id="main">
  16
  17 <!--#include virtual="/header.inc" -->
  18
  19 <div id="contents">
  20
  21   <h1 class="top">CMS Features</h1>
  22
  23   <h2>Problem Specification</h2>
  24
  25        <h3>Original Problem</h3>
  26
  27        <p>
  28         This is the original specification given to us when we
  29         started the project. The i-scream central monitoring
  30         system meets this specification, and aims to extend it
  31         further. This is, however, where it all began.
  32        </p>
  33
  34        <h3>Centralised Machine Monitoring</h3>
  35
  36        <p>
  37         The Computer Science department has a number of different machines
  38         running a variety of different operating systems. One of the tasks
  39         of the systems administrators is to make sure that the machines
  40         don't run out of resources. This involves watching processor loads,
  41         available disk space, swap space, etc.
  42        </p>
  43
  44        <p>
  45         It isn't practicle to monitor a large number of machines by logging
  46         on and running commands such as 'uptime' on the unix machines, or
  47         by using performance monitor for NT servers. Thus this project is
  48         to write monitoring software for each platform supported which
  49         reports resource usage back to one centralized location. System
  50         Administrators would then be able to monitor all machines from this
  51         centralised location.
  52        </p>
  53
  54        <p>
  55         Once this basic functionality is implemented it could usefully be
  56         expanded to include logging of resource usage to identify longterm
  57         trends/problems, alerter services which can directly contact
  58         sysadmins (or even the general public) to bring attention to problem
  59         areas. Ideally it should be possible to run multiple instances of
  60         the reporting tool (with all instances being updated in realtime)
  61         and to to be able to run the reporting tool as both as stand alone
  62         application and embeded in a web page.
  63        </p>
  64
  65        <p>
  66         This project will require you to write code for the unix and Win32
  67         APIs using C and knowledge of how the underlying operating systems
  68         manage resources. It will also require some network/distributed
  69         systems code and a GUI front end for the reporting tool. It is
  70         important for students undertaking this project to understand the
  71         importance of writing efficient and small code as the end product
  72         will really be most useful when machines start run out of processing
  73         power/memory/disk.
  74        </p>
  75
  76        <p>
  77         John Cinnamond (email jc) whose idea this is, will provide technical
  78         support for the project.
  79        </p>
  80
  81   <h2>Features</h2>
  82
  83        <h3>Key Features of The System</h3>
  84
  85        <ul>
  86         <li>A centrally stored, dynamically reloaded, system wide configuration system</li>
  87         <li>A totally extendable monitoring system, nothing except the Host (which
  88           generates the data) and the Clients (which view it) know any details about
  89           the data being sent, allowing data to be modified without changes to the
  90           server architecture.</li>
  91         <li>Central server and reporting tools all Java based for multi-platform portability</li>
  92         <li>Distribution of core server components over CORBA to allow appropriate components
  93           to run independently and to allow new components to be written to conform with the
  94           CORBA interfaces.</li>
  95         <li>Use of CORBA to create a hierarchical set of data entry points to the system
  96           allowing the system to handle event storms and remote office locations.</li>
  97         <li>One location for all system messages, despite being distributed.</li>
  98         <li>XML data protocol used to make data processing and analysing easily extendable</li>
  99         <li>A stateless server which can be moved and restarted at will, while Hosts,
 100           Clients, and reporting tools are unaffected and simply reconnect when the
 101           server is available again.</li>
 102         <li>Simple and open end protocols to allow easy extension and platform porting of Hosts
 103           and Clients.</li>
 104         <li>Self monitoring, as all data queues within the system can be monitored and raise
 105           alerts to warn of event storms and impending failures (should any occur).</li>
 106         <li>A variety of web based information displays based on Java/SQL reporting and
 107           PHP on-the-fly page generation to show the latest alerts and data</li>
 108         <li>Large overhead monitor Helpdesk style displays for latest Alerting information</li>
 109        </ul>
 110
 111        <h3>An Overview of the i-scream Central Monitoring System</h3>
 112
 113        <p>
 114         The i-scream system monitors status and performance information
 115         obtained from machines feeding data into it and then displays
 116         this information in a variety of ways.
 117        </p>
 118
 119        <p>
 120         This data is obtained through the running of small applications
 121         on the reporting machines.  These applications are known as
 122         "Hosts".  The i-scream system provides a range of hosts which are
 123         designed to be small and lightweight in their configuration and
 124         operation.  See the website and appropriate documentation to
 125         locate currently available Host applications.  These hosts are
 126         simply told where to contact the server at which point they are
 127         totally autonomous.  They are able to obtain configuration from
 128         the server, detect changes in their configuration, send data
 129         packets (via UDP) containing monitoring information, and send
 130         so called "Heartbeat" packets (via TCP) periodically to indicate
 131         to the server that they are still alive.
 132        </p>
 133
 134        <p>
 135         It is then fed into the i-scream server.  The server then splits
 136         the data two ways.  First it places the data in a database system,
 137         typically MySQL based, for later extraction and processing by the
 138         i-scream report generation tools.  It then passes it onto to
 139         real-time "Clients" which handle the data as it enters the system.
 140         The system itself has an internal real-time client called the "Local
 141         Client" which has a series of Monitors running which can analyse the
 142         data.  One of these Monitors also feeds the data off to a file
 143         repository, which is updated as new data comes in for each machine,
 144         this data is then read and displayed by the i-scream web services
 145         to provide a web interface to the data.  The system also allows TCP
 146         connections by non-local clients (such as the i-scream supplied
 147         Conient), these applications provide a real-time view of the data
 148         as it flows through the system.
 149        </p>
 150
 151        <p>
 152         The final section of the system links the Local Client Monitors to
 153         an alerting system.  These Monitors can be configured to detect
 154         changes in the data past threshold levels.  When a threshold is
 155         breached an alert is raised.  This alert is then escalated as the
 156         alert persists through four live levels, NOTICE, WARNING, CAUTION
 157         and CRITICAL.  The alerting system keeps an eye on the level and
 158         when a certain level is reached, certain alerting mechanisms fire
 159         through whatever medium they are configured to send.
 160        </p>
 161 </div>
 162
 163 <!--#include virtual="/footer.inc" -->
 164
 165 </div>
 166
 167 <!--#include virtual="/menu.inc" -->
 168
 169 </div>
 170
 171 </body>
 172 </html>