Monitoring - What to monitor

4 minutes to read Download PDF Edit

Introduction

Application and system monitoring forms an integral part of on-premise hosting of the Mendix platform. When done correctly, it will provide a system administrator with a wealth of information about the current state and health of your Mendix installation. This document will detail the metrics that need to be collected in order to gain sufficient insight in your technical Mendix environment, as well as defining critical values for these metrics.

Prerequisites

This document is based upon the following assumptions:

Monitoring Categories

This document will describe the four monitoring categories that form the basis of Mendix application and system monitoring:

Fault Monitoring

Fault monitoring is primarily used to detect major errors related to one or more system components. Faults can consist of errors, such as the loss of network connectivity, a database server going off-line, or the application suffering a Java out-of-memory situation. Faults are an important indication for the malfunctioning of a system or application.

Performance Monitoring

Performance monitoring is specifically concerned with detecting degraded performance, such as reduced application-, database- or other back end resource response times. Generally, performance issues arise in an application as the user load increases. Performance problems are important events to detect in the lifetime of an application since they, like fault events, negatively affect the user experience.

Configuration Monitoring

Configuration monitoring is a safeguard designed to ensure that configuration variables affecting the application and the back end resources remain at some predetermined configuration settings. Configurations that are incorrect, such as a too low maximum JVM heap size, can negatively affect the application performance.

Security Monitoring

Security monitoring detects intrusion attempts by unauthorized system users.

Metrics - Fault Monitoring

Fault monitoring implies the broadest range of monitoring metrics, as it consists of both hard- and software related items and extends beyond the reach of just the Application(server) itself, as back-end resources and network connectivity components (routers, switches, etc.) need to be taken into account as well.

 Type of MonitoringApplicable MetricThreshold
Hardware and NetworkServer availabilityHeartbeat/ping all serversUP/DOWN
 Error reportMonitor error report logs hard errorsERRORS
 Network latencyPing time between network componentsUP/DOWN/SNMP traps
 CPU utilizationCPU utilization all servers> 99% over x minutes
 Memory utilizationMemory utilization all servers> 99% over x minutes
 Paging/swappingOS level metric all serversIn process of paging/swapping
 File systemAvailable file space all serversOut of space
 Network componentsCapture SNMP trapsUP/DOWN/ERROR
Mendix Application ServerAdmin server processMonitor admin server processUP/DOWN
 Application server processMonitor application server processUP/DOWN
 Java processMonitor Java processUP/DOWN
Web ServerIIS Worker processesAvailableUP/DOWN/ERROR
 Timed out connectionConnection timeoutOccurred
DatabasesSQL Server processAvailableUP/DOWN
 SQL Agent processAvailableUP/DOWN
 SQL Server maintenance plansRunningSUCCESS/FAILURE
ApplicationFunctionalEnd-to-end application testPASSED/FAILED
 Error logsSearch for errors emitted by the applicationERROR OCCURRED

Metrics - Performance Monitoring

 Type of MonitoringApplicable MetricThreshold
Hardware and NetworkNetwork latencyPing time and network bandwidth measurementsTimings > 1000 ms or network bandwidth maxed
 Error reportMonitor error report logs hard errorsERRORS
 Network latencyPing time between network componentsUP/DOWN/SNMP traps
 CPU utilizationCPU utilization all servers> 80% over x minutes
 Memory utilizationMemory utilization all servers> 80% over x minutes
 Paging/swappingOS level metric all serversIn process of paging/swapping
 File systemAvailable file space all servers> 80% used
 Network componentsCapture SNMP trapsDegraded counters
Web ServerHTTP responseAverage response time retrieving 1K GIFResponse time > 1000 ms
DatabasesSQL ServerAverage response timeResponse time > 1000 ms
 PostgreSQLAverage response timeResponse time > 1000 ms
ApplicationComplex page requestsAverage response time> 10 secs or less
 Error logsSearch for warnings emitted by the applicationWarnings occur

Metrics - Configuration Monitoring

 Type of MonitoringApplicable Metric
Hardware and NetworkNetworkEach network component configuration
 ServerOS level configuration
 File systemNTFS configuration
Web ServerIIS serverConfigurations
DatabasesSQL serverConfigurations
 PostgreSQL serverConfigurations
ApplicationApplication-specificConfigurations

Metrics - Security Monitoring

Security monitoring comprises the detection of, and response to, all security related incidents within the Mendix platform - consisting of Server hard- and software, back-end systems and network connectivity components, like routers, switches, firewalls, etc. As security monitoring comprises such a broad technical field, it is out of scope of this document to list all valid metrics. However, as data security is of vital importance to most, if not all, organizations, it needs to be mentioned here. Ample documentation on the subject of security monitoring exists on the internet today.

Mendix recommends involving a third party supplier to audit your Mendix environment for any security related issues.