Architecture and Hardware Requirements

This document acts as a reference point for the logical design of Canopy, and also provides guidelines on specific deployment scenarios and 

Architecture

Canopy is comprised of three main components:

  • Application/Web service (web tier, including application server and local file storage system)
  • Database layer
  • Document generation service (docx and PDF)

It is possible to run all three components on the same hardware - assuming the minimum requirements are met and planning of storage is undertaken. However, equally it is possible to deploy Canopy in different configurations - e.g. placing the web tier and database service on different services and storage on a NAS/SAN.

Alternative deployment guides will be made available shortly. For any specific questions, please contact support@checksec.com

Logical design


Services and Ports

Canopy communicates over the following TCP ports:

Service GroupServicePort (Protocol)Publicly Accessible?SecurityNotes
Canopy application serverWeb Server443 (https)YesYes - hardened TLS configuration out of the box

This is a standard web service, running on nginx as standard (apache also supported). It is also the primary, and only, interface that needs to be accessible to users of the application. The Web Service acts as a reverse proxy to the Application Service, for both user access and API (RESTful) access.

A self-signed certificate is used by default, which should be replaced with a production ready certificate.

Canopy application serverApplication Service8000 (http)NoLocalhost configuration by default.The Application Service is built on top of django and is typically bound to localhost only. If network communication is required (e.g. for large scale deployments), this service is wrapped (reverse proxy) via another Web Server layer, which would use the default TLS hardened configuration.
Canopy application serverRabbitMQ server5672 (amqp)NoLocalhost configuration by default; default username/password.

RabbitMQ is a backend message queue for running asynchronous jobs via celery.

RabbitMQ can be configured to run over TLS, which may be a requirement under larger/enterprise deployments. Specific guidelines are available on the RabbiMQ site: https://www.rabbitmq.com/ssl.html

It is recommended that the default username/password be changed on install, even though the service is restricted to localhost.

Canopy database serverPostgreSQL (default)5432 (pgql)NoLocalhost configuration by default; default username/password accessible through a user-restricted admin script.

PostgreSQL is the primary database supported by Canopy. Oracle may also be used (REF).

Additional PostgreSQL hardening guidelines are provided at: REF.

Canopy report serverDocserver8181 (http)NoLocalhost configuration by default; no authentication required.

This service accepts a docx (template) and an XML (data source) and returns a generated docx file to the application server.

This service can be run on alternative server (for load distribution). In such a scenario, we recommend using nginx as a frontend proxy, which can be secured via TLS.

Canopy report serverPDF converter9016 (http)NoLocalhost configuration by default; no authentication required.

This service accepts a docx and returns a docx and PDF to the calling application.

This service can be run on alternative server (for load distribution). In such a scenario, we recommend using nginx as a frontend proxy, which can be secured via TLS.

OtherMail routing25 (smtp)NoOutbound only service, for mail routing.This is a dependency for sending mail-based notifications to users. Outbound firewall requirement.

Deployment scenarios

Single server

By default Canopy will set up a single server instance using the standard service protocols listed above.

Larger and Enterprise deployments

Within enterprise environments, service layers may be available for databases. Canopy supports separation of the following modules on separate servers:

  • Web server: the web server can be run on its own instance. The web server configuration would need to be configured to connect to the application server on the exposed port (default: 8000). Multiple servers can be deployed in high availability environments
  • Application server: the application server can be configured on a separate server. 
  • Database server: canopy requires a single database server. This database can be hosted on a network connected server. The database URL and PORT must be configured in the canopy.ini file on the application server. Database replication is not currently supported, however, 
  • Report server: both the docserver and the PDF converter can be deployed to a separate server (or servers) in order to offload the resource intensive operation of document generation. Both of these services can be deployed using TLS to encrypt network communications.


For configuration guidelines, see: TBC.

Recommended hardware configurations

The following requirements are intended for Canopy Server in multi-user environments.

ItemMinimum requirementRecommended requirement (small to medium sized consultancies)Recommended requirements (large consultancies and enterprises)
Processor2 cores4+ cores16+ cores
Memory (RAM)4GB8GB32GB
Storage20GB (depends on usage)Use based. See below: Calculating storage space.Use based. See below: Calculating storage space.


Specific upper limits are being tested on these configurations and will be communicated in due course.

Performance benchmarks

The following performance benchmark data can be used to help determine suitable hardware deployments within environments. In summary:

  • Requests Per Second (RPS) was 7 requests is 1 action in Canopy on average.
  • 1 action corresponded with a typical task performed by a single user; therefore RPS
  • Recommended concurrent users (RCU) is the recommended number of concurrent users to avoid performance degradation. This is an estimate based on the test scenarios. It may be the case that if there are a similar amount of users performing heavier operations, this could lead to a resource consumption issue earlier than suggested.
  • Maximum concurrent users (MCU) is the maximum upper limit of the performance tests, at which point the performance of the server was significantly affected. In all cases, the upper limit was reached due to CPU related bottlenecks.
HostvCPUs

RAM(GB)

RCUMCURPS per CPUGunicorn workersBottleneckCPU
AWS EC2 t2.medium23.525503510CPUIntel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
AWS EC2 t2.2xlarge832100-12525043.758CPUIntel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
AWS EC2 c4.4xlarge1630200-22035731.2516CPUIntel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz

All benchmarking was conducted on AWS instances. Performance results on other virtualised platforms, or on direct hardware, may be different. This data should be used for guidance only.

Calculating storage space

Storage requirements for Canopy vary greatly based on the planned use of the system. If Canopy is going to be used to store data, including state files from proxy tools, code repository dumps, etc. then more disk space will be required. If such information is stored on an external environment, references can be used inside of Canopy instead via the description fields.

A typical usage scenario calculation might consist of:

  • Average data per project: 1GB
  • Estimated number of projects per year: 250 (distributed across the team)
  • Estimate space requirement (per year): 250 * 1GB = 250GB + 2-5GB for the database server

Total estimate for 1 year might be ~255GB, which should arguably rounded up to 300GB.

It may be appropriate to project for a 3-5 year period, towards an upper limit.