Isilon, UCS, Nexus get the nod for UK Community Cloud Infrastructure

(Point of note: I work for Eduserv, this is my technology take on my current project deployment).

Eduserv are building a new cloud service called ‘The Community Cloud Infrastructure’, with the first service to be deployed being the ‘University Modernisation Fund Cloud Pilot‘. (Subsequent public and hybrid clouds are planned for deployment on the platform in the future, to support Education, Health, Public Sector and other public bodies).

As part of this project, there are some exciting choices and this post is to provide some background information about the technology stack to be used in the infrastructure, and here is the technology stack outline. The purpose of the platform is to be both scalable and resilient, whilst providing down the line multi-site availability zone protection for cloud customers without a major re-design. The aim is to be able to scale the platform to support to 5,000+ physical servers (10,000+ VMs) in the first instance with a lower price point than Amazon EC2 equivalent at launch.

Compute

Cisco UCS chassis with B-series blades, fabric extenders and fabric interconnects provide the computing power to the platform. Multiple fabrics provide the pod scalability, with the UCS fabric being designed with enough north-south connectivity to find a happy medium between bandwidth and scalability. Fabric commissioning takes account of the products offered for the platform, which has bearing on the amount of contention from each physical blade.

Vendor Site: http://www.cisco.com/en/US/products/ps10265/index.html

Networking

After much calculation, Cisco Nexus 7k and 5k switches form the backbone of the network, and form the core of the UCS infrastructure. Resilience and scalability with a 7k core and 5k distribution layer combine to produce the right amount of uplinks from the UCS chassis, plus north-south and east-west connectivity, plus connectivity to the storage. The entire network stack for cloud provision is based on 10GB connectivity, with out-of-band management connectivity connecting in at 1GB.

Vendor Site: http://www.cisco.com/en/US/products/ps9441/Products_Sub_Category_Home.html

Storage

Here is the exciting technology choice. Isilon X-series and NL-series storage will be used to power the cloud, with SATA disk technology and 10GB connectivity to the core switches providing massive scalability via modular architecture to 14PB! For those not familiar with Isilon technologies, there is an excellent post from arstechnica here: http://bit.ly/n0dWZ3

Vendor Site: http://www.isilon.com/

Software

The software stack comprises a mixture of enterprise and open source options. VMware vCloud Director provision is being aimed at large institutions, IT departments, colleges and schools who wish to outsource IT operations to cloud providers for a variety of use cases, whilst OpenStack is being targeted at individuals. Options for consumption include PAYG and fixed per-month billing, with access via a portal for customer service and support.

Vendor Site(s): http://www.vmware.com/products/vcloud-director/overview.htmlhttp://www.openstack.org/

As time passes, I will post more information on the interesting technical and operational aspects of this project. If you have any queries or questions, please feel free to contact us for more information or leave a comment!

Troubleshooting vCloud Director Cells

So, you have a single or multi-cell vCloud Director installation, but some of the cells are misbehaving or are having issues? Here are some tips, tricks and gems of information I have collected over time…..

  1. NTP. Time is critically important to vCD cells. If there is a difference between the cells local time is greater than 2 seconds, there will be connectivity issues.
  2. DNS. This is critical – make sure the cells have forward and reverse DNS entries in contactable DNS server.
  3. Check the ‘Transfer’ folder is read/write. If the NFS share is mounted as read-only – the cells will complain and not function.
  4. If a single cell is working, and another cell is added to the configuration, when presenting the transfer NFS mount to the second cell the services need to be restarted on both cells for it to function correctly.
  5. Check the cell.log file for issues, specifically with connectivity to the transfer location.
  6. Has the mount point been added to FSTAB for persistence? If not, add it in there…
  7. Check the transfer location is not full, or have limited space.

Got any tips of your own? Please leave a comment or mail us so I can update this post over time…..

Credits for tips to this list:

Ephemeral Ports with VMware vCloud Director 1.5 and vSphere 5

This is something that I have been thinking about in a recent build project, where we are considering using vCDNI-Backed networks in vCloud Director. Dave Hill writes a great article on using (or not) Ephemeral Ports in vCD here:

http://www.virtual-blog.com/2011/03/ephemeral-ports-with-vcd-to-use-or-not-to-use/

…where the focus was on 4.0 vs. 4.1 deployments.

With the recent release of vSphere 5, the arguments for and against using Ephemeral ports in a capacity context come into sharper focus:

  • vSphere 4.1: 1,016 PortGroups per vDS/vCenter
  • vSphere 5: 256 PortGroups per vDS/vCenter

Source:

So, what are the choices? In vSphere, there are 3 port binding types:
  • Static – dvPort is immediately assigned, and freed when the VM is removed. Only available for connection through vCenter. Default in vCloud Director 1.5.
  • Dynamic – dvPort is assigned when VM is powered-on and NIC connected, and freed when VM is powered-off. Not available in vSphere 5. Good for environments where there are more VMs than port groups available. VMs must be powered-on and off through vCenter.
  • Ephemeral – dvPort is created and assigned when VM is powered-on, and deleted when the VM is powered-off orNIC is disconnected. Can be created directly on ESXi hosts. Should only be used for recovery where provisioning dvPorts is required directly on hosts.
Another reason not to use Ephemeral Ports is the performance aspect. With Ephemeral ports, as these are dynamically created and destroyed, operations will be comparably slower for add-host and VM tasks than with tasks that use the Static binding type.

Building vCloud Director Cells using CentOS6

In my lab infrastructure (build information soon to be published for information), I have been playing with vCloud Director 1.5. As part of my work day, I am developing a cloud platform as part of a team at Eduserv (for more information see http://www.eduserv.org.uk/cloud), and part of that is to design availability zones so customer information when populated will be resilient across geographical locations.

Now, not being part of the RedHat brigade (and a novice getting up to speed with CentOS), I have been playing with getting vCloud Director cells to work ok on CentOS (an ‘unsupported’ activity, according to the documentation – RHEL being a pre-req for vCD). In order to get this to work, I followed the following process.

NOTE: There is probably a much better way of doing this, but this is the first way I found to get it to work. If you have found another or have a better suggestion – feel free to comment! [Read more...]

Single vs. Multi-Cell vCloud Director Considerations

With this post, I wanted to outline the considerations for deploying vCloud Director cells in a single vs. multi-cell deployment. These considerations are often relevant when moving a deployment from a lab or P.O.C. deployment to a production or build environment. If you come across that isn’t mentioned here, please drop us a comment and add to the list!

‘Transit’ NFS Share

When deploying vCD cells, the NFS share to host the ‘transit’ location is critical. This is used as part of the downloading of vApps, where downloads are streamed to the transit location then downloaded to the destination – to protect or proxy access to the vCenter from the download location. This is not such an issue for multi-cell environments where the transit location is shared, typically on a SAN with enough allocated space. In single cell deployments (as is common in lab set-ups), an important consideration is to size the cell with enough space to stream your largest deployed vApps. (Location of the shared transfer service storage is mounted from $VCLOUD_HOME/data/transfer).

Response Files

Once the initial cell has been installed, a response file is generated with the network and connection details used in the original configuration. This file is saved on the cell at:

/opt/vmware/vcloud-director/etc/responses.properties

On subsequent installations, the responses file from the first install should be used to ensure that additional installations are consistent. Copy the installation file to a secure location on the network, then as part of the installation, reference the responses file as following:

./installation-file -r path-to-response-file

Load Balancing Cells

With multiple cells, you will certainly need a method of balancing traffic across them. There are several methods of doing this, but so far I have played with:

As soon as I have more information on these, I will update this post with more information.

More thoughts to be added soon…..