NRENs and other academic organisations are currently using, building or planning to build so-called low-cost storage infrastructure. Aim of this work is to help the system designers in understanding the real costs of the storage systems and making educated decisions related to choosing the hardware and software platforms.
TCO calculator enables analysing the storage infrastructure costs including the hardware investments costs as well as operational costs including energy, maintenance and data center costs. It also calculates and simulates overal system parameters such as system throughput, IOPS etc.
At the moment TCO calcuator cosinders the low-cost storage systems based on disk servers. It takes into account costs and parameters resulting from the server platform used, HDD and SSD drives applied as well as software defined storage software specifics (now mainly Ceph, to be extended in future).
The content and organisation of the TCO calculator is a result of the discussion among NRENs that started during the TF-Storage meeting in Vienna (February 12-13, 2015) and teleconferences organised afterwards. Notes from the teleconferences are (or will be) included in the child pages.
TCO calculator is provided as the excel sheet to be used offline. At some point we will perhaps prepare an online tool.
You can get the tool from here: TCO_calculator
For Excel users: sometimes Excel report the file as 'broken' and asks if it should try to 'repair' it - please answer yes if you trust us
You may specify the basic system parameters on the single sever landing page. They include sever platform, type of the HDD and SSD disks used as well as the data redundancy level and access pattern planned in the system.
You may also analyse multi server configuratons using a dedicated tab, for instance simulate various sizes of your low-cost storage system and associated costs.
Servers, HDD disks and SSD parameters can be added to the tabs that include known server platforms and their components. These lists should be extended so that they reflect all platforms and components known to the NRENs. We count on your contribution to make the tool comprehensive!
Computations related to power usage, capacity and performance are based on several parameters and assumptions, collected in the parameters tab.
The remaining tabs contain the models and formulas for calculating the system capacity, performance, CAPEX and OPEX cost etc.
Temporarily network related costs (e.g. cost of the cluster interconnect switches, network uplinks) are not considered. This will be improved in the next versions.
The major change in the current version vs the previous ones is distributing the calculator into several tables. Based on this organisation, partners will be able to work independently on particular aspects of the calculator, e.g. IOPS modelling or power consumption calculations.
Hardware aspects and parameters are simulated based on the catalogue values mainly. The parameters and assumptions adopted need to be verified by real life experience and measurements.
Models for simulating the power usage, MB/s and IOPS performance are relatively simplistic. We should perhaps work on improving these models based on the existing heuristics supported by real life relatonships and values collected by NRENs. Again! we need you contribution here!
Efforts will be also put into including the storage software specifics in the simulations.
It is also planned to consider traditional disk array- and tape-based systems in addition to low-cost disk server platforms. This will enable comparing the various approaches for implementing the storage services.
As the TCO is a work in progress we would appreciate ANY kind of feedback. Therefore if you like it or dislike it or have an idea of improvement or extension or you are willing to contribute to it, please contact us on the mailing list.
Summary of Action Points:
Document editing:
- Maciej, PSNC to share the updated version of the TCO calculator (split into areas such as servers, energy, staff etc)
and start collecting comments/notes in a separate document, to be developed as a "Cost Effective Storage how-to"
- ALL, to connect on the TCO tools and start working on the various sheets.
IOPS modelling:
- Panos, GRNET to play with the IOPS tools and let us know the experiences.
- others to review / look for other sources of IOPS/bandwidth, power consumption models
Power usage (important part of cost):
- Panos, GRNET to review datasets on power consumption data and share them.
- volunteers needed to explore power consumption data
- PSNC/CSC to speak about this during week 23-27.03.2015 and share results
Next telco:
- in about 2-3 weeks, after the Easter holidays
- Maciej to doodle about it (status: done: see below):
- ALL to express their availability the 2nd week March, 6-10 using doodle below: (please use the doodle until the end of this week)
Notes from the TCO Calculator discussion
16 March 2015, 12.00-13.30
Preliminary tool has been designed and distributed by PSNC.
The initial aim was to calculate the minimum volume of a local storage solution for PSNC.
The numbers in the tool are dummy numbers.
Short review of the existing material was made including:
a) SNIA cost calculator – models too simplistic, failed to simulate e.g. Ceph based cluster performanc
b) Wmarow's IOPS calculator – might be worth having a look at as includes some modelling
Some aspects of the tool has been discussed:
0) Quick summary of suggested improvements / extensions:
- to split „areas” into multiple sheets – so that we can distribute the work
- good idea would be to have a sheet on example / reference server configs and disk parameters
- network uplinks – include 1Gbit for management in rack space / ports budget – see network part
- cooling efficiency factor – might be included within power price – see electricity
- collecting failure rates would be good (again might be perceived as sensitive)
1) Power consumption:
- Power for the disks must be separated from the power for the main board and other server components: memory, network interfaces etc.
- Different storage architectures need some historical data on power consumptions to be analysed.
- Various may need power consumption modelling. Work on the models as options to select.
- GRNET has some data collected – Panos will check if they can explored and will share conclusions/data
- CSC has some data / models – options what can be shared will be checked.
- PSNC will be able to provide the data it is planning to collect.
- We need long-term averaged measurements not the point-in-time data as the power consumption varies depending on system activity.
- We predict the real-life data to be more usable than models however modelling should still be explored.
2) Staff expenses
- Different from country to country, not to share real numbers in public - politically sensitive.
- Overhead should be included in the total cost of an employee.
- 0.5 FTE for maintenance, including both hardware and operating system is realistic.
- The lower the quality of the hardware (i.e. cheaper) the more HW maintenance you need. This factor can be base don experimental data.
- Data Centre operations costs (using the existing facility) included Building maintenance, Security, UPS, etc. can be virtualized and factored in.
- The same model can be applied for rented spaces (co-location) – we should enable calculating in RUs
- Cooling can be part of these cost or part of the electricity calculation separate (for now it is part of electricity cost – but the impression is that this should be analysed more explicitely in the model in order to enable more detailed analysis / using various parameters)
4) Network
- Cost of the switches is included (10 Gbit ports to servers, pair of ToR switches with uplink – see below)
- Uplink component is missing for. FibreChannel could also be considered as alternative technology.
- At least 10G ports for uplink per switch and network for management (access to IPMI interfaces) must be considered. Some cases 40G ports.
- Should the network be redundant? It depends on the size of the setup and the other redundancy features of the architecture. Network redundancy can be checked against the desired availability figures. (We should be carefull as saving money on switches may enforce high data replication factor – costs...)
- There are some Java tolls available. Panos, GRNET will play with them and share experiences.
- Calculations can be included in a separate sheet.
4) Modelling
- Electricity, Cooling factor, Staff expenses, and other costs (i.e. OPEX above) are the main cost components we have to calculate with.
- The number of racks used should be an open parameter in the model. Different disk/server/rack density.
- Consider not only full racks but also fractions of it (couple of RUs). In some cases this level of granularity is needed.
- We could come up with 4-5 different server configurations as different calculation models to use. Footnotes are needed to explain the differences and the reasoning behind the models.
- Create separate XLS sheets per cost component, and re-distribute the calculator.
- Share numbers on power consumption, if possible.
- Using Goolge doc with dummy numbers on sensitive data (staffing, etc.) is fine for now. Some XLS Marcos may need to be developed off-line. Let's decide how to share the final product later.