Solutions Linux 2011 is now over ! The last day was again full of various meetings, including a very interesting one with Charles Schulz (The Document Foundation) where I learnt a lot around LibreOffice future that I can not disclose 😉 or maybe later on. And I lead the System Administration track, which was extremely successfull this year with more than 130 people in the room !! I’m glad to see that the new model of free conferences is allowig the event to have more attendees.
We had the following presentations:
Open Source alternative to deploy ITIL by Erwan Taloch (Combodo)
A very interesting presentation that takes over the boring parts of ITIL to show how beneficial, once correctly tooled with iTop ITIL can be ! Impressive. And the tool is looking very professional, efficient and quick, even if it’s high degree of parametrization requires appropriate knowledge to set it up. But Value is at that price no ?
iTop has been made by 3 persons Denis Flaven, Erwan Taloc, Romain Quetiez when in charge of Outsourcing at HP.
Thousands of customers to manage. One tool was missing to manage work processes, contacts, SLA, …
Developed their own tool to replace Word/Excel docs.
2006: Development of iTop
2009: first version on sf.net
2010: Creation of Combodo to provide professional support
2011: 15000 downloads for iTop 1.1
Goal is to solve IT mastering.
IT is critical for every company (they all have Internet access, CRM, ERP, …) and is becoming more and more complex
So specialization is required, so multiple actors are involved, so complexity increases ! And of course, they need to do more with less.
Each team has its own reference of documents to manage the environment. It leads to excessive time to solve issues.
ITIL is one method to solve these problems. Often judged too expensive and too heavy .
But ITIL well used is a pragmatic way to manage IT. You need to choose parts that are useful. Not theoritically but practically. That’s what iTop is made for. And at the Center of iTop, they have the CMDB. It is the common reference point.
Demo of the Configuration Management of iTop. Erwan explained the notion of Configuration Items, and showed the powerful links between a business solution and a mail server e.g. He also demonstraed how to import existing data from Excel/OpenOffice content files into iTop to build part of the CMDB. Also the traceability that the tool is bringing.
He also explained how to keep in sync iTop with data coming from other tools such as GLPi, OCS, FusionInventory, … He used Talend to manage the extraction of OCS and update of iTop. All in a couple of clicks.
Challenge is to make this info live along time. The only way for tat is to use the information.
Erwan further showed how iTOP materializes the diagram of dependency and also the diagram of impact (these are generated from the data in the CMDB). They made a plugin for the event with Nagios. Alarms in Nagios can be linked to Incidents in iTop.
In ITIL the most important aspect is Services management. there are 2 support parts in iTop: Service catalog (including SLA with TTO, TTR), Contracts (linking CIs, SLA). iTop provides a vision for IT and another one for the users (including feedback forms).
Link is also possible with lots of external tools Archipel, Zabbix, FusionInventory, Puppet, …
GLPI users: IT managers View on park, Helpdesk people, Users (to create and follow tickets), Management for reports, Purchasing department (in relationship with budgets and assets management)
Architecure: Web application (PHP based) with install wizard and very few dependencies.
Does it scale ? It is used on park with more than 130000 machines and 90000 users.
More than 1million of users declared. (first political action of support) on the Web site.
Information management: how to enter info in GLPi ?
– computers/smartphones use an agent (FusionInventory Agent)
– network equipment (no agent install possible): uses SNMP to collect information
– printers (no agent install possible either)
Integration with other tools (managing licenses, financial, technical information) is possible through webservice interface, API for plugins (managed on the forge) or CSV import/export mecanisms.
GLPI differentiate from OTRS, RT, due to the inventory associated and the history around. It allows stats production.
For authentication, it proposes an LDAP integration (AD, OpenLDAP) or pop3/IMAP or WebSSO/CAS.
Notions of entities and rights + profiles + rules based system.
Gonéri then took an example with lots of people profiles and showed how profiles are managed accordingly.
Helpdesk is ITIL v1 compliant (SLA, user satisfaction, Incident management, business rules, notifications).
There are multiple interfaces depending on the nature of the profile (end user, support technician, smartphones)
Another communication mean is through webservices or mails.
GLPI is managed by a french 1901 rules association (Indepnet).
2 independant leaders Julien Dombre, Jean-Mathieu Doléans, and lots of contributors/translators/…
There is the notion of Business Partners, which sponsor the project.
FusionInventory created by GLPI community members to extend the features and have a complete platform.
Walid then made a demo covering the park management on computers, network switches, android phones as well as the ticketing part.
iTop and GLPI are in fact more complementary than in competition. Especially the inventory part and park management of GLPI with the rest of iTOP features. Of course, on some others they proved similar features. but each tool allow you to choose in it the modules you like the most, and are open enough of course to dialog with the rest of the ecosystem. Hopefully we will soon see an integrtaion of the 2.
these 2 presentations were extremely successfull and gathered more than 130 people.
Shinken by Jean Gabes (Lectra)
State of monitoring in 2011: since the last 10 years new needs on performace, availability, increased number of abstraction including virtualization, have arisen. We also have to manage an Increased number of system due to dev, pre-prod, prod so also a large set of systems to monitor. We are moving from a technical IT to a business IT.
One of the goal of a good monitoring is to get pertinent alerts.
Historical monitoring tools are not adpated anymore. Including Nagios.
Multiple Nagios do not interoperate. But it’s needed so it creates problems of configuration management and of architecture/performance.
We do have remote sites, virtualization security through FW, … which is complex for Nagios to manage.
Nagios still has strength: modularity through modules and its community.
Shinken was created to bring the new concepts to Nagios but was not accepted by Nagios authors so evolved into a separate project.
Nagios configuration, monitoring agents, interfaces (CGI, Centreon, Ninja, Thruk, …), everything is kept, but performances are multiplied by 2.
Split of features into small daemons. One for configuration management. One tfor scheduling. One to launch monitoring agents. One for passive info management. One for alerts. One for information agregation/presentation.
Advantage of the architecture: redundancy is embedded, and no limit to the number of daemons but a single configuration and a single DB. Linear scalability is embedded. It also solves remote/secured sites problems.
What about slow remote sites, and multi customers ? Cloud of monitoring should be distincts and shinken provides the notion of realm to support it. Realms are separate, but it still keeps a single configuration (with the arbitor daemon outside of the realms) and even a single data storage (or not ! as you want)
Shinken is already better architectured than Nagios. However, this is not sufficient: Monitoring should also bring added value. Importance of correlation, and bringing back only the correct information. Separation of the source of the problem from the impact. Shinken provides a smart alert filter to support this. On top, we need to concentrate on solving the *important* source issues. So there is a need of criticity level. Criticity should be managed recursively from the application level to the individual elements.
Shinken also manages impact of dependencies from the Hypervisor to the guests, and knows the association of guests on hosts. Same impact for N-tier solutions with redundancy: is the app still running ? is the only intreseting question. Aggregation information should be at the heart of the tool. It’s called business rules in shinken.
Shinken provides some other bonus: discovery module and configuration creation (configuration is Nagios like so still complex as powerful), including VMWare VMs. Notification and dependencies system simplified. Tool developed in python, so works on LInux/Windows/Android !
What is the current status:
Tool was developed 2 years ago. Architecture is in place as well as principles.
Need to document more. Tutorial.
Need to support newcomers.
Integration of Behavior driven monitoring methiods with the cucumber module.
Managing Consoles in a better way using profiles and views (Thruk)
jean made a very lively presentation to a 80 people audience, and this project is defeinitely to be looked at closely for all of you interested in monitoring.
SystemTap by Adrien Kunysz (Acunu)
Acunu is specialized in Cloud Storage/Big Data: just a more performant Cassandra.
What is SystemTap:An iInfrastructure to simplify gathering of information about the running Linux system. It allows to observe a running system.
It works by writing a script describing what to observe. The stap command transforms it into a kernel module, loaded and executed.
SystemTap executes actions when code passes by probe points. E.g. syscall.read (each time you call read), …
It supports metachar (whatever function of the moduule Y, or the socket interface, …) or static (every 200 ms, when the VM free space). Can also be userland (each time we enter ls, each time we call a function containing malloc in glibc,…). SystemTap reuses the dtrace entry points for PostgreSQL. Can also work on Java functions or python ones… Cf: man stapprobes
Syntax similar to the C language with awk spirit. For each pattern doing something. It provides hashes, easy stats (sum, avg, min, max, …) and lots of functions.Most commonly used functions: pid(), uid(), execname(), tid (thread), probefunc (in which function or we). Cf: man stapfuncs
Options of stap:
-x trace a specific PID (instead of whole system by default)
-c trace only the command and childs
-L list all probe points of the context
-g guru mode 😉
Adrien then went on showing an example where each executable launched by the stystem is printed.
Could be powerfull to help support team debugging problems.
Adrien then showed an example based on SIGKILL analysis, another example on file watching (/etc/passwd)
SystemTap is integrated in RHEL5+, Fedora, CentOS. utrace patch required for tracing in user space.
Other languages can also be handled if the runtime has been instrumented.
There are security mecanisms at language level, and execution level. However there is the -g mode 😉
Adrien really had fun demonstrated all the features of SystemTap, and we finished late as there was lots of interest in the room on this subject, even if we had less people around for this more technical subject.