As a follow-up to the previous entries concerning my new storage server, I thought I’d talk about installing and configuring the smartmontools monitoring software in OpenSolaris. Like most open source software, it’s fairly easy to compile and install on OpenSolaris, it’s the automating part that’s a little different from Linux, for which smartmontools was developed.
To get started, download the latest release of the smartmontools source and extract it to a temporary directory. Next, make sure you have the
gcc-dev packages installed, otherwise compiling the source is going to be a challenge (if
which gcc returns nothing, run
pfexec pkg install gcc-dev). Now you can build and install the tools quite easily with the following commands.
pfexec make install
At this point the
smartctl binaries are installed under
/usr/local, along with the manual pages and a sample configuration file,
/usr/local/etc/smartd.conf. There are just a couple of changes to be made in the configuration file, and a few notes before proceeding. First off, as of today, ATA disk support in smartmontools on Solaris is not there, so SCSI emulation is used instead. While this gives us basic health status, it seems to prevent any detailed SMART data from being collected. It may also be the reason why I can’t run self-tests on my disks. This all worked in Linux with these same disks, so I’m guessing it’s due to the lack of ATA support in Solaris. Secondly, before you can monitor your disks, you’ll need to know the labels for those disks. I found
zpool status worked quite well.
$ zpool status pool: rpool state: ONLINE scrub: scrub completed after 0h6m with 0 errors on Sun Feb 15 02:21:37 2009 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c0d0s0 ONLINE 0 0 0 errors: No known data errors pool: yubaba state: ONLINE scrub: scrub completed after 0h59m with 0 errors on Sun Feb 15 03:15:02 2009 config: NAME STATE READ WRITE CKSUM yubaba ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 errors: No known data errors
Not only does it show which disks are in which pools, but it gives you the names of the disks that smartmontools expects, namely
c4t0d0 and so on. Now we are ready to make changes to the
The first change to make in
smartd.conf is to comment out the
DEVICESCAN line, which is fine if you want to scan all disks in your system, but I found that smartmontools didn’t like my rpool disk, and it wanted me to declare the disk types as “scsi” for it to do anything at all. Next we have to tell
smartd which disks to monitor, so I added the following lines to the end of the
/dev/rdsk/c4t0d0 -d scsi -H -m root /dev/rdsk/c4t1d0 -d scsi -H -m root /dev/rdsk/c5t0d0 -d scsi -H -m root /dev/rdsk/c5t1d0 -d scsi -H -m root
This seems to work, as invoking
pfexec smartd -q onecheck resulted in output like this:
$ pfexec smartd -q onecheck smartd version 5.38 [i386-pc-solaris2.11] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Opened configuration file /usr/local/etc/smartd.conf Configuration file /usr/local/etc/smartd.conf parsed. Device: /dev/rdsk/c4t0d0, opened Device: /dev/rdsk/c4t0d0, is SMART capable. Adding to "monitor" list. Device: /dev/rdsk/c4t1d0, opened Device: /dev/rdsk/c4t1d0, is SMART capable. Adding to "monitor" list. Device: /dev/rdsk/c5t0d0, opened Device: /dev/rdsk/c5t0d0, is SMART capable. Adding to "monitor" list. Device: /dev/rdsk/c5t1d0, opened Device: /dev/rdsk/c5t1d0, is SMART capable. Adding to "monitor" list. Monitoring 0 ATA and 4 SCSI devices Device: /dev/rdsk/c4t0d0, opened SCSI device Device: /dev/rdsk/c4t0d0, SMART health: passed Device: /dev/rdsk/c4t1d0, opened SCSI device Device: /dev/rdsk/c4t1d0, SMART health: passed Device: /dev/rdsk/c5t0d0, opened SCSI device Device: /dev/rdsk/c5t0d0, SMART health: passed Device: /dev/rdsk/c5t1d0, opened SCSI device Device: /dev/rdsk/c5t1d0, SMART health: passed Started with '-q onecheck' option. All devices sucessfully checked once. smartd is exiting (exit status 0)
So far so good, but what about having smartd run at bootup, and continuously monitoring the disk status? In Linux, you’d use
initd, but since this is OpenSolaris, we’ll use the Service Management Framework (SMF) instead. To do that, paste the following text into
/var/svc/manifest/site/smartd.xml, change the file ownership to
root:sys, and invoke
pfexec svccfg -v import /var/svc/manifest/site/smartd.xml. Then check that the service is running (
svcs smartd), and if not, enable it using
pfexec svcadm enable smartd.
<?xml version="1.0"?> <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type="manifest" name="smartd"> <service name="site/smartd" type="service" version="1"> <single_instance/> <dependency name="filesystem-local" grouping="require_all" restart_on="none" type="service"> <service_fmri value="svc:/system/filesystem/local:default"/> </dependency> <exec_method type="method" name="start" exec="/usr/local/etc/rc.d/init.d/smartd start" timeout_seconds="60"> <method_context> <method_credential user="root" group="root"/> </method_context> </exec_method> <exec_method type="method" name="stop" exec="/usr/local/etc/rc.d/init.d/smartd stop" timeout_seconds="60"> </exec_method> <instance name="default" enabled="true"/> <stability value="Unstable"/> <template> <common_name> <loctext xml:lang="C"> SMART monitoring service (smartd) </loctext> </common_name> <documentation> <manpage title="smartd" section="1M" manpath="/usr/local/share/man"/> </documentation> </template> </service> </service_bundle>
At this point we have a managed service that is checking the health of our disks, and if anything comes up, it will send an email to the root user. While I would have liked to also set up short and long self-tests, I can live without it for now. In the mean time, I’ve got a weekly cron job that scrubs the data on the disks using
zpool scrub, which will identify any data read errors on the disks and attempt to correct them automatically.