OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

How easy are these things to boot diskless via PXE? Anyone doing it?
 
I never tried but if your bios/nic allows PXE/iSCSI boot you should be able to at least boot the OS installer.

But what is the goal?
Install or run Solaris/OmniOS/OI without a local disk?

For the first USB OS installer bootsticks are easier, for the second you add complexity that may worth the effort only for (very) many installations. And even then I would prefer a small local Sata SSD in a disk tray with optionally a cloned spare disk if you want to be able to boot without delay when the bootdisk fails. Without a spare disk a full disaster recovery (save current BE via a regular replications job (hourly or daily when needed) to datapool, reinstall OS after crash and restore BE) is done within 20 min without any complexity. You may also boot clonezilla to create/restore botdisks via SMB or NFS.

If you want a pure USB stick bootsystem, you can check the Illumos distribution SmartOS where only the base boot process is on the stick and everything else on datapool or VMs.

First rule of any server: keep it simple, stupid!
 
Last edited:
I never tried but if your bios/nic allows PXE/iSCSI boot you should be able to at least boot the OS installer.

But what is the goal?
Install or run Solaris/OmniOS/OI without a local disk?

I have a pool of physical machines that can boot into a pool of operating systems from a heavyweight file server. Any machine can boot any OS based on the dhcp/tftp configuration on the server.

It's very practical. And also good because the server has RAID and proper backup for all the OS installations at once. And you can mess around with the installs without them being booted by simply going into the appropriate directory on the NFS server.

The machines are not necessarily diskless. They might have local disks for swapspace or even some ZFS pool or another. They just don't boot from those disks (WIndows excluded, I boot into Windows by having a Windows disk and then simply withholding PXE info from the machine when it starts, so it falls through to disk boot).

Anyway, I want Solaris in that OS pool.
 
Just a note, while you might not trip over the ZFS issue, when it happens you could have a real mess. This is mainly a warning for those running "last available" releases of unsupported OS's.
 
after updating from r151036 to r151046 I am getting a connection refused error when trying to open the nappit webgui. I tried rebooting a few times and that hasn't worked. The shares are still working fine but what is the best way to troubleshoot nappit? Are there new firewall options on 151046 that need to add an exception for port 81? I'm wondering if i should have updated nappit first before updating the OS. I cant remember what version it was
 
after updating from r151036 to r151046 I am getting a connection refused error when trying to open the nappit webgui. I tried rebooting a few times and that hasn't worked. The shares are still working fine but what is the best way to troubleshoot nappit? Are there new firewall options on 151046 that need to add an exception for port 81? I'm wondering if i should have updated nappit first before updating the OS. I cant remember what version it was

"Connection refused" is not a firewall problem. It is the server process not running or at least not listening on the TCP port.
 
I would also asume that the napp-it webserver is not running properly or at all
as there is no special firewall active per default.

Even with an older napp-it on newest OmniOS it should basically work on port 81.
Only menu User can throw errors if a newer Perl is not supported.

- Have you updated 036 -> 038 lts -> 046 lts (update always over lts)
(you can boot 036 bootenvironment again for a proper update)

- what happens when you try to start the webserver manually
/etc/init.d/napp-it start
or
/var/web-gui/data/tools/httpd/napp-it-mhttpd -c \*\*.pl -u napp-it -d /var/web-gui/data/wwwroot -p 81
 
OmniOS r151048f (2023-12-11)
https://omnios.org/releasenotes.html

Weekly release for w/c 11th of December 2023.
This update requires a reboot
Security Fixes
  • curl has been updated to version 8.5.0.
  • The OpenJDK packages have been upgraded to versions 1.8.392-08, 11.0.21+9and 17.0.9+9.
  • perl has been upgraded to version 5.63.3.
Other Changes
  • A race condition in ZFS could cause a very recently written file to appear tocontain holes if inspected with lseek(SEEK_DATA). This is very hard to hitin practice, although the GNU cp command can trigger it and produce emptytarget files. The native illumos/OmniOS cp does not use lseek in this wayand is unaffected.


  • To update from earlier151048, run 'pkg update' + reboot
    To update from former OmniOS, update in steps over LTS versions
    To downgrade, start a former boot environment (automatically created on updates)
 
I would also asume that the napp-it webserver is not running properly or at all
as there is no special firewall active per default.

Even with an older napp-it on newest OmniOS it should basically work on port 81.
Only menu User can throw errors if a newer Perl is not supported.

- Have you updated 036 -> 038 lts -> 046 lts (update always over lts)
(you can boot 036 bootenvironment again for a proper update)

- what happens when you try to start the webserver manually
/etc/init.d/napp-it start
or
/var/web-gui/data/tools/httpd/napp-it-mhttpd -c \*\*.pl -u napp-it -d /var/web-gui/data/wwwroot -p 81
I think I went straight 036->046 so that must have been the issue. I will rollback and try 036->038 first


/etc/init.d/napp-it start
/etc/init.d/napp-it: line 3: 9975 Killed /var/web-gui/data/tools/httpd/napp-it-mhttpsd -c \*\*.pl -u napp-it -d /var/web-gui/data/wwwroot -p $Port -S -E /var/web-gui/data/tools/httpd/mini_httpd.pem > /dev/null 2>&1
/etc/init.d/napp-it: line 3: 9976 Killed /var/web-gui/data/tools/httpd/napp-it-mhttpd -c \*\*.pl -u napp-it -d /var/web-gui/data/wwwroot -p 81 > /dev/null 2>&1
root@san02:/rpool# /var/web-gui/data/tools/httpd/napp-it-mhttpd -c \*\*.pl -u napp-it -d /var/web-gui/data/wwwroot -p 81
ld.so.1: napp-it-mhttpd: fatal: libssl.so.1.0.0: open failed: No such file or directory
Killed

here is mail in root:
perl /var/web-gui/data/napp-it/zfsos/_lib/scripts/auto.pl
Auto-Submitted: auto-generated
X-Mailer: cron (SunOS 5.11)
X-Cron-User: root
X-Cron-Host: san02
X-Cron-Job-Name: perl /var/web-gui/data/napp-it/zfsos/_lib/scripts/auto.pl
X-Cron-Job-Type: cron
MIME-Version: 1.0
Content-Type: text/plain
Content-Length: 109
Date: Tue, 12 Dec 2023 21:00:00 -0600
Message-Id: <65791e30.9210.15bc8412@san02>
From: <root@san02>

Tty.c: loadable library and perl binaries are mismatched (got first handshake key 12400080, needed 12280080)
 
Last edited:
rolled back to omnios 036 and updated nappit 20.06a3->21.06a13, updated to omnios 038 (checked webgui working), updated to omnios 046 and we are good! Thanks Gea
 
OmniOS 151046 comes with a newer Perl. The TTy module is a binary and required by Perl Expect for interactive actions.
Current napp-it free and pro supports this. This is why you need to update napp-it as well (or rerun the wget online installer)
 
Release Notes for OmniOS v11 r151048
r151048m (2024-02-02)

https://omnios.org/releasenotes.html

Weekly release for w/c 29th of January 2024.
This is a non-reboot update
Security Fixes
  • openssl has been updated to version 3.1.5. Security fixes have beenback-ported to the legacy 1.1 and 1.0 openssl packages.
  • unzip has been updated with a number of security fixes.
  • OpenJDK packages have been updated to 1.8.402-06, 11.0.22+7 and 17.0.10+7.
Other Changes
  • unzip now supports newer compression versions by virtue of being linkedto libbz2.
  • The virtio-scsi driver is now included in installation media and images tosupport installation in virtual environments with virtio-scsi boot disks.
  • The zlib package has been updated to version 1.3.1.
 
Hi _Gea
So Broadcom has stopped free ESXi versions for home use. I run an "all-in-one" setup as you describe it on your website. Any thoughts on this change? Would you already recommend an alternative?
Thanks for sharing!
 
There are some discussions at Illumos regarding XPC. SmartOS is also an option as is Hyper-V, but I also suppose Proxmox is the mainfuture option with homeuse. Really sad that you now need a full featured Debian instead the lightweight and minimalistic ESXi. Not sure how good Solaris & co runs under Proxmox.

btw.
napp-it cs (currently under development) can now manage *BSD, *Linux, *Illumos, OSX, Solaris and Windows ZFS servers or server groups, https://www.napp-it.org/downloads/windows.html
 
There are some discussions at Illumos regarding XPC. SmartOS is also an option as is Hyper-V, but I also suppose Proxmox is the mainfuture option with homeuse. Really sad that you now need a full featured Debian instead the lightweight and minimalistic ESXi. Not sure how good Solaris & co runs under Proxmox.

btw.
napp-it cs (currently under development) can now manage *BSD, *Linux, *Illumos, OSX, Solaris and Windows ZFS servers or server groups, https://www.napp-it.org/downloads/windows.html
proxmox is pretty light IMO. not much worse in terms of overhead than esxi. it has really been the choice for a lot home labers for a while now
 
Thanks y'all for your suggestions. My own research points also in the direction of Proxmox. Also, it supports Windows 11 Clients by providing a TPM option, which is great (I hated ESXi for making this topic so difficult for home users). Guess I will start with some trial tests to see how well it supports OmniOS and Napp-it on top. Will report back here sometime soon, I hope.
 
hi Gea, I notice that my Omni VM with napp-it on it getting shutdown all the time. Do you know how to debug that?
 

Release Notes for OmniOS v11 r151048
r151048t (2024-03-22)

https://omnios.org/releasenotes.html

This update requires a reboot

Security Fixes
  • AMD CPU microcode has been updated to 20240116.
  • Intel CPU microcode has been updated to 20240312.
  • Introduced a workaround for the recently published Intel Register File Data Sampling [RFDS] vulnerability in some Intel Atom CPUs - INTEL-SA-00898
Other Changes
  • Fix for a kernel panic in the SMB server caused by a race between cancel and completion functions - illumos 15985.
  • SHA-2 calculations that use libmd and a very large block size couldproduce incorrect hashes.
  • A POSIX normal lock would not properly deadlock on re-entry in a single-threaded application - illumos 16200.
  • Clock calibration in KVM environments now retrieves the clock frequencydirectly via an MSR. This fixes the calculation in environments such as AWS.This calibration method was previously only tried in VMWare guests.
  • Added support for e1000g I219 V17 and LM+V24-27,29 network cards.
  • The ena network driver has received a number of fixes that make it morestable on multi-processor instance types, and support for device reset hasbeen added.
 
I have a ZFS filesystem on a SSD pool that I was manually backing up to a HDD pool using zfs send -I. Unfortunately I forgot about this for awhile and now I no longer have a common base snap to start my incremental replication with. Annoyingly its off by a day, the historical HDD backup pool's most recent snap is from Dec 7 of 2022 and the oldest SSD pool's snap is Dec 6 of 2022. I know traditional knowledge says to just call it a loss and start over, but this got me thinking. Couldn't I use rsync to get both filesystems to be identical and then snap it and use that snap as the base for replication? Is there forensic tools that can view both filesystems at the block level and scan for any differences, sync them over and make the snapshots be 100% consistent? This is more of a thought exercise, as the data isn't that important but I think it's possible.
 
I have a ZFS filesystem on a SSD pool that I was manually backing up to a HDD pool using zfs send -I. Unfortunately I forgot about this for awhile and now I no longer have a common base snap to start my incremental replication with. Annoyingly its off by a day, the historical HDD backup pool's most recent snap is from Dec 7 of 2022 and the oldest SSD pool's snap is Dec 6 of 2022. I know traditional knowledge says to just call it a loss and start over, but this got me thinking. Couldn't I use rsync to get both filesystems to be identical and then snap it and use that snap as the base for replication? Is there forensic tools that can view both filesystems at the block level and scan for any differences, sync them over and make the snapshots be 100% consistent? This is more of a thought exercise, as the data isn't that important but I think it's possible.

A napp-it replication uses dedicated snaps (*_repli.._nr_n) and protects them from autosnap so normally you should have a common base snap. If not you cannot sync them again. Rsync can sync files but not make filesystems exact identical.

Usually, you rename the target filesystem in such a case to filesystem.old and restart the replication with an initial full replication. On success delete the filesystem.old
 
The flavours of ZFS
  1. native ZFS in Solaris 11.
    This is the Unix where ZFS was developped for. The most resource efficient and stable ZFS and propably the fastest one. In 20 years I have not seen as many bug reports up to dataloss than on Linux in a few weeks. Native ZFS is not free nor compatible to Open-ZFS. For noncommercial use/tests you can download Solaris 11 cbe for free (current Solaris beta)

  2. Open-ZFS 2.x
    This is the ZFS in BSD, Linux, OSX and Windows. Due the marketshare of Linux, most Open-ZFS development happens here now, so this is mainstream ZFS. A few years ago, Illumos was upstream to Open-ZFS, now Open-ZFS is upstream to Illumos. Main problem here are the number of Linux distributions, each with a different Open-ZFS release or update policy. Checking issue tracker quite often is not a bad idea. For a backup pool it is a good idea to disable features to be less affected by bugs.

  3. Illumos ZFS
    Illumos is the Opensource fork of Solaris. It inherits all of the Solaris advantages like stability, resource efficiency, easyness in handling, the Solaris iSCSI Comstar project or the in ZFS integrated kernelbased SMB server with a superiour integration of Windows ntfs alike ACL. Illumos is compatible to Open-ZFS 2.x but independent from Open-ZFS 2.x. It uses Open-ZFS now as upstream to integrate newer ZFS features after additional stability checks what makes it more stable than Open-ZFS. Illumos Issue tracker shows quite a similar low number of critical issues as Solaris.
    Distributions are very close to Illumos (not as different as the many Linux distributions: NexentaStor (storage appliance), OmniOS (storage OS with a stable/long term stable), OpenIndiana (successor of OpenSolaris) or SmartOS (virtualizer OS, competitor to ESXi or Proxmox)
 
Release Notes for OmniOS v11 r151048
r151048w (2024-04-11)

Weekly release for w/c 8th of April 2024.
https://omnios.org/releasenotes.html

This update requires a reboot

Security Fixes

Other Changes
  • A panic in ZFS in conjunction with SMB2 has been fixed.
  • A bug in readline that could cause crashes with unknown locales has beenresolved.
  • The system PCI and USB hardware databases have been updated.
  • For Intel CPUs which are not vulnerable to Post-barrier Return Stack Buffer (PBRSB) the kernel no longer spends time mitigating thi
 
Back
Top