OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

ok, ashift values are ok and not different within datapool and at 12.
(can happen if you add another vdev to a pool)
The ashift=0 values are not clear to me but seems uncritical as they do not affect data or disks

Performance:
- I asume you have enabled sync on the SSD pool. This will result in a poor but secure write performance. To validate, disable sync and retry a write. (On a VM storage you should enable sync or your guest filesystem can become corrupted on a crash during write)

To benchmark pool performance use menu Pool > Benchmark.
This is a series of large and small read/write io with sync enabled vs sync disabled

Check:
Is your vnic a vmxnet3 or e1000. First is faster
Have you run System > Appliance tuning with defaults?
This will improve NFS and ip performance between storage and ESXi

An SSD pool with an additional Slog is "suboptimal" as this has the effect that every write musz be done twice, once fast via rambased writecache and once on every write commit. As I see your pool SSDs are desktop ones and only the log is the enterprise edition with powerloss protection. In such a case the slog improves write security a little but a small risk remains as the pool itself has no powerloss protection so no guarantee that last writes are save on a crash.

Best regarding security and performance would be using a mirror of SSDs with powerloss protection without an extra Slog and sync enabled.
 
Last edited:
ok, ashift values are ok and not different within datapool and at 12.
(can happen if you add another vdev to a pool)
The ashift=0 values are not clear to me but seems uncritical as they do not affect data or disks

Performance:
- I asume you have enabled sync on the SSD pool. This will result in a poor but secure write performance. To validate, disable sync and retry a write. (On a VM storage you should enable sync or your guest filesystem can become corrupted on a crash during write)

To benchmark pool performance use menu Pool > Benchmark.
This is a series of large and small read/write io with sync enabled vs sync disabled

Check:
Is your vnic a vmxnet3 or e1000. First is faster
Have you run System > Appliance tuning with defaults?
This will improve NFS and ip performance between storage and ESXi

An SSD pool with an additional Slog is "suboptimal" as this has the effect that every write musz be done twice, once fast via rambased writecache and once on every write commit. As I see your pool SSDs are desktop ones and only the log is the enterprise edition with powerloss protection. In such a case the slog improves write security a little but a small risk remains as the pool itself has no powerloss protection so no guarantee that last writes are save on a crash.

Best regarding security and performance would be using a mirror of SSDs with powerloss protection without an extra Slog and sync enabled.
I'm running omnios, v11 r151042b. All of my VMs are vmxnet3 except for omnios as I cannot make it see the NIC when I choose vmxnet3 from my host. I have to flip it back to e1000 for it to recognize a NIC. Know any tricks for this, I've read your manuals and didn't see anything special. I also have VMware tools installed, they might need to be updated.

I did notice that I had only one NIC on my vSwitch, probably the default setup. It had defaulted to 100mb. I've added two more NICs and they autodetected to 100mb as well but I forced them 1000/full. I also have a 10GB microtik on the way and will attempt again to get omnios to see the vmxnet3 nic.

I have also removed the SLOG and disabled sync on my SSDMirror. Still working on your other suggestions.

Thank you very much for your guidance and your work on napp-it!

UPDATE: I now have a 10GBE switch in place and connected to a 10GB SFP+ port on my host. My machine ID changed so now my license needs to be updated, will send an email about this. I've added the new NIC to my vSwitch0 in ESXi.
 
Last edited:
If you have a drive fail in a pool that is 80% full then start a resilver operation only to realize you can purge some ancient snaps that free up an additional 40% of the array- will the resilver operation see that dynamically or will it continue to resilver all of the 80% full array and not see it until after the drive replacement is complete?
 
A drive resilver reads all data with a metadata reference. If amount of referenced data shrinks, a resilver should finish faster. With modern ZFS and sorted resilver, the difference may be not huge.
 
The multihreaded Solaris/ZFS integrated SMB server allows file/folderbased ntfs alike ACL but also sharebased ACL. These are ACLs on a share controlfile /pool/filesystem/.zfs/shares/filesystem. This file is created when you activate a share and deleted when you disable a share. Share ACL are therefor not persistent.

In current napp-it 22.dev share ACL are preserved as ZFS properties. When you re-enable a share you can now restore last share ACL or set basic settings like everyone full, modify or read.
 
Last edited:
Save energy on your napp-it backupserver

Energy costs have multiplied since last year. Really a problem for a backupserver up 24/7 when you only want to backup your storageserver once or a few times a day especially as incremental ongoing ZFS replications are finished within minutes.

A money saving solution is to remotely power up the backupserver via ipmi, sync the filesystema via ZFS replication and power off the backupserver when replications are finished. For this I have created a script for a napp-it 'other job' for your storageserver to simplify and automate this.

Details, see https://forums.servethehome.com/ind...laris-news-tips-and-tricks.38240/#post-357328
 
OpenIndiana Hipster 2022.10 is here

https://www.openindiana.org/2022/12/04/openindiana-hipster-2022-10-is-here/
OpenIndiana is a Illumos distribution and more or less the successor of OpenSolaris. It comes in a desktop edition with a Mate GUI, browser, email and Office apps, a textedition similar to OmniOS bloody and a minimal distribution. Usually you install the desktop or text edition. Minimal lacks essential tools.

While OpenIndiana Hipster is ongoing Illumos (every pgk update gives you newest Illumos so quite a Illumos reference installation) there are annual snapshots that gives a tested startpoint for beginners. This is the main difference to OmniOS where stability with dedicated stable repositories is the main concern.

During setup, select your keyboard but keep language=en when using napp-it.
 
Use case and performance considerations for an OmniOS/OpenIndiana/Solaris based ZFS server
This is what I am asked quite often

If you simply want the best performance, durability and security, order a server with a very new CPU with a frequency > 3GHz and 6 cores or more, 256 GB RAM and a huge Flash only storage with 2 x 12G multipath SAS (10dwpd) or NVMe in a multi mirror setup - with a datacenter quality powerloss protection to ensure data on a powerloss during writes or background garbage collection. Do not forget to order twice as you need a backup on a second location at least for a disaster like fire, theft or Ransomware.

Maybe you can follow this simple suggestion, mostly you search a compromise between price, performance and capacity under a given use scenario. Be aware that when you define two of the three parameters, the third is a result of your choice ex low price + high capacity = low performance.

As your main concern should be a workable solution, you should not start with a price restriction but with your use case and the needed performance for that (low, medium, high, extreme). With a few users and mainly office documents, your performance need is low, even a small server with a 1.5 GHz dualcore CPU, 4-8 GB RAM and a mirror from two SSD or HD can be good enough. Add some external USB disks for a rolling daily backup and you are ready.

If you are a media firm with many users that want to edit multitrack 4k video from ZFS storage, you need an extreme solution regarding pool performance (> 2GB/s sequential read,write), network (min multiple 10G) and capacity according your needs. Maybe you come to the conclusion to prefer a local NVMe for hot data and a medium class disk based storage for shared file access and versioning only. Do not forget to add a disaster backup solution.

After you have defined the performance class/use case (low, medium, high, extreme), select needed components.


CPU
For lower performance needs and 1G networks, you can skip this. Even a cheap dual/quadcore CPU is e good enough. If your performance need is high or extreme with a high throughput in a 10G network or when you need encryption, ZFS is quite CPU hungry as you see in https://www.napp-it.org/doc/downloads/epyc_performance.pdf. If you have the choice prefer higher frequency over more cores. If you need sync write (VM storage or databases) avoid encryption as encrypted small sync writes are always very slow and add an Slog for diskbased pools.

RAM
Solaris based ZFS systems are very resource efficient due the deep integration of iSCSI, NFS and SMB into the Solaris kernel that was developped around ZFS from the beginning. You need less than 3 GB for a 64bit Solaris based OS itself to be stable with any pool size. Use at least 4-8 GB RAM to allow some caching for low to medium needs with only a few users.

memory.png

As ZFS uses most of the RAM (unless not dynamically demanded by other processes) for ultrafast read/write caching to improve performance you may want to add more RAM. Per default Open-ZFS uses 10% of RAM for write caching. As a rule of thumb you should collect all small writes < 128k in the rambased write cache as smaller writes are slower or very slow. As you can only use half of the write cache unless the content must be written to disk, you want at least 256k write cache that you can have with 4 GB RAM in a single user scenario. This RAM need for write caching scale with number of users that write concurrently so add around 0.5 GB RAM per active concurrent user.

Oracle Solaris with native ZFS works different. The rambased writecache caches last 5s of writes that can consume up to 1/8 of total RAM. In general this often leads to similar RAM needs than OI/OmniOS with Open-ZFS. On a faster 10G network with a max write of 1 GB/s this means 8GB RAM min + RAM wanted for readcaching.


Most of the remaining RAM is used for ultrafast rambased readcaching (Arc). The readcache works only for small io on a read last/ read most optimazation. Large files are not cached at all. Cache hits are therefore for matadate and small random io. Check napp-it menu System > Basic Statistic > Arc after some time of storage usage. Unless you does not have a use scenario with many users, many small files and a high volatility (ex a larger mailserver), cache hit rate should be > 80% and metadata hit rate > 90%. If results are lower you should add more RAM or use high performance storage like NVMe where caching is not so important.

arc.png

If you read about 1GB RAM per TB storage, forget this. It is a myth unless you do not activate rambased realtime dedup (not recommendet at all or when dedup is needed use fast NVMe as a special vdev mirror for dedup). Needed RAM size depends on number of users, files or wanted cache hit rate not poolsize.

L2Arc
L2Arc is an SSD or at best NVMe that can be used to extend the rambased Arc. L2Arc is not as fast as RAM but can increase cache size when more RAM is not an option or when the server is rebooted more often as L2Arc is persistent. As L2Arc needs RAM to organize, do not use more than say 5x RAM as L2Arc. Additionally you can enable read ahead on L2Arc that may improve sequential reads a little. (add "set zfs:l2arc_noprefetch=0" to /etc/system or use napp-it System > Tuning).


Disk types
RAM can help a lot to improve ZFS performance with the help of read/write caching. For larger sequential writes and reads or many small io it is only raw storage performance that counts. If you look at the specs of disks the two most important values are seqential transfer rate for large transfers and iops that counts when you read or write small datablocks.

Mechanical disks
On mechanical disks you find values of around 200-300 MB/s max sequential transfer rate and around 100 iops. As a Copy on Write filesystem like ZFS is not optimized to a single user/single datastream load, it spread data quite evenly over the pool for a best multiuser/multithread performance. It is therefore affected by fragmentation with many smaller datablocks spread over the whole pool where performance is more limited by iops than sequential values. On average use you will often see no more than 100-150 MB/s per disk. When you enable sync write on a single mechanical disk, write performance is not better than say 10 MB/s due the low iops rating.

Desktop Sata SSD
can achieve around 500 MB/s (6G Sata) and a few thousand iops. Often iops values from specs are only valid for a short time until performance drops down to a fraction on steady writes.

Enterprise SSDs
can hold their performance and offer powerloss protection PLP. Without PLP last writes are not save on a power outage during write as well as data on disk with background operations like firmware based garbage collection to keep SSD performance high.

Enterprise SSDs are often available as 6G Sata or 2 x 12G multipath SAS. When you have an SAS HBA prefer 12G SAS models due the higher performance (up to 4x faster than 6G Sata) and as SAS is full duplex while Sata is only half duplex with a more robust signalling with up to 10m cable length (Sata 1m). The best of all SAS SSDs can achieve up to 2 GB/s transfer rate and over 300k iops on steady 4k writes. SAS is also a way to build a storage with more than 100 hotplug disks easily with the help of SAS expanders.

NVMe are the fastest option for storage. The best like Intel Optane 5800x rate at 1.6M iops and 6.4 GB/s transfer rate. In general Desktop NVMe lack powerloss protection and can hold write iops not on steady write so prefer datacenter models with PLP. While NVMe are ultrafaste it is not as easy to use many of them as each wants a 4x pci lane connection (pci-e card, M.2 or oculink/U.2 connector). For a larger capacity SAS storage is often nearly as fast and easier to implement especially when hotplug is wanted. NVMe is perfect for a second smaller high performance pool for databases/VMs or to tune a ZFS pool with an Slog for a faster sync write on disk based pools, a persistent L2Arc or a special vdev mirror.


ZFS Pool Layout

ZFS groups disks to a vdev and stripe several vdevs to a pool to improve performance or reliability. While a ZFS pool from a single disk without redundancy rate as described above, a vdev from several disks can behave better.

Raid-0 pool (ZFS always stripes data over vdevs in a raid-0)
You can create a pool from a single disk (this is a basic vdev) or a mirror/raid-Z vdev and add more vdevs to create a raid-0 configuration. Overall read/write performance from math is number of vdevs x performance of a single vdev as each must only process 1/n of data. Real world performnce is not a factor n but more 1.5 to 1.8 x n depending on disks or disc caches and decreases with more vdevs. Keep this in mind when you want to decide if ZFS performance is "as expected"

A pool from a single n-way mirror vdev
You can mirror two or more disks to create a mirror vdev. Mostly you mirror to improve datasecurity as write performance of an n-way mirror is equal to a single disk (a write is done when on all disks). As ZFS can read from all disks simultaniously read performance and read iops scale with n. When a single disk rate with 100 MB/s and 100 iops a 3way mirror can give up to 300 MB/s and 300 iops. If you run a napp-it Pool > Benchmsrk with a singlestream read benchmark vs a fivestream one, you can see the effect. In a 3way mirror any two disks can fail without a dataloss.

A pool from multiple n-way mirror vdevs
Some years ago a ZFS pool from many striped mirror vdevs was the preferred method for faster pools. Nowaday I would use mirrors only when one mirror is enough or when an easy extension to a later Raid-10 setup ex from 4 disks is planned. If you really need performance, use SSD/Nvme as they are by far superiour.

A pool from a single Z1 vdev
A Z1 vdev is good to combine up to say 4 disks. Such a 4 disk Z1 vdev gives the capacity of 3 disks. One disk of the vdev is allowed to fail without a dataloss. Unlike other raid types like raid-5 a readerror in a degraded Z1 does not mean a pool lost but only a damaged reported file that is affected by the read error. This is why Z1 is much better and named different than raid-5. Sequential read/write performance of such a vdev is similar to a 3 disk raid-0 but iops is only like a single disk (all heads must be in position prior an io)

A pool from a single Z2 vdev
A Z2 vdev is good to combine say 5-10 disks. A 7 disk Z2 vdev gives the capacity of 5 disks. Any two disks of the vdev are allowed to fail without a dataloss. Unlike other raid types like raid-6 a readerror in a totally degraded Z2 does not mean a pool lost but only a damaged reported file that is affected by the read error. This is why Z2 is much better and named different than raid-6. Sequential read/write performance of such a vdev is similar to a 5 disk raid-0 but iops is only like a single disk (all heads must be in position prior an io)

A pool from a single Z3 vdev
A Z1 vdev is good to combine say 11-20 disks. A 13 disk Z2 vdev gives the capacity of 10 disks. Any three disks of the vdev are allowed to fail without a dataloss. There is no equivalent to Z3 in traditional raid. Sequential read/write performance of such a vdev is similar to a 10 disk raid-0 but iops is only like a single disk (all heads must be in position prior an io)


A pool from multiple raid Z[1-3] vdevs
Such a pool stripes the vdevs what means sequential performance and iops scale with number of vdevs (not linear similar to the raid-0 degression with more disks)


Many small disks vs less larger disks
Many small disks can be faster but are more power hungry and as performance improvement is not linear and failure rate scale with number of parts I would always prefer less but larger disks. The same is with number of vdevs. Prefer a pool from less vdevs. If you have a pool of say 100 disks and an annual failure rate of 5%, you have 5 bad disks per year. I you asume a resilver time of 5 days per disk you can expect 3-4 weeks where a resilver is running with a noticeable performance degration.


Special vdev
Some high end storages offer tiering where active or performance sensitive files can be placed on a faster part of an array. ZFS does not offer traditional tiering but you can place critical data based on their physical size (small io), type (dedup or metadata) or based on the recsize setting of a filesystem on a faster vdev of a ZFS pool. Main advantage is that you do not need to copy files around so this is often a superiour approach as mostly the really slow data is data with a small physical file or blocksize. As a vdev lost means a pool lost, use special vdevs always in a n-way mirror. Use the same ashift as all other vdevs (mostly use ashift=12 for 4k physical disks) to allow a special vdev remove.

To use a special vdev, use menu Pools > Extend, select a mirror (best a fast SSD/NVMe mirror with PLP) with type=special. Allocations in the special class are dedicated to specific block types. By default this includes all metadata, the indirect blocks of user data, and any deduplication tables. The class can also be provisioned to accept small file blocks. This means you can force all data of a certain filesystem to the special vdev when you set the ZFS property "special_small_blocks" ex special_small_blocks=128K for a filesystem with a recsize setting smaller or equal. In such a case all small io and some critical filesystems are on the faster vdev others on the regular pool. If you add another vdev mirror load is distributed over both vdevs. If a special vdev is too full, data is stored on the other slower vdevs.

Slog
With ZFS all writes always go to the rambased writecache (there may be a direct io option in a future ZFS) and
are written as a fast large transfer with a delay. On a crash during write the content or the writcache is lost (up
to several MB). Filesystems on VM storage or databased may get corrupted. If you cannot allow such a dataloss
you can enable sync write for a filesystem. This will force any write commit immediately to a faster Zil area of
the pool or to a fast dedicated Slog device that can be much faster than the pool ZIL area and additionally in a
second step as a regular cache write. Every bit that you write is writtn twice, once directly and once collected
in writecache. This can never be as fast as a regular write vie writecache. So Slog is not a performance option
but a security option when you want acceptable sync write performance. The Slog is never read beside after a
power outage to redo missing writes on next reboot, similar to the BBU protection of hardware raid.
Add an Slog only when you need sync write and buy the best that you can afford regarding low latency, high
endurance and 4k write iops. The Slog can be quite small (min 10GB). Widely used are the Intel datacenter
Optane.

Tuning
Beside the above "physical" options you have a few tuning options. For faster 10G+ networks you can increase tcp buffers or NFS settings in menu System > Tuning. Another option is Jumboframes that you can set in menu System > Network Eth ex to a "payload" of 9000. Do not forget to set all switches to highest possible mtu value or at least to 9126 (to include ip headers)

Another setting is ZFS recsize. For VM storage with filesystems on it I would set to 32K or 64K (not lower as ZFS becomes inefficient then). For mediadata a higer value of 512K or 1M may be faster.

more, https://www.napp-it.org/doc/downloads/napp-it_build_examples.pdf
 
I have a problem of accessing ZFS data via local network. The speed seems very slow eventhough I have 1G ethernet port out of my all in one system (ESXi as boot up in a seperate USB, Omni's in a SSD, and ZFS data in a HDDs). The data is not filled up all the way, and when ever I play a 4K movie stored in ZFS (around 80Mbps-90Mbps) via wired LAN, the stutter happened. when I change to wireless , somehow the situation gets better. What should I do to make the connection is better?
 
You should first check pool performance in menu Pool > Benchmarks. This is a series or read/write tests with sync vs async. If performance is as expected, test network performance via iperf (server in menu Services, client in System > Network Eth). To rule out cable/switch problems, compare a direct cabling. If you use special settings like Jumbo, disable.
 
Filesystem monitoring in napp-it via fswatch
https://github.com/emcrisostomo/fswatch

Filesystem monitoring is a new feature in newest napp-it 23.dev (jan 29) in menu Service > Audit and Fswatch.
It logs events like file create, modify or delete. You can use it to monitor activities or create alerts
when many files are modified in a short time (under development) ex due a Ransomware attack or to sync
modified files on demand ex between a cloud/s3 and an SMB share based on snaps as there is no working
filelocking between them.

service.png

You can enable monitoring in the above service menu

Menu forms:
Report alerts under development
Watched folderlist 1..3 watched folders ex path1 path2 (blank separated)
use "path 1" with blanks in path

Include Path must contain this regex to be logged
Exclude Events are excluded when regex is in path

Options default is -tnr
t = print timestamp
n = print events with a numetric trigger
(alternative is x to print cleartext)
r = recursive
d = log only directory access not files
(reduce load)

Eventlist log only these events ex
--event 8 --event 68


Tip:
Do not log large filesystems recursively, log only needed folders.
With many files it can take some time until events are logged.


Logs

Logfiles are in folder /var/web-gui/_log/monitor/
There is one logfile per day of last 31 days ex fswatch.01.log .. fswatch31.log
Older logs ere overwritten. If you want to preserve them use an "other job"


You can show and filter logs in menu Service > Audit > Fswatch Log

ex show only events on .jpg files
filter.png


I have included fswatch for Illumos (OmniOS, OpenIndiana) and Solaris.
On Solaris I had to modify sources, not sure about stability, https://github.com/emcrisostomo/fswatch/issues/228
 
Last edited:
I'm sorry to hijack the thread, but I think my topic is fine here. After a power failure on my OmniOS NAS (151044), my pool only shows:

zfs: [ID 961531 kern.warning] WARNING: Pool 'smallpool' has encountered an incorrectable I/O failure and has been suspended; `zpool clear` will be required before the pool can be written to.

A "zpool clear smallpool" hangs the whole machine or nothing happens even after hours. It is a raidz1 (4 disk) array without extra cache or log device. One disk got damaged during the power failure. SMART says "end to end error" on the Seagate disk. I have checked this externally on a PC. The pool can still be mounted read-only and a "zpool status -v smallpool shows" only 4 defective files. Lucky me....

Is there any realistic way to still recover the pool? I've been copying files down for days (18TB), but ideally I'd like to be able to resilver it with a new disk.

Thanks in advance
 
"An incorrectable I/O failure and has been suspended" indicates a pool failure without redundancy what means that more than one disk of the Z1 failed. I would power off/on and retry, optionally check cabling and disk bays. If at least three disks come back, you can access the pool in a degraded state, optionally with one or more files reported as corrupted. Unlike a Raid-5 the whole Z1 pool is not lost after a short two disk failure as ZFS can detect good/bad data due checksums.
 
root@aio-pod:~# zpool status -v smallpool
pool: smallpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 3,95G in 0 days 00:21:49 with 132 errors on Thu Dec 15 12:26:26 2022
config:

NAME STATE READ WRITE CKSUM
smallpool ONLINE 0 0 10
raidz1-0 ONLINE 0 0 20
c6t4d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 1
c6t2d0 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

<metadata>:<0x70a3>
....


the checksum errors on the one disc increase over time. This is the disc with the SMART error. Even after a reboot / cold reset I can only mount the pool readonly. When there is no more hope - nothing can be done. After the power failure in December it has probably briefly still something resilvert. Since then I struggle with the problem.
 
I would
- backup all important data from the readonly pool
- then disk > replace the bad disk

then retry a pool clear + rw mount, otherwise recreate the pool (due damaged metadata)
 
I had already tried a disk replace once. One disk would be there. Even already online in another slot. But how does this work, if I can only mount the pool readonly? If I mount the pool normally, I immediately get the i/o error. a zpool clear "hangs" then.

-> "backup all important data from the readonly pool".
I have already started with this. :-/
 
ok seems pool (metadata) is damaged

backup data (or copy to a new pool via midnight commander)
recreate pool and restore data is the only option
 
Where do i find out the disk temps in napp it? Had a hard drive go out and wanted to double check and make sure they aren't overheating, but the drives are 10 years old..Can't seem to find it in the menus
 
Disk temp is detected via smartmontools.
If temp is not shown, first run Disks > Smartinfo > get smartvalues
 
Disk temp is detected via smartmontools.
If temp is not shown, first run Disks > Smartinfo > get smartvalues
Thanks, they wern't showing on disks but that did the trick.

They are all at 50c so I better do something to check the cooling. Thanks
 
50°C under load is quite normal, >60°C is not so good, >70°C is critical
 
50°C under load is quite normal, >60°C is not so good, >70°C is critical
Thanks I just opened up the server and all the HDD fans were dead. I just replaced the fans. Is there a way to run a script to check the temps and notify me when they hit a certain temp? The 50c was pretty much at idle. My drives usually idle around 31-34c. I dont think any actions were happening maybe other than reading a few smaller files. Nothing to put the pool under load.
 
Report job "smartcheck" alerts > 59°C.

/var/web-gui/data/napp-it/zfsos/_lib/scripts/report/r05#smart#temp_and_shortcheck#SI#AS.pl line 200.
You can adjust for a lower value (not update save unless you create a private report from it)
 
Manage ESXi via SOAP ex create/delete ESXi Snaps

soap.png


I came up with the idea of AiO (ESXi server with virtualized ZFS/NFS storage and VMs on NFS and pass-through storage hardware) around 2010. This was the first stable ZFS storage solution based on (Open)Solaris or a lightweight minimalistic OmniOS. Others copied the idea based on Free-BSD or Linux.

From the beginning ZFS snaps offered a huge advantage over ESXi snaps as they could be created/destroyed without delay and initial space consumption. Even thousands of snaps are possible while ESXi snaps are limited to a few shorttime ones. Combined with ZFS replication a high speed backup or copy/move of VMs is ultra easy. That said, there is a problem with ZFS snaps and VMs as the state of a ZFS snap is like a sudden powerloss. There is no guarantee that a VM in a ZFS snap is not corrupted.

In napp-it I included an ESXi hotsnap function to create a save ESXi snap prior the ZFS snap followed by a ESXi snap destroy. This includes an ESXi snap in every ZFS snap with hot memory state. After a VM restore from an ZFS snap you can go back to the safe ESXi snap. Works perfectly but handling is a little complicated as you need ssh access to access esxcli. Maybe you have asked yourself if there is no easier way and there is one vie the ESXi SOAP api similar to the ESXi web-ui.

Thomas just published a small interactive Perl script for easy ESXi web management via SOAP. It even works with ESXi free, see ESX / ESXi - Hilfethread


1. install (missing) Perl modules

perl -MCPAN -e shell
notest install Switch
notest install Net::SSLeay
notest install LWP
notest install LWP::protocol::https
notest install Data::Dumper
notest install YAML
exit;

complete list of needed modules:
Switch
LWP::UserAgent
HTTP::Request
HTTP::Cookies
Data:: Dumper
Term::ANSIColor
YAML
LIBSSL
Net::SSLeay
IO::Socket::SSL
IO::Socket::SSL::Utils
LWP::protocol::https


Howto:
Update napp-it to newest 23.dev where the script is included

example: list all datastores
perl /var/web-gui/data/napp-it/zfsos/_lib/scripts/soap/VMWare_SOAP.pl list_all_datastores --host 192.168.2.48 --user root --password 1234

Attached Datastores "63757dea-d2c65df0-3249-0025905dea0a"
Attached Datastores "192.168.2.203:/nvme/nfs"


example: list VMs:
perl /var/web-gui/data/napp-it/zfsos/_lib/scripts/soap/VMWare_SOAP.pl list_attached_vms --host 192.168.2.48 --user root --password 1234--mountpoint /nvme/nfs --mounthost 192.168.2.203

Attached VM ID "10" = "solaris11.4cbe"
Attached VM ID "11" = "w2019.125"
Attached VM ID "12" = "oi10.2022"
Attached VM ID "14" = "w11"
Attached VM ID "15" = "ventura"
Attached VM ID "16" = "danube"
Attached VM ID "9" = "omnios.dev.117"

example: create snap
perl /var/web-gui/data/napp-it/zfsos/_lib/scripts/soap/VMWare_SOAP.pl create_snapshot --host 192.168.2.48 --user root --password 1234--mountpoint /nvme/nfs --mounthost 192.168.2.203 --vm_id 9 --snapname latest --mem --no-quiesce --snapdesc latest

example: list (latest) snap
perl /var/web-gui/data/napp-it/zfsos/_lib/scripts/soap/VMWare_SOAP.pl list_snapshot --host 192.168.2.48 --user root --password 1234 --vm_id 9

I will use the script to work together with a normal autosnap job in a future napp-it. Up to then you can create a jobid.pre (ex 123456.pre) in /var/web-gui/_log/jobs/ with a script to create the ESXi snap and a jobid.post to destroy the ESXi snap after it was included in the ZFS snap.


Update
I have added a SOAP menu in latest napp-it 23.dev

soap2.png

data.png
 
Last edited:
Update: ESXi Soap management in current napp-it 23.dev
https://forums.servethehome.com/ind...laris-news-tips-and-tricks.38240/#post-367124

implemented_actions = ('summary','ssh_on','ssh_off','poweron','shutdown','reboot','kill','mount','unmount','create_snapshot', 'remove_snapshot','last_snapshot','revert_snapshot','list_attached_vms','list_all_datastores');

It is now possible to manage and automate ESXi via scripts (VMs and ESXi snaps) from napp-it
 
OmniOS r151044p (2023-02-21)

Weekly release for w/c 20th of February 2023.
This update requires a reboot

Security Fixes
Git has been updated to version 2.37.6.

Other Changes
The bundled AMD CPU microcode has been updated.
The signalfd driver could cause a system panic.
It was possible that the system could panic if the in-zone NFS server was in use.

https://github.com/omniosorg/omnios-build/blob/r151044/doc/ReleaseNotes.md
 
Solaris and its OpenSource fork Illumos (OmniOS, OpenIndiana, Nexenta, SmartOS etc).
Not mainstream like Windows or Linux but a perfect specialist OS for ZFS with three unique selling points

1. Resource efficiency
A minimalistic 64 bit ZFS OS like OmniOS requires only around 1 GB kernel memory plus some RAM
for read/write caching even with Apache websever + NFS +SMB + iSCSI activated.

Especially in a virtualized AiO setup where low RAM need and high efficiency is critical, OmniOS performs best. The reason for the low RAM need is the perfect integration of storage services into the OS and that ZFS memory management is still Solaris alike.

ram.png



2. Everything storage related is OS included even in a minimal setup

Sun invented NFS and added a superiour SMB server and iSCSI stack into the free OpenSolaris. This is why its now free in its successor Illumos. Especially when you want an SMB server with local Windows alike SMB groups (more powerfull than Unix groups), ntfs alike ACL and Windows SID security references as extended ZFS properties to keep AD permissions intact when you restore a backup or ZFS snaps=Windows previous version without any settings, there is no alternative.

3. It just works, no pain with ZFS, updates or regular bugfixes.

Sun developped (Open)Solaris more or less around ZFS as the only filesystem. No need to add something after a minimal OS setup. In OmniOS you even see a repository per OS release. No unexpected new or modified behaviours on bi-weekly or monthly security or bugfix updates. An upgrade to a newer OS version means switching to the new repository followed by a pkg upgrade. Setup from an ISO or USB installer is done under 5 minutes, https://omnios.org/releasenotes.html

Not to mention dtrace, OS container solutions (zones and Linux LX ones), Bhyve or the ability to be a perfect candidate for a storage VM under ESXi and many other things that was developped for and under Solaris.
 
Last edited:
User identifiers and Linux/Unix user mapping

Have you ever heard of user mapping or why this is critical for
any serious use of Linux/Unix SMB filers?


The Problem:

Microsoft Windows where SMB comes from is using Windows security identifiers (SID ex S-1-5-21-722635049-2797886035-3977363046-1234) to identify users or groups or assign ACL permissions to shares, files and folders. As the related server is part of the SID, a domain user has a worldwide unique SID.

Linux or Unix is using a simple number like 1021 for a user (uid) or group (gid). This means that a user id cannot be unique. Some like root with user id 0 is even the same on every Unix server on earth.

user.png

User and groups with Unix uid/gid and Windows SID


As SMB is a Microsoft thing, every SMB client needs the Windows SID as reference. The Unix uid/gid is used for Linux/Unix file sharing mechanism like NFS. But as ZFS is a Unix filesystem, every file has and needs the Unix uid/gid reference. That means when a Windows user with SID S-1-5-21-722635049-2797886035-3977363046-1234 writes a file, the Unix owner uid of the file on disk is a simple number like 1021. If you need a relation between both you need id mapping what means an assignment between the two number.

This is not a problem in a single server/ local user environment where the SID is generated from the Unix uid ex S-1-5-21-722635049-2797886035-3977363046-1021 for a Unix user with uid 1021. A simple alternative is to map them based on their usernames ex Winuser: paul = Unixuser: paul or Winuser:* = Unixuser:* where you want a local Unix user for every Windows user with same name. With both options you can keep Unix permissions and Windows permissions transparent in sync.


SMB groups

A problem comes up when you want to use the functionality of Windows groups where a role can be assigned to a group ex "administrators" or "backup operators" or when you need groups that can contain groups. Unix groups do not offers such a functionality. Only Solaris and its forks offers this as they additionally implemented SMB group management to Unix groups.


Backup/Restore

If you backup files and restore them to a different server you have a problem as permissions are not preserved or consistent as for ex a user with uid 1021 is paul on the first server and hanns or unknown on the other. You need special settings, mechanism or mapping tables to solve the problem or a centralized mechanism like Active Directory with Unix extensions to assign consistent uid/gid for all users in s domain. Not as easy and simple as in a Windows only environment but normally the only option ex with a Unix SMB server like SAMBA.

Sun developpers were aware of this problem and when they included the kernelbased Solaris SMB server into the OS and ZFS, they decided not to use Unix uid/gid but Windows SID only and directly as an extended ZFS file attribute. This means that if you backup a Solaris fileserver and restore the files to another AD member server all ACL remains intact.


How can Windows SID work as SMB file reference
as ZFS is a Unix filesystem where every file needs a Unix uid/gid?

Sun solved the problem with ephemeral mappings. This is a temporary uid that is only valid during an active SMB session to fulfill Unix needs. For SMB the uid is not relevant or used.

Everything perfect and the ultimate solution on Solarish for SMB only use. If you need a strict relation between SMB users and Unix users ex for special applications or multiprotocol sharing you are back to the mapping problem even on Solarish either with a mapping table or a centralized uid for any user from the AD server.


Unique selling points

Native processing of Windows SID, Windows ntfs alike ACL, Windows compatible SMB groups and zero config ZFS snaps = Windows previous versions or a hazzle free SMB server setup in general are the unique selling points for the kernalbased Solaris and Illumos SMB server over the alternative SAMBA. As NFS and SMB sharing is a strict ZFS property of a filesystem, setup is a simple on/off.
 
_Gea
since nappit already using port 80, can I run http server just to get files, upload files from outside? which folder should i use

Thanks
 
Back
Top