Aaron Blinka
|
CompactFlash disk corruption, possible relation to popen or Pipe?
|
Aaron Blinka
04/10/2023 1:53 PM
post122204
|
CompactFlash disk corruption, possible relation to popen or Pipe?
Hi all,
We are experiencing production issues with drives running QNX 4.25 with CompactFlash cards (solid state) as the disk for
the node. We are seeing multiple failure modes. What we are frequently seeing is that when the nodes are rebooted/
power cycled the node won't reboot and gives a message of "Reboot and select proper boot device" - the drive is
unbootable. We've also seen issues where the node is online but some files on the disk are corrupted (an example is awk
, the program would crash when you attempt to run it and its checksum does not match other nodes in the system). We've
also seen issues where trying to run a program (example: netinfo), or access a file will give a "Input/output error".
Additionally, we sometimes see error messages in terminal 1: "popen: Input/output error". Sometimes we see a
combination of these failure modes, sometimes we just see one, we have yet to see a repeatable pattern.
Our application does use a popen() call to execute some shell commands and get the result back into the program. That
particular code runs on a loop and executes roughly every 1 second.
In most cases we see some errors in chkfsys, especially after the nodes have had problems. The amount of errors varies.
In some of the nodes that are completely unbootable we've been able to mount them as an external drive via io-usb and
a USB-to-CompactFlash adapter. Some of them don't mount, but some do. The ones that do mount successfully typically
show a very large number of errors when running chkfsys.
These failures have occurred on multiple different cards in multiple different nodes. As far as we can tell the cards
are not worn out or having hardware issues/bad sectors. Most of these cards are not very old, which also leads us to
think this isn't a hardware failure in the CompactFlash cards themselves.
One thing we noticed when investigating is that the Pipe manager (/bin/Pipe) is not running on these nodes. We're not
sure if that is related or not. We read in the help manual that when the Pipe manager is not running that pipes are
implemented through the Fsys driver, but we're not sure if that is having an impact or not either. We are starting to
suspect that the issue may be tied either to Pipe, popen, Fsys driver parameters, or some combination of these. We also
have some suspicions that there may be a correlation with the processor board, since many of these nodes run on a dual-
core Intel Core 2 Duo. Some of them run on boards with a single-core Pentium processor, but we haven't seen any issues
where the nodes won't reboot on the Pentium processors, just individual files being corrupt (i.e a bad checksum for /bin
/awk). We're not sure if maybe the Fsys driver behaves badly with a dual-core processor or not.
There are a mix of versions across the field, some are runnign /bin/Fsys version 4.24V, some are running 4.24Z. Some
are running /bin/Fsys.eide (some are version 4.25A, some are 4.25G), some are running /bin/Fsys.atapi version 4.25G.
All are running /boot/sys/Proc32 4.25Q.
If anyone out there has any ideas or can help us, we would be greatly appreciative.
|
|
|