Project Home
Project Home
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - CompactFlash disk corruption, possible relation to popen or Pipe?: (2 Items)
   
CompactFlash disk corruption, possible relation to popen or Pipe?  
Hi all,

We are experiencing production issues with drives running QNX 4.25 with CompactFlash cards (solid state) as the disk for
 the node.  We are seeing multiple failure modes.  What we are frequently seeing is that when the nodes are rebooted/
power cycled the node won't reboot and gives a message of "Reboot and select proper boot device" - the drive is 
unbootable.  We've also seen issues where the node is online but some files on the disk are corrupted (an example is awk
, the program would crash when you attempt to run it and its checksum does not match other nodes in the system).  We've 
also seen issues where trying to run a program (example: netinfo), or access a file will give a "Input/output error".  
Additionally, we sometimes see error messages in terminal 1: "popen: Input/output error".  Sometimes we see a 
combination of these failure modes, sometimes we just see one, we have yet to see a repeatable pattern.

Our application does use a popen() call to execute some shell commands and get the result back into the program.  That 
particular code runs on a loop and executes roughly every 1 second.  

In most cases we see some errors in chkfsys, especially after the nodes have had problems.  The amount of errors varies.
  In some of the nodes that are completely unbootable we've been able to mount them as an external drive via io-usb and 
a USB-to-CompactFlash adapter.  Some of them don't mount, but some do.  The ones that do mount successfully typically 
show a very large number of errors when running chkfsys.

These failures have occurred on multiple different cards in multiple different nodes.  As far as we can tell the cards 
are not worn out or having hardware issues/bad sectors.  Most of these cards are not very old, which also leads us to 
think this isn't a hardware failure in the CompactFlash cards themselves.

One thing we noticed when investigating is that the Pipe manager (/bin/Pipe) is not running on these nodes.  We're not 
sure if that is related or not.  We read in the help manual that when the Pipe manager is not running that pipes are 
implemented through the Fsys driver, but we're not sure if that is having an impact or not either.  We are starting to 
suspect that the issue may be tied either to Pipe, popen, Fsys driver parameters, or some combination of these.  We also
 have some suspicions that there may be a correlation with the processor board, since many of these nodes run on a dual-
core Intel Core 2 Duo.  Some of them run on boards with a single-core Pentium processor, but we haven't seen any issues 
where the nodes won't reboot on the Pentium processors, just individual files being corrupt (i.e a bad checksum for /bin
/awk).  We're not sure if maybe the Fsys driver behaves badly with a dual-core processor or not.

There are a mix of versions across the field, some are runnign /bin/Fsys version 4.24V, some are running 4.24Z.  Some 
are running /bin/Fsys.eide (some are version 4.25A, some are 4.25G), some are running /bin/Fsys.atapi version 4.25G.  
All are running /boot/sys/Proc32 4.25Q.

If anyone out there has any ideas or can help us, we would be greatly appreciative. 
Re: CompactFlash disk corruption, possible relation to popen or Pipe?  
One other piece of information in case it is helpful:

Most of the nodes are running the following parameters for Fsys:

/bin/Fsys -Hdisk## -A -r8000

Where ## is the size of the CompactFlash card.

Some of the nodes (not sure if it's related, but these nodes have only seen a corrupt file like awk rather than 
unbootable drives) are running with:

/bin/Fsys -d0 -c0k -A -r 8000