Channel Problem determination - When a channel refuses to run - Middleware news
When a channel refuses to run
If a channel refuses to run:
* Check that DQM and the channels have been set up correctly. This is a likely problem source if the channel has never run. Reasons could be:
o A mismatch of names between sending and receiving channels (remember that uppercase and lowercase letters are significant)
o Incorrect channel types specified
o The sequence number queue (if applicable) is not available, or is damaged
o The dead-letter queue is not available
o The sequence number wrap value is different on the two channel definitions
o A queue manager, CICS system, or communication link is not available
o Following a restart, the wrong queue manager may have been attached to CICS
o A receiver channel might be in STOPPED state
o The connection might not be defined correctly
o There might be a problem with the communications software (for example, is TCP running?)
o In z/OS using CICS, check that the DFHSIT SYSIDNT name of the target CICS system matches the connection name that you have specified for that system
* It is possible that an in-doubt situation exists, if the automatic synchronization on startup has failed for some reason. This is indicated by messages on the system console, and the status panel may be used to show channels that are in doubt.
The possible responses to this situation are:
o Issue a Resolve channel request with Backout or Commit.
You need to check with your remote link supervisor to establish the number of the last message or unit of work committed. Check this against the last number at your end of the link. If the remote end has committed a number, and that number is not yet committed at your end of the link, then issue a RESOLVE COMMIT command.
In all other cases, issue a RESOLVE BACKOUT command.
The effect of these commands is that backed out messages reappear on the transmission queue and are sent again, while committed messages are discarded.
If in doubt yourself, perhaps backing out with the probability of duplicating a sent message would be the safer decision.
o Issue a RESET command.
This command is for use when sequential numbering is in effect, and should be used with care. Its purpose is to reset the sequence number of messages and you should use it only after using the RESOLVE command to resolve any in-doubt situations.
* On WebSphere MQ for iSeries, Windows, UNIX systems, and z/OS without CICS, and MQSeries for OS/2 Warp, there is no need for the administrator to choose a particular sequence number to ensure that the sequence numbers are put back in step. When a sender channel starts up after being reset, it informs the receiver that it has been reset and supplies the new sequence number that is to be used by both the sender and receiver.
Note:
If the sender is WebSphere MQ for z/OS using CICS, the sequence number should be reset to the same number as any receiving queue managers.
* If the status of a receiver end of the channel is STOPPED, it can be reset by starting the receiver end.
Note:
This does not start the channel, it merely resets the status. The channel must still be started from the sender end.
Triggered channels
If a triggered channel refuses to run, the possibility of in-doubt messages should be investigated as described above.
Another possibility is that the trigger control parameter on the transmission queue has been set to NOTRIGGER by the channel. This happens when:
* There is a channel error
* The channel was stopped because of a request from the receiver
* The channel was stopped because of a problem on the sender that requires manual intervention
After diagnosing and fixing the problem, you must start the channel manually.
An example of a situation where a triggered channel fails to start is as follows:
1. A transmission queue is defined with a trigger type of FIRST.
2. A message arrives on the transmission queue, and a trigger message is produced.
3. The channel is started, but stops immediately because the communications to the remote system are not available.
4. The remote system is made available.
5. Another message arrives on the transmission queue.
6. The second message does not increase the queue depth from zero to one, so no trigger message is produced (unless the channel is in RETRY state). If this happens, the channel must be started manually.
On WebSphere MQ for z/OS, if the queue manager is stopped using MODE(FORCE) during channel initiator shutdown, it may be necessary to manually restart some channels after channel initiator restart.
Conversion failure
Another reason for the channel refusing to run could be that neither end is able to carry out necessary conversion of message descriptor data between ASCII and EBCDIC, and integer formats. In this instance, communication is not possible.
Network problems
When using LU 6.2, make sure that your definitions are consistent throughout the network. For example, if you have increased the RU sizes in your CICS Transaction Server for z/OS or Communications Manager definitions, but you have a controller with a small MAXDATA value in its definition, the session may fail if you attempt to send large messages across the network. A symptom of this may be that channel negotiation takes place successfully, but the link fails when message transfer occurs.
When using TCP, if your channels are unreliable and your connections break, set a KEEPALIVE value for your system or channels. You can use the SO_KEEPALIVE option to set a system-wide value, and on WebSphere MQ for z/OS, you can also use the KeepAlive Interval channel attribute (KAINT) to set channel-specific keepalive values. These options are discussed in Checking that the other end of the channel is still available, and KeepAlive Interval (KAINT).
Adopting an MCA
The Adopt MCA function enables WebSphere MQ to cancel a receiver channel and to start a new one in its place.
For more information about this function, see Adopting an MCA. For details of its parameters, see WebSphere MQ for z/OS System Setup Guide.
Registration time for DDNS
When a group TCP/IP listener is started, it registers with DDNS. But there may be a delay until the address is available to the network. A channel that is started in this period, and which targets the newly registered generic name, fails with an 'error in communications configuration' message. The channel then goes into retry until the name becomes available to the network. The length of the delay will be dependent on the name server configuration used.
Dial-up problems
WebSphere MQ supports connection over dial-up lines but you should be aware that with TCP, some protocol providers assign a new IP address each time you dial in. This can cause channel synchronization problems because the channel cannot recognize the new IP addresses and so cannot ensure the authenticity of the partner. If you encounter this problem, you need to use a security exit program to override the connection name for the session.
This problem does not occur when a WebSphere MQ for AIX, iSeries, HP-UX, Linux, Solaris, and Windows, or MQSeries V5.1 for Compaq Tru64 UNIX, and OS/2 Warp product is communicating with another product at the same level, because the queue manager name is used for synchronization instead of the IP address.
When a channel refuses to run
If a channel refuses to run:
* Check that DQM and the channels have been set up correctly. This is a likely problem source if the channel has never run. Reasons could be:
o A mismatch of names between sending and receiving channels (remember that uppercase and lowercase letters are significant)
o Incorrect channel types specified
o The sequence number queue (if applicable) is not available, or is damaged
o The dead-letter queue is not available
o The sequence number wrap value is different on the two channel definitions
o A queue manager, CICS system, or communication link is not available
o Following a restart, the wrong queue manager may have been attached to CICS
o A receiver channel might be in STOPPED state
o The connection might not be defined correctly
o There might be a problem with the communications software (for example, is TCP running?)
o In z/OS using CICS, check that the DFHSIT SYSIDNT name of the target CICS system matches the connection name that you have specified for that system
* It is possible that an in-doubt situation exists, if the automatic synchronization on startup has failed for some reason. This is indicated by messages on the system console, and the status panel may be used to show channels that are in doubt.
The possible responses to this situation are:
o Issue a Resolve channel request with Backout or Commit.
You need to check with your remote link supervisor to establish the number of the last message or unit of work committed. Check this against the last number at your end of the link. If the remote end has committed a number, and that number is not yet committed at your end of the link, then issue a RESOLVE COMMIT command.
In all other cases, issue a RESOLVE BACKOUT command.
The effect of these commands is that backed out messages reappear on the transmission queue and are sent again, while committed messages are discarded.
If in doubt yourself, perhaps backing out with the probability of duplicating a sent message would be the safer decision.
o Issue a RESET command.
This command is for use when sequential numbering is in effect, and should be used with care. Its purpose is to reset the sequence number of messages and you should use it only after using the RESOLVE command to resolve any in-doubt situations.
* On WebSphere MQ for iSeries, Windows, UNIX systems, and z/OS without CICS, and MQSeries for OS/2 Warp, there is no need for the administrator to choose a particular sequence number to ensure that the sequence numbers are put back in step. When a sender channel starts up after being reset, it informs the receiver that it has been reset and supplies the new sequence number that is to be used by both the sender and receiver.
Note:
If the sender is WebSphere MQ for z/OS using CICS, the sequence number should be reset to the same number as any receiving queue managers.
* If the status of a receiver end of the channel is STOPPED, it can be reset by starting the receiver end.
Note:
This does not start the channel, it merely resets the status. The channel must still be started from the sender end.
Triggered channels
If a triggered channel refuses to run, the possibility of in-doubt messages should be investigated as described above.
Another possibility is that the trigger control parameter on the transmission queue has been set to NOTRIGGER by the channel. This happens when:
* There is a channel error
* The channel was stopped because of a request from the receiver
* The channel was stopped because of a problem on the sender that requires manual intervention
After diagnosing and fixing the problem, you must start the channel manually.
An example of a situation where a triggered channel fails to start is as follows:
1. A transmission queue is defined with a trigger type of FIRST.
2. A message arrives on the transmission queue, and a trigger message is produced.
3. The channel is started, but stops immediately because the communications to the remote system are not available.
4. The remote system is made available.
5. Another message arrives on the transmission queue.
6. The second message does not increase the queue depth from zero to one, so no trigger message is produced (unless the channel is in RETRY state). If this happens, the channel must be started manually.
On WebSphere MQ for z/OS, if the queue manager is stopped using MODE(FORCE) during channel initiator shutdown, it may be necessary to manually restart some channels after channel initiator restart.
Conversion failure
Another reason for the channel refusing to run could be that neither end is able to carry out necessary conversion of message descriptor data between ASCII and EBCDIC, and integer formats. In this instance, communication is not possible.
Network problems
When using LU 6.2, make sure that your definitions are consistent throughout the network. For example, if you have increased the RU sizes in your CICS Transaction Server for z/OS or Communications Manager definitions, but you have a controller with a small MAXDATA value in its definition, the session may fail if you attempt to send large messages across the network. A symptom of this may be that channel negotiation takes place successfully, but the link fails when message transfer occurs.
When using TCP, if your channels are unreliable and your connections break, set a KEEPALIVE value for your system or channels. You can use the SO_KEEPALIVE option to set a system-wide value, and on WebSphere MQ for z/OS, you can also use the KeepAlive Interval channel attribute (KAINT) to set channel-specific keepalive values. These options are discussed in Checking that the other end of the channel is still available, and KeepAlive Interval (KAINT).
Adopting an MCA
The Adopt MCA function enables WebSphere MQ to cancel a receiver channel and to start a new one in its place.
For more information about this function, see Adopting an MCA. For details of its parameters, see WebSphere MQ for z/OS System Setup Guide.
Registration time for DDNS
When a group TCP/IP listener is started, it registers with DDNS. But there may be a delay until the address is available to the network. A channel that is started in this period, and which targets the newly registered generic name, fails with an 'error in communications configuration' message. The channel then goes into retry until the name becomes available to the network. The length of the delay will be dependent on the name server configuration used.
Dial-up problems
WebSphere MQ supports connection over dial-up lines but you should be aware that with TCP, some protocol providers assign a new IP address each time you dial in. This can cause channel synchronization problems because the channel cannot recognize the new IP addresses and so cannot ensure the authenticity of the partner. If you encounter this problem, you need to use a security exit program to override the connection name for the session.
This problem does not occur when a WebSphere MQ for AIX, iSeries, HP-UX, Linux, Solaris, and Windows, or MQSeries V5.1 for Compaq Tru64 UNIX, and OS/2 Warp product is communicating with another product at the same level, because the queue manager name is used for synchronization instead of the IP address.
Comments
Post a Comment