Cisco TelePresence Solution Architecture: 2010

Tuesday, September 21, 2010

Tandberg - Basic - Troubleshooting Video Quality

1. Verify Endpoint model and firmware
2. Verify Call Diagnostics/System Information:

Video:
Check:
- Video Protocol(codec)
- Resolution
- Frame Rate
- Call rate

Audio
Check:
- Audio protocol (codec)
- Audio rate
Check TX/RX Packet loss

Gathering call details:

1. Verify QoS settings and marking from Tandberg Menu

2. Verify IPLR is enabled:
Transmit: Enables/disables Intelligent Packetloss Recovery
usage: iplr

3. Verify call details from Telnet:
xstatus Call 1

*s Call 1 (status=Synced, type=Vtlph, protocol=H323, direction=Incoming, logTag=
9):
     CallRate: 1920
     RemoteNumber: "3006131"
     IncomingNumber: "13302220001"
     IncomingSubAddress: ""
     Mute: Off
     Microphone: On
     Duration: 303
     MuteOutgoing: Off
     CallOnHold: False
     RemoteSiteOnHold: False
     MultiwayProgress: Off
     Channels 1 (type=Incoming):
       Rate: 1920
       Restrict: Off
       IPLR: Off
       Encryption (status=Off): /
       Audio (status=Active):
         Protocol: G722
         Rate: 64
       Video 1 (status=Active):
         Protocol: H264
         Resolution: w720p
         Rate: 1856
       Video 2 (status=Inactive): /
       Data (status=Inactive): /
     Channels 2 (type=Outgoing):
       Rate: 1920
       Restrict: Off
       IPLR: On
       Encryption (status=Off): /
       Audio (status=Active):
         Protocol: G722
         Rate: 64
       Video 1 (status=Active):
         Protocol: H264
         Resolution: w720p
         Rate: 1856
       Video 2 (status=Inactive): /
       Data (status=Inactive): /
*s/end

OK

4. Verify real time statistics:

Enable syslog
syslog on
syslog 3

H323MCS-1 pkts 23703, loss      0, jitter   0, maxjitter   1, drop    0, rate 640 (Audio input)
H323MCS-1 pkts 108081, loss      0, jitter   1, maxjitter   4, drop    0, rate 17140 (Main Video input)
H323MCS-1 pkts      2, loss      0, jitter   0, maxjitter   0, drop    0, rate 0 (Data input)
H323MCS-1 pkts 23703, loss      0, jitter   0, maxjitter   0, drop    0, rate 640 (Audio output)
H323MCS-1 pkts 73652, loss      0, jitter   7, maxjitter   7, drop    0, rate 12690 (Main Video output)
H323MCS-1 pkts     19, loss      0, jitter   0, maxjitter   0, drop    0, rate 0 (Data output)

5. Enable PacketLossDownSpeed:
xConfiguration PacketlossDownSpeed Mode:

6. Collect sniffer captures

* some commands may vary depending Endpoint model

Monday, September 13, 2010

General - Replay video from packet capture

1) Download videosnarf
Download the tar file:
videosnarf-0.63.tar.gz
http://sourceforge.net/projects/ucsniff/files/

2) Build and Compile VideoSnarf
Extract file above
cd videosnarf-(version)
./configure
make
make install

3) Download FFmpeg
Download the tar file
ffmpeg-0.6.tar.gz
http://ffmpeg.org/download.html

4) Build and Compile FFmpeg
Extract file above
cd ffmpeg-(version)
./configure
make
make install

5) Obtain packet capture using Wireshark and then filter it by source IP address and UDP port

Once you have filtered packet capture, Save As this filtered packet capture as new file containing only the stream you want to replay (Use Save As and then Selected Displayed packets)

6) Use Videosnarf to detect RTP streams in the packet capture
videosnarf -i filename.pcap -c
Will generate H264-media-1.264 file

7) Use FFMPEG to convert H264 file to AVI video
ffmpeg -i H264-media-1.264 myvideo.avi

8) Play your video.avi in your favorite video player

Friday, August 20, 2010

CTS System - Device names and numbers

TelePresence systems contains different components in the system board to process Video signals, when you go through the logs you will find reference to them, here is the device name and number used:

Video Encoder
Device 0 - Main Camera
Device 2 - PC Video Input (VGA)
Device 4 - Document Camera

At any time only one either Device 2 or Device 4 are active, never both.

Video Decoder
Device 1 - Plasma
Device 3 - Projector Display

Scope ID 15: Primary Video Encoder
Scope ID 16: Primary Video Decoder

Thanks to Shankar for sharing this

Tuesday, August 17, 2010

CTS System - SNMP Decode dateandTime

We have a value in our MIB for notifying when an specific call started:

Object ctpcStartDateAndTime

"This object specifies the value of local date and time when a call is started.

http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?local=en&translate=Translate&typeName=DateAndTime

This Object has variable type as "DateAndTime". DateAndTime is a standard Textual Convention which is defined in the SNMPv2-TC. DateAndTime is resolved to base data type OCTET STRING. The DISPLAY-HINT format for DataAndTime is given as follows.

DISPLAY-HINT "2d-1d-1d,1d:1d:1d.1d,1a1d:1d"

The date-time specification is as follows.

"A date-time specification.

field octets contents range
----- ------ -------- -----
1 1-2 year 0..65536
2 3 month 1..12
3 4 day 1..31
4 5 hour 0..23
5 6 minutes 0..59
6 7 seconds 0..60
(use 60 for leap-second)
7 8 deci-seconds 0..9
8 9 direction from UTC '+' / '-'
9 10 hours from UTC 0..11
10 11 minutes from UTC 0..59

For example, Tuesday May 26, 1992 at 1:30:15 PM EDT would be
displayed as:

1992-5-26,13:30:15.0,-4:0

Note that if only local time is known, then timezone information (fields 8-10) is not present."

Valid Usage

SnmpVar snmpvar = syntax.CreateVariable("'07:D2:09:03:0C:14:20:03:2B:07:00'");//hex format, length 11 bytes

SnmpVar snmpvar = syntax.CreateVariable("'07:D2:09:03:0C:14:20:03'");//hex format, length 8 bytes

SnmpVar snmpvar = syntax.CreateVariable("2002-9-21,13:53:32.3,-7:0");//string format, length 11 bytes

SnmpVar snmpvar = syntax.CreateVariable("2002-9-21,13:53:32.3");//string format, length 8 byte

Example:

07 DA 08 09 0E 13 31 00

07DA = 2010

08   = August

09   = 9th

0E   = 14

13   = 13 minutes

31   = 31 seconds

00   = deciseconds

The value of year is in network-byte order.

Saturday, August 14, 2010

CTS system - Basic Video quality troubleshooting

High level description

1. Obtain an accurate problem description:
• Display flickering, snowy
• Pinkish,greenish,blueish
• Black Display during loopback as well as during the call
• Half image, zoom in
• Other....

Obtain pictures or video when problem is happening

2. Problem details:
• New install, upgrade?
• Reproducible, random
• Start of the call, end of the call, after display is idle?
• Point to point calls?, Multi-point calls?
• Do we have a video or a picture when problem occurs.
• Problem only with one type of unit? Interop ?
• Problem appears after firmware upgrade? Any network changes?
• Any message during the failure (i.e Network congestion etc.)
• Verify type of Display and generation (CTS 500 GEN2 37", CTS 1000 GEN2 65", etc)
• Obtain app code and boot code:
- App code, boot code and firmware must be the same on all three displays

Isolating the problem

We can try first a loopback with encoder enabled, this will confirm codec itself is not dropping packets

From SSH CLI:
diag display loopback full enable

If ICMP is enabled across the path we can use mtr to see which device may be introducing delay, packet loss, etc.

utils network mtr count 100 tos 160

Replace 160 for your current QoS marking:

http://mytelepresence.blogspot.com/2010/04/telepresence-system-mtr-utility.html

Verify Packets sent/receive show proper QoS value and proper policy maps are matched

Network devices:

Verify SRND guide to make sure we meet network requirements:

http://www.cisco.com/en/US/docs/solutions/Enterprise/Video/TP-Book.html

1) ‘show run’ output

In addition, I would suggest collecting the following information to further isolate network traffic:

1) Interf type x/y <- WAN interface
2) ‘load-interval 30’
3) Reproduce problem and keep call up for perhaps 5 mins and then collect
the following information while the call is still up and while packet loss is being seen:

a) ‘show interf intftype x/y’
b) ‘show policy-map interf intftype x/y ’
c) ‘show queue intftype x/y’

The above should allow us to (a) determine what our guaranteed bandwidth really is (b) determine where we are seeing drops (ex: interface drops vs software queuing drops).

Thursday, August 5, 2010

CTS System - DTLS Call analysis

As of now CTS codecs supports 2 methods for securing Media traffic: SRTP and DTLS, in this section we are going to describe DTLS protocol and the way it works with CTS units. This article refers what is needed to enable DTLS it does not cover other security protocols used by CUCM, CTMS or CTS, to obtain more information consult TelePresence security guide in CCO.
http://www.cisco.com/en/US/docs/solutions/Enterprise/Video/telepresence.html#wp41946

DTLS

Datagram Transport Layer Security (DTLS) provides communication security for datagram protocols. DTLS is based on Transport Layer Security (TLS) protocol. This datagram-compatible version of the protocol is specifically designed to be similar to TLS with the minimal amount of changes needed to fix problems created by the reordering or loss of packets. There are two main areas that unreliability creates problems for TLS:

The traffic encryption layer does not allow individual packets to be decrypted, there are two inter-record dependencies:
- Cryptographic context is chained between records
- A Message Authentication Code (MAC) that includes a sequence number provides anti-replay and message reordering protection, but the sequence numbers are implicit in the records
The handshake layer breaks if messages are lost because it depends on them being transmitted reliably for these two reasons:
- The handshake is a lockstep cryptographic handshake requiring messages to be transmitted and received in a defined order, causing a problem with potential reordering and message loss
- Fragmentation can be a problem because the handshake messages are potentially larger then any given datagram

The first problem caused by the inter-packet dependencies can be solved by using a method employed in the Secure Internet Protocol (IPsec) by adding explicit state to each individual record.

To solve the issue of packet loss DTLS employs a simple retransmission timer. Figure 1 below illustrates the basic concept. The client is expecting to see the HelloVerifyRequest message from the server. If the timer expires then the client knows that either the ClientHello or the HelloVerifyRequest was lost and retransmits.

In TelePresence solution it does not matter the number of codecs in the system only primary codecs will perform dTLS handshake, IP Phones are not associated in the process. In every TelePresence call two dTLS handhsakes occur, one for each stream (video and audio)

DTLS reuses almost all the protocol elements of TLS with minor but important modifications for it to work properly with datagram transport,

A TLS client initiates the handshake by sending the ClientHello message. This message contains the TLS version, a list of algorithms and compression methods that the client will accept and a random nonce used for anti- replay.

The server responds with The Server- Hello contains the server’s choice of version and algorithms and a random nonce. The Certificate contains the server’s certificate chain. The ServerHelloDone is sim- ply a marker message to indicate that no other messages are forthcoming. In more complicated handshakes other messages would appear between the Certificate and the ServerHelloDone messages.

Then send the ChangeCipherSpec message to indicate that it is changing to the newly negotiated protection suite.

CTS Manager - OBTP Troubleshooting

Sunday, July 25, 2010

Tandberg - Cisco Integration

After Tandberg acquisition, there has been different questions in our external alias (ask-telepresence-technical )and Cisco NetPro forum regarding the interoperability of Tandberg products and existing Cisco ones. In general one of most important questions is:
"What exactly is going to be included in our roadmap? "

We all want to know what is going to be supported, what is not, we want to make sure that if we acquire an specific product in the next few months is not going to be EoL soon, we also want to know what will happen with my existing installed Cisco TP equipment, etc.

For the next few months I will be updating this blog with information regarding Tandberg and Cisco integration which will help you to deploy both products smoothly, as well as making reference for all the information in our site (www.cisco.com) for the latest updates.

Tuesday, July 20, 2010

CTS Manager - CURL Browsing

To verify WebServer connection using SSL, try the following:

curl -u scheduler:C1sco123 https://172.16.154.21/exchange/ -k

You will see a Security Warning message from that web page only using SSL mode in case remote web server is configured for HTTP SSL

You can use curl for HTTP troubleshooting issues check:
http://curl.haxx.se/docs/httpscripting.html

Monday, July 19, 2010

B2B - ASR/GSR SBC Basic call analysis

In order to debug SIP messages arriving to GSR SBC, please follow these steps:

GSR SBC

RP/0/9/CPU0:ciscotxbu-mysbc-1#show services red
Service type     Name                    Pref. Active        Pref. Standby
--------------------------------------------------------------------------------
SBC              mycompanysbc              0/7/CPU0 Active     0/6/CPU0 Standby

RP/0/9/CPU0:ciscotxbu-mysbc-1#run attach 0/7/CPU0

attach: Starting session 0 to nodeid 0/7/CPU0

attach: Type "exit" to quit.

ksh-LC>en
ksh-LC>cd /tmp
ksh-LC>ls pdtrc*
pdtrc.201007191624.log    pdtrc.201007191936.log

ksh-LC>tail -f pdtrc.201007191936.log

Place call

ASR SBC

1) issue CLI 'sbc dump-diagnostics' ;
2) issue CLI 'sh run';
3) issue CLI 'sh bootflash:' to find out the ipstrc file1 dumped by step1;
4) debug sbc asr-sbc log-level buffer 0;
5) Place the call, after the call is released, issue CLI 'sbc dump-diagnostics'
6) issue CLI 'sh bootflash' to find out the newly dumped ipstrc file2 & pdtrc file.
7) Copy the above file1, file2 & pdtrc out. (ie copy filename tftp/ftp,scp)
8) 'no debug all' to disable the debug.

Please turn on the above debug in low traffic window in case it is a production network

Wednesday, July 14, 2010

CTS Manager - NDR Messages

Occasionally, you will get a Non-Delivery Notice (NDR) when you try to send email. An NDR indicates that your email did not make it to the destination, but why? Most of the time you have simply made a typo in the address, but sometimes you just don't know what is wrong. That is where the error numbers listed in the NDR can help.
An NDR looks something like this:

Your message did not reach some or all of the intended recipients.
From:       donotreply@cisco.com
Subject:    Telepresence Meet-Me Meeting Confirmation
Sent:         7/12/2010 11:36 AM

The following recipient(s) could not be reached:

gogasca@blogger.com onTelepresence Meet-Me Meeting Confirmation
There was a SMTP communication problem with the recipient's email server. Please contact your system administrator.

In the above case you tried to send an email to Gonzalo Gasca, the subject of the email was "Telepresence Meet-Me Meeting Confirmation", and it was sent on 7/12/2010 11:36 AM. The problem with the email was that the destination server (blogger.com) is using an spam filter. You should verify the address and try again (as indicated). In this case we need to engage IT to verify spam filters and allow emails from our domain account donotreply@cisco.com

Sometimes the last line of the NDR looks like:

This shows 2 error messages, the first error message (550) was recorded by the sending server, the recipient server sent a clarified message (in this case 552). Any error messages after the ";" (in this case the 552) are clarification errors generated by the recipients server. So in this particular case, the destination server reported that the destination mailbox was unavailable, further more, it was unavailable because the recipients mailbox was full.

Error codes for the digits are listed here:

1: The server has accepted the command, but does not yet take action. A confirmation message is required. Currently, this is not used.
2: The server has completed the task successfully.
3: The server has understood the request, but requires further information to complete it.
4: The server has encountered a temporary failure. If the command is repeated without any change, it might be completed. This is hardly ever used by mail servers.
5: The server has encountered an error.

Errors codes for the second digit are:

0: A syntax error has occurred.
1: Indicates a informational reply, for example to a HELP request.
2: Refers to the connection status.
3 and 4 are unspecified.
5: Refers to the status of the mail system as a whole and the mail server in particular.
6: This digit refers to the status of the mail server.

Error codes list more specific errors as follows:

211 A system status message.
214 A help message for a human reader follows.
220 Service ready.
221 Service closing.
250 Requested action taken and completed. The best message of them all.
251 The recipient is not local to the server, but it will accept and forward the message.
252 The recipient cannot be VRFYed, but the server accepts the message and attempts delivery.
354 Start message input and end with .. This indicates that the server is ready to accept the message itself.
421 The service is not available and the connection will be closed.
422 The recipient has exceeded their mailbox limit. It could also be that the delivery directory on the Virtual server has exceeded its limit.
431 Not enough disk space on the delivery server. Microsoft say this NDR maybe reported as out-of-memory error.
432 Classic temporary problem, the Administrator has frozen the queue.
441 Intermittent network connection. The server has not yet responded. Classic temporary problem. If it persists, you will also a 5.4.x status code error.
442 The server started to deliver the message but then the connection was broken.
446 Too many hops. Most likely, the message is looping.
447 Problem with a timeout. Check receiving server connectors.
449 A DNS problem. Check your smart host setting on the SMTP connector. For example, check correct SMTP format. Also, use square brackets in the IP address [197.89.1.4] You can get this same NDR error if you have been deleting routing groups.
450 The requested command failed because the user's mailbox was unavailable.
451 The command has been aborted due to a server error. Not your fault.
452 The command has been aborted because the server has insufficient system storage.
465 Multi-language situation. Your server does not have the correct language code page installed.
500 The server could not recognize the command due to a syntax error.
501 A syntax error was encountered in command arguments.
502 This command is not implemented.
503 The server has encountered a bad sequence of commands.
504 A command parameter is not implemented.
51x Problem with email address.
510 Often seen with contacts. Check the recipient address.
511 Another problem with the recipient address. Maybe an Outlook client replied to a message while offline.
512 SMTP; 550 Host unknown. An error is triggered when the host name can’t be found. For example, when trying to send an email to bob@ nonexistantdomain.com.
513 Another problem with contacts. Address field maybe empty. Check the address information.
514 Two objects have the same address, which confuses the categorizer.
515 Destination mailbox address invalid.
516 Mailbox may have moved.
517 Problem with senders mail attribute, check properties sheet in ADUC.
52x NDR caused by a problem with the large size of the email.
521 The message is too large. Else it could be a permissions problem. Check the recipient's mailbox.
522 The recipient has exceeded their mailbox limit.
523 Recipient cannot receive messages this big. Server or connector limit exceeded.
524 Most likely, a distribution list or group is trying to send an email. Check where the expansion server is situated.
530 Problem with MTA, maybe someone has been editing the registry to disable the MTA / Store driver.
531 Mail system full.
532 System not accepting network messages.
533 Remote server has insufficient disk space to hold email.
534 Message too big.
535 Multiple Virtual Servers are using the same IP address and port. Email probably looping.
540 DNS Problem. There is no DNS server that can resolve this email address.
541 No answer from host.
542 Bad connection.
543 Routing server failure. No available route.
544 Cannot find the next hop.
546 Tricky looping problem, a contact has the same email address as an Active Directory user. One user is probably using an Alternate Recipient with the same email address as a contact.
547 Delivery time-out. Message is taking too long to be delivered.
548 Bad recipient policy.
550 The requested command failed because the user's mailbox was unavailable (for example because it was not found, or because the command was rejected for policy reasons). Underlying SMTP 500 error. Our server tried ehlo, the recipient's server did not understand and returned a 550 or 500 error.
551 The recipient is not local to the server. The server then gives a forward address to try.
552 The action was aborted due to exceeded storage allocation. Possibly the disk holding the operating system is full. Or could be a syntax error if you are executing SMTP from a telnet shell.
553 The command was aborted because the mailbox name is invalid. Could also be More than 5,000 recipients specified.
554 The transaction failed.
555 Wrong protocol version.
563 More than 250 attachments.
571 Permissions problem. For some reason the sender is not allowed to email this account. Perhaps an anonymous user is trying to send mail to a distribution list.
572 Distribution list cannot expand and so is unable to deliver its messages.
573 Internal server error, IP address related.
574 Extra security features not supported.
575 Cryptographic failure. Try a plain message with encryption.
576 Certificate problem, encryption level maybe to high.
577 Message integrity problem.

Most errors are fairly self explanatory if read carefully. Some common error messages, what they mean and what you can do about it, are listed here.
Some explanation might help first:

The email address consists of two parts (for the sake of this article). The username and the domain name. The username is the part in front of the "@" sign and the domain name is the part after the "@" sign. Example: support@microsoft.com, support is the username and microsoft.com is the domain name.
The email address you are sending to is referred to as the recipient, the person sending the email is the sender.
The recipient's mailbox and server is also referred to as the "destination mailbox" or the "destination server". The recipient's mailbox is also referred to as "e-mail account".
The email you send, is sent to a "mailbox". So if you send an email to john.doe@company.com, the recipient's mailbox would refer to "john.doe's" mailbox.

Domain name : Part After the @ sign.
Username/recipient : Part before the @ sign.

Now some error messages:

The destination server for this recipient could not be found in Domain Name Service (DNS).
- The domain name was invalid.
- Check spelling. Verify address.
- Error number 540
The e-mail account does not exist at the organization this message was sent to. Check the e-mail address, or contact the recipient directly to find out the correct address.
- The username part of the email address is invalid
- Check spelling. Verify address.
- Error number 511
Could not deliver the message in the time limit specified. Please retry or contact your administrator.
- The recipient's email server did not respond in a timely manner. Normally email will retry for 2 days before being rejected with this error.
- There was a problem at the recipients mail server, not much you can do but retry again.
- Error number 447
The message could not be delivered because the recipient's mailbox is full.
There was a SMTP communication problem with the recipient's email server. Please contact your system administrator. Requested action not taken: exceeded storage allocation.
- The recipients mailbox was full, and would not accept more email.
- Contact the username and have them delete some messages in their mailbox. Send again.
- Error number 522

Friday, July 2, 2010

TelePresence CTMS - Interop troubleshooting

#*#000 = %23*%23000

Sridhar to complete

Friday, May 28, 2010

General - How to Troubleshoot Linux Kernel Panics

Problem Description:

Kernel panics on Linux are hard to identify and troubleshoot. Troubleshooting kernel panics often requires reproducing a situation that occurs rarely and collecting data that is difficult to gather.

Solution Summary:

This document outlines several techniques that will help reduce the amount of time necessary to troubleshoot a kernel panic.

Technical Discussion:

What is a kernel panic?
As the name implies, the Linux kernel gets into a situation where it doesnt know what to do next. When this happens, the kernel gives as much information as it can about what caused the problem, depending on what caused the panic.

There are two main kinds of kernel panics:

Hard Panic : also known as Aieee!
Soft Panic : also known as Oops

What can cause a kernel panic?
Only module that are located within kernel space can directly cause the kernel to panic. To see what modules are dynamically loaded, do lsmod this shows all dynamically loaded modules (Dialogic drivers, LiS, SCSI driver, filesystem, etc.). In addition to these dynamically loaded modules, components that are built into the kernel (memory map, etc.) can cause a panic.

Since hard panics and soft panics are different in nature, we will discuss how to deal with each separately.

How to Troubleshoot a Hard Kernel Panic

Hard Panics Symptoms:

Machine is completely locked up and unusable.
Num Lock / Caps Lock / Scroll Lock keys usually blink.
If in console mode, dump is displayed on monitor (including the phrase Aieee!).
Similar to Windows Blue Screen.

Hard panics causes:

The most common cause of a hard kernel panic is when a driver crashes within an interrupt handler, usually because it tried to access a null pointer within the interrupt handler. When this happens, that driver cannot handle any new interrupts and eventually the system crashes. This is not exclusive to Dialogic drivers.

Hard panics information to collect:
Depending on the nature of the panic, the kernel will log all information it can prior to locking up. Since a kernel panic is a drastic failure, it is uncertain how much information will be logged. Below are key pieces of information to collect. It is important to collect as many of these as possible, but there is no guarantee that all of them will be available, especially the first time a panic is seen.

/var/log/messages sometimes the entire kernel panic stack trace will be logged there
Application / Library logs (RTF, cheetah, etc.) may show what was happening before the panic
Other information about what happened just prior to the panic, or how to reproduce
Screen dump from console. Since the OS is locked, you cannot cut and paste from the screen. There are two common ways to get this info:

Digital Picture of screen (preferred, since its quicker and easier)
Copying screen with pen and paper or typing to another computer

If the dump is not available either in /var/log/message or on the screen, follow these tips to get a dump:

If in GUI mode, switch to full console mode no dump info is passed to the GUI (not even to GUI shell).
Make sure screen stays on during full test run if a screen saver kicks in, the screen wont return after a kernel panic. Use these settings to ensure the screen stays on.

setterm -blank 0
setterm -powerdown 0
setvesablank off

From console, copy dump from screen (see above).
Hard panics Troubleshooting when a full trace is available
The stack trace is the most important piece of information to use in troubleshooting a kernel panic. It is often crucial to have a full stack trace, something that may not be available if only a screen dump is provided the top of the stack may scroll off the screen, leaving only a partial stack trace. If a full trace is available, it is usually sufficient to isolate root cause. To identify whether or not you have a large enough stack trace, look for a line with EIP, which will show what function call and module caused the panic. In the example below, this is shown in the following line:
EIP is at _dlgn_setevmask [streams-dlgnDriver] 0xe

If the culprit is a Dialogic driver you will see a module name with:
streams-xxxxDriver (xxxx = dlgn, dvbm, mercd, etc.)

Hard panic full trace example:

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
printing eip:
f89e568a
*pde = 32859001
*pte = 00000000
Oops: 0000
Kernel 2.4.9-31enterprise
CPU: 1
EIP: 0010:[] Tainted: PF
EFLAGS: 00010096
EIP is at _dlgn_setevmask [streams-dlgnDriver] 0xe
eax: 00000000 ebx: f65f5410 ecx: f5e16710 edx: f65f5410
esi: 00001ea0 edi: f5e23c30 ebp: f65f5410 esp: f1cf7e78
ds: 0018 es: 0018 ss: 0018
Process pwcallmgr (pid: 10334, stackpage=f1cf7000)
Stack: 00000000 c01067fa 00000086 f1cf7ec0 00001ea0 f5e23c30 f65f5410 f89e53ec
f89fcd60 f5e16710 f65f5410 f65f5410 f8a54420 f1cf7ec0 f8a4d73a 0000139e
f5e16710 f89fcd60 00000086 f5e16710 f5e16754 f65f5410 0000034a f894e648
Call Trace: [setup_sigcontext+218/288] setup_sigcontext [kernel] 0xda
Call Trace: [] setup_sigcontext [kernel] 0xda
dlgnwput [streams-dlgnDriver] 0xe8
Sm_Handle [streams-dlgnDriver] 0×1ea0
intdrv_lock [streams-dlgnDriver] 0×0
Gn_Maxpm [streams-dlgnDriver] 0×8ba
Sm_Handle [streams-dlgnDriver] 0×1ea0
lis_safe_putnext [streams] 0×168
__insmod_streams-dvbmDriver_S.bss_L117376 [streams-dvbmDriver] 0xab8
dvbmwput [streams-dvbmDriver] 0×6f5
dvwinit [streams-dvbmDriver] 0×2c0
lis_safe_putnext [streams] 0×168
lis_strputpmsg [streams] 0×54c
__insmod_streams_S.rodata_L35552 [streams] 0×182e
sys_putpmsg [streams] 0×6f
[system_call+51/56] system_call [kernel] 0×33
system_call [kernel] 0×33
Nov 28 12:17:58 talus kernel:
Nov 28 12:17:58 talus kernel:
Code: 8b 70 0c 8b 06 83 f8 20 8b 54 24 20 8b 6c 24 24 76 1c 89 5c

Hard panics Troubleshooting when a full trace is not available
If only a partial stack trace is available, it can be tricky to isolate the root cause, since there is no explicit information about what module of function call caused the panic. Instead, only commands leading up to the final command will be seen in a partial stack trace. In this case, it is very important to collect as much information as possible about what happened leading up to the kernel panic (application logs, library traces, steps to reproduce, etc).

Hard panic partial trace example (note there is no line with EIP information)
ip_rcv [kernel] 0×357
sramintr [streams_dlgnDriver] 0×32d
lis_spin_lock_irqsave_fcn [streams] 0×7d
inthw_lock [streams_dlgnDriver] 0×1c
pwswtbl [streams_dlgnDriver] 0×0
dlgnintr [streams_dlgnDriver] 0×4b
Gn_Maxpm [streams_dlgnDriver] 0×7ae
__run_timers [kernel] 0xd1
handle_IRQ_event [kernel] 0×5e
do_IRQ [kernel] 0xa4
default_idle [kernel] 0×0
default_idle [kernel] 0×0
call_do_IRQ [kernel] 0×5
default_idle [kernel] 0×0
default_idle [kernel] 0×0
default_idle [kernel] 0×2d
cpu_idle [kernel] 0×2d
__call_console_drivers [kernel] 0×4b
call_console_drivers [kernel] 0xeb
Code: 8b 50 0c 85 d2 74 31 f6 42 0a 02 74 04 89 44 24 08 31 f6 0f
<0> Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing]

Hard panics using kernel debugger (KDB)
If only a partial trace is available and the supporting information is not sufficient to isolate root cause, it may be useful to use KDB. KDB is a tool that is compiled into the kernel that causes the kernel to break into a shell rather than lock up when a panic occurs. This enables you to collect additional information about the panic, which is often useful in determining root cause.

Some important things to note about using (KDB)

If this is a potential Cisco issue, technical support should be contacted prior to the to use of KDB
Must use base kernel i.e. 2.4.18 kernel instead of 2.4.18-5 from RedHat. This is because KDB is only available for the base kernels, and not the builds created by RedHat. While this does create a slight deviation from the original configuration, it usually does not interfere with root cause analysis.
Need different Cisco VOS drivers compiled to handle the specific kernel.

How to Troubleshoot a Soft Kernel Panic

Soft panics symptoms:

Much less severe than hard panic.
Usually results in a segmentation fault.
Can see an oops message search /var/log/messages for string Oops.
Machine still somewhat usable (but should be rebooted after information is collected).

Soft panics causes:

Almost anything that causes a module to crash when it is not within an interrupt handler can cause a soft panic. In this case, the driver itself will crash, but will not cause catastrophic system failure since it was not locked in the interrupt handler. The same possible causes exist for soft panics as do for hard panics (i.e. accessing a null pointer during runtime).

Soft panics information to collect:

When a soft panic occurs, the kernel will generate a dump that contains kernel symbols this information is logged in /var/log/messages. To begin troubleshooting, use the ksymoops utility to turn kernel symbols into meaningful data.

To generate a ksymoops file:

Create new file from text of stack trace found in /var/log/messages. Make sure to strip off timestamps, otherwise ksymoops will fail.

Run ksymoops on new stack trace file:

Generic: ksymoops -o [location of Application drivers] filename

Example: ksymoops -o /lib/modules/2.4.18-5/misc ksymoops.log

All other defaults should work fine

For a man page on ksymoops, see the following webpage:
http://gd.tuwien.ac.at/linuxcommand.org/man_pages/ksymoops8.html

So you are trying to start Linux for the first time and wham! You get messages like:

Unable to mount root device.
Kernel panic - not syncing.

What do I do now? Oh, how I love Windows!!;

Here's the scoop;

(1) The first part of the system that starts running is the boot loader, usually grub. This is the program that loads Linux, and/or Windows if you so desire. (The master boot record, or MBR, enables the computer to load grub.)

(2) The first thing that Grub needs to know is where is the kernel? It gets this from the /boot/grub/grub.conf file. The way that you specify the correct drive and partition in Grub is a little different from, like (hd0,0) what you use in ordinary Linux. The kernel will be in some file named vmlinuz.

(3) Once Grub has loaded the kernel into memory, the first thing that the kernel needs to know is, where is the root filesystem? The root= parameter is passed to the kernel to provide this information. Notice that now you are talking to Linux, and you identify devices in Linux's terms, like /dev/hda23.

(4) Given this information, Linux is going to try to mount the root filesystem prepare it for use. The most common mistake at this point is that you've specified the wrong device in step #3. Unfortunately, the message that results is rather nasty looking.

When Linux doesn't know how to proceed, as in this case, it says kernel panic and it stops. But, even then, it tries to go down gracefully. It tries to write anything to disk that hasn't been written out (an operation called syncing, for some darn-fool reason), and if it succeeds in doing so it will say not syncing. What's totally misleading about this message combination is that it implies, incorrectly, that the reason for the panic is not syncing, when actually the reason for the panic will be found in the preceding few lines.

You might see the message, tried to kill 'init'. That really means that a program called init died& which it is not allowed to ever do. init is a very special program in Linux the first program created when the machine starts.

So, basically, when you get these messages on startup the situation is really a lot more dreadful looking than it actually is. You have probably just made a tpyo when entering the information in grub.conf.

(Another common place to make a typo is in /etc/fstab, which tells Linux where all the other drives are.)

So what do you do? If you're doing a first-time install you can just start over. Otherwise, you need to boot a separate CD-ROM, which will give you a stand-alone Linux installation from which you can edit the offending files.

Explained: kernel panic - not syncing - attempted to kill init

--------------------------------------------------------------------------------

When the kernel gets into a situation where it does not know how to proceed (most often during booting, but at other times), it issues a kernel panic by calling the panic(msg) routine defined in kernel/panic.c. (Good name, huh?) This is a call from which No One Ever Returns.

The panic() routine adds text to the front of the message, telling you more about what the system was actually doing when the panic occurred & basically how big and bad the trail of debris in the filesystem is likely to be. This is where the not syncing part comes from, and when you see that, it's good. (panic() does try to issue a sinc() system-call to push all buffered data out to the hard-disks before it goes down.)

The second part of the message is what was provided by the original call to panic(). For example, we find panic(Tried to kill init!) in kernel/exit.c.

So, what does this actually mean? Well, in this case it really doesn't mean that someone tried to kill the magical init process (process #1&), but simply that it tried to die. This process is not allowed to die or to be killed.

When you see this message, its almost always at boot-time, and the real messages; the cause of the actual failure; will be found in the startup messages immediately preceding this one. This is often the case with kernel-panics. init encountered something really bad, and it didn't know what to do, so it died, so the kernel died too.

BTW, the kernel-panic code is rather cute. It can blink lights and beep the system-speaker in Morse code. It can reboot the system automagically. Obviously the people who wrote this stuff encountered it a lot;

In diagnosing, or at least understanding, kernel-panics, I find it extremely helpful to have on-hand a copy of the Linux source-code, which is usually stored someplace like /usr/src/linux-2.x. You can use the grep utility to locate the actual code which caused the panic to occur.

CTS System - Secure registration analysis

Get the CTS logs from this location:

http://docs.google.com/leaf?id=0B47-vpuz_NefN2VmODNiYzUtMzhlYS00OGY1LTkzYzMtN2E5ZDIxNDM3ZWFl&hl=en

You will find 2 files, Registration1 and Registration2.

Registration 1 contains 4 different types of system bootup:

No security, MIC, LSC and AUTHString, including CTS logs and Packetcapture

Registration2 contains the CTS logs analysis for a secure using MIC certificate.

Look for ##DEBUG## in Log files

. CTS registration

Logs needed:

CCM SDI/SDL traces

Sniffer capture

CTS logs

CTS sysop logs,

cdp

cca,

ccafg

cmr,

cma,

sip

Srtp*

Secd*,

tsps

rc.log,

sysm

What to check?

Start with CTS SYSOP logs

Verify CCAFG logs to confirm CUCM file is retrieved

Check sysm files for system startup

If multi-screen verify secondaries and look for the same logs

Check CCA logs (SIP messages are included in CCA in human readable timestamps)

Check SIP logs (Epoch time)

If Security is configured check:

- SECD

Check Keep alive timers and CUCM configuration

Verify CUCM Registration

Check CUCM SDI/SDL Traces

Check system has initialized successfully from Phone UI (Take ScreenShot)

Check TSPS for Midlet/Phone UI XML

Use tcpdump or utils network capture command

CTS System - Basic MultiPoint call analysis

The following information will help you how to perform a basic call control analysis for a Telepresence call between 3 CTS Endpoints and CTMS in this call we have 1 CTS 3200, 1CTS 3000 and 1 CTS-1300 system.

Versions:

CUCM 7.1.3.32900-4 Pub and Sub

CTS 1.6.5

CTS 3200 and CTS 3000 use ts4.local (Presentation codec).

Call from 3001,3002 and 3003 to 4006 around 15:45 PST time

Get the files from this location and look for the word: ##DEBUG##

http://docs.google.com/leaf?id=0B47-vpuz_NefMTRmNDgzYjMtZGI0MS00NDBkLWJjMDYtODNjNDNhZTQ2OTk2&hl=en

Logs needed:

CCM SDI/SDL traces (Detailed level)

CTMS Logs set to INFO level

CTMS sysop, sip, ccs, switching, alarm, rtp, keyechange*

CTS Logs

CTS sysop logs,

rtp,

cca,

cmr,

cma,

sip

Srtp*

Secd*,

Keyexchange*

rc.log,

sysm

Sniffer capture (Optional)

+++++DEBUG+++++

You will find details in CUCM SDI, SDL, CTMS and CTS traces (CCA, CMR, OSD and TSPS) for each call

What to check?

Use CTS SYSOP logs

Check CCA logs

Check SIP logs

Check CUCM traces

Check CMA

Check CMR

If Security is configured check:

- SECD

- KEYEXCHANGE

- SRTP

Check CUCM Configuration/Infraestructure (i.e. BEA, SBCs, etc)

Check CUCM Traces SDI/SDL traces

Use CTMS_SYSOP

Check SIP.LOGx

Check CCS

Check SWITCHING

Check RTP

Check ALARM

If Security is configured check:

- KEYEXCHANGE

Troubleshooting

•Verify TelePresence SW Compatibility matrix

•From CTS CLI, verify Calling Services are up

•Verify System status in CTS GUI

•Verify system is registered

•Verify Region Bandwidth in CUCM (Check Video and Audio BW)

•Verify Location

•Verify CSS (Calling Search Space (Use Dial Number Analyzer)

•Check Route Patterns

•Verify SIP Trunk Security Profile matches what is configured in CTMS (Protocol, port)

•Check Region between SIP trunk for CTMS and Endpoints

•Verify Static/Scheduled or Ad-hoc meeting is configured in CTMS

•Check packet flow

•Use sniffer capture (Span port is recommended)

CTS - Certification - TelePresence 642-185 Exam

This is the material I used to clear 642-185 Exam:

Cisco TelePresence Installations Specialist

TelePresence book TelePresence book is available to read for free on Google Books:

http://books.google.com/books?id=1fKoJX40KtYC&printsec=frontcover&dq=telepresence+cisco&ei=Mr_6S_mmDqbukwSdiOHJAw&cd=1#v=onepage&q&f=false

SRND Guide

http://www.cisco.com/en/US/docs/solutions/Enterprise/Video/TP-Book.html

Certification training

Video 101: Introduction to Video Concepts

Troubleshooting Cisco TelePresence

4. Room design:

Hardware Guides for each CTSXXX System

Thursday, May 27, 2010

CTS System - TelePresence SIP H264 profile-level-id

In a SIP SDP an H.264 video capability can sometimes appear such as this:

CTMS:

a=rtpmap:96 mpeg4-generic/48000
a=fmtp:96 profile-level-id=16;streamtype=5;mode=AAC-hbr;config=11B0;sizeLength=13;indexLength=3;indexDeltaLength=3;constantDuration=480
a=rtpmap:0 PCMU/8000
a=rtpmap:99 L16/48000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
m=video 16390 RTP/AVP 112
b=TIAS:4000000
a=rtpmap:112 H264/90000
a=fmtp:112 profile-level-id=ABCDEF;packetization-mode=1

CTS:

a=rtpmap:96 mpeg4-generic/48000
a=fmtp:96
profile-level-id=16;streamtype=5;mode=AAC-hbr;config=B98C00;sizeLength=1
3;indexLength=3;indexDeltaLength=3;constantDuration=480
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
m=video 28242 RTP/AVP 112
b=TIAS:4000000
a=rtpmap:112 H264/90000
a=fmtp:112
profile-level-id=4d0028;sprop-parameter-sets=R00AKAmWUgDwBDyA,SGE7iJyA;p
acketization-mode=1

RFC 3984 defines this (page 38).

"If the profile-level-id parameter is used for capability exchange or session setup procedure,

it indicates the profile that the codecsupports and the highest levelsupported for the signaled profile. The profile-iop byte indicates whether the codec has additional limitations whereby only the common subset of the algorithmic features and limitations of the profiles signaled with the profile-iop byte and of the profile indicated by profile_idc is supported by the codec. For example, if a codec supports only the common subset of the coding tools of the Baseline profile and the Main profile at level 2.1 and

below, the profile-level-id becomes 42E015, in which 42 stands for the Baseline profile, E0 indicates that only the common subset for all profiles is supported, and 15 indicates level 2.1."

The question is how does the 15 convert to level 2.1?

Page 298 of the ITU spec for H.264 documents how to convert this hex value into a level.

A level to which the bitstream conforms shall be indicated by the syntax elements level_idc and constraint_set3_flag as follows.

– If level_idc is equal to 11 and constraint_set3_flag is equal to 1, the indicated level is level 1b.

– Otherwise (level_idc is not equal to 11 or constraint_set3_flag is not equal to 1), level_idc shall be set equal to a value of ten times the level number specified in Table A-1 and constraint_set3_flag shall be set equal to 0.

This means to covert the last byte you convert the hex to decimal and divide it by 10. 0x15 = decimal 21 = level 2.1

Level limits for each are documented at http://rob.opendot.cl/index.php/useful-stuff/h264-profiles-and-levels/. You can also look at table A-1 of the H.264 ITU spec linked below.

http://www.itu.int/rec/T-REC-H.264/en

Internally CUCM converts these to H.241 levels. This mapping is in the ITU-T H.241 table 8-4.

http://www.itu.int/rec/T-REC-H.241/en

Thanks to Ryan Ratliff for compiling information above

Cisco TelePresence Solution Architecture

Tuesday, September 21, 2010

Tandberg - Basic - Troubleshooting Video Quality

Monday, September 13, 2010

General - Replay video from packet capture

Friday, August 20, 2010

CTS System - Device names and numbers

Tuesday, August 17, 2010

CTS System - SNMP Decode dateandTime

Saturday, August 14, 2010

CTS system - Basic Video quality troubleshooting

High level description

Isolating the problem

Thursday, August 5, 2010

CTS System - DTLS Call analysis

Friday, July 30, 2010

CTS Manager - OBTP Troubleshooting

Sunday, July 25, 2010

Tandberg - Cisco Integration

Tuesday, July 20, 2010

CTS Manager - CURL Browsing

Monday, July 19, 2010

B2B - ASR/GSR SBC Basic call analysis

Wednesday, July 14, 2010

CTS Manager - NDR Messages

Friday, July 2, 2010

TelePresence CTMS - Interop troubleshooting

Friday, May 28, 2010

General - How to Troubleshoot Linux Kernel Panics

CTS System - Secure registration analysis

CTS System - Basic MultiPoint call analysis

CTS - Certification - TelePresence 642-185 Exam

Thursday, May 27, 2010

CTS System - TelePresence SIP H264 profile-level-id

Followers