Automatic Crystal Centering - Cornell University · Qt / X11 Client The Qt / X11 client is by far the most complex and feature rich. This client is the main focus of current development

Automatic Crystal Centering

Ismail Degani Dave Schuller Chris Heaton

Richard Gillilan

Introduction

• X-Ray Crystallography

It is not the intention of this paper to provide a thorough explanation of X-Ray Crystallography; only what is necessary to the understanding of the software is discussed. For more information regarding X-Ray Crystallography, please visit:

http://www-structure.llnl.gov/Xray/101index.html

X-Ray Crystallography is a technique that allows one to determine the

molecular structure from a crystal lattice. It is especially useful in the case of bio-molecules, such as DNA and proteins. These can be far too complex to try and investigate via other techniques, (Nuclear Magnetic Resonance, for example). A thorough understanding of protein structures is vital to the creation of new drugs, cures for diseases, and therapeutic agents, to name just a few.

X-Ray Crystallography exploits the fact that X-rays have the proper

wavelength (10^-8 cm) to be scattered by the electron cloud of an atom. This also holds true for the electron density of a molecular crystal. Roughly speaking, these scattered x-rays form diffraction patterns that are collected by a CCD device and are used to construct a very accurate model of the crystal. So, as the picture below illustrates, an incoming beam hits the center of the crystal, and spots of high intensity on the area detector correspond to planes of atoms present in the crystal lattice.

• The need for software and automation

As crystals get increasingly smaller and more complex, precise machinery that can position the crystal in the exact center of the incoming x-ray beam are required. Because crystals can be very hard to see, we need to rely on high powered digital video equipment capable of displaying crisp pictures of the crystal with the aid of real-time image enhancement.

And finally, as the demand for X-ray crystallography increases, automating as much of the centering and diffracting process as possible is essential to achieving a higher level of overall efficiency.

The picture below shows the setup of the F-1 hutch at the Cornell High Energy Synchrotron. It is very much like the diagram on the previous page, except there is only one axis of rotation, and three degrees of translational freedom on our goniometer. In addition, we have a Firewire digital camera that provides a 1024x768 resolution live stream to our software package, a snake-light used to illuminate the crystal, and a cryo-cooling system (not shown) useful for reducing radiation damage to the crystal during X-Ray bombardment.

It is the aim of our software centering package to ultimately do the following: 1. Adjust the rotation axis height to intersect the incoming x-ray beam. 2. Align the center of the crystal with the center of rotation. This is to ensure that

when the phi axis motor rotates the crystal, the crystal’s center does not move.

Hardware Components:

• Sony DFW X700 IIDC 2.0 Compliant Firewire Camera. Outputs images at 15 FPS at a resolution of 1024x768 pixels in YUV4:2:2 format.

• Compumotor stepper motors that control the goniometer. They are controllable via socket or serial interface, and are accurate to the nanometer.

• Video Server – x86 1GHz PC running RedHat Linux 9.0

Software Design

The crystal centering software consists of several interconnected parts:

Video Server In short, the video server is a very lightweight layer between the camera and the network.

• Written entirely in C (Future versions may be written using C++/Qt Objects) • Connects to camera via linux libraries libdc1394 & libraw1394 • Low-level interface allows direct access to camera’s memory buffer (no

memcpy’s required when sending buffer to compressor) • XVID CODEC (http://www.xvid.org) used to compress raw YUV4:2:2 stream

into ISO compliant MPEG4 stream. o XVID processes data in real-time at ~7 fps on current hardware (1Ghz

Athlon) Production server should be powerful enough to deliver 15 fps. o XVID utilizes assembly routines written for many hardware architectures

to ensure fastest possible computation rate.

o XVID supports dynamic bit-rate adjustment according to network traffic condition / bandwidth limitations / user preference.

• Once data is compressed, it is transmitted via buffered TCP packets to client(s). (we might be considering UDP for an extra performance boost)

This implementation is certainly not the first. Many designs were tried, but only this one stood up to the performance requirement. Three failed prototypes to take note of are:

• A Pure Java Video Server o In the interest of keeping our project platform independent, we initially

sought to create a Pure Java solution. o For Linux, camera access was provided to the server by libdc1394 via JNI. o A windows implementation was not developed, but camera access would

be provided via a standard VFW (Video for Windows) interface. o Streaming Video compression was provided by JMF (The Java Media

Framework). This package included native code to boost performance. o The server outputted RTP packets (Real-time transport) in order to

compensate for network delays and compression lag time, ensuring a real-time broadcast.

Even though the actual image retrieval and compression was done without the overhead of the virtual machine, the Java solution still produced approximately only 1 frame every three to four seconds. Despite the elegance and portability of the system, it had to be discarded.

• A Hybrid Java/Open Darwin Server o In an attempt to salvage existing Java code, but shift more work to native

processing, we implemented a solution where all of the JNI code was left intact, but JMF was replaced by Apple’s Open Darwin Streaming system

o Darwin was open source, and supported ISO compliant MPEG4 Streams. o Darwin was very robust – it could handle multiple streams, clients, and

media devices. Replacing JMF with Open Darwin did show some performance increase. The frame rate transmitted was now 1 fps. This was still unacceptable.

• LIBDC1394 / FFMPEG

o Our next try was to implement a completely native solution using another off-the-shelf streamer. We wrote a small C Application that printed the camera’s data to stdout. (Linux standard out). At the command line, a pipe was created that connected our C program to FFMPEG, a compressing/streaming package (http://www.ffmpeg.org)

o FFMPEG is a simple, command line based compressor / server set. o FFMPEG can support multiple clients, and produce streams of several

different formats, resolutions and bit-rates for a given data feed. This server revision produced about 4-6 frames per second, but the output was approximately 10 seconds behind real-time due to FFMPEG’s buffering system. This in itself was unacceptable due to the fact that the client had to control motors. The FFMPEG system was also highly unstable, because FFMPEG is still in its beta stages. Compumotor Interface

• Compumotors are fully programmable • Built in EEPROM can store macros such as “rotate phi 90 degrees” etc, which can

be invoked by a remote client via telnet/sockets, or a local client via a serial interface.

CCD_BL_CM

• CCD_BL_CM is a software suite designed to collect datasets from crystal diffraction.

• Both CCD_BL_CM and the centering GUI require access to the Compumotors. Since there are issues with multiple direct connections to the Compumotors, the centering GUI communicates with the Compumotors through CCD_BL_CM

Client Due to performance issues, we are locked into the Linux Platform on the server side, but this is not so on the client side. Many prototypes of clients have been developed for cross-platform environments.

Web Browser Client

• The most versatile client • Written entirely in

HTML. • The user would center the

crystal by clicking on directional and rotational buttons that are simple image hyperlinks.

• The intermediary between the web client and the Compumotors is a Java Servlet engine known as Tomcat. (www.apache.org)

• Tomcat converts HTTP based POST requests into telnet motor commands

• HTML Frames are utilized to provide an uninterrupted video stream even as the control window is refreshing.

• Session control is maintained by the Tomcat to ensure that only one user is using the Compumotors at a time

• SSL 128 bit encryption layer ensures no malicious access to motor control.

• The MPEG4 compliant stream is embedded via <object> tags for Internet Explorer and as a plug-in for Netscape navigator.

Flash Client

• While maintaining the

advantages of a browser based implementation, the flash client adds a more feature rich and interactive user experience

• Flash clients allow a user to control the quality and size of the video stream remotely.

• Several different motor control interfaces are available depending on how a user wants to manipulate the crystal.

• Pre-built components from macromedia keep code maintenance simple.

Qt / X11 Client The Qt / X11 client is by far the most complex and feature rich. This client is the main focus of current development and will become our production client as soon as it is completed. Because it is a full fledged application, it can do many things browser based clients cannot. One of the most notable is image recognition, which will be discussed in the next section.

• The application is designed in two parts:

o MainPanel Responsible for maintaining a connection to CCD_BL_CM for

motor control. Contains all control widgets, such as X, Y, Z movement, and Phi

rotation options. (continuous, 90 degrees, etc) Displays a video/motor status window to troubleshoot errors and

log motor requests. Used to set x-ray beam position (crosshairs represent beam center,

and circle represents beam radius) Used to configure the following video / motor settings: (next

revision will include camera zoom / brightness / contrast / sharpness / gamma correction adjustment)

Uses robust Qt threaded sockets to dynamically respond to an incoming message from the motor server

Qt’s system of signals and slots facilitates rapid code development and feature implementation.

All GUI design work is done with Qt Designer. o VideoWidget

An independent component that is combined with the MainPanel at compile time. The VideoWidget is the center panel that displays the live incoming crystal feed.

Links with XVID library for decompressing MPEG-4 video. Uses libXv, (X-Windows Video Extension) a library that gives

graphics intensive software direct access to hardware accelerated video cards. LibXv allows the VideoWidget to avoid costly YUV -> RGB conversion in software. The picture is blitted to the graphics buffer as is. (using XvPutImage)

Qt has no support for hardware accelerated devices, because they compromise its platform independent component set. In order to use libXv, we circumvent the Qt layer by obtaining a pointer to the underlying X-Window from Qt, and drawing directly to it. It is an unfortunate hack, but performance increases almost threefold due to it. Future versions of Qt might make it possible to do without.

Image Recognition

The Qt / X11 interface is robust enough to support real-time image recognition (unlike the browser based clients). The advantages of including image recognition are numerous. The centering process is mundane, predictable, and somewhat tedious. This makes it a perfect candidate for automation. If a computer could recognize where a crystal was, then it could easily calculate how much to move each motor in order to center it. In addition, computer automated centering would be much more precise, and would be executed as fast as the motors could move. In the near future, motors with air-bearings are expected to be installed. These motors turn with almost no friction, and can move extremely quickly. Combined with automatic centering, the entire process could be finished in seconds.

Crystal recognition is still an unsolved problem. Crystals come in many shapes and sizes, with many colors and reflective/refractive properties. A great appreciation of the human eye is cultivated when trying to get a machine to recognize a protein crystal. The current image recognition routines we have built into our software do not recognize the crystal itself. Instead, they recognize the shape and orientation of the mount the crystal resides in. Assuming a mount well fitted to a crystal, it follows that the center of the mount should correspond to somewhere very near the geometric center of the crystal. There are two main types of mounts that are used:

Loops:

These are the standard commercially available loops for mounting crystals. The crystal is

the cube in the center of the loop. It is kept in oil to keep from drying out. The oil’s viscosity / surface tension is what keeps it suspended inside the loop. During x-ray

bombardment, the oil freezes into a glass-like solid (due to cryo-cooling) and the crystal cannot move.

Thorne Mounts: A relatively new type of mount developed by Professor Robert Thorne and his students at

Cornell University. These nanofabricated mounts yield more accurate results and have many other advantages over traditional loops, including lower background scatter, and

excess oil drainage.

The software library we are using to aid in detecting these mounts is OpenCV, an image

and motion recognition package developed and donated to open source by Intel Corp. (http://www.intel.com/research/mrl/research/opencv)

Image Processing / Recognition Techniques Used: Image Pyramids

To aid in the detection of edges, image pyramid representations of our crystal mounts are utilized. Basically, an image pyramid is a data structure holding several different resolutions of a given image (the largest being the original image resolution.) The pyramid can also be thought of as a set of “masks” of pixels, a binary image defining which pixels are available in the image. Lower resolutions make many more pixels unavailable, which is extremely useful when trying to find large contours in even slightly noisy images.

Linear filters are utilized to selectively remove pixels from an image in order to lower its resolution. Given an image kernel, and a filter kernel, we can define the linear filter as follows:

∫ ∫∞

∞−

∞

∞−

−−= ηξηξηξ ddyxFIyxFI ),(),(),)(*(

The filter kernel we used for obtaining lower resolution images is the Gaussian Kernel. We iteratively created each level, that is, each level was created from the one directly above it. Another method is to create each level of the pyramid from the base image. The one dimensional case of the Gaussian Kernel filter is given below:

2

21

21),(

⎟⎠⎞

⎜⎝⎛−

= σ

πσσ

t

etG

Color Channel Separation Color channel separation is necessary before applying our edge detector (see next section) because it only detects gradient changes in a single color. To accurately detect where contours appear in an image, each color channel must be passed separately into the edge detection mechanism.

In this example, we’ve separated the red, green and blue channels of the crystal loop. The differences in each color channel are subtle, but in many cases can increase the likelihood of detecting an edge. Certain edges are better defined in one channel over another. For example, we see that in the green channel, the upper edge of the loop is sharper than in the blue channel.

Canny Edge Detection Canny Edge Detection is very useful if trying to determining where a contour’s edges are. Without going into too much detail, a Canny Edge Detector works as follows:

1. Smooth the image according to a Gaussian distribution to eliminate noise.

2. By looking at gradients between rows and columns of the image, find out what the “Edge Strength” is for a given pixel. The edge strength is given by: |G| = |Gx| + |Gy| where Gx is the magnitude of change in the x direction, and Gy in the y direction.

3. Once the edge strength is known, calculate the edge direction. This is very simple if we know both the changes in the x and y directions. It is the familiar slope formula

⎟⎠⎞

⎜⎝⎛= −

GxGy1tanθ

4. Round each angle to a value of either 0, 45, 90, or 135 (only possible directions

when going from pixel to pixel. An angle of 180 for example, is 0 with a negative magnitude)

5. Look at all edge directions and find out which correspond to valid edges by making sure directions are plausible and magnitudes are consistent. Supplied threshold values for magnitude are utilized.

Below is what a canny image detector would output given a crystal loop image. Note: the detector is supplied one color channel at a time. It then superimposes all outputs. This is repeated for each level of the image pyramid.

Image Dilation Image Dilation is a technique used to join separated regions of an image. In the example below, we see that many of the bamboo leaves are now connected to the branch, whereas before they were suspended in air. The panda bear’s eyes have also been filled in almost completely.

Before After

This is especially useful for image recognition, because many times noise interferes with contour recognition - many “holes” appear in recognized edges. Dilating the output ensures that a useful edge will not be discarded if it contains holes due to noise. The method we use for image dilation is quite simple, but works well because it is designed for our canny edge detector. For every valid edge pixel (from canny output):

• Select the pixel in the direction it points to • Set it as a valid edge pixel with the same edge strength and direction:

Repeat for every valid edge pixel. Sometimes two or three pixels need to be set (the dilation algorithm needs to be run several times) before a contour is connected properly. Here is an example of canny edge dilation on a crystal loop:

Before After

Image Recognition Process (for loops) The current crystal recognition algorithm is a 7 step process.

1. First sample the image at several resolutions using Image Pyramids. 2. Separate the three color planes (RGB) and find edges in each of them separately 3. Use the canny edge detector on each color channel of each level of the image

pyramid using several different threshold levels. 4. Dilate the canny output to make sure there are no holes between

segments. 5. Take all existing (complicated) contours and simplify/remove them

(no contours that overlap themselves, no contours with sharp edges, no contours that have a great deal of direction change within a small area)

6. Of the contours that are left, only pick ones that have a sizeable positive area, and are convex. This eliminates contours that are based on noise.

7. Fit ellipses to the existing contours. The biggest ellipse corresponds to the loop. Here is an example of a loop that has been processed by our algorithm. The red outline is the contour detected, and the blue is the ellipse fitted to the contour. (The smaller green contour and ellipse are discarded) Using the ellipse as an abstract representation of the loop, we will be able to center the loop automatically. An alternative to fitting ellipses is finding the center of mass of a given contour.

Conclusions & Future Developments Many crystal centering enhancements are planned for the coming months. A few to be listed are:

• Completing image recognition for Thorne loops using image comparison techniques. Thorne loops have a distinctive shape, and can be fitted quite nicely using contour comparison techniques with an existing Thorne loop contour.

• Rewriting the video server using the new Qt 4.0 to take advantage of robust sockets, and the “signals and slots” mechanism for easy code maintenance. (see http://doc.trolltech.com/3.2/signalsandslots.html)

• Installing air bearings on the existing compumotors to speed up motor response time.

• Installing a motor operated zoom lens on the Firewire camera to aid in centering.

• Experimenting with different MPEG compression techniques and find out what schemes work best for image recognition at the client end. It would be interesting to see what kind of changes need to be made to existing algorithms (that are designed for the best human experience) when the viewer is a machine.

• Experimenting with thin fibers that fluoresce when bombarded with x-rays to aid in the beam alignment process.

• Infusing crystals with a purple dye and determining whether this affects

diffraction results. If not, then it would facilitate crystal recognition by color detection algorithms.

• Experimenting with automatic crystal mounting devices to try and automate even

more of the diffraction process.

• Using OpenGL and 3D reconstruction techniques, we may be able to produce three dimensional models of a mounted crystal loop.

The design of a crystal centering GUI equipped with image enhancement /

recognition is very helpful to understanding the principles behind image recognition, and where improvements can be made. It also allows one to build experience with new software development tools / libraries available today. In the coming months, a great deal of research will be done to try and tackle the crystal recognition problem using novel algorithms and image deconstruction techniques, perhaps even artificial intelligence. A solid understanding of the current tools and techniques is vital to this endeavor. In addition, a robust software platform is now in place to test any new designs / ideas that may arise.

References Canny Edge Detection http://www.pages.drexel.edu/~weg22/can_tut.html Crystallography http://www-structure.llnl.gov/Xray/101index.html Dilation http://www.reindeergraphics.com/tutorial/chap6/binary02.html Image Pyramids http://www.prip.tuwien.ac.at/Research/ImagePyramids/gray-level-pyramid/node3.html Nuclear Magnetic Resonance http://www.cis.rit.edu/htbooks/nmr/inside.htm

Documents

Automatic Crystal Centering - Cornell University · Qt / X11 Client The Qt / X11 client is by far the most complex and feature rich. This client is the main focus of current development