225
Copyright 2009 Peter Baer Galvin - All Rights Reserved Solaris 10 Administration Topics Workshop 1- Administration By Peter Baer Galvin For Usenix Last Revision Apr 2009 Saturday, May 2, 2009

2009 04.s10-admin-topics1

Embed Size (px)

DESCRIPTION

Solaris 10 Admin workshop

Citation preview

Page 1: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 10 Administration Topics Workshop1- Administration

By Peter Baer Galvin

For UsenixLast Revision Apr 2009

Saturday, May 2, 2009

Page 2: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

About the Speaker

Peter Baer Galvin - 781 273 4100

[email protected]

www.cptech.com

[email protected]

My Blog: www.galvin.info

Bio

Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading systems integrator and VAR, and was the Systems Manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines. He was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the systems administration column there. He is now Sun columnist for the Usenix ;login: magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials in security and system administration and given talks at many conferences and institutions.

2

Saturday, May 2, 2009

Page 3: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

ObjectivesCover a wide variety of topics in Solaris 10

Useful for experienced system administrators

Save time

Avoid (my) mistakes

Learn about new stuff

Answer your questions about old stuff

Won't read the man pages to you

Workshop for hands-on experience and to reinforce concepts

Note – Security covered in separate tutorial

3

Saturday, May 2, 2009

Page 4: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

More Objectives

What makes novice vs. advanced administrator?

Bytes as well as bits, tactics and strategy

Knows how to avoid trouble

How to get out of it once in it

How to not make it worse

Has reasoned philosophy

Has methodology

4

Saturday, May 2, 2009

Page 5: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Prerequisites

Recommend at least a couple of years of Solaris experience

Or at least a few years of other Unix experience

Best is a few years of admin experience, mostly on Solaris

5

Saturday, May 2, 2009

Page 6: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

About the Tutorial

Every SysAdmin has a different knowledge set

A lot to cover, but notes should make good reference

So some covered quickly, some in detail

Setting base of knowledge

Please ask questions

But let’s take off-topic off-line

Solaris BOF6

Saturday, May 2, 2009

Page 7: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Fair WarningSites vary

Circumstances vary

Admin knowledge varies

My goals

Provide information useful for each of you at your sites

Provide opportunity for you to learn from each other

7

Saturday, May 2, 2009

Page 8: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Why Listen to Me

8

20 Years of Sun experienceSeen much as a consultantHopefully, you've used:

My Usenix ;login: column

The Solaris Corner @ www.samag.com

The Solaris Security FAQ

SunWorld “Pete's Wicked World”

SunWorld “Pete's Super Systems”

Unix Secure Programming FAQ (out of date)

Operating System Concepts (The Dino Book), now 8th ed

Applied Operating System Concepts

Saturday, May 2, 2009

Page 9: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Slide Ownership

As indicated per slide, some slides copyright Sun Microsystems

Feel free to share all the slides - as long as you don’t charge for them or teach from them for fee

9

Saturday, May 2, 2009

Page 10: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OverviewLay of the Land

Saturday, May 2, 2009

Page 11: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Schedule

11

Times and Breaks

Saturday, May 2, 2009

Page 12: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Coverage

Solaris 10+, with some Solaris 9 where needed

Selected topics that are new, different, confusing, underused, overused, etc

12

Saturday, May 2, 2009

Page 13: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OutlineOverview

Objectives

Solaris Versions, features, selection

Booting and Installation

SMF and FMA

Patching

Important Administration Tools

What’s Next for Solaris

Quick Performance Overview

Sysadmin Philosophy13

Saturday, May 2, 2009

Page 14: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Polling Time

Solaris releases in use?

Plans to upgrade?

Other OSes in use?

Use of Solaris rising or falling?

SPARC and x86

OpenSolaris?

14

Saturday, May 2, 2009

Page 15: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Your Objectives?

15

Saturday, May 2, 2009

Page 16: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Your Lab Environment

Apple Macbook Pro

3GB memory

Mac OS X 10.5

VMware Fusion 2.0

Solaris 10U6

50 Containers

16

Saturday, May 2, 2009

Page 17: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Lab PreparationHave device capable of telnet on the USENIX network

Or have a buddy

Learn your “magic number”

Telnet to 131.106.62.100+”magic number”

User “root, password “lisa”

It’s all very secure

17

Saturday, May 2, 2009

Page 18: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Lab Preparation

Or...

Use virtualbox

Use your own system

Use a remote machine you have legit access to

18

Saturday, May 2, 2009

Page 19: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris VersionsUse the “best” one

19

Saturday, May 2, 2009

Page 20: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 8

Many sites still running S8

Why?!

Watch project Solaris 8 Migration Assistant

Per-socket cost

But does P to V of S8 into an S8-compatible container(!)

Fully support by Sun as “Solaris 8”

Does not expand lifetime of S8

20

Saturday, May 2, 2009

Page 21: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

The Case of the System that Would only Boot Sometimes

System would run without problems

Normal shutdown or system crash /re boot

System would fail to boot with “short read” error

Ideas?!

21

Saturday, May 2, 2009

Page 22: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 9Improved performance (page coloring, variable page sizes, page locality)

Solaris 9 Resource Manager in the Solaris 9 Operating SystemSolaris Volume Manager

Solaris Naming and Directory Service

Sun Management Center Change Manager  

Network Multipathing - Solaris IP Multipathing (IPMP)

Mobile IP for the Solaris 9 Operating System

Solaris Operating Environment and Linux Compatibility

Java 2 Platform, Standard Edition 1.4 for the Solaris 9 Operating System

Not a developer release!

22

Saturday, May 2, 2009

Page 23: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 10Shipped Feb 2005

Major new features (some discussed throughout)

Dtrace

Fire Engine

Solaris Cryptography Framework

NFS V4

Solaris Privileges

ZFS (S10 Update 2)

Full history w/ details available in 817-0547.pdf

23

Saturday, May 2, 2009

Page 24: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 10 (2)Netscape 7

New X Windowing features

Gnome 2.0 desktop

System V IPC resource controls

Physical memory control using a new resource capping daemon

Extended accounting for IPQos

USB 2.0 support, and USB removable media support

Dynamic intimate shared memory large-page support (for databases) (SPARC only)

Memory placement optimization (on SunFire servers) (SPARC only)

Improved UFS logging performance

Unicode version 3.2

FTP client and server enhancements

PAM enhancements

Auditing enhancements

Password history checking

24

Saturday, May 2, 2009

Page 25: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 10 (3)Locale administrator for adding and removing locates at the command line

A new autofs configuration file

Multiterabyte volume and disk support (64-bit SPARC only)

Up to 16TB UFS file systems (64-bit SPARC only) (individual files are still limited to 1TB)

devfs dynamically attaches and detaches device entries in /devices

NCA support of multiple instances of the web server

IPv6 6to4 router and packet tunneling of IPv4 over IPv6

NFS services are only started when needed, rather than only at boot time

Sun ONE integration and availability

routeadm routing administration command

sendmail version 8.12 using TCP wrappers

BIND version 8.4.2

Availability of a reduced networking software group for selection during installation of more secure systems

Solaris Product Registry added features and a command-line interface

Solaris Flash differential archives and configuration scripts

Customized contents of Solaris Flash archives25

Saturday, May 2, 2009

Page 26: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 10 (4)Solaris Live Upgrade 2.1

Ability to boot and install software over a WAN

Improved DHCP implementation

Solaris Management Console Patches tool can now analyze, download and install recommended patches

Improved System V IPC configuration

Signed packages and patches for more secure download

NIS to LDAP transition service

Top-down volume creation in Solaris Volume Manager

Systems Management Agent implements SNMPv1, v2c, and v3

Event ports for generating and collecting events from disjoint sources

New atomic operations API included in libc

WBEM includes many updates

Solaris Privileges for programmers allows applications to be written that need specific rights, rather than superuser rights.

Smartcard interfaces and middleware APIs

Basic Audit and Reporting Tool (BART) can compare contents of a system over time or audit an installed package for changes

Kerberos enhancements

26

Saturday, May 2, 2009

Page 27: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

S10 U1 (1/06) ChangesUpgrade from S8, S9, or old S10

Sun Update Connection for patching

x86 GRUB booting

Performance - large pages, kernel page relocation, memory placement optimization (MPO)

prtconf -b prints product names

27

Saturday, May 2, 2009

Page 28: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

S10 U2 (6/06) ChangesZFS

New ACL model (ZFS only), based on NFS V4, more granular, chmod

Predictive self-healing for x86

iscsiadm multiple session targets

logadmin -l uses local time when renaming

volfs managed by SMF

UDP and TCP performance improvement

IPv6 for ipfilter

28

Saturday, May 2, 2009

Page 29: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

S10 U3 (11/06) Changessmcwebserver enhancements

fsstat

SMF management of dynamic resource pools

Zones “move” and “clone” commands

Zone migration

LDOMS 1.0

Solaris Trusted Extensions

SNIA multipath management - mpathadm29

Saturday, May 2, 2009

Page 30: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

S10 U4 (8/07) ChangesImproved nscd

iostat -y understands multipathing

Sun service tag product identifer

MPxIO path steering

raidctl

more FMA and predictive self-healing supported devices

stmsboot on SPARC and x86 (enable or disable MPxIO on fibre-channel

Live Upgrade includes non-global-zone support

Deferred-activation patching

Networking improvements, including zone “exclusive-IP”, nge jumbo frame support

Solaris key-management framework (KMF) to manage public key objects

iSCSI target, iscsiadm, iscsitadm

Branded zones - lx

zonecfg integrated resource management (zone.max-*), temporary pools, capped memory improvements

DTrace non-kernel use in zones

30

Saturday, May 2, 2009

Page 31: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

S10 U 5 (5/08) ChangesTrusted Extensions installed by default, SMF managed, disabled by default

fwflash - new firmware manipulation tool

The PostScriptTM Printer Description (PPD) file management utility, /usr/sbin/ppdmgr plus PAPI print commands

Client-side support for the Internet Printing Protocol (IPP)

SunVTS 7.0 includes the following features:

Introduction of the concept of purpose-based testing

Improved diagnostics effectiveness

Web-based user interface

Simplified usage

New architecture framework

Enterprise View

Resource management expansion via CPU Caps plus projmod -a to apply project DB to active project

x86 power management

iSNS support for iSCSI target

SPARC: Hardware -Accelerated Elliptical Curve Cryptography (ECC) Support

Network enhancements, desktop tools, performance, libchewing, fsexam file code converter, more drivers

31

Saturday, May 2, 2009

Page 32: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

S10 U6 (10/08) ChangesZFS boot / root (text installer)

Zones on ZFS, Auto zone-upgrade

Live upgrade from UFS to ZFS root

Roll back ZFS dataset without unmounting

ZFS quotas and reservations for file system data only

ZFS cachefile property controls what is cached, where

Separate ZIL locations, iSCSI improvements...32

Saturday, May 2, 2009

Page 33: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Current Releases

33

(Source: Stephen Lau (http://whacked.net))

Saturday, May 2, 2009

Page 34: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Software Express for Solaris

Get future Solaris releases, now!

Frequent updates (~1 / month)

Basically, exports of internal Solaris builds (SPARC and x86)

Regression tested by Sun for stability

Other products might be available in the future

No patches, but bug report and on-line support for paid version

Free version allows download, access to docs

Takes a couple of hours over fast link

Need to be able to create .iso CDs, DVDs

34

Saturday, May 2, 2009

Page 35: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolarisOpensolaris is a source base and a distroSolaris open source under CDDL licenseUpdates currently biweekly or soOne week after code checked in to kernel gate

Very recent bits

Goal is to be even closer to kernel engineering

No testing doneNo supportBut great stuff to play with

35

Saturday, May 2, 2009

Page 36: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris (2)Can use either gcc or (free*) Sun Studio compiler to build

http://www.opensolaris.org/os/community/tools/sun_studio_tools/

Whole community around OpenSolarisAt http://www.opensolaris.org

This is the place that kernel developers communicate about DTrace and other areas of the kernel

Lots of great info at http://blogs.sun.com

36

Saturday, May 2, 2009

Page 37: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris (3)Already some interesting community work

Live discs from shillix - http://schillix.berlios.de/Belenix - http://belenix.sarovar.org/belenix_home.htmlNexenta – debian-based GNU/Solaris(!) - http://www.gnusolaris.org/gswiki

marTux - first non-Solaris Express/Solaris Express Community Release OpenSolaris distribution for SPARC (sun4u for now, sun4v later) - http://www.martux.org/RELEASES/

opensolaris live small CD / USB distro - http://www.milax.org

37

Saturday, May 2, 2009

Page 38: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris (4)Note that each release has its own patching / upgrade methodology

Sometimes need to reinstall each time

For Sun flavors use the BFU to install a new archive over an old

Just updates the kernel components, not user-land stuff

38

Saturday, May 2, 2009

Page 39: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris DistroNee’ Project Indiana

FCS 5-May-2008

Commercial support available (just like Solaris) 13-May-2008

Solaris kernel + ZFS + modern userland + new packaging system

Livecd

Could be the future of Solaris

x64 only for now(!)

ISV support is the open issue39

Saturday, May 2, 2009

Page 40: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris Distro (2)IPS is new package system, SVR4 packages supported too

IPS lets you create and manage packages

Packages repositories on the web

Update all installed packages

“undo” via ZFS rollback

Search packages

Create your own repository

pkg, pkgsend, pkg.depotd

For example pkg install openoffice

Other packages: netbeans, sunstudioexpress, clustertools, webstackui, glassfishv2

40

Saturday, May 2, 2009

Page 41: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris Distro (3)Once in OpenSolaris, upgrades are easy:$ pfexec pkg refresh$ pfexec pkg image-update

When done:A clone of opensolaris exists and has been updated andactivated. On next boot the Boot Environment opensolaris-1 willbe mounted on '/'. Reboot when ready to switch to this updated BE.

$ beadm list

BE Active Active on Mountpoint Space

Name reboot Used

---- ------ --------- ---------- -----

opensolaris-1 no yes - 17.06M

opensolaris yes no - 33.92M41

Saturday, May 2, 2009

Page 42: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris Distro (4)Can build your own distribution, via toolhttp://www.opensolaris.org/os/project/caiman/Constructor/

The Opensolaris Bible provides very good coverage (mostly user-land)

New automatic installer (replacing Jumpstart et al) - follow it athttp://www.opensolaris.org/os/project/caiman/auto_install/

42

Saturday, May 2, 2009

Page 43: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved 43

Saturday, May 2, 2009

Page 44: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Which OS?8, 9, 10 are all viable operating systems

<= 2.6 for legacy environments if you can’t move

2.7 for those too lazy to upgrade(!)

8 for those seeking consistency without going through upgrade effort, those waiting for 10

Solaris 9 for most, stable, apps available, good performance if conservative or not ready to move

I recommend S10 latest supported release SPARC and x86Especially as only OS on new hardware

Unless apps not available

Or company standard for previous release

Watch out for vendor support and patch cycle on x86

44

Saturday, May 2, 2009

Page 45: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 10 AdoptionEveryone wants it

But waiting for vendor support

Given a list of apps, Sun can tell you expected support date

Start from that, start testing a few months before all apps expected to be supported

Some waiting for ZFS bootability (to avoid upgrading twice)

45

Saturday, May 2, 2009

Page 46: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Getting it Right the First Time...

Installation

46

Saturday, May 2, 2009

Page 47: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

PartitionsInstallation Methods

Swap SpaceUpgrading

Zones / Containers

Topics

47

Saturday, May 2, 2009

Page 48: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•"It depends"•Who/what is using the system?•Users cause problems!•Why few partitions?

•Backups easy and fewer passes•Easy to add VXVM•Easy to mirror (manually or automatically)•Less chance of miss-allocating

•Why many partitions?•Finer-grain backup control•Faster restore if a corruption•More control over disk space use•Solaris 8 has 1TB file system limit for UFS

•Life will be different once ZFS is bootable!

How Many Partitions?

48

Saturday, May 2, 2009

Page 49: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•On an 72GB disk, system with 32GB memory•/ 10 GB•swap 6 GB•/var 10GB•4GB unused raw partition (set aside for crash)•2 X 9MB partitions for disksuite

•What to do with the rest?•Leave unused for emergency or optimum performance•Create scratch space•Personal sysadmin space

Partitions My Way -72GB Disk

49

Saturday, May 2, 2009

Page 50: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

ZFS Boot / Root

It all changes once ZFS is root file system

snapshots before all changes

rollback if don’t like the changes

No partitioning needed

1 command mirroring

50

Saturday, May 2, 2009

Page 51: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

/crash?!From the kernel developer who wrote dumpadm:

“dumpadm can either be used to configure a swap device as the dump device, or a dedicated dump device (e.g. a raw /dev/dsk/xxx partition not being used as a filesystem). We actually prefer that because you can never have your dump swapped over if savecore runs out of disk space, and we run savecore in the background if you have one, improving reboot time.” – Mike Shapiro

51

Saturday, May 2, 2009

Page 52: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•CDROM / DVDROM•for single system or custom systems•Jumpstart - scripted network install•Flash Archive - image-based install, based on jumpstart

Installation Methods

52

Saturday, May 2, 2009

Page 53: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Swap Devices (continued)

Performance tip: access to swap page is 104 X slower than memory page

Also, disk location of swap or head contention can cause 101 X difference in access time

Webstart requires at least 512MB swap space

Need to mirror swap to prevent disk failure from causing crash

53

Saturday, May 2, 2009

Page 54: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Swap Devices (cont)•Yes, mirroring can cause performance degradation, but without mirroring system not proof against failed disk causing crash•Can be raw partitions or files in file systems•Both work well•Add swap space with swap -a <device>•swap -a /dev/dsk/c2t0d0s2•or swap -a <file>•swap -a /swap1/swapfile

54

Saturday, May 2, 2009

Page 55: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•Check use with# swap -lswapfile dev swaplo blocks free/dev/dsk/c0t0d0s1 32,1 16 1049744 927360/dev/dsk/c2t0d0s2 32,242 16 4194272 4194272/swap1/swapfile - 16 819184 819184# swap -stotal: 77240k bytes allocated + 30912k reserved = 108152k used, 3012576k available

Swap Device (cont)

55

Saturday, May 2, 2009

Page 56: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Where to Swap?

Best to avoid swapping altogether Spread swap among multiple controllers,

multiple disks Can swap to raw disk partition or file system file Make file system file with mkfile

#mkfile 100m /opt/swapfile Performance decreases with file system Almost never have >1 swap space per device

56

Saturday, May 2, 2009

Page 57: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Upgrading OS ReleasesJust plopping in the new CDROM and answering questions not recommended

Do not upgrade a potentially-security-breached systemPerform a new install instead

Why do most sites avoid upgrades?

Upgrading with zones adds complexity / limits(from http://docs.sun.com/app/docs/doc/817-1592/6mhahuoul?q=upgrade+zone&a=view as of S10U2)

You can use either the standard Solaris interactive installation program or the custom JumpStart installation program to upgrade your Solaris system with zones installed. Solaris Live Upgrade is not supported for this release. For information, see Solaris 10 Installation Guide: Solaris Live Upgrade and Upgrade Planning and Solaris 10 Installation Guide: Custom JumpStart and Advanced Installations.

57

Saturday, May 2, 2009

Page 58: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Upgrading OS Releases (2)New technologies can help

appcert to check if given app “guaranteed” to run on new OS rev

Determine if your platform is supportedhttp://www.sun.com/bigadmin/hcl/

Determine the platform to use (if changing platforms)

Opteron servers great (but need to move to x86/x64)

Sun T1-based systems great for lots of threads

Will you work load run well on a T1? http://www.sun.com/bigadmin/content/cooltst_tool/

Jumpstart for upgrades – possible but not guaranteed

Liveupgrade (working as of 10/01 release)

Splits the mirror (if SVM if >= S9 8/03) or find available disk

Automates duplication of boot disk, upgrade to duplicate disk

Allows upgrade while system live

Easy test and fall-back to previous release

Boot alternate, if unhappy reboot primary boot disk58

Saturday, May 2, 2009

Page 59: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Production Server Upgrade Methodology

Check all app certifications for support under new release

If in house or non-supported app, build test environment and test

Perform full backup

And test it!

Record all system details via Explorer or manually

59

Saturday, May 2, 2009

Page 60: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Production Server Upgrade Methodology (2)

Mirror root disk if possible (or break existing mirror)

Upgrade one half of mirror, testFall back if necessary

Or re-mirror after testing period over for RAID protection

Undo DiskSuite mirroring, VXVM encapsulationCheck VX manuals for upgrade instructions, including use of begin and end scripts to save and restore VX state

60

Saturday, May 2, 2009

Page 61: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Production Server Upgrade Methodology (3)

Update via CDROMOr other method if you get it working

Restore VX state if it was saved

Test system and appsCheck log files, run usual system status commands

Analyze old /etc/systemDo not just copy it over – reset it based on new OS release

Run explorer to capture new system state

Perform full backup to record “known good state”

After test period, remirror root disk61

Saturday, May 2, 2009

Page 62: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Faster / Easier > S10U5ZFS root allows multiple boot environments

Use LU on Solaris

Quite feature rich# lucreate -A 'mydescription' -c first_disk \

-m /:/dev/dsk/c02t4d0s0:ufs -m /usr:/dev/dsk/c02t4d0s1:ufs \

-M /etc/lu/swapslices -n second_disk

BE on OpenSolaris / Nevada# beadm list

BE Active Mountpoint Space Policy Created ---- ------ --------- ----- ----- ----- opensolaris NR / 2.36G static 2008-12-01 17:03 opensolaris-1 - - 57.0K static 2008-12-01 17:55

62

Saturday, May 2, 2009

Page 63: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Install using JETJumpstart Enterprise Toolkit

Unsupported extensions to Jumpstart by Sun to make it easier / faster

1. Add the packages

# pkgadd -d SUNWjet.pkg 2. Add /opt/SUNWjet/bin to the path of the root user 3. Either: 1. run 'copy_solaris_media' to copy the Solaris image from CD/DVD to disk 2. run 'add_solaris_location' to inform the toolkit of existing Solaris images 4. Create a 'template' for a new client, using the 'make_template' command. # make_template machine1 5. Edit the new template and configure the build # vi /opt/jet/Templates/machine1 6. Configure the build environment for this client # make_client machine1 7. Start the build on the client: * (for Sparc) ok boot net - install * (for x86/64) Force a PXE boot

63

Saturday, May 2, 2009

Page 64: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Upgrading to the Same OS

If binaries deleted, corruption problems, packages missing, other program-level problems (not config file problems)

Perform an “upgrade” to the same OS release as is currently running

Will refresh all packages to their original state

Need to re-patch the system

64

Saturday, May 2, 2009

Page 65: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Giving a system the boot

Booting

Saturday, May 2, 2009

Page 66: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•Solaris 10 booting•Service Management Facility (SMF)

66

Topics

Saturday, May 2, 2009

Page 67: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

ufsbootblock ufsboot /kernel/unix

init/etc/inittab/etc/rcS/etc/rc2/etc/rc3

67

System Boot < S10

Saturday, May 2, 2009

Page 68: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Run levels0 system shutdown

1 "systemic state"one user, no services or daemons

2 multi-user no NFSmount all partitions, starts services

3 multi-user with NFS

4 spare multiuser state (unused)

5 Power down6 Reboot

kills all processes, unmounts, rebootsS,s Single user state

no daemons, system mountsQ,q Causes init to reread inittab file

68

Saturday, May 2, 2009

Page 69: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris 10 Service Management Facility (SMF)

Part of larger predictive self-healing facility (Build 69 and beyond)

Replacing inetd, changing use of /etc/rc files, etc

Much more sophisticated management of system startup and daemons

Builds reference tree of which processes need which, and order to start them in

If service fails, knows how to restart the service and all that depended on it

Startup to login prompt much faster with multithreading – each service started when those it depends on are ready

The only mandatory difference in S10

69

Saturday, May 2, 2009

Page 70: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SMF (conf)Has a repository containing state and configuration of services, dependencies, methods of managing services

Has manifests (in XML format) to describe services -> input into repository

/var/svc/manifest

Changes to services can be made here

Won’t be reflected until service restarted or refreshed

Repository/database used for services/etc/svc/repository.dbHas commands to manage services, repository

70

Saturday, May 2, 2009

Page 71: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SMF (cont)Booting now much “quieter”Each service has its own log in /var/svc/log (/etc/svc/volatile)Services that would have hung boot now debuggable in maintenance modeNew boot –m verbose to display message per serviceProcesses will automatically restart by svc.startd or be placed in maintenance mode (watch out for kill -9)Location of the scripts to be executed/lib/svc/method

71

Saturday, May 2, 2009

Page 72: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

rc scriptsOnly a few rc scripts out of the box

# ls /etc/rc3.dREADME S52imq S77dmi S84appservS16boot.server S75seaport S80mipagent S90sambaS50apache S76snmpdx S82initsma

There for non smf-converted services

There for backward compatibility

rc scripts started after all services start, but no other SMF services provided

Inittab now sparse. It includes info on modifying ttymon for example:# For modifying parameters passed to ttymon, use svccfg(1m) to modify

# the SMF repository. For example:

#

# # svccfg

# svc:> select system/console-login

# svc:/system/console-login> setprop ttymon/terminal_type = "xterm"

# svc:/system/console-login> exit

72

Saturday, May 2, 2009

Page 73: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

The boot process S10 SPARC

ufsbootblock ufsboot /kernel/unix

init/etc/svc/\

repository.db svc.startd

inittab

73

Saturday, May 2, 2009

Page 74: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

The boot process S10 X86

BIOS boot block /kernel/unixGRUB

init/etc/svc/\

repository.db svc.startd

inittab

74

Saturday, May 2, 2009

Page 75: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

GRUB

Nice, fairly standard boot management

Controlled via /boot/grub files

RAMdisk image created automatically when system files changed, used to speed boot

Create manually if needed via

bootadm update-archive

75

Saturday, May 2, 2009

Page 76: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svcsDisplays services and stati

# svcsSTATE STIME FMRIlegacy_run Feb_28 lrc:/etc/rcS_d/S50sk98sollegacy_run Feb_28 lrc:/etc/rc2_d/S10lulegacy_run Feb_28 lrc:/etc/rc2_d/S20sysetup

legacy_run Feb_28 lrc:/etc/rc2_d/S40llc2. . .legacy_run Feb_28 lrc:/etc/rc3_d/S84appservlegacy_run Feb_28 lrc:/etc/rc3_d/S90sambaonline Feb_28 svc:/system/svc/restarter:default

online Feb_28 svc:/network/pfil:defaultonline Feb_28 svc:/system/filesystem/root:defaultonline Feb_28 svc:/network/loopback:defaultonline Feb_28 svc:/milestone/name-services:default. . .

76

Saturday, May 2, 2009

Page 77: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svcs (cont)Displays details about services (i.e. what failed)

# svcs -x

svc:/application/print/server:default (LP print server)

State: disabled since Mon Feb 28 11:01:34 2005

Reason: Disabled by an administrator.

See: http://sun.com/msg/SMF-8000-05

See: lpsched(1M)

Impact: 2 dependent services are not running. (Use -v for list.)

Displays info on all services (even disabled ones)

# svcs -a77

Saturday, May 2, 2009

Page 78: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svcs (cont)

Displays details about services (i.e. what depends on what)

# svcs –xv ssh

STATE STIME FMRI

online Feb_28 svc:/network/ssh:default

Feb_28 366 sshd

78

Saturday, May 2, 2009

Page 79: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svcadm

Changes service states permanently (unless –t option used)

# svcs sendmail

STATE STIME FMRI

online Feb_28 svc:/network/smtp:sendmail

# svcadm disable sendmail

# svcs sendmail

STATE STIME FMRI

disabled 17:46:01 svc:/network/smtp:sendmail

79

Saturday, May 2, 2009

Page 80: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svcpropList the properties of a service via

svcprop service_name

# svcprop zonesgeneral/enabled boolean falsegeneral/entity_stability astring Unstable

general/single_instance boolean truemulti-user-server/entities fmri svc:/milestone/multi-user-server

multi-user-server/grouping astring require_allmulti-user-server/restart_on astring nonemulti-user-server/type astring service

startd/duration astring transientstart/exec astring /lib/svc/method/svc-zones\ %m

start/timeout_seconds count 60start/type astring methodstop/exec astring /lib/svc/method/svc-zones\ %m

stop/timeout_seconds count 500stop/type astring method

80

Saturday, May 2, 2009

Page 81: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svcprop (cont)tm_common_name/C ustring Solaris\ zonestm_man_zones/manpath astring /usr/share/mantm_man_zones/section astring 5tm_man_zones/title astring zonestm_man_zoneadm/manpath astring /usr/share/mantm_man_zoneadm/section astring 1Mtm_man_zoneadm/title astring zoneadmrestarter/logfile astring /var/svc/log/system-

zones:default.logrestarter/start_pid count 525restarter/start_method_timestamp time 1144642223.336907000restarter/start_method_waitstatus integer 0restarter/transient_contract count restarter/auxiliary_state astring nonerestarter/next_state astring nonerestarter/state astring onlinerestarter/state_timestamp time 1144642223.379661000

81

Saturday, May 2, 2009

Page 82: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

inetadm

SMF component that manages inet services

Now inetd is a subcomponent

Original inetd.conf entries are now services

Any changes to inetd.conf reflected in changes to services

Only when inetconv is run

82

Saturday, May 2, 2009

Page 83: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

inetadm (cont)# inetadm

ENABLED STATE FMRI

enabled online svc:/application/x11/xfs:default

enabled online svc:/application/font/stfsloader:default

enabled offline svc:/application/print/rfc1179:default

enabled online svc:/network/rpc/metamed:default

enabled online svc:/network/rpc/metamh:default

enabled online svc:/network/rpc/gss:default

disabled disabled svc:/network/rpc/ocfserv:default

enabled online svc:/network/rpc/smserver:default

disabled disabled svc:/network/rpc/rex:default

. . .

enabled online svc:/network/shell:default

disabled disabled svc:/network/shell:kshell

disabled disabled svc:/network/talk:default

enabled online svc:/network/rpc-100235_1/rpc_ticotsord:default

enabled online svc:/network/imap/tcp:default

enabled online svc:/network/imaps/tcp:default

enabled online svc:/network/pop3/tcp:default

enabled online svc:/network/pop3s/tcp:default

83

Saturday, May 2, 2009

Page 84: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

inetadm -p# inetadm -p NAME=VALUEbind_addr=""bind_fail_max=-1bind_fail_interval=-1max_con_rate=-1max_copies=-1con_rate_offline=-1failrate_cnt=40failrate_interval=60inherit_env=TRUEtcp_trace=FALSEtcp_wrappers=FALSE

84

Saturday, May 2, 2009

Page 85: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

inetadm -l# inetadm -l imap/tcp

SCOPE NAME=VALUE

name="imap"

endpoint_type="stream"

proto="tcp"

isrpc=FALSE

wait=FALSE

exec="/opt/csw/sbin/imapd"

user="root"

default bind_addr=""

default bind_fail_max=-1

default bind_fail_interval=-1

default max_con_rate=-1

default max_copies=-1

default con_rate_offline=-1

default failrate_cnt=40

default failrate_interval=60

default inherit_env=TRUE

default tcp_trace=FALSE

default tcp_wrappers=FALSE

85

Saturday, May 2, 2009

Page 86: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

inetadm –m -M

Modify a property of an inetd service

# inetadm -m ftp tcp_trace=TRUE

Modify one of the inetd properties

# inetadm -M tcp_wrappers=TRUE

86

Saturday, May 2, 2009

Page 87: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svccfgManipulates data in service configuration repository

Full, rich feature set

For example:

# svccfg

svc:> list

. . .

network/imap/tcp

network/imaps/tcp

network/pop3/tcp

network/pop3s/tcp

svc:> network/pop3/tcp

network/pop3s/tcp

svc:> select telnet

svc:/network/telnet> listprop

. . .

general framework

general/entity_stability astring Unstable

general/restarter fmri svc:/network/inetd:default

87

Saturday, May 2, 2009

Page 88: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

svccfg (cont)svc:/network/telnet> setprop

Usage: setprop pg/name = [type:] value setprop pg/name = [type:] ([value...])

Set the pg/name property of the currently selected entity. Values may be

enclosed in double-quotes. Value lists may span multiple lines.svc:/network/telnet> helpGeneral commands: help set repository end

Manifest commands: inventory validate import export archiveProfile commands: apply extract

Entity commands: list select unselect add deleteSnapshot commands: listsnap selectsnap revertProperty group commands: listpg addpg delpg

Property commands: listprop setprop delprop editpropProperty value commands: addpropvalue delpropvalue setenv

unsetenv

88

Saturday, May 2, 2009

Page 89: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

MilestonesAugment, not replacement for old “run levels”If all services for milestone running, milestone reached

If not, milestone not reached

Milestone configurations:# ls /var/svc/manifest/milestone

multi-user-server.xml name-services.xmlsingle-user.xml multi-user.xml network.xml

sysconfig.xml# svcs "svc:/milestone/*" online Sep_22 svc:/milestone/name-services:default

online Sep_22 svc:/milestone/network:default online Sep_22 svc:/milestone/devices:default

online Sep_22 svc:/milestone/single-user:default online Sep_22 svc:/milestone/sysconfig:default online Sep_22 svc:/milestone/multi-user:default

online Sep_22 svc:/milestone/multi-user-server:default

89

Saturday, May 2, 2009

Page 90: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Milestones vs. Run LevelsLiane Praza's WeblogFriday February 04, 2005

smf milestones, runlevels, and system maintenance

A number of questions about smf(5) milestones have been surfacing lately, so I'll try to give a summary of the topic and answer a few common questions here.

An smf(5) milestone is really nothing more than a service which aggregates a bunch of service dependencies. Usually, a milestone does nothing useful itself, but declares a specific state of system-readiness which other services can depend upon. One example is the name-services milestone. It simply depends upon the possible name services you might be running:

$ svcs -d name-services STATE STIME FMRI disabled Jan_04 svc:/network/rpc/nisplus:default

disabled Jan_04 svc:/network/dns/client:default disabled Jan_04 svc:/network/ldap/client:default

online Jan_04 svc:/network/nis/client:default

90

Saturday, May 2, 2009

Page 91: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Milestones vs. Run and has no useful actions to perform during the start or stop method:

$ svcprop -p start name-services

start/exec astring :true

start/timeout_seconds count 3

start/type astring method

$ svcprop -p stop name-services

stop/exec astring :true

stop/timeout_seconds count 3

stop/type astring method

The name-services milestone is considered online as long as any name services which are enabled are running. There's also nothing different about these milestones to smf(5), it just sees them as yet-another-service.

We've implemented standard Unix system run-levels in smf(5) using milestones. The single-user, multi-user, and multi-user-server milestones correspond to run-levels S, 2, and 3, respectively. In addition to the runlevel milestones, there are the all and none keywords. These aren't actual services, but shorthand for either the graph with no services, or the graph with all services. This set of five special milestones can either be booted directly to (boot -m milestone=) or reached by running svcadm milestone. As mentioned in a previous entry, the way we reach a limited milestone (any special milestone but all) is to temporarily disable all services which aren't part of the milestone's subgraph.

91

Saturday, May 2, 2009

Page 92: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Milestones vs. Run Levels (cont)

A common question is why the console-login service is disabled if you boot to a milestone that isn't all. This can easily be determined by looking at console-login's dependents.

$ svcs -D console-login

STATE STIME FMRI

As there are no milestones which have console-login as one of their dependencies, it won't be started as part of any milestone but all. Fortunately, we'll always start an sulogin(1M) prompt if a login service can't be reached.

So, why are milestones useful then? The most useful milestone is none, for the recovery/exploration scenario I described here. The other use is when doing service development. You can use svcadm milestone to transition to limited milestones then back up without rebooting the system.

There's a large omission in my description of milestone use above. I don't mention system maintenance or patching anywhere. A very common question is: Should I stop using init s, boot -s, and my other standard procedures to change runlevels and perform standard system maintenance? Emphatically, no! Your old favorite commands continue to work as they always have. There's no need to change procedures. There's no reason to retrain your fingers with a much longer-to-type command when init s works just fine. The init invocations will work just like they always have, where svcadm milestone won't. For example, running svcadm milestone svc:/milestone/single-user:default won't change the run-level of the system (as described by who -r). Running init s will.

92

Saturday, May 2, 2009

Page 93: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SMF Notessvcs –a shows all services, no matter the stateAlso of interest

svcadm restart – restart the servicesvcadm refresh – reread the service configurationsvcs –d FMRI – shows named service and parentssvcs –D FMRI – shows named service and dependentsboot –m milestone – boots to named milestonesvcadm milestone – transitions to named milestonesvccfg apply /var/svc/profile/generic_limited_net.xml – disables generic extraneous network daemons

93

Saturday, May 2, 2009

Page 94: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SMF Notes (cont)/var/svc/profile/site.xml – copied by jumpstart script as default set of services on jumpstarted systems

Check out the Q&A (FAQ): http://mail.opensolaris.org/pipermail/smf-discuss/2006-June/000672.html

Never modify manifests in place. Always use svccfg to modify or customize a service

Web site to create SMF manifests http://es.opensolaris.org/easySMF/

94

Saturday, May 2, 2009

Page 95: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

LabsWhat services are running?

How do you parse the output of svcs?

Which are disabled? Failed?

What does inetd.conf look like?

What is in the rc directories?

What do the service log files show?

Kill off an unimportant service via kill

What happened

Disable it via SMF

Where is the SMF configuration information stored?

How would you change the parameters of a service?

What does an RPC service look like now?

95

Saturday, May 2, 2009

Page 96: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Labs (cont)

What profiles are available?

What run level are we at?

How would you enable tcp-wrappers?

96

Saturday, May 2, 2009

Page 97: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Tips and Tricks

Systems Management

Saturday, May 2, 2009

Page 98: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•Patches•Fault management architecture (FMA)•Crash and core dumps•Odds and Ends•Analyzing a system

98

Topics

Saturday, May 2, 2009

Page 99: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•Sun patches come with installpatch and backoutpatch routines to automate patching •Solaris >= 2.6 includes new patch commands

•patchadd (accepts path or URL for patch!)•patchrm

•All patch operations are logged in the /var/sadm/patch subdirectories as well•installpatch -u doesn’t verify file attributes•installpatch -d doesn’t save the original files•Big disks -> don’t use this option

Use pkgrm to remove packages to avoid them being patched (sendmail et al)

99

Patches

Saturday, May 2, 2009

Page 100: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

OpenSolaris

Or run OpenSolaris and never patch again

Rather, upgrade packages

Including the kernel!

100

Saturday, May 2, 2009

Page 101: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•Other useful patch tools• SUC(E) Sun Update Connection (Enterprise)• smpatch (nee’ patchpro) (from Sunsolve)• patchDiag (from Sunsolve)• patchcheck (from Sunsolve)• patchreport http://www.cs.duke.edu/~wjs/pr.html

•Watch out for Sun patchmanager•Doesn’t warn of reboot need•Doesn’t warn of special instructions

• Patches from• ftp:sunsolve.sun.com

http://sunsolve.sun.com

101

Patches (cont)

Saturday, May 2, 2009

Page 102: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Patches (continued)New with Nevada is the Sun Update Manager

GUI interface built into CDE(?) and Xorg

Automatic patch updates (like Windows!)

Can use proxy, caching, etc.

Possibly a command line interface as well

Future of this unclear

Also, Sun bought Aduva, which will result in Sun Update Connection Enterprise 1.0 (Sun UCE 1.0) - now part of Sun xVM Ops Center

102

Saturday, May 2, 2009

Page 103: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Patch PhilosophyOn new systems, install

Recommended Patches

Suggested Patches

Sometimes not the same thing!

Install security patches if you care

Install point patches only if you see symptoms

Watch out – patches can overwrite security changes (startup script removal, sendmail, inetd.conf changes)

Retest after changes made

103

Saturday, May 2, 2009

Page 104: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

FMA

104

Saturday, May 2, 2009

Page 105: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

FMANew with Solaris 10, Solaris Fault Management Architecture (called predictive self-healing by marketing)

Two components – service manager and fault manager

Fault manager designed to detect faults (as before) and analyze them

Can reduce downtime / debugging by not “waiting for that problem to happen again”

New daemon runs by default at boot – fmdStill logs to syslog et al, and /var/fm/fmd/fltlog

Command line interface fmadm

fmdump

fmstat

Currently, better hw info from SPARC than Opteron CPUs105

Saturday, May 2, 2009

Page 106: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

FMA Fault ManagementShould be much more likely to catch and debug intermittent or correctable error and point to a correction: (from bigadmin article)

SUNW-MSG-ID: SUN4U-8000-6H, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Sun Oct 17 14:15:50 PDT 2004 PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: myhost EVENT-ID: 64fe6c23-12b7-ccd1-f0a7-b531941738f8 DESC: The number of errors associated with this CPU has exceeded acceptable levels. Refer to http://sun.com/msg/SUN4U-8000-6H for more information. AUTO-RESPONSE: An attempt will be made to remove the affected CPU from service. IMPACT: Performance of this system may be affected. REC-ACTION: Schedule a repair procedure to replace the affected CPU. Use fmdump -v -u <EVENT_ID> to identify the CPU.

106

Saturday, May 2, 2009

Page 107: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

fmadmMain administrative interface

# fmadm

Usage: fmadm [-P prog] [-q] [cmd [args ... ]]

fmadm config - display fault manager configuration

fmadm faulty [-ai] - display list of faulty resources

fmadm flush <fmri> ... - flush cached state for resource

fmadm load <path> - load specified fault manager module

fmadm repair <fmri>|<uuid> - record repair to resource(s)

fmadm reset [-s serd] <module> - reset module or sub-component

fmadm rotate <logname> - rotate log file

fmadm unload <module> - unload specified fault manager module

# fmadm config

MODULE VERSION STATUS DESCRIPTION

cpumem-retire 1.0 active CPU/Memory Retire Agent

eft 1.12 active eft diagnosis engine

fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis

io-retire 1.0 active I/O Retire Agent

syslog-msgs 1.0 active Syslog Messaging Agent

107

Saturday, May 2, 2009

Page 108: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

fmdump

Facility to display fault logs and detailed information (from bigadmin article)

# fmdump -v -u 64fe6c23-12b7-ccd1-f0a7-b531941738f8

TIME UUID SUNW-MSG-ID Oct 17 14:15:50.1630 64fe6c23-12b7-ccd1-f0a7-b531941738f8 SUN4U-8000-6H 100% fault.cpu.ultraSPARC-III.l2cachedata FRU: hc:///component=Slot 1 rsrc:

cpu:///cpuid=1/serial=1107C270C8A

108

Saturday, May 2, 2009

Page 109: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

fmstat

Information about resource use by FMA

# fmstat

module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz

cpumem-retire 0 0 0.0 0.0 0 0 0 0 0 0

eft 0 0 0.0 0.0 0 0 0 0 260K 0

fmd-self-diagnosis 0 0 0.0 0.0 0 0 0 0 0 0

io-retire 0 0 0.0 0.0 0 0 0 0 0 0

syslog-msgs 0 0 0.0 0.0 0 0 0 0 32b 0

109

Saturday, May 2, 2009

Page 110: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

logadm

Tool for managing log files

Configurable to automatically rotate files, delete old files, etc

110

Saturday, May 2, 2009

Page 111: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

FMA Odds and Ends

FMA supports AMD “M2” CPUs (Rev F)

Enabled by default

S10 8/07 provides predictive self-healing on PCI-Express on x64 systems

111

Saturday, May 2, 2009

Page 112: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Odds and Ends

112

Saturday, May 2, 2009

Page 113: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

routeadm

routeadm now the proper way to manage use of routing and forwarding

# routeadm -e ipv4-forwarding

113

Saturday, May 2, 2009

Page 114: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Sun SRS NetConnectHas had many names over time

Now useful, free (if you have “good” support contract), going away!?

Time to have a new look, but possibly going away in favor of xVM Ops Center

Can send data back to Sun or to a server at your site

Provides patch info, uptime, performance monitoring, event monitoring, etc

But does not phone home for service callsYou have to do that

http://www.sun.com/service/netconnect/

114

Saturday, May 2, 2009

Page 115: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Must ReadFinally, Sun has documented kernel tunables

Read“Solaris Tunable Parameters Reference Manual”

Unique per Solaris release, starting with S8

At docs.sun.com (for free)115

Saturday, May 2, 2009

Page 116: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

mpathadm

Solaris 10 U 3 and beyond

Tool to manage multipathing via ANSI standard API

Probably the best way to manage storage multipathing

116

Saturday, May 2, 2009

Page 117: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyzing a System

Methodology I use when approaching a “broken” system

“Slow” system, failing applications, etc

Learned the hard way, I always regret skipping any steps

117

Saturday, May 2, 2009

Page 118: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyzing a System - CaptureCapture problem definition as succinctly as possible

Helps avoid the “death spiral”

When did the problem start

What invokes it

What avoids it

What is it

What changes were made before it started

What debugging / analyzing / testing changes made since the start

What existing diagnosis is available (performance trends, performance monitoring tools)

118

Saturday, May 2, 2009

Page 119: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyzing a System - Capture 2Capture available testing resources

Any dev or Q/A systems available?

Ability to reproduce the problem

Ability to test and make changes in production

Ability to test under load, load generation tools

Downtime windows, change limits (validated system, production lockdown)

Change deployment method and cycle

Production testing possible? Performance impact possible?

Capture state with explorer

Now part of “Services Tools Bundle”

http://www.sun.com/download/products.xml?id=47c7250a

Capture state with GUDS

Apparently only available from Sun Support on an as-needed basis119

Saturday, May 2, 2009

Page 120: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyzing a System - AuditOS release and patch level

Application release and patch level

Is the application level supported on the OS level?

Scan through dmesg / /var/adm/messages

Don’t ignore anything odd - could be the canary

Check system health via

df (full disks, mount options,fs types via -n)

ifconfig (network param mismatch), kstat (grep for interface name)

/etc/system (inherited settings, system variables)

/etc/projects (system settings)

Quick scan of “the usual suspects”

iostat, vmstat, netstat, prstat, mpstat, lockstat, intrstat

Then go process-level if the problem can be narrowed down

120

Saturday, May 2, 2009

Page 121: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Next StepsHave McDougall and Mauro books handy

Install DTracetoolkit if possible

Have DTrace one-liners handy

Watch for system overhead

Is current scheduler class appropriate

If the system isn’t time-sharing, don’t run with time-sharing scheduler

If processes stepping on each other or one running amok, consider implementing limits as possible based on the OS

Processor sets

CPU caps, memory limits121

Saturday, May 2, 2009

Page 122: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Drill DownFrom “Performance Analysis Using DTrace” by Benoit Chaffanjon - here are some examples but read the paper

Graph of time spent in each system call by each process

syscall:::entry

/uid != 0/

{

self->tm = timestamp

}

syscall:::return

/self->tm/

{

@[execname, pid, probefunc] = quantize(timestamp - self->tm);

self->tm = 0

}

Short lived processes

dtrace -n 'proc:::exec{printf("%s execing %s, ,

uid/zone =%d/%s\n",execname,args[0],uid,zonename)}'

122

Saturday, May 2, 2009

Page 123: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Drill Down - 2

Code compiled on an old compiler may work but will not perform well

If not stripped, detect with an old friend :

dump -c $1 |grep "WS6U2" (or SUNWspro)

If stripped, usage of obsoleted library or functions like .mul or .div are a sign of an old v7 compiler

Recompilation is key get SunStudio - its free at

http://www.sun.com/software/solaris/get.jsp

123

Saturday, May 2, 2009

Page 124: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Drill Down - 3Error management is time consuming. Detect and fix them before moving forward

It will change your performance picture !

The most common error is Error #2 - File not found

How to detect them (or errinfo):

/usr/sbin/dtrace -qn 'syscall:::return /errno != 0 && pid != $pid/

{ \@Errs[execname,probefunc,errno] = count(); }

dtrace:::END {printa("%s %s %d %\@d\\n",\@Errs); }'

Not a Number (NaN) exception handling is OS

managed on UltraSparc III

Detect with :

# kstat -n fpu_traps

fpu_unfinished_traps 77652

Only way to fix it : upgrade the CPUs to UltraSPARC-IV , SPARC VI 64, Opteron or Xeon

124

Saturday, May 2, 2009

Page 125: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Drill Down - 4Identify application log writing impact

Who is doing what ?

dtrace -n 'io:::start{@[execname, args[2]->fi_pathname] = count()}'

And what is the block size ?

dtrace -n 'io:::start{@[execname, args[2]->fi_pathname] =

quantize(args[0]->b_bufsize)}

Need hot spots or number of pending I/Os (and more),

Beyond the DTracetoolkit: ($) Ortera Atlas http://www.ortera.com

125

Saturday, May 2, 2009

Page 126: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Drill Down - 5

Use the proper chip for the proper workload. See http://www.spec.org and http://www.tpc.org

Single threaded workloads are common. Make sure your application is multi-threaded and that it works.

Verify with :

profile:::profile-100hz /pid/{@[pid, execname] = lquantize(cpu, 0, 512,

1);}

Fixed priority and FSS are good practices - priocntl

Processor sets and process binding are good tuning tools Example : Oracle database redo log process

126

Saturday, May 2, 2009

Page 127: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Drill Down - 6Draw your system memory map with :

mdb -k << !

::memstat

!

Then, drill down per process with pmap -sx

The memory allocator matters : libumem.so (32 or 64 bit) often yields performance gains.

If Java, think garbage collection. -XX:ParallelGc and -XX:AggressiveHeap works well on SMP

The memory block size matters

Use large 4M pages when possible. You can change it on the fly with ppgsz

LD_PRELOAD=$LD_PRELOAD:mpss.so.1 can be used to control the page size used by any software

Setting MPSSHEAP=size will control the heap pages

Setting MPSSSTACK=size will control the stack pages

pmap will be used to verify if the change worked

127

Saturday, May 2, 2009

Page 128: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Analyze a System - Drill Down - 7

1Gb isn’t that fast any more

Difficult to spot as a bottleneck

DTrace not fully IP-stack enabled

In the mean time check out nicstat:http://www.brendangregg.com/K9Toolkit/nicstat

All this and more in a SAGE wiki: XXX

128

Saturday, May 2, 2009

Page 129: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Sys Admin Labs

Explore the commands in this section

129

Saturday, May 2, 2009

Page 130: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

You can tune a file system but you can’t tuna fish

Performance

Saturday, May 2, 2009

Page 131: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

•DTrace•Other Old and New Important Performance Tools

131

Overview

Saturday, May 2, 2009

Page 132: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Overview of Performance Tools – Process Stats(Courtesy of McDougall and Mauro)

Process Statscputrack - per-processor hw counters

pargs – process arguments

pflags – process flags

pcred – process credentials

pldd – process's library dependencies

psig – process signal disposition

pstack – process stack dump

pmap – process memory map

pfiles – open files and names

prstat – process statistics

ptree – process tree

ptime – process microstate times

pwdx – process working directory132

Saturday, May 2, 2009

Page 133: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Overview of Performance Tools – Process Control (Courtesy of McDougall and Mauro)

Process Control

pgrep – grep for processes

pkill – kill processes list

pstop – stop processes

prun – start processes

prctl – view/set process resources

pwait – wait for process

preap – reap a zombie process133

Saturday, May 2, 2009

Page 134: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Overview of Performance Tools – Tracing (Courtesy of McDougall and Mauro)

Process Tracing/debuggingabitrace – trace ABI interfacesdtrace – trace the worldmdb – debug/control processestruss – trace functions and system calls

Kernel Tracing/debuggingdtrace – trace and monitor kernellockstat – monitor locking statisticslockstat -k – profile kernelmdb – debug live and kernel cores

134

Saturday, May 2, 2009

Page 135: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Overview of Performance Tools – System Stats (Courtesy of McDougall and Mauro)

System Statsacctcom – process accounting

busstat – Bus hardware counters

cpustat – CPU hardware counters

iostat – IO & NFS statistics

kstat – display kernel statistics

mpstat – processor statistics

netstat – network statistics

nfsstat – nfs server stats

sar – kitchen sink utility

vmstat – virtual memory stats

intrstat - interrupt stats135

Saturday, May 2, 2009

Page 136: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Before You DTraceDon’t skip standard methods in a rush to DTrace

Use macro- and micro-techniques appropriately

df, vmstat (sr column especially), mpstat (CPU use), prstat,intrstat, iostat (high response times)

Check /var/adm/messages

Then DTrace!136

Saturday, May 2, 2009

Page 137: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace

137

when it comes to foilinghackers, Saman Amarasinghe views theworld in stark terms.

“There is a black-and-white line,”says Amarasinghe, an associate pro-fessor at Massachussets Institute ofTechnology and CTO of Determina,maker of a host-based intrusion pre-vention system based on what thecompany calls Memory Firewall tech-nology. “White is what a pro-gram would normally do.Black is what an exploitwould force it to do.”

To better protect theWindows operating envi-ronment from intrusion,Amarasinghe teamed up afew years ago with two MITstudents, Derek Bruening andVladimir Kiriansky. Their efforts even-tually produced Determina’sSecureCore IPS, which inspects appli-cations at run time to ensure that noneis executing malicious code.

For at least a year before SecureCore

hit the market, the developers mar-shaled a small army of security expertsand attacked it with “almost every[software] exploit known to man,”Amarasinghe says. The result was “anunbreakable system we’re all prettyproud of.” In its battle tests just beforeits release, SecureCore preventednumerous viruses from executing onWindows server and desktop systems

in Determina’s lab.When the product was

merely a gleam in thedevelopers’ eyes, “We werebuilding compilers andlooking closely at applica-tions to build a dynamic

optimization system,” Bru-ening recalls. “There are a lot

of rules broken at the lowest levelthat programmers aren’t necessarilythinking about.”

Determina describes the technologyas a Memory Firewall because, insteadof using signatures to detect maliciouscode, SecureCore monitors individual

distributed environments these con-cepts didn’t exist, or they existed onlyon silo systems.”

Now the transactions users are run-ning and their impact on the system asa whole can be made completely visibleto the enterprise, Alon says. “You canisolate what part of your infrastructureis causing performance problems.”

More important, CoreFirst allows IT

departmentsto assign prioritiesto different types of transactions. Forexample, a financial institution mightgive priority to its online banking cus-tomers, allotting more system resourcesto them and fewer to lower-priorityjobs. This allows enterprises to improvecustomer service, use their resourcesmore efficiently, and avoid expensive

36 I N F O W O R L D . C O M 0 8 . 0 1 . 0 5

Amir Alon, Yori Lavi, and Israel MazinOpTier’s enterprise management app takes holistic approach to BPM

Saman Amarasinghe, DerekBruening, andVladimir KirianskyMemory Firewall monitors apps at run time to ward off malicious code

when amir alon, yori lavi, andIsrael Mazin decided to form a startupin 2002, they went to some Fortune1000 corporations and asked what kepttheir IT pros awake at night. Theanswer: measuring and maintainingperformance as their IT infrastructureshifted from client-server to .Net andJ2EE architectures.

“We were astonished,” Alon says.“There were all these big players in theperformance management market, buttheir solutions couldn’t adapt to the newparadigm, where a single business trans-action can span many different systems.”

The former Memco and Compuwareexecutives called their New York-basedcompany OpTier, with the goal of opti-mizing performance across all tiers ofthe enterprise. Two years later theyintroduced CoreFirst, a performancemanagement app that enables corpora-tions to analyze activity at every level,from mainframes and middleware toWeb and application servers.

With CoreFirst, every IT “transac-tion” — whether it’s pulling data from aserver, requesting a Web page, or send-ing a job to a network printer — isassigned a unique ID. Software agentsinstalled at every branch of the networkfollow the transaction as it flows fromone system to the next, providing real-time analysis that allows IT managersto quickly identify system bottlenecks.CTO Alon says managers can get asgranular as they want, looking at par-ticular types of transactions or drillingdown to view a single user’s activity.

Alon says the idea sprang from hisdays as an IBM mainframe perfor-mance analyst in the early 1980s. WithBig Blue iron, transaction tracking andworkload management were tightlyintegrated into the OS, he says. “But in

Saturday, May 2, 2009

Page 138: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Quick Overview - From the Horse’s Mouth

http://www.youtube.com/watch?v=6chLw2aodYQ&fmt=18

138

Saturday, May 2, 2009

Page 139: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Dtrace OverviewBest tool ever for understanding system behavior

Dynamic probes within the kernel, processes

Has its own programming language (D)

Zero overhead until used

Focus on usability in production systems

Can be used to find out about almost all happenings in the kernel

Interview with the developers - http://www.samag.com/documents/s=9171/sam0406h/0406h.htm

See talk from Usenix 2004

blogs.sun.com/bmc (!)

Note that all code from the Solaris Dynamic Tracing Guide is available in /usr/demo/dtrace

Language interfaces being designed (i.e. what is this Java object doing?)

Already available for python, per, java (SE6), othersl

Still need to start from system-wide commands (mpstat et al)

(some following slides are from DTrace Boot Camp talk by Adam Leventhal – DTrace team member at Sun)

139

Saturday, May 2, 2009

Page 140: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTraceFully scalable

Enabled in Solaris 10 – no custom kernel or configuration changes needed

Way to much to cover here

Will cover how to use, the D language, probes, variables, actions, basics of using DTrace for kernel, and process

Good example code available at http://users.tpg.com.au/adsln4yb/dtrace.html

All DTrace resources at http://www.sun.com/bigadmin/content/dtrace

DTrace discussion list at

http://www.opensolaris.org/os/community/dtrace

140

Saturday, May 2, 2009

Page 141: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace

141

(From Sun “how-to” doc)

Saturday, May 2, 2009

Page 142: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Example - 1connections.d snoop inbound TCP connections as they are established, displaying the server process that accepted the connection.

# ./connections.d

UID PID IP_SOURCE PORT CMD

0 254 192.168.001.001 23 /usr/sbin/inetd -s

0 254 192.168.001.001 23 /usr/sbin/inetd -s

0 254 192.168.001.001 79 /usr/sbin/inetd -s

0 254 192.168.001.001 21 /usr/sbin/inetd -s

0 254 192.168.001.001 79 /usr/sbin/inetd -s

100 2319 192.168.001.001 6000 /usr/openwin/bin/Xsun :0 -nobanner

0 254 192.168.001.001 79 /usr/sbin/inetd -s

[...]

142

Saturday, May 2, 2009

Page 143: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Example - 2

The following script counts number of write(2) calls by application:

syscall::write:entry{@counts[execname] = count();}

143

Saturday, May 2, 2009

Page 144: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Example - 4# dtrace -s write-calls-by-app.d

dtrace: script 'write-calls-by-app.d' matched 1 probe

^C

dtrace 1

login 1

sshd 2

sh 6

telnet 6

w 7

df 12

in.telnetd 25

mixer_applet2 61

gnome-panel 108

metacity 125

gnome-terminal 197

#

144

Saturday, May 2, 2009

Page 145: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Example - 5Let’s have a look at the size of the writes to file descriptor 5, per section of user code (!)

syscall::write:entry/execname == "sshd" && arg0 == 5/

{@[ustack()] = quantize(arg2);}

145

Saturday, May 2, 2009

Page 146: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Example - 6bash-2.05b# dtrace -s write-sshd-fd-5.d

dtrace: script 'write-sshd-fd-5.d' matched 1 probe

^C

libc.so.1`_write+0xc

sshd`atomicio+0x2d

805b59c

sshd`main+0xd59

805b1fa

value ------------- Distribution ------------- count

8 | 0

16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1

32 | 0

libc.so.1`_write+0xc

sshd`packet_write_poll+0x2e

sshd`packet_write_wait+0x23

sshd`userauth_finish+0x19f

805f42e

sshd`dispatch_run+0x49

sshd`do_authentication2+0x7c

sshd`main+0xdc7

805b1fa

146

Saturday, May 2, 2009

Page 147: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Example - 7#!/usr/sbin/dtrace -s#pragma D option flowindentpid$1::$2:entry{self->trace = 1;}pid$1:::entry, pid$1:::return, fbt:::/self->trace/{printf("%s", curlwpsinfo->pr_syscall ?"K" : "U");}pid$1::$2:return/self->trace/{self->trace = 0;}

147

Saturday, May 2, 2009

Page 148: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved 148

Saturday, May 2, 2009

Page 149: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Toolkit

DTrace Toolkit with lots (> 90) of great scripts

Includes scripts for Python, Perl, Java, PHP, Ruby, Tcl, Javascript

Best starting point for learning DTracehttp://www.opensolaris.org/os/community/dtrace/dtracetoolkit/

149

Saturday, May 2, 2009

Page 150: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Toolkit Hits

dexplorer - run a lot of tools for a few seconds and log output to a file

Other key scripts include

dtruss, dvmstat, execsnoop, hotkernel, hotuser, errinfo, iopattern, iosnoop, iotop, opensnoop, procsystime, rwsnoop, rwtop, statsnoop

150

Saturday, May 2, 2009

Page 151: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Lab Preliminary StepsGet a Solaris 10 machine

Become the root user

Make a new directory – use it to save all the examples from this talk

Might want to record your command-line history for future reference

In a zone, DTrace has limited functionality so not all examples will work

151

Saturday, May 2, 2009

Page 152: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Introduction to DTrace

Quick start overview of DTrace and programming basics

Listing probes

Enabling probes

Built-in variables

The trace(), and printf() actions

Then more actions, examples, fun!152

Saturday, May 2, 2009

Page 153: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

What is a Probe?Provided by a “provider”

Many providers, including

syscall - system calls entry / return

vminfo - virtual memory stats

sysinfo - sysinfo stats

io - disk and NFS events

sched - system scheduling events

profile - fixed sampling

dtrace - BEGIN / END probes

pid - user-level tracing

fbt - raw kernel function entry / return tracing

153

Saturday, May 2, 2009

Page 154: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Probe Namesprovider:module:function:name

Some providers don’t use all 4 fields

Leave any field blank to match all

Fundamentally - determine proper provider then check the DTrace Guide

If using “syscall” provider, check man(2)

If using “pid” provider check program source code and man(3C)

If using “fbt” provider check http://cvs.opensolaris.org/source/

154

Saturday, May 2, 2009

Page 155: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Listing ProbesUse dtrace -l to list all probes

Use dtrace -lv to list all probes with attributes

Use dtrace -lP <provider> to list probes for specific provider

Can mix -l with -n to list probes matching a pattern

Specify probes by a four-tuple: provider:module:function:name

Any component can be blank

Exercise: list some probes

Exercise: combine -l and -n

Exercise: try using wildcards and grep for the various components of the probe tuple

155

Saturday, May 2, 2009

Page 156: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Enabling Probes

Try it:#dtrace -n syscall:::entry

Note here that “dtrace” is the “consumer”, and a probe is specified

Traces every system call on the system

What does the output mean?

Exercise: trace a single system call entry156

Saturday, May 2, 2009

Page 157: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

The trace() Action and VariablesUse the trace() action to trace any datum

e.g. results of computation, variables, etc.

Try tracing a value:#dtrace -n 'syscall:::entry { trace(10); }'

#dtrace -n 'syscall::fork*: { trace(pid); }'

Exercise: trace a variable

execname – currently running executable

timestamp – nanosecond timestamp

walltimestamp – seconds since the Unix epoch

pid, uid, gid, etc. – what you'd expect

157

Saturday, May 2, 2009

Page 158: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Tracing Process Creation

#dtrace -n 'syscall::exec*:return { trace (execname) ; }’

Try at “entry” rather than return - what happens?

158

Saturday, May 2, 2009

Page 159: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

D Program Structure

probe-description

{ action; action;

...}

159

Saturday, May 2, 2009

Page 160: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Predicates

Predicates are arbitrary D expressions that determine if a clause is executed

Specify a predicate like this: /arbitrary-expression/

Try limiting tracing to a particular executable#dtrace -n 'syscall:::entry/execname == “Xorg”/{}'

Exercise: mix predicates and the trace() action

160

Saturday, May 2, 2009

Page 161: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

D Program Structure

probe-description/predicate/

{ action; action;

...}

161

Saturday, May 2, 2009

Page 162: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

More VariablesEach part of a probe has an associated variable

probeprov – provider name

probemod – module containing the probe (if any)

probefunc – function containing the probe

probename – name of probe

Probes can have arguments (arg0, arg1, etc.)

Different for each provider and each probe

syscall entry probe arguments are the parameters passed to the system call

Exercise: try tracing system call arguments

162

Saturday, May 2, 2009

Page 163: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

The printf() Action

Modeled after printf(3C) – behaves as you'd expect

Small difference: 'l's (“ell”) not needed to specify argument width – but you can use them

Exercise: use printf to trace the pid and execname

Done? Try out your favorite printf() format characters

163

Saturday, May 2, 2009

Page 164: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Fun With walltimestamp

The printf() action has some additional format characters (some borrowed from mdb(1))

%Y can be used to format a date

Try it:#dtrace -n 'BEGIN{ printf(“%Y”, walltimestamp); }'

164

Saturday, May 2, 2009

Page 165: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

D-ScriptsCan do everything from the command-line

Big DTrace enabling can become confusing

Put them in an executable script: #!/usr/sbin/dtrace -s

syscall:::entry { trace(execname); }

Exercise: try it – make it executable

Quiet mode dtrace -q

Use \n to terminate each line when quiet mode165

Saturday, May 2, 2009

Page 166: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Aggregations

Often the individual data points are overwhelming

Aggregations provide a way of accumulating data

Done at the data source, so very efficient

Data stored efficiently on MP systems

Several aggregating functions

Aggregations can be keyed by an arbitrary tuple of D expressions

Think of them as associative arrays

By default, the contents of aggregations are printed when the consumer completes

e.g. when you hit ^C

166

Saturday, May 2, 2009

Page 167: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Simple Aggregation With count()

Aggregations are specified like this: @name[arbitrary-tuple] = action(arguments)

The name and tuple may be omitted

The arguments depend on the aggregating action

Try it: # dtrace -n 'syscall:::entry{ @ = count(); }'

Exercise: try specifying a name for the aggregation

Exercise: try adding tuple keys (comma separated)

Exercise: produce a count for each system call167

Saturday, May 2, 2009

Page 168: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

The quantize() Aggregating Action

The quantize() action is particularly useful for performance work

Takes a single numeric argument

Produces a histogram in power of two buckets

Try it:# dtrace -n 'syscall::write:entry{ @ = quantize(arg2); }'

168

Saturday, May 2, 2009

Page 169: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Using DTrace as Non-Root

DTrace use controlled by three least-privilege bits

dtrace_proc – process access

dtrace_user – user space access

dtrace_kernel – kernel access

Note that giving the dtrace_kernel proc to a user is (almost) equivalent to giving them root

169

Saturday, May 2, 2009

Page 170: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace and ZonesAs of Nevada 37, DTrace can run inside non-global zones

Note that zones can have proc and user but not kernel

# zonecfg -z myzonezonecfg:myzone> set limitpriv=“default,dtrace_proc,dtrace_user”zonecfg:myzone> exit

# zoneadm -z myzone boot# zlogin myzone

myzone# dtrace -l...

170

Saturday, May 2, 2009

Page 171: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace One-LinersSnarfed from http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_One_Liners

Processes * New processes with arguments, dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }'Files * Files opened by process name, dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }' * Files created using creat() by process name, dtrace -n 'syscall::creat*:entry { printf("%s %s",execname,copyinstr(arg0)); }'Syscalls * Syscall count by process name, dtrace -n 'syscall:::entry { @num[execname] = count(); }' * Syscall count by syscall, dtrace -n 'syscall:::entry { @num[probefunc] = count(); }' * Syscall count by process ID, dtrace -n 'syscall:::entry { @num[pid,execname] = count(); }' * Read bytes by process name, dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'I/O * Write bytes by process name, dtrace -n 'sysinfo:::writech { @bytes[execname] = sum(arg0); }' * Read size distribution by process name, dtrace -n 'sysinfo:::readch { @dist[execname] = quantize(arg0); }' * Write size distribution by process name, dtrace -n 'sysinfo:::writech { @dist[execname] = quantize(arg0); }'Physical I/O * Disk size by process ID, dtrace -n 'io:::start { printf("%d %s %d",pid,execname,args[0]->b_bcount); }' * Disk size aggregation dtrace -n 'io:::start { @size[execname] = quantize(args[0]->b_bcount); }' * Pages paged in by process name, dtrace -n 'vminfo:::pgpgin { @pg[execname] = sum(arg0); }'

171

Saturday, May 2, 2009

Page 172: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

More DTrace One-linersMemory * Minor faults by process name, dtrace -n 'vminfo:::as_fault { @mem[execname] = sum(arg0); }'

User-land * Sample user stack trace of specified process ID at 1001 Hertz dtrace -n 'profile-1001 /pid == $target/ { @num[ustack()] = count(); }' -p PID

* Trace why threads are context switching off the CPU, from the user-land perspective, dtrace -n 'sched:::off-cpu { @[execname, ustack()] = count(); }'

* User stack size for processes dtrace -n 'sched:::on-cpu { @[execname] = max(curthread->t_procp->p_stksize);}'

Kernel * Sample kernel stack trace at 1001 Hertz dtrace -n 'profile-1001 /!pid/ { @num[stack()] = count(); }'

* Interrupts by CPU, dtrace -n 'sdt:::interrupt-start { @num[cpu] = count(); }'

* CPU cross calls by process name, dtrace -n 'sysinfo:::xcalls { @num[execname] = count(); }'

* Trace why threads are context switching off the CPU, from the kernel perspective, dtrace -n 'sched:::off-cpu { @[execname, stack()] = count(); }'

* Kernel function calls by module dtrace -n 'fbt:::entry { @calls[probemod] = count(); }'

172

Saturday, May 2, 2009

Page 173: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

A Couple of My D Scripts#!/usr/sbin/dtrace -s

#pragma D option flowindent

pagefault:entry

{

self->mytime=timestamp;

self->t = 1;

}

pagefault:return

/self->t == 1/

{

@[probename] = quantize (timestamp - self->mytime);

self->mytime=0;

self->t = 0;

}

173

Saturday, May 2, 2009

Page 174: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

A Couple of My D Scripts#!/usr/sbin/dtrace -s

/* pass in a process ID, output is 1 line per file that failed to open */

#pragma D option quiet

syscall::open*:entry

/ pid == $1 /

{

self -> filename = copyinstr(arg0);

}

syscall::open*:return

/errno != 0 && pid == $1 && strlen(self -> filename) /

{

printf("%s %s %d %s\n",execname,probefunc,errno, self->filename);

}174

Saturday, May 2, 2009

Page 175: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

A Couple of My D Scripts#!/usr/sbin/dtrace -s -q

/* for a given pid sum up the amount of time spent in each system call */

syscall:::entry

/pid == $1/

{

self->tm = timestamp

}

syscall:::return

/self->tm/

{

@[probefunc] = quantize(timestamp - self->tm);

self->tm = 0

}

175

Saturday, May 2, 2009

Page 176: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Great DTrace TalkBy Benoît Chaffanjon, Sun Solutions Center

Good methodology, step-by-step examples, one-liners

Show all processes as created:dtrace -n 'proc:::exec{printf("%s execing %s, ,

uid/zone =%d/%s\n",execname,args[0],uid,zonename)}'

Watch for errors (they slow thread execution):/usr/sbin/dtrace -qn 'syscall:::return /errno != 0 && pid != $pid/ { \@Errs[execname,probefunc,errno] = count(); } dtrace:::END {printa("%s %s %d %\@d\\n",\@Errs); }'

Watch for errors (they slow thread execution):/usr/sbin/dtrace -qn 'syscall:::return /errno != 0 && pid != $pid/ { \@Errs[execname,probefunc,errno] = count(); } dtrace:::END {printa("%s %s %d %\@d\\n",\@Errs); }'

176

Saturday, May 2, 2009

Page 177: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Great DTrace Talk (cont)

Check number of threads in a processprofile:::profile-100hz /pid/{@[pid, execname] = lquantize(cpu, 0, 512, 1);}

Who is causing xcalls

dtrace -q -n 'xcalls { @[pid,tid,execname] = count(); }'

Full talk at: http://opensolaris.org/os/project/sdosug/past_meetings/Performance_Analysis_Using_DTrace.pdf

177

Saturday, May 2, 2009

Page 178: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Chime

First step toward a GUI for DTrace

Available open source at http://opensolaris.org/os/project/dtrace-chime/

178

Saturday, May 2, 2009

Page 179: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Sun Studio Compilers

Some very nice tools, including DProfile and D-Light

D-Light integrated D-Trace tool in Sun Studio 12 and above

GUI and Graphichttp://developers.sun.com/sunstudio/downloads/express/index.jsp

179

Saturday, May 2, 2009

Page 180: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

D-Light

180

Saturday, May 2, 2009

Page 181: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Ortera (commercial)

“A visual DTrace for storage”

Runs on SPARC S8, 9, 10; S10x86 servers

Sol, Win, Linux, Mac client

Haven’t tried it myself but have heard good things

http://www.ortera.com/products/index.html

181

Saturday, May 2, 2009

Page 182: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

DTrace Tutorial

“Context Switch” - UK consulting and training company.

Have a DTrace workshop and put course on-line

http://www.context-switch.com/performance/dtrace.htm

182

Saturday, May 2, 2009

Page 183: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Other Old and New Important Performance Tools

183

Saturday, May 2, 2009

Page 184: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

prstat“Best” process state tool

Updates like top

-m microstate accounting

-L light-weight processes info

-Z zone-based info

-t per-user info

-T task info

-v verbose process usage184

Saturday, May 2, 2009

Page 185: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

fsstatNew as of Nevada 36

Per file system statistics!

# fsstat -F

new name name attr attr lookup rddir read read write write

file remov chng get set ops ops ops bytes ops bytes

12.3K 1.48K 949 683K 8.94K 5.32M 16.8K 555K 266M 2.90M 2.58G ufs

0 0 0 2.51K 0 4.14K 1.66K 1.60K 313K 0 0 proc

0 0 0 83 1 117 2 8 19.5K 0 0 nfs

76.6K 15 3.36K 1.81M 268K 3.54M 1.58K 6.33K 15.4M 207K 1.26G zfs

0 0 0 21.3K 0 116K 843 323K 455M 0 0 hsfs

0 0 0 32.4K 0 0 0 0 0 0 0 lofs

4.07K 2.87K 742 23.3K 366 8.28K 78 51.9K 51.3M 42.7K 48.5M tmpfs

0 0 0 829 0 0 0 23 2.73K 0 0 mntfs

0 0 0 0 0 0 0 0 0 0 0 nfs3

0 0 0 0 0 0 0 0 0 0 0 nfs4

0 0 0 4 0 0 0 0 0 0 0 autofs

185

Saturday, May 2, 2009

Page 186: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

trapstat# trapstat

vct name | cpu0

------------------------+---------

24 cleanwin | 158

41 level-1 | 101

44 level-4 | 9

46 level-6 | 2

49 level-9 | 100

4a level-10 | 100

4e level-14 | 101

60 int-vec | 11

64 itlb-miss | 353

68 dtlb-miss | 1701

6c dtlb-prot | 3

84 spill-user-32 | 89

8c spill-user-32-cln | 14

98 spill-kern-64 | 854

a4 spill-asuser-32 | 34

ac spill-asuser-32-cln | 94

c4 fill-user-32 | 94

cc fill-user-32-cln | 101

186

Saturday, May 2, 2009

Page 187: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

pmap -x

# pmap -x 656

656: -sh

Address Kbytes RSS Anon Locked Mode Mapped File

00010000 288 200 - - r-x-- dev:136,8 ino:2613

00066000 16 16 - - rwx-- dev:136,8 ino:2613

0006A000 16 16 - - rwx-- [ heap ]

FFBFE000 8 8 - - rw--- [ stack ]

-------- ------- ------- ------- -------

total Kb 328 240 - -

187

Saturday, May 2, 2009

Page 188: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

mdb ::memstat# mdb -kLoading modules: [ unix krtld genunix ip usba s1394

ufs_log nfs random ptm login dmux cpc lofs ]> ::memstatPage Summary Pages MB %Tot------------ ---------------- ---------------- ----Kernel 3604 28 6%Anon 5941 46 10%Exec and libs 2442 19 4%Page cache 2028 15 3%Free (cachelist) 2815 21 5%Free (freelist) 45272 353 73%

Total 62102 485>

188

Saturday, May 2, 2009

Page 189: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

kstat

Displays lots and lots of kernel statistics

Mostly useful when working with Sun to debug a problem, as most of the displayed variables are undocumented

189

Saturday, May 2, 2009

Page 190: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

lockstat

Obscure, but essential in some circumstances Especially for kernel engineersShows kernel locks, timingsUseful to determine system slowness causes

when all else failsImproved with even more options in Solaris 8Rewritten in S10 as part of DTrace

190

Saturday, May 2, 2009

Page 191: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

lockstat (cont)

Can get kernel profiling as well

# lockstat –I 997 –i sleep 120

191

Saturday, May 2, 2009

Page 192: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris Resource Manager

<=S8 commercial Sun program

Goes beyond partitioning

More fine-grain control

Virtual memory control as well

Good for server consolidation

Full, better version now included with S9

192

Saturday, May 2, 2009

Page 193: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris Resource Manager - 1

Really a bunch of disparate facilities

Resource limitations

Based on projects, tasks, processes

Controlled by /etc/project

Limited resources that can be limited

IP QOS

Fair share scheduler

193

Saturday, May 2, 2009

Page 194: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris Resource Manager - 2

Fair Share Scheduler is cool

Shares given out for a domain (projects, tasks, processes)

Each object gets (its share) / (total share) worth of the system if it wants it

Else resources used as available / requested

194

Saturday, May 2, 2009

Page 195: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris Resource Manager - 3Enable FSS on the system#dispadmin -d FSS

Check the default class with dispadmin -d

Move all existing processes to FSS scheduling class#priocntl -s -c FSS -i class TS

Check scheduling classes in use:# ps -ef -o pset,class | grep -v CLS | sort | uniq

- IA - TS - FSS - SYS

Enter share information for each project in /etc/projects:testproject:100::::project.cpu-shares=(privileged,10,none)

195

Saturday, May 2, 2009

Page 196: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris Resource Manager - 4Or set them dynamically:#prctl -r -n project.cpu-shares -v 3 -i project testproject

Launch a task in the project:$ newtask -p testproject /usr/tmp/cpuhog &

Use prstat to watch the system, including project information

196

Saturday, May 2, 2009

Page 197: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Solaris Resource Manager (>= S 9)

Physical memory use now manageable via rcapd daemon

Asynchronous resource limiter

Runs around kicking out pages to reduce the working set, as dictated by configuration

See “System Administration Guide: N1…” Ch 10

197

Saturday, May 2, 2009

Page 198: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Resource Manager and /etc/systemMany items moved from /etc/system

Check app deployment guides to be sure

Note this is per zone the app is installed in

For example, Websphere guidance (http://www.sun.com/software/whitepapers/solaris10/websphere6_sol10.pdf) shows to leave /etc/system alone and instead modify /etc/project to containing the following additions to the resource controls for user.root. These project resource settings are read at login.

bash-3.00# cat /etc/project

system:0::::

user.root:1::::

process.max-file-descriptor=(privileged,1024,deny);

process.max-sem-ops=(privileged,512,deny);

process.max-sem-nsems=(privileged,512,deny);

project.max-sem-ids=(privileged,1024,deny);

project.max-shm-ids=(privileged,1024,deny);

project.max-shm-memory=(privileged,4294967296,deny)

noproject:2::::

default:3::::

group.staff:10::::

198

Saturday, May 2, 2009

Page 199: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Resource management and /etc/system (cont)

Now that we are using resource management we can use some new features. For example notify via syslog if these resources are used near their limit:Also to enable warnings via syslog if the resource limits are approached by executing the following commands (they update the /etc/rctladm.conf file):

# rctladm -e syslog process.max-file-descriptor# rctladm -e syslog process.max-sem-ops# rctladm -e syslog process.max-sem-nsems# rctladm -e syslog process.max-sem-ids# rctladm -e syslog process.max-shm-ids# rctladm -e syslog process.max-shm-memory

199

Saturday, May 2, 2009

Page 200: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Things I Don’t Cover (yet)

dladmshareadmsmbioswificonfig

200

Saturday, May 2, 2009

Page 201: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Cool Commands (<= Nevada)Profile of executable

truss –c

Show a process’ system calls and time they tooktruss –D

Stop a process at a given system calltruss –T

Download pre-build open source packagespkg-get (http://www.bolthole.com/solaris)

Manage core file creation and retentioncoreadm

Reap zombie processespreap

What is system memory being used by?vmstat –p

What is a process using memory for?pmap –x <pid>

201

Saturday, May 2, 2009

Page 202: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

T1 (Niagara) Performance

cooltst analyzes expected app performance if app moved to T1 chip

http://cooltools.sunsource.net/cooltst/

cooltools suggests application performance improvements on T1

http://cooltools.sunsource.net/cooltuner/index.html

202

Saturday, May 2, 2009

Page 203: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

What’s Next for Solaris?

Saturday, May 2, 2009

Page 204: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Multiple Projects in the Works

Clearview - unify features implemented by network interfaces

Crossbow - network virtualization and resource control

NFS - pNFS, NFS server in non-global zone, DTrace for NFS v4

Network storage (FCOE, Infiniband)

OpenSolaris ports - easy install 3rd party software

Mem / VM resource management

ZFS encryption

Native CIFS client, server

Caiman - Solaris install experience

See them all at http://www.opensolaris.org/os/projects

204

Saturday, May 2, 2009

Page 205: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

In the Liberal Arts Tradition

Philosophy

Saturday, May 2, 2009

Page 206: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

System Administration Best PracticesFrom March 2003 SysAdmin Magazine column

Consensus administration best practices (Solaris and general) with contributions from

many experienced sysadminsContribute at

[email protected]

206

Topics

Saturday, May 2, 2009

Page 207: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices (1)

Keep an Eye peeled and the wall at your backKnow how your systems run when no problems, put debugging tools in place

Communicate with usersThey can “help” spot problem, give you room to work when trouble strikes

Help users fix it themselvesKnowledge transfer to fellows, users

Use Available InformationRTFM is right, after all these years, use available tech support

207

Saturday, May 2, 2009

Page 208: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices (2) Know when to use strategy, when to use tactics

Hand-to-hand combat vs. arranging the battlefield to increase your odds of winning

All projects take 2X scheduled time and money So 2 X estimates to prepare!

It’s not done until its tested Great aggravation from untested changes

It’s not done until its documentedDecrease wheel-reinvention, miscommunication

Never change anything on Fridays…or MondaysSpeed kills, causes unhappy weekends

208

Saturday, May 2, 2009

Page 209: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices (3)Audit before Edit

Review system logs, understand state before making changes

Use defaults whenever possibleToo clever causes too complex

Always be able to undo what you are about to doCopy individual files, directories, backup systems to disk/tape

Do not spoil managementDon’t let management put you in lose/lose situations

If you haven’t seen it work, it probably doesn’tDiscount the marketing, watch the details

If you’re fighting fires, find the sourceImplement alarming, log file monitoring, push important data, don’t pull unimportant

209

Saturday, May 2, 2009

Page 210: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices (4)If you don’t understand it, don’t play with it on production systems

Get a QA environment for experiments, before mistakes cost you in production

If it can be accidentally used, and can produce bad consequences, protect it

Put scripts around powerful commands or procedures, boxes around power-off buttons

Occam’s Razor is very sharp

Check the simple stuff first, avoid complex solutions to simple problems

The last change is the most suspiciousEven if whatever changed couldn’t possibly be causing the current problem, it probably is

When in doubt, rebootRebooting still solves problems, when used appropriately

210

Saturday, May 2, 2009

Page 211: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices (5)If it ain’t broke, don’t fix it

Consider how much time has been wasted by those who said “just one more tweek”…

Save early and oftenDon’t be the guy who lost his thesis when his floppy disk went bad

Dedicate a system disk (or 4)

Have a plan

Develop written task list, reuse it when task reoccurs or use as basis for similar tasks

Cables and connectors can go badBe sure to check them, especially after board changes & system moves

Mind the power

Check power supplied vs. power drawn, grounding, single power grid vs. multiple into a system

Same with cooling211

Saturday, May 2, 2009

Page 212: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices (6)Try before you buy

If possible, the best way to assure that the solution fits your needs, in your environment

Don’t panic and have fun

Rash decisions cause serious problems

Know where you are

And make it very obvious!

I.e. color-coded windows & prompts212

Saturday, May 2, 2009

Page 213: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices (pearls)Keep your propagation constant less than 1. (This comes from nuclear reactor physics. A reactor with a propagation constant less than 1 is a generator. More than 1 is a warhead. Basically, don’t let things get out of control.)

Everything works in front of the salesman.

Don’t cross the streams (Ghostbusters reference — heed safety tips).

If at first you don’t succeed, blame the compiler.

If you finish a project early, the scope will change to render your work meaningless before the due date.

If someone is trying to save your life, cooperate.

Never beam down to the planet while wearing a red shirt (Star Trek reference — don’t go looking for trouble).

Learning from your mistakes is good. Learning from someone else’s mistakes is better.

The fact that something should have worked does not change the fact that it didn’t.

213

Saturday, May 2, 2009

Page 214: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices The customer isn’t always right, but he pays the bills.

Flattery is flattery, but chocolate gets results.

When dealing on an enigmatic symptom, whether it’s an obscure application or database error, or a system “hanging”: the Hardware is always guilty until proven innocent.

Use only standard cross-platform file formats, to share documentation (i.e., ASCII files, HTML, or PDF).

Use a log file in every computer to log every change you make.

Share your knowledge and keep no secrets.

Don’t reinvent the wheel, but be creative.

If you can’t live without it, print it out on hardcopy.

Always know where your software licenses are.

Always know where your installation CDs/DVDs/tapes are.

The question you ask as a sys admin is not “Are you paranoid?”; it’s “Are you paranoid enough?”

214

Saturday, May 2, 2009

Page 215: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

SysAdmin Best Practices Reboots are for pansies - avoid them at all costs - even when you think you need to perform one!

Users will eventually find out about the changes you have made to the system - there is no need to "inform" them with emails, meetings, man pages, etc.

If you haven't moved the cables - they are not the problem!

Cut your time estimates in half - a good Sys. Admin thrives on intense situations.

There is no better time to make a change than Friday afternoon, people will be more than willing to stay a little while extra to help you test and debug if it is necessary.

The people who write software don't know what they are doing - you have to chose your own settings every time you install a package

Backups take too long to produce and are rarely needed - make the system change and "wing it“!

215

Saturday, May 2, 2009

Page 216: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

ReferencesYou Are Now Free to Move About

Solaris

216

Saturday, May 2, 2009

Page 217: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References [Kozierok] TCP/IP Guide, No Starch Press, 2005 [Nemeth] Nemeth et al, Unix System Administration

Handbook, 3rd edition, Prentice Hall, 2001 [SunFlash] The SunFlash announcement mailing list

run by John J. Mclaughlin. News and a whole lot more. Mail [email protected]

Sun online documents at docs.sun.com [Kasper] Kasper and McClellan, Automating Solaris Installations, SunSoft Press, 1995

217

Saturday, May 2, 2009

Page 218: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued)

[O’Reilly] Networking CD Bookshelf, Version 2.0, O’Reilly 2002

[McDougall] Richard McDougall et al, Resource Management, Prentice Hall, 1999 (and other "Blueprint" books)

[Stern] Stern, Eisler, Labiaga, Managing NFS and NIS, 2nd Edition, O’Reilly and Associates, 2001

218

Saturday, May 2, 2009

Page 219: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued) [Garfinkel and Spafford] Simson Garfinkel and Gene Spafford, Practical Unix & Internet Security, 3rd Ed, O’Reilly & Associates, Inc, 2003 (Best overall Unix security book)

[McDougall, Mauro, Gregg] McDougall, Mauro, and Gregg, Solaris Internals and Solaris Performance and Tools, 2007 (great Solaris internals, DTrace, mdb books)

219

Saturday, May 2, 2009

Page 220: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued) Subscribe to the Firewalls mailing list by sending

"subscribe firewalls <mailing-address>" to [email protected]

USENIX membership and conferences. Contact USENIX office at (714)588-8649 or [email protected]

Sun Support: Sun’s technical bulletins, plus access to bug database: sunsolve.sun.com

Solaris 2 FAQ by Casper Dik: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/Solaris2/FAQ

220

Saturday, May 2, 2009

Page 221: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued)Sun Managers Mailing List FAQ by John DiMarco: ftp://ra.mcs.anl.gov/sun-managers/faq

Sun's unsupported tool site (IPV6, printing)http://playground.sun.com/

Sunsolve STBs and Infodocshttp://www.sunsolve.com

221

Saturday, May 2, 2009

Page 222: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued) comp.sys.sun.* FAQ by Rob Montjoy: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/comp-sys-sun-faq

“Cache File System” White Paper from Sun: http://www.sun.com/sunsoft/Products/Solaris-whitepapers/Solaris-whitepapers.html

“File System Organization, The Art of Automounting” by Sun: ftp://sunsite.unc.edu/pub/sun-info/white-papers/TheArtofAutomounting-1.4.ps

Solaris 2 Security FAQ by Peter Baer Galvinhttp://www.sunworld.com/common/security-faq.html

Secure Unix Programming FAQ by Peter Baer Galvinhttp://www.sunworld.com/swol-08-1998/swol-08-security.html

222

Saturday, May 2, 2009

Page 223: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued) Firewalls mailing list FAQ: ftp://rtfm.mit.edu/pub/usenet-by-group/Comp.answers/firewalls-faq

There are a few Solaris-helping files available via anon ftp at ftp://ftp.cs.toronto.edu/pub/darwin/solaris2Peter’s Solaris Corner at SysAdmin Magazinehttp://www.samag.com/solaris

Marcus and Stern, Blueprints for High Availability, Wiley, 2000

Privilege Bracketing in Solaris 10http://www.sun.com/blueprints/0406/819-6320.pdf

223

Saturday, May 2, 2009

Page 224: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued)

Peter Baer Galvin's Sysadmin Column (and old Pete's Wicked World security columns, etc)http://www.galvin.info

My blog at http://pbgalvin.wordpress.comOperating Environments: Solaris 8 Operating Environment Installation and Boot Disk Layout by Richard Elling http://www.sun.com/blueprints (March 2000)Sun’s BigAdmin web site, including Solaris and Solaris X86 tools and information’http://www.sun.com/bigadmin

224

Saturday, May 2, 2009

Page 225: 2009 04.s10-admin-topics1

Copyright 2009 Peter Baer Galvin - All Rights Reserved

References (continued)

DTracehttp://users.tpg.com.au/adsln4yb/dtrace.html

http://www.solarisinternals.com/si/dtrace/index.php

http://www.sun.com/bigadmin/content/dtrace/

225

Saturday, May 2, 2009