22
Using NoSQL databases to store RADIUS and Syslog data, part 1: Idea Karri Huhtanen 18.9.2012

Using NoSQL databases to store RADIUS and Syslog data

Embed Size (px)

DESCRIPTION

A seminar presentation done for TUT's NoSQL course. A brief look into the possibility and the feasibility of using NoSQL databases to store RADIUS accounting and Syslog data. In this particular case, Syslog-NG, Radiator RADIUS server and MongoDB were used as trial platforms. The presentation includes configuration examples and also some code.

Citation preview

Page 1: Using NoSQL databases to store RADIUS and Syslog data

Using NoSQL databases to store RADIUS and Syslog

data, part 1: IdeaKarri Huhtanen

18.9.2012

Page 2: Using NoSQL databases to store RADIUS and Syslog data

Some background

• currently RADIUS accounting data is stored usually in SQL databases with fixed database schema

• for Syslog messages an SQL database can be used, but commercial log analyzers (like Splunk) usually use their own solutions which may or may not be SQL databases

• Started thinking if NoSQL database could be applied to both or one of these?

Page 3: Using NoSQL databases to store RADIUS and Syslog data

RADIUS accounting messageWed  Aug    8  13:49:33  2012                User-­‐Name  =  "jotain@realm"                NAS-­‐Port  =  8                NAS-­‐IP-­‐Address  =  192.168.229.131                Framed-­‐IP-­‐Address  =  192.168.163.226                NAS-­‐Identifier  =  "Cisco_66:77:88"                Airespace-­‐WLAN-­‐Id  =  4                Acct-­‐Session-­‐Id  =  "50223ea9/00:11:22:33:44:55/2292"                Acct-­‐Authentic  =  Remote                Tunnel-­‐Type  =  0:VLAN                Tunnel-­‐Medium-­‐Type  =  0:802                Tunnel-­‐Private-­‐Group-­‐ID  =  0:222                Event-­‐Timestamp  =  1344422780                Acct-­‐Status-­‐Type  =  Alive                Acct-­‐Input-­‐Octets  =  1262012                Acct-­‐Input-­‐Gigawords  =  0                Acct-­‐Output-­‐Octets  =  13518133                Acct-­‐Output-­‐Gigawords  =  0                Acct-­‐Input-­‐Packets  =  11692                Acct-­‐Output-­‐Packets  =  11154                Acct-­‐Session-­‐Time  =  1235                Acct-­‐Delay-­‐Time  =  19                Calling-­‐Station-­‐Id  =  "00:11:22:33:44:55"                Called-­‐Station-­‐Id  =  "f4:7f:35:5e:bf:b0"                cisco-­‐avpair  =  "nas-­‐update=true"                Digest-­‐Response  =  "P"C<188>"                Digest-­‐Response  =  "P"C<194>"                Timestamp  =  1344422954

One message contains undetermined number of attributes.

Some can be interpreted, some stay unknown.

Because there can be a changing number of changing type of attributes I began to wonder if NoSQL could be used for storing these?

interpreted attributes,the unknown attributes are usually left in OID:FieldDataType binary format

Page 4: Using NoSQL databases to store RADIUS and Syslog data

Syslog messageThe  syslog  message  has  the  following  ABNF  [RFC5234]  definition:

           SYSLOG-­‐MSG            =  HEADER  SP  STRUCTURED-­‐DATA  [SP  MSG]

           HEADER                    =  PRI  VERSION  SP  TIMESTAMP  SP  HOSTNAME                                                SP  APP-­‐NAME  SP  PROCID  SP  MSGID            PRI                          =  "<"  PRIVAL  ">"            PRIVAL                    =  1*3DIGIT  ;  range  0  ..  191            VERSION                  =  NONZERO-­‐DIGIT  0*2DIGIT            HOSTNAME                =  NILVALUE  /  1*255PRINTUSASCII

           APP-­‐NAME                =  NILVALUE  /  1*48PRINTUSASCII            PROCID                    =  NILVALUE  /  1*128PRINTUSASCII            MSGID                      =  NILVALUE  /  1*32PRINTUSASCII

           TIMESTAMP              =  NILVALUE  /  FULL-­‐DATE  "T"  FULL-­‐TIME            FULL-­‐DATE              =  DATE-­‐FULLYEAR  "-­‐"  DATE-­‐MONTH  "-­‐"  DATE-­‐MDAY            DATE-­‐FULLYEAR      =  4DIGIT            DATE-­‐MONTH            =  2DIGIT    ;  01-­‐12            DATE-­‐MDAY              =  2DIGIT    ;  01-­‐28,  01-­‐29,  01-­‐30,  01-­‐31  based  on                                                                ;  month/year            FULL-­‐TIME              =  PARTIAL-­‐TIME  TIME-­‐OFFSET            PARTIAL-­‐TIME        =  TIME-­‐HOUR  ":"  TIME-­‐MINUTE  ":"  TIME-­‐SECOND                                                [TIME-­‐SECFRAC]            TIME-­‐HOUR              =  2DIGIT    ;  00-­‐23            TIME-­‐MINUTE          =  2DIGIT    ;  00-­‐59            TIME-­‐SECOND          =  2DIGIT    ;  00-­‐59            TIME-­‐SECFRAC        =  "."  1*6DIGIT            TIME-­‐OFFSET          =  "Z"  /  TIME-­‐NUMOFFSET            TIME-­‐NUMOFFSET    =  ("+"  /  "-­‐")  TIME-­‐HOUR  ":"  TIME-­‐MINUTE

           STRUCTURED-­‐DATA  =  NILVALUE  /  1*SD-­‐ELEMENT            SD-­‐ELEMENT            =  "["  SD-­‐ID  *(SP  SD-­‐PARAM)  "]"            SD-­‐PARAM                =  PARAM-­‐NAME  "="  %d34  PARAM-­‐VALUE  %d34            SD-­‐ID                      =  SD-­‐NAME            PARAM-­‐NAME            =  SD-­‐NAME            PARAM-­‐VALUE          =  UTF-­‐8-­‐STRING  ;  characters  '"',  '\'  and                                                                          ;  ']'  MUST  be  escaped.            SD-­‐NAME                  =  1*32PRINTUSASCII                                                ;  except  '=',  SP,  ']',  %d34  (")

           MSG                          =  MSG-­‐ANY  /  MSG-­‐UTF8            MSG-­‐ANY                  =  *OCTET  ;  not  starting  with  BOM            MSG-­‐UTF8                =  BOM  UTF-­‐8-­‐STRING            BOM                          =  %xEF.BB.BF

Until researching into this I thought

Syslog messages had fixed structure and

could be then handled with fixed database schema.

Then I read the RFC5424: http://

tools.ietf.org/html/rfc5424

Here we have once again parameters, although they are within one defined STRUCTURED-

DATA field.

So could NoSQL be used also for Syslog?

Page 5: Using NoSQL databases to store RADIUS and Syslog data

So what happens next?• Selection of NoSQL database:

• Likely Column Family Store if no one can suggest a better one?

• Something easy to setup and use, will concentrate into getting RADIUS server and/or Syslogd transferring data to database.

• Setting up a WiFi access point and/or controller to provide real RADIUS and Syslog data

• Storing data, retrieving data, searching data, deleting data to see what works

• Writing and presenting Part II: “Implementation and Results” of these slides

Page 6: Using NoSQL databases to store RADIUS and Syslog data

Results (hopefully)

• Is storing RADIUS accounting and Syslog messages into NoSQL database: a brilliant idea, brilliantly stupid idea or something else?

• How hard can it be? What does it require to do this, is it possible and how?

• Does it actually work? What can you do with data? Is there some indication of performance improvements or problems?

• Will not do complete performance measurements though, designing and setting up reliable measurement environment will probably take too much time.

Page 7: Using NoSQL databases to store RADIUS and Syslog data

Using NoSQL databases to store RADIUS and Syslog data,

part 1I: The Saga Continues

Karri Huhtanen27.11.2012

Page 8: Using NoSQL databases to store RADIUS and Syslog data

Happened earlier

• currently RADIUS accounting data is stored usually in SQL databases with fixed database schema

• for Syslog messages an SQL database can be used, but commercial log analyzers (like Splunk) usually use their own solutions which may or may not be SQL databases

• Started thinking if NoSQL database could be applied to both or one of these?

Page 9: Using NoSQL databases to store RADIUS and Syslog data

Results (luckily)

• Is storing RADIUS accounting and Syslog messages into NoSQL database: a brilliant idea, brilliantly stupid idea or something else?

• How hard can it be? What does it require to do this, is it possible and how?

• Does it actually work? What can you do with data? Is there some indication of performance improvements or problems?

• Will not do complete performance measurements though, designing and setting up reliable measurement environment will probably take too much time.

a good idea

easy, 1 night before presentation required

Yes. Store and Process. Unknown.Some issues to be considered.

Coded one Python script.

Page 10: Using NoSQL databases to store RADIUS and Syslog data

So what happened?• Selection of NoSQL database:

• Likely Column Family Store if no one can suggest a better one?

• Something easy to setup and use, will concentrate into getting RADIUS server and/or Syslogd transferring data to database.

• Setting up a WiFi access point and/or controller to provide real RADIUS and Syslog data

• Storing data, retrieving data, searching data, deleting data to see what works

• Writing and presenting Part II: “Implementation and Results” of these slides

MongoDB

Done, but not thoroughly

Done

Page 11: Using NoSQL databases to store RADIUS and Syslog data

That was the executive summary. Thank you.

Page 12: Using NoSQL databases to store RADIUS and Syslog data

Now some more detailed information and even some code.

Page 13: Using NoSQL databases to store RADIUS and Syslog data

storing RADIUS accounting and Syslog messages into NoSQL database

• It is a good idea because:

• When we have massive amount of log or accounting data, we need massive database clusters.

• Data is mainly stored, read, analyzed and occasionally deleted. Data will not be updated or changed and is relatively simple (few tables with a lot of columns).

• NoSQL may provide better way to scale this horizontally by distribution and sharding.

• It is already being done. Several log analyzers, stores already use NoSQL databases as backends. There exists projects such as Greylog2 etc. which provide complete solutions from log storage, visualization, analysis etc.

• Logs and accounting data are actually use cases for some NoSQL databases, for example: http://docs.mongodb.org/manual/use-cases/storing-log-data/

Page 14: Using NoSQL databases to store RADIUS and Syslog data

storing RADIUS accounting and Syslog messages into NoSQL database

• It is not a brilliant idea because:

• If we look what we need to do to optimize the performance it starts to look like a lot like designing and optimizing a SQL database: http://docs.mongodb.org/manual/use-cases/storing-log-data/

• You cannot forget datatypes or database design even with NoSQL databases especially when going into production.

• Prototypes may be faster and easier for developers, but creating a design and configuration which survices production use may be as hard as it has ever been. The difference is that instead of SQL database expert, you know need a NoSQL expert.

• ... but it is not a brilliantly stupid idea either, it is an idea worth considering depending of the project.

Page 15: Using NoSQL databases to store RADIUS and Syslog data

How hard can it be?

• With Ubuntu Linux Server 12.04 LTS:

• sudo apt-get install python-pymongo mongodb syslog-ng syslog-ng-mod-mongodb

• for Syslog-NG, just some configuration

• for Radiator, some configuration and coding an external Python script to handle accounting messages

• But this is far from production use, it is more like proto or proof of concept implementation done in 1 work day.

Page 16: Using NoSQL databases to store RADIUS and Syslog data

Demo

Page 18: Using NoSQL databases to store RADIUS and Syslog data

Radiator RADIUS server

# /etc/radiator/radiator.cfg## send all RADIUS accounting requests to external script#<Handler Request-Type = Accounting-Request> <AuthBy EXTERNAL> Command %D/acct2mongo.py </AuthBy> AcctLogFileName %L/acct-acct2mongodb-%Y-%M.log</Handler>

Page 19: Using NoSQL databases to store RADIUS and Syslog data

#!/usr/bin/env pythonfrom pymongo import Connectionimport datetimeimport sys

def main():

line = str() post = dict()

# opening connection connection = Connection( 'localhost', 27017) # database 'radius' db = connection['radius'] # collection 'accounting' collection = db['accounting']

post['acct2mongotimestamp'] = datetime.datetime.utcnow()

for line in sys.stdin.readlines(): pieces = line.split(' = ', 1) if len(pieces) == 2: post[pieces[0].strip().strip('"')]=pieces[1].strip().strip('"')

collection.insert(post) connection.end_request() connection.disconnect()

# 0 Means reply with an acceptance. For Access-Requests, # an Access-Accept will be sent. For Accounting-Requests, # an Accounting-Response will be sent. return 0

if __name__ == '__main__': main()

acct2mongo.py

Page 20: Using NoSQL databases to store RADIUS and Syslog data

Does it actually work? What can you do with data?

• Yes it does actually work, but once again it does not solve or be applicable to everything.

• One can store, read, search and delete data supposedly very efficiently, but anything more complicated is harder and must be implemented by developer.

• For example: MongoDB does not have a reliable decimal datatype. It is better to keep numbers as a string and convert them when processing data.

• Repeating earlier statement: “You cannot forget datatypes or database design even with NoSQL databases especially when going into production.”

Page 21: Using NoSQL databases to store RADIUS and Syslog data

Performance?

• Would need to be measured and verified and with real production environment or solution.

• Would also need to be compared with well designed and optimised SQL database, maybe even one functioning as NoSQL one.

• In the implementation this was not tested as the datasets were very small compared to real datasets.

Page 22: Using NoSQL databases to store RADIUS and Syslog data

Conclusions• NoSQL should be at least considered as an option

when designing and implementing large scale Syslog or Radius Accounting storages.

• For development it is flexible.

• For production use NoSQL solution still needs design, careful planning and testing to verify if the performance, reliability and security is enough. Probably as much as SQL database design.

• Key issue will probably be can the SQL database handle the data or is horizontal scaling required.