Upload
douglas-lawson
View
217
Download
2
Embed Size (px)
Citation preview
- Document Filter Ver 3.x
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 2 -
Software that works!
▶ Overview▶ Key features▶ Block Diagram▶ Specification▶ Performance
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED
- 3 -
Overview
• Synap NextTM – Document Filter extracts text and information from various word processor document files, such as MS-Office WORD / PowerPoint / Excel / PDF / Hansoft Hangul / JustSystems-Ichitaro etc.
DocumentFile Format
MS-Word(Doc)
MS-PowerPoint(Ppt)
MS-Excel(Xls)
Hansoft Hangul(Hwp)
Synap NextDocument Filter
•Format detection•Error Check•Summary Information
Text
ExtractFull Text Summary
KMS, EDMS, Groupware, CMS(Various System)
Search Engine
Database
Email,Document Securitypersonal information(Security System)
JustSystems Ichitaro (JTD)
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 4 -
Key features
1. Stabilized fast document filtering
2. Automatically detects supported file format
3. Various document-file and platform(OS) support
4. Support for multi-thread
5. Multilingual are supported at base unicode
6. Attached document file to e-mail support
7. Files in Compound Document File filtering
8. Document Files in zipped file can be filtered
9. Easy API to use
10. Memory-management and file-filtering API are supported
11. Unify Library of all document file
12. C/C++ API is included for customizing filter capability
13. C/C++ API can be exported and called by JAVA, Python, VB and Delphi
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 5 -
Block Diagram
Docu2Txt doc2Txt ppt2Txt xls2Txt Pdf2Txt
File Format
Detector
DocInformationExtractor
WorldAnalyzer
PowerPointAnalyzer
ExcelAnalyzer
HwpAnalyzer
PDFAnalyzer
Html2Txt
HtmlAnalyzer
Text2Txt
Character Set Detector
Unicode Out put Buffer Manager(UTF16 Pivot, Surrogate Unsupported, Language Tag Aware)
MS World95,97,
2000/3/7,XP
MS Power Point95,97,
2000/3/7,XP
MS Excel95,97,
2000/3/7,XP
HWP2.3.x,
96,97,2002/7Adobe PDF
OLE2 Inter face (Microsoft Office & Hwp)
Unicode to KS5600
Converter
Unicode to UnicodeConverter
Unicode to JapaneseConverter
Unicode to ChineseConverter
AnyDocument
Html Text
File Stream Interface / Memory Stream Interface
hwp2Txt
Office Code (ASCII, DBS2,UCS2) to Unicode Converter
HWP to Unicode
Converter
Cid to Unicode
Converter
Special to Unicode Converter
InputDocument
Data
InputDocument
Data
ExtractText
Processing
ExtractText
Processing
OutputText DataOutput
Text Data
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 6 -
Specification
• Microsoft Office Word (Doc), PowerPoint (Ppt), Excel (Xls) : 96, 97, 2000, XP, 2003, 2007
• (Korean) Hansoft Hangul (HWP) : 2.x, 3.x, 96, 97, wordian, 2002, 2004, 2005, 2007
• Adobe Acrobat PDF : (PDF 1.x)
• Rich Text Format (RTF)
• (Korean) Handy Soft Arirang (HWD)
• (Japanese) JustSystems Ichitaro (JTD) : 8 ~ 12
• Microsoft Document Imaging (MDI)
• Microsoft Outlook Message (MSG)
• Open Office (Odt, Ods, Odp) : 1.x, 2.x
• WordPerfect (WP, WPD) : 4.0, 5.0, 6.0 – X3
• Autodesk Drawing File (DWG) : R11-R14, 2000, 2004, 2005
• Flash Movie File (SWF) : 2 - 8
• Compressed file (ZIP, TAR, GZIP, (Korean) ARZ, BZIP)
• XML/HTML, MHT, CHM, EML, MIME, TEXT, MP3 TAG
▶ Supported Documents
▶ Various platforms(OS) Support ▶ Various compilers Support
• IBM AIX 4.3, 5.x
• RedHat Linux 6.x, 7.x, 8.x, 9.x, 10.x
• RedHat Enterprise Linux 3, 4, 5, 6
• RedHat Fedora ~7
• Solaris 7, 8, 9, 10, 10 x86
• HP-UX IA 11.x
• Windows 95/98/NT/2000/XP/2003/Vista
• gcc 2.x, 3.x, 4.x
• MS-VC 6,7,8
• xlC
• aCC
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 7 -
Performance
▶ Test Data
Item doc xls ppt pdf txt Total
Number of file 258 298 58 412 190 1,216
Sum Total (MB) 100 135 70.2 100 3.57 408.77
Average (KB) 398.15 463.99 1,240.26 249.68 19.24 2,371.32
Output File size (KB) 6,521 36,772 650 17,996 5,867 67,806
▶ Windows Test
Item doc xls ppt Pdf Txt Average
TotalTurnaround
time7.15 18.92 2.36 47.24 2.94 17.72
TotalFiltering
time 7.09 18.55 2.35 47.06 2.88 15.58
TotalText-file
output time 0.07 0.37 0.01 0.18 0.06 0.138
AverageTurnaround
time0.03 0.06 0.04 0.11 0.02 0.05
Item doc xls ppt Pdf Txt Average
TotalTurnaround
time5.06 15.91 2.55 27.00 1.06 10.31
TotalFiltering
time 5.00 15.64 2.54 26.65 1.02 10.17
TotalText-file
output time 0.06 0.27 0.01 0.35 0.04 0.14
AverageTurnaround
time0.02 0.05 0.04 0.06 0.005 0.03
▶ Linux TestOS : WindowsXP Professional SP2CPU : Intel Pentium4 2.33GMemory : 256 MB
OS : Redhat Fedora Core 6 Kernel 2.6.18CPU : Intel Pentium 4 3.0GHzMemory : 256 MB
(Unit : second) (Unit : second)
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 8 -
Software that works!
▶ Example case▶ Customer▶ Synap Next™
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 9 -
Example Case
1. Search Business
▶ Search Engine, Portal site• Electronic document retrieval system : A Search engine can index text from document files without
Word Processor• Search result preview Naver (portal :7 year continuation), Empas (Portal :5Year), Daum (Portal :5Year), Happycampus (Document
Votal :2Year) 3Soft (K2 Search Engine:4Year), Korea Wisenut (Search : 4Year), KONAN (Search : 4Year), OPENBASE
(Search :5Year) Fujitsu Korea (Search), Daumsoft (Search :5Year), Diquest (Search :5Year), REPIA (Search :5Year), etc..
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 10 -
Example Case
▶ Desktop Search
▶ KMS, EDMS, EKP, CMS
• Filter embedded in Desktop Search can help peoples find documents.
• NHN, Empas and Konan Technology use filter for it’s Desktop Search.
• KMS, EDMS, EKP and CMS is supported for searching documents files by filter
• Knowledge Cube, Shinsegae I&C and OnTheIT adopt filter for it’s products.
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 11 -
Example Case
2. Security Business
Router Firewall Switch Server(DB, Mail, File)
PC
▶ Filter embedded in Solution can inspect security documents, e-mail and contents on networks.
• Web appliance firewall(WAF) : Secui.com, Piolink, Monitorapp, PentaSecurity, Inca Internet, Winstechnet, etc
• Spam Mail : Mobizen, JIRAN soft, Terracetech, etc
• Personal information management system(PIMS) : Sentineltechnology, Xcerenet, expernet, winnerdime, etc
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 12 -
Customer
• Internet Service
• Search Engine Solution & SI
• Security Solution
• KM, EKP, CMS Solution
• Government Agency
National SecurityResearch Institute
Electronics and telecommunicationsResearch Institute
Korea InformationSecurity Agency
Korea Industrial Technology Foundation
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 13 -
Synap NextTM Products
SYNAP NEXTTM
Document Filter
Extracts text / document-information from various document files
MS-Office word / Power point / Excel, HWP, PDF, HTML, Zip etc
Various platforms(OS) Support
Converter
Convert document files to HTML/XML
MS-Office word / Power point / Excel, HWP
Various platforms(OS) Support
Web OfficeIt's expected to open beta service in March/2008.
Work documents, Spreadsheets and Presentations at web
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 14 -
Software that works!
▶ Contact us
COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 15 -
Contact US
Synapsoft Corporation
Rm.706, Woolim e-BIZ Center II, 184-1, Guro 3-dong, Guro-gu, Seoul 152-769, Korea
TEL) 82-2-890-3400 FAX) 82-2-890-3414
Homepage : http://www.synap.co.kr , Blog : http://synap.tistory.com
• Sungyeon Lee
TEL) 82-2-890-3406 E-Mail) [email protected]
▶ Technical Consulting & Support
▶ Sales • Jaesung Kim
TEL) 82-2-890-3402 E-Mail) [email protected]