Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Software Architecture 18Crowd Sourcing
HaopengChen
REliable, INtelligent andScalableSystemsGroup(REINS)ShanghaiJiaoTongUniversity
Shanghai,Chinahttp://reins.se.sjtu.edu.cn/~chenhp
e-mail:[email protected]
REliable, INtelligent &Scalable Systems传统软件工程
• 解决软件危机问题– 通过“工程化”方法开发软件,提高软件的开发效率及正确性
• 特点– 精英化– 计划性– 封闭化
2
REliable, INtelligent &Scalable Systems软件开发面临的新问题
• 如何实现“富功能”软件的快速构造与演化?– 软件规模庞大、功能繁多、更新频繁、需要具有高扩展性;开发技术变更迅速
• 如何充分利用资源并精炼出更好解决方案或目标软件?– 软件开发同质化,互联网上(如谷歌、Sourceforge等)存在诸多代码资源及技术解决方案,存在大量重复开发
• 如何使软件开发与社会网络结构相耦合?– 对于高度连接的社会网络而言,已发现的最优信息可以快速传播,使得软件能更快被开发
• 如何演化软件使其适应新需求、新环境?
3
REliable, INtelligent &Scalable Systems开发一个集成开发环境
• 如何基于软件工程开发一个能够支持各种语言编辑、编译、调试、运行的软件集成开发环境?– 需求分析– 架构设计– 代码实现– 代码测试– 软件部署– 软件维护– 版本升级– …
缺点:软件灵活性不够,不能适应新语言、不能支持新的功能需求
4
VisualStudio时间线
REliable, INtelligent &Scalable SystemsEclipse集成开发环境
• 四个组成部分:JDT支持Java开发、CDT支持C开发、PDE用来支持插件开发,EclipsePlatform则是一个开放的可扩展IDE,提供建造块和构造并运行集成软件开发工具的基础
• EclipsePlatform允许工具建造者独立开发与他人工具无缝集成的工具从而无须分辨一个功能在哪里结束,而另一个功能在哪里开始
• Eclipse将来能成为可进行任何语言开发的IDE集成者,使用者只需下载各种语言的插件即可
Eclipse是一个依赖于群体智能的成功软件项目,谁也无法预料它将向哪个方向前进!
5
REliable, INtelligent &Scalable SystemsEclipse集成开发环境(续)
• Eclipse项目已成为公司与个体均能参与开发的项目
公司贡献,3个月内 个人贡献,3个月内
• 每个人可以在线报告Eclipse的错误(Bugzilla)
• Eclipse在群体智能的刺激、激励下,功能趋于强大,质量趋于稳定
6
REliable, INtelligent &Scalable Systems目标
• 针对当前软件开发过程中快、散、变的特点,以群体智能为软件开发驱动力,构造高复杂性、高开放性、高质量的软件
• 潜在解决方法:群体智能驱动的软件构造与演化– 重要前提:社会网络、问题搜索– 研究问题1:群体智能驱动的复杂软件构造– 研究问题2:软件协同演化– 研究问题3:持续性的软件质量保障
7
REliable, INtelligent &Scalable Systems理论基础
软件构造与演化理论
群体智能
群体智能驱动的软件构造与演化
• 软件构造与演化– 分而治之,将软件开发任务分解为并行或者层次的开发任务,并分配给开发者;开发者之间可能存在交互
– 通过软件演化,为软件提供新功能或使其适应新环境
• 群体智能– 如蚁群算法,“每只蚂蚁只关心很小范围内的眼前信息,而且根据这些局部信息利用几条简单的规则进行决策。这样,在蚁群这个集体里,复杂性的行为就会凸现出来”
• 群体智能驱动的软件构造与演化– 由群体智能驱动,对软件目标、架构设计、算法实现、质量控制、组织形式等方面平衡,以高效构造及演化软件
8
REliable, INtelligent &Scalable Systems群体智能驱动的软件构造与演化的本质
• 强大的群体开发力量– 群体能高效进行复杂软件的开发
• 群体智能指导软件的发展方向– 通过交叉、变异、繁殖等,软件能快速收敛,也能快速发散
• 软件构造与演化– 群体智能将驱动高质量、高复杂性、高开放性的软件构造与演化过程
9
REliable, INtelligent &Scalable Systems群体智能驱动软件构造的模式
• 任务划分、分配与组合• 基于社会网络的多团队协同软件构造(Codebook项目的例子)• 重用已有项目(Chrome浏览器开发的例子)• 软件构造具有开放性(Chrome浏览器插件的例子)• 基于搜索的软件构造问题求解• 软件自动生成与交叉• 群体与AI互动• 通过众包方式开发软件(TopCoder的例子)• …
10
REliable, INtelligent &Scalable Systems简单的软件构造过程
• 开发任务可以被分解为小型的、独立的子任务
• 缺少智能– 不适宜复杂的、创新型的、高技巧的开发任务
需要引入智能,使其适应大型的、多样化的软件构造及演化
11
REliable, INtelligent &Scalable Systems复杂的软件构造过程
• 需要一个管理任务和开发人员的综合性平台
• 复杂任务,必须被分解成较小的子任务,每个子任务被设计以适应特殊需求或具备特点,使其能被分配到合适的开发群体。群体必须被适当地激励,选择(例如,通过口碑),和组织(例如,通过分层结构)
• 任务可能会通过多阶段的工作流进行组织,其中,开发人员可同步或异步协作完成任务。AI可能指导群体(或被其引导)
• 必须保证软件质量,确保单个开发人员的产品的高质量,并完美组装在一起
12
REliable, INtelligent &Scalable Systems任务划分与组合
• 划分软件构造任务,刻划与管理任务、子任务间依赖关系,组合构造结果– 对软件行为建模– 封装和重用成熟设计模式– 任务的内聚和松耦合– 为改善软件构造,需要尝试和迭代不同工作流参数
13
REliable, INtelligent &Scalable Systems基于社会网络的多团队协同软件构造
• Codebook是一个基于社交网络的Web服务,其支持大型开发项目中协调任务
– 通过共享制件和任务,识别相关人员和制件的关系,提供各类开发协调信息
– 例图中,开发经理Pam与开发人员Dave共同开发Squre方法;Bug#673也将Pam和Dave紧密联系在一起…
14
REliable, INtelligent &Scalable Systems重用已有项目
• GoogleChrome浏览器汇聚了100多个开源项目
David M. Gay's floating point routinesshowopen-vcdiffshow gpsdshow libjingleshow mtpdshow Simplified Wrapper and Interface Generator (SWIG)show
dynamic annotationsshow WebKitshow GIMP Toolkitshow libjpegshow Netscape Plugin Application Programming Interface (NPAPI)showtallocshow
Netscape Portable Runtime (NSPR)showMSDN sample codeshow hunspellshow libjpeg-turboshow ocmockshow tcmallocshow
google-glog's symbolization libraryshowAlmost Native Graphics Layer Engineshowhunspell dictionariesshow International Phone Number LibraryshowOpenMAX ILshow tlsliteshow
valgrindshow Darwinshow hyphenshow libpngshow opusshow undoviewshow
xdg-mimeshow Apple sample codeshow IAccessible2 COM interfaces for accessibilityshowlibsrtpshow OTS (OpenType Sanitizer)showThe USB ID Repositoryshow
xdg-user-dirsshow WebKit private system interfaceshowiccjpegshow libusbshow PLY (Python Lex-Yacc)show Internationalization Library for v8show
Breakpad, An open-source multi-platform crash reporting systemshowAndroidshow icon_familyshow libvashow Protocol Buffersshow Webdrivershow
BSDiffshow Binary-, RedBlack- and AVL-Trees in Python and Cythonshowicushow libvpxshow Python FTP server libraryshowWebRTCshow
XZ Utilsshow bsdiffshow Chinese and Japanese Word ListshowWebP image encoder/decodershowmockshow The Windows Installer XML (WiX)show
Google code support upload scriptshowbspatchshow ISimpleDOM COM interfaces for accessibilityshowlibxmlshow Quick Color Management Systemshowwtlshow
Java Native Interface from Android NDKshowbzip2show jemallocshow libxsltshow re2 - an efficient, principled regular expression libraryshowx86incshow
mock4jsshow Google Cache Invalidation APIshowjsoncppshow libyuvshow google-safe-browsingshow XUL Runner SDKshow
Mozilla Personal Security ManagershowCompact Language Detectionshowgoogle-jstemplateshow LZMA SDKshow sfntlyshow yasmshow
Network Security Services (NSS)showcodesighsshow Khronos header filesshow mach_overrideshow simplejsonshow zlibshow
google-urlshow devscriptsshow launchpad-translationsshow mesashow skiashow V8 JavaScript Engineshow
native clientshow Expat XML Parsershow LCOV - the LTP GCOV extensionshowmodp base64 decodershow SMHashershow Strongtalk
Native Client SDKshow eyesfreeshow LCOV - the LTP GCOV extensionshowNSBezierPath additions from Sean Patrick O'BrienshowSnappy: A fast compressor/decompressorshow
Network Security Services (NSS)showffmpegshow LevelDB: A Fast Persistent Key-Value StoreshowMongoose webservershow speexshow
Spdysharkshow flacshow NVidia Control X Extension LibraryshowMozc Japanese Input Method Editorshowsqliteshow
PPAPIshow Flot Javascript/JQuery library for creating graphsshowlibeventshow Cocoa extension code from CaminoshowSudden Motion Sensor libraryshow
seccompsandboxshow OpenGL ES 2.0 Programming Guideshowlibexifshow mt19937arshow SwiftShader software renderer.show
15
REliable, INtelligent &Scalable Systems软件构造具有开放性
• GoogleChrome浏览器插件,可以大大的扩展浏览器的功能– 插件功能:捕捉特定网页的内容,捕捉HTTP报文,捕捉用户浏览动作……
– 插件开发简单,开发语言是Javascript,开发人员能很快上手– 插件开发人员可以向Chrome的网上商店上传插件
16
REliable, INtelligent &Scalable Systems基于搜索的软件构造问题求解
• 通过启发式搜索,解决软件构造中关键问题或算法– 定义软件构造关键问题对应的搜索空间(即一组可能解法所构成的空间)
– 通过启发式搜索解空间– 设计度量以评价解法
• 启发式搜索可以应用于软件开发全过程– 需求工程– 软件设计– 软件构造– …
17
REliable, INtelligent &Scalable Systems软件自动生成与交叉
• 通过机器学习,实现软件自动生成– 通过历史代码库挖掘,挖掘有用代码– 通过部分输入输出数据,合成代码
• 软件交叉– 复用遗留软件的构件或优质代码– 通过多个软件交叉,生成目标软件
18
REliable, INtelligent &Scalable Systems通过众包方式开发软件
• TopCoder是一个面向程序员的网站,采用比赛、评分、支酬等方式吸引众多程序员业余工作– TopCoder的客户包括美国在线(AOL)、美林公司(MerrillLynch)等– TopCoder会把一些软件项目分拆成多个小单元,在网上发布,邀请全球的编程高手来竞投
19
REliable, INtelligent &Scalable Systems群体与AI互动
群体指导AI AI指导群体• 人成为计算组件,开展AI系统所不能胜任的(计算)任务– 算法受益于群体训练数据– 算法模拟人类认知– 设计机器学习算法– 设计算法对软件开发的成本和性能进行权衡
• AI指导群体开发– 通过机器学习和历史数据挖掘建立开发模型
– AI与群体交互,互相培训学习,共同控制复杂的软件构造过程
20
REliable, INtelligent &Scalable Systems进化论
• 生存下来的既不是最强的物种,也不是最聪明的物种,而是最能适应变化的物种。
—《物种起源》
查尔斯·达尔文
21
REliable, INtelligent &Scalable SystemsLinux操作系统演化的例子
• Linux开始于芬兰赫尔辛基大学的学生LinusTorvalds• Linux是自由软件,用户可以无偿得到它及源代码,且可以任意修改和补充它们
• Linux处于一个自由发展的阶段,演绎出数以百计的(商业或非商业)版本,如Redhat、Debian等,其各具特色
• 目前,Linux凭借优秀的设计,不凡的性能,加上IBM、INTEL等知名企业大力支持,逐渐成为主流操作系统之一
22
REliable, INtelligent &Scalable Systems软件演化中的遗传算法
• 遗传算法的主要特征– 位串表示– 比例选择– 将选择、交叉、变异、精英算子、繁殖等作为产生新个体的主要方法
软件演化中个体的表示方案?
软件演化的交叉算子和变异算子?
是否基于社会网络的演化来引导软件演化?
23
REliable, INtelligent &Scalable Systems软件的协同演化
• 竞争协同– 杀毒软件与病毒之间、不同操作系统之间(如Windows与Linux竞争协同)
– 软件群体演化中互相适应,使得所有软件协同演化– 如何为目标软件建立竞争对手及建立什么适应度度量?
• 合作协同– 软件被分割,每一“构件”独立演化,并使软件整体得到优化– 粒子群优化和蚁群算法,是合作协同的典型例子– 如何建立软件与社会网络、软件“构件”间、构件与环境的信息交互?
24
REliable, INtelligent &Scalable Systems软件的质量预期
时间
故障率
理想
曲线
实际曲线
修改
由于副作用造成
故障率的提高
25
REliable, INtelligent &Scalable Systems软件质量保障
• 智能驱动的软件构造和演化过程中,需要经过交叉、变异、繁殖、协同,软件质量频繁波动– 如何度量群体智能对于软件质量的影响?
• 定性与定量– 如何在软件构造和演化过程中提升其质量?
• 质量需要稳定上升– 如何建立智能驱动软件构造与演化中的质量保障框架和方法?
• 传统测试与分析可能无法适应于(群体智能驱动的)软件的构造和演化,需要新型质量保障方法
26
REliable, INtelligent &Scalable Systems群体智能驱动软件构造演化平台
• 支持群体智能式的软件构造– 支持软件开发、调试、集成、部署、运行和管理– 支持启发式搜索和遗传算法,以解决关键难题
• 支持软件协同演化– 支持竞争式与协同式软件演化– 支持软件的交叉、变异与繁殖
• 支持持续性质量保障框架、工具与方法– 支持对软件的频繁变更分析及错误发现、修复– 支持运营演化中的软件质量保障
• 构建基于互联网的软件库,累积、管理软件资产;支持群体智能驱动的软件构造与演化
27
REliable, INtelligent &Scalable Systems核心科学问题
•搜索:可计算理论与问题求解的非确定性•演化:基于上下文的软件行为推理与变换•质量:软件的质量波动与收敛
科学挑战
•群体智能驱动的软件构造•软件的协同演化•持续性的软件保障
科学问题
28
REliable, INtelligent &Scalable SystemsIntroduction
• Whatiscrowdsourcing?– Outsourcingataskviaopencall
• Crowdsourcingwebsites– MTurk/TopCoder/Upwork/CrowdFlower– 80,000jobs,5+millionworkersinUpwork
• Task-workermatchingplaysacrucialrole• Howtodescribetaskrequirementsandworkerskills• Whatcriteriashouldbegiveninthedescription
REliable, INtelligent &Scalable SystemsIntroduction
• Description with natural language (Taskcn)– Not machine-readable– Inefficient– subjective
• Tags (Upwork)– Not sufficient to articulate task publishers’ needs
• task A(Java && Javascript)• task B(Java || Javascript)
– Requirements are not exhaustive– Matching rules vary on skills– No single suitable worker for the task
REliable, INtelligent &Scalable SystemsA Solution – STWM
• Meta-modelfordescription– Extensible/Customized
• Self-adaptivetask-workermatchingalgorithm– Efficient– Matchtasksandworkersaccordingtothecustomizedrules– Recommendsuitableworkersinacertainorder
• Teamformation– Workerstoformateam– Recommendteamstothetaskpublisher
REliable, INtelligent &Scalable SystemsFramework of STWM
REliable, INtelligent &Scalable SystemsMeta model for description
• constraint must be satisfied during the matching process• match: =, within, >, <, ≠ … API provided• composite: Max, Min, ∪, ∩… API provided
REliable, INtelligent &Scalable SystemsMeta model for description
Definitionsofthemetadataclassforpropertiesoftime,payandlanguageskill
REliable, INtelligent &Scalable SystemsMeta model for description
Adefinitionofclasslanguageandtwoinstancesthisclass
REliable, INtelligent &Scalable SystemsMeta model for description
REliable, INtelligent &Scalable SystemsMeta model for description
//language_of_task1{
"name":"language","value":["java","C++","javaScript"],"weight":0.9,"skill_level":{
"class":"com.stwm.Operation",[{"op1":">","args1":[3.0,"double"]},{"op2":">","args2":[3.0,"double"]},{"op3":">","args3":[3.0,"double"]}]
},"constraint":{
"class":"com.stwm.Operation",[{"op1":"in","args1":["java","collection"]},{"op2":"in","args2":["C++","collection"]},{"op3":"||","args3":["?","?"]},{"op4":"in","args4":["javaScript","collection"]},{"op5":"&&","args5":["?","?"]},{"op6":"contain","args6":[["java","C++","javaScript"],"collection"]}]
}}
REliable, INtelligent &Scalable SystemsMeta model for description
• Class definition for task and worker:
score:acriterionusedtosortmatchedworkers
skill_weight:theweightoftheskill_rerquirmentamongthelistedfourpropertyrequirementsinthetaskclass
REliable, INtelligent &Scalable SystemsMatching algorithm for individual worker
• necessary property:– if p.domain ≠ skill and p.weight ≥ baseline_weight– if p.domain = skill and p.weight ≥ avg_weight(skills)
• calculation formula of worker.score– w is an instance of Class worker, p’ is the property of the worker w with the
same property name as p
REliable, INtelligent &Scalable SystemsMatching algorithm for individual worker• Algorithm1.Task-workermatchingalgorithm• Input: Set<worker>W;taskT.• Output:Set<worker>W’;• 1.functionmatching(Set<worker>W,taskT):• 2. FinalSet,PreferCandidate,Candidate,W’←∅;• 3. Set<worker>ℛ =Cluster(W,T);• 4. foreachworkerwinℛ:• 5. iffor∀p∊M,p.match(p’)=true then• 6. Calculatew.score;• 7. FinalSet =FinalSet∪w;• 8. elseiffor∀p∊M’,p.match(p’)=true then• 9. Calculatew.score;• 10. PreferCandidate =PreferCandidate∪w;• 11. elseif∃p∊M,p.match(p’)=true then• 12. Candidate =Candidate∪ w;• 13. endif• 14. endfor• 15. ifFinalSet ≠∅ then• 16. W’=Sort(T,FinalSet);• 17. elseifPreferCandidate ≠∅ then• 18. W’=Sort(T,PreferCandidate);• 19. endif• 20. returnW’;• 21.endfunction
timecomplex:O(km)
REliable, INtelligent &Scalable SystemsMatching algorithm for individual worker
REliable, INtelligent &Scalable SystemsTeam formation algorithm
• Worker W– aij = 1 the jth worker Wj has the ith property of I (Ii) – aij = 0 otherwise
• Team Q– qi = 1 the ith property of I (Ii) is covered by the team Q– Compute its team profile q defining the expertise of the team as the (binary)
sum of the properties of each individual
REliable, INtelligent &Scalable SystemsTeam formation algorithm
• the team formation problem can be formally formulated as a binary integer program as follows, where cj represents the cost of choosing the worker wj.
• Meta-RaPS-SCP– a feasible solution for a SCP instance – effective, simple, randomness
REliable, INtelligent &Scalable SystemsTeam formation algorithm• Algoritm 2.Teamformation• Input:Set<worker>W;taskT; int preferCount;int maxLoops.• Output:Set<team>Qs;//ateamisasetofworkers• 1.functionteamFormation(Set<worker>W,taskT,int preferCount,int maxLoops):• 2. Qs←∅;Set<Property>PSet ←∅;int i =0;• 3. WhileQs.size <preferCount &&i ≤maxLoops:• 4. BooleanisFeasible =true;• 5. teamQ=Meta-RaPS-SCP(T,W,%priority,%restriction);• 6. ifQ==∅• 7. returnQs;• 8 endif• 9. foreachpropertypoftaskT:• 10. PSet =∅;• 11. forallworkersinteamQaddp’toPSet;• 12. //p’isthepropertyoftheworkerinteamQwiththesamepropertynameasp• 13. Propertypt=p.composite(PSet);• 14. //thecompositefunctionisgiveninthedefinitionofpasshowninmeta-model• 15. ifp.match(pt)=falsethen• 16. isFeasible =false;• 17. break;• 18. endif• 19. endfor• 20. ifisFeasible =truethen• 21. Qs=Qs∪ Q;• 22. endif• 23. i ++;• 24. endwhile• 25. returnQs;• 26.endfunction
REliable, INtelligent &Scalable SystemsSimulation experiments
• Pythonscrapytograbworkerdatafromupwork• 500pagesworkerdata(4500)• Re-descriptiontoconstructproperties• Construct10000workers• (eachonehasatmost6properties,atleast1property)• 1master,3slave
REliable, INtelligent &Scalable SystemsSimulation experiments
• Exp1: experiment for task-worker matching with comparison– Same skill requirements, different preference
REliable, INtelligent &Scalable SystemsSimulation experiments
• Exp1: experiment for task-worker matching with comparison
Definitionforpropertydatabase
REliable, INtelligent &Scalable SystemsSimulation experiments
• Exp1: experiment for task-worker matching with comparison
REliable, INtelligent &Scalable SystemsSimulation experiments
• TagsdescriptionfortaskAandtaskB
REliable, INtelligent &Scalable SystemsSimulation experiments
• Analysis:• Validatetheeffectiveness,extensibilityandcorrectnessofmeta-model– databaseproperty
• Validatethecorrectnessofthematchingalgorithm• Validatetheefficiencyofthematchingprocess
– Cluster:nearly40%– Thewholematchingprocess:3.67s– (descriptionprocess2.34s,cluster0.82s)
REliable, INtelligent &Scalable SystemsSimulation experiments
• Exp2: experiment for team formation– set preferCount = 4, maxLoops = 100, %priority = 80% and %restriction =
60% .
REliable, INtelligent &Scalable SystemsSimulation experiments
• Exp2: experiment for team formation
Nearly1.13s,0.03sforMeta-RaPS-SCPprocedure
REliable, INtelligent &Scalable SystemsSimulation experiments
• Then set more restrict constraints on the property of task C: – langOfC.value = {Java, JavaScript,Ruby, Html5} – constraint: Java && JavaScript && Ruby && Html5 – payOfC.value = 50 – No suitable team found
– Set %priority = 5%, %restriction = 90%– Or increase maxLoops=200, 400, 600, 800– Still no suitable team
– the task should be re-described
REliable, INtelligent &Scalable SystemsSystem Design
54
REliable, INtelligent &Scalable SystemsSystem Design
55
Ø ClusteringStrategy- reducesearchspace• Firstdivision– basedonthetype
Platformcustomized
Flexibleandscalable
Avoiddimensiondisaster
• Seconddivision– basedonK-MeansalgorithmK-Means– simpleandefficient
Maximumiterationnumber– controlconvergence
Lastoutputasinput
REliable, INtelligent &Scalable Systems
56
REliable, INtelligent &Scalable Systems
57
• Map-Reduceimplementation
REliable, INtelligent &Scalable SystemsSystem Design
58
Ø DynamicMeasurement• Reasonsfordynamicmeasurement
• Objectiveevaluationsgivenbydifferentpublishers
• Slidingwindowanalysis
REliable, INtelligent &Scalable SystemsExperiments
59
Ø Data• PythonScrapy – 50,000piecesofdatafromUpwork.com
• Transformtosuitabilitydescription
• Simulated500,000developers
REliable, INtelligent &Scalable Systems
60
Experiments
Ø Efficiencyofclusteringmethod
REliable, INtelligent &Scalable Systems
61
Experiments
Ø Effectivenessofclusteringmethod
REliable, INtelligent &Scalable Systems
62
Experiments
Ø Accuracyofdynamicmeasurement
• before
• after
REliable, INtelligent &Scalable Systems
63
Experiments
Ø Accuracyofdynamicmeasurement
Mostsuitableworker
A,B,C
Mostsuitableworker
B,A,A
REliable, INtelligent &Scalable SystemsPublications
• Fu Y, Chen H, Song F. STWM: A Solution to Self-adaptive Task-Worker Matching in Software Crowdsourcing[M]//Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2015: 383-398.
• Song F, Chen H, Fu Y. An Approach to Rapid Worker Discovery in Software Crowdsourcing[M]//Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2015: 370-382.
Thank You!
65