Best Project Center | Best project center in chennai, best project center in t.nagar, best project center in tnagar, Best final year project center, project center in Chennai, project center near me, best project center in mambalam, best project center in vadapalani, best project center in ashok nagar, best project center in Annanagar, best project center

Search Projects Here

44 Results Found

PRIVACY PRESERVING AUTHENTICATION - Shared Authority Based Privacy-preserving Authentication Protocol In Cloud Computing

Cloud Computing Is An Emerging Data Interactive Paradigm To Realize Users' Data Remotely Stored In An Online Cloud Server. Cloud Services Provide Great Conveniences For The Users To Enjoy The On-demand Cloud Applications Without Considering The Local Infrastructure Limitations. During The Data Accessing, Different Users May Be In A Collaborative Relationship, And Thus Data Sharing Becomes Significant To Achieve Productive Benefits. The Existing Security Solutions Mainly Focus On The Authentication To Realize That A User's Privative Data Cannot Be Illegally Accessed, But Neglect A Subtle Privacy Issue During A User Challenging The Cloud Server To Request Other Users For Data Sharing. The Challenged Access Request Itself May Reveal The User's Privacy No Matter Whether Or Not It Can Obtain The Data Access Permissions. In This Paper, We Propose A Shared Authority Based Privacy-preserving Authentication Protocol (SAPA) To Address Above Privacy Issue For Cloud Storage. In The SAPA, 1) Shared Access Authority Is Achieved By Anonymous Access Request Matching Mechanism With Security And Privacy Considerations (e.g., Authentication, Data Anonymity, User Privacy, And Forward Security); 2) Attribute Based Access Control Is Adopted To Realize That The User Can Only Access Its Own Data Fields; 3) Proxy Re-encryption Is Applied To Provide Data Sharing Among The Multiple Users. Meanwhile, Universal Composability (UC) Model Is Established To Prove That The SAPA Theoretically Has The Design Correctness. It Indicates That The Proposed Protocol Is Attractive For Multi-user Collaborative Cloud Applications.

PERFORMACE AND COST EVALUATION - Performance And Cost Evaluation Of An Adaptive Encryption Architecture For Cloud Databases

The Cloud Database As A Service Is A Novel Paradigm That Can Support Several Internet-based Applications, But Its Adoption Requires The Solution Of Information Confidentiality Problems. We Propose A Novel Architecture For Adaptive Encryption Of Public Cloud Databases That Offers An Interesting Alternative To The Tradeoff Between The Required Data Confidentiality Level And The Flexibility Of The Cloud Database Structures At Design Time. We Demonstrate The Feasibility And Performance Of The Proposed Solution Through A Software Prototype. Moreover, We Propose An Original Cost Model That Is Oriented To The Evaluation Of Cloud Database Services In Plain And Encrypted Instances And That Takes Into Account The Variability Of Cloud Prices And Tenant Workloads During A Medium-term Period.

SECURE DATA FORWARDING - A Secure Erasure Code Based Cloud Storage System With Secure Data Forwarding

A Cloud Storage System, Consisting Of A Collection Of Storage Servers, Provides Long-term Storage Services Over The Internet. Storing Data In A Third Party's Cloud System Causes Serious Concern Over Data Confidentiality. General Encryption Schemes Protect Data Confidentiality, But Also Limit The Functionality Of The Storage System Because A Few Operations Are Supported Over Encrypted Data. Constructing A Secure Storage System That Supports Multiple Functions Is Challenging When The Storage System Is Distributed And Has No Central Authority. We Propose A Threshold Proxy Re-encryption Scheme And Integrate It With A Decentralized Erasure Code Such That A Secure Distributed Storage System Is Formulated. The Distributed Storage System Not Only Supports Secure And Robust Data Storage And Retrieval, But Also Lets A User Forward His Data In The Storage Servers To Another User Without Retrieving The Data Back. The Main Technical Contribution Is That The Proxy Re-encryption Scheme Supports Encoding Operations Over Encrypted Messages As Well As Forwarding Operations Over Encoded And Encrypted Messages. Our Method Fully Integrates Encrypting, Encoding, And Forwarding. We Analyze And Suggest Suitable Parameters For The Number Of Copies Of A Message Dispatched To Storage Servers And The Number Of Storage Servers Queried By A Key Server. These Parameters Allow More Flexible Adjustment Between The Number Of Storage Servers And Robustness.

HIERARCHICAL ATTRIBUTE - HASBE-A Hierarchical Attribute Based Solution For Flexible And Scalable Access Control In Cloud Computing

Cloud Computing Has Emerged As One Of The Most Influential Paradigms In The IT Industry In Recent Years. Since This New Computing Technology Requires Users To Entrust Their Valuable Data To Cloud Providers, There Have Been Increasing Security And Privacy Concerns On Outsourced Data. Several Schemes Employing Attribute-based Encryption (ABE) Have Been Proposed For Access Control Of Outsourced Data In Cloud Computing; However, Most Of Them Suffer From Inflexibility In Implementing Complex Access Control Policies. In Order To Realize Scalable, Flexible, And Fine-grained Access Control Of Outsourced Data In Cloud Computing, In This Paper, We Propose Hierarchical Attribute-set-based Encryption (HASBE) By Extending Ciphertext-policy Attribute-set-based Encryption (ASBE) With A Hierarchical Structure Of Users. The Proposed Scheme Not Only Achieves Scalability Due To Its Hierarchical Structure, But Also Inherits Flexibility And Fine-grained Access Control In Supporting Compound Attributes Of ASBE. In Addition, HASBE Employs Multiple Value Assignments For Access Expiration Time To Deal With User Revocation More Efficiently Than Existing Schemes. We Formally Prove The Security Of HASBE Based On Security Of The Ciphertext-policy Attribute-based Encryption Scheme By Bethencourt And Analyze Its Performance And Computational Complexity. We Implement Our Scheme And Show That It Is Both Efficient And Flexible In Dealing With Access Control For Outsourced Data In Cloud Computing With Comprehensive Experiments.

CQA POST VOTING PREDICTION - QAAN Question Answering Attention Networking For Community Question Classification

Community Question Answering (CQA) Provides Platforms For Users With Various Backgrounds To Obtain Information And Share Knowledge. In Recent Years, With The Rapid Development Of Such Online Platforms, An Enormous Amount Of Archive Data Has Accumulated, It Becomes More And More Difficult For Expert Users To Identify Desirable Questions. In Order To Reduce The Proportion Of Unanswered Questions In CQA, Facilitate Expert Users To Find The Questions They Are Interested In, Question Classification Becomes An Important Task Of CQA, Which Aims To Assign A Newly Posted Question To A Specific Preset Category. In This Paper, We Propose A Novel Question Answering Attention Network (QAAN) For Investigating The Role Of The Paired Answer Of Questions For Classification. Specifically, QAAN Studies The Correlation Between Question And Paired Answer, Taking The Questions As The Primary Part Of The Question Representation, And The Answer Information Is Aggregated Based On Similarity And Disparity With The Answer. Our Experiment Is Implemented On Yahoo! Answers Dataset. The Results Show That QAAN Outperforms All The Baseline Models.

REPRESENTATIVE TRAVEL ROUTE RECOMMENDATION- Personalized Tourism Route Recommendation System Based On Dynamic Clustering Of User Groups

Tourism Path Dynamic Planning Is An Asynchronous Group Model Planning Problem. It Is Required To Find Group Patterns With Similar Trajectory Behavior Under The Constraint Of Unequal Time Intervals. Traditional Trajectory Group Pattern Mining Algorithms Often Deal With GPS Data With Fixed Time Interval Sampling Constraints, So They Can Not Be Directly Used In Coterie Pattern Mining. At The Same Time, Traditional Group Pattern Mining Has The Problem Of Lack Of Semantic Information, Which Reduces The Integrity And Accuracy Of Personalized Travel Route Recommendation. Therefore, This Paper Proposes A Semantic Based Distance Sensitive Recommendation Strategy. In Order To Efficiently Process Large-scale Social Network Trajectory Data, This Paper Uses MapReduce Programming Model With Optimized Clustering To Mine Coterie Group Patterns. The Experimental Results Show That: Under MapReduce Programming Model, Coterie Group Pattern Mining With Optimized Clustering And Semantic Information Is Superior To Traditional Group Mode In Personalized Travel Route Recommendation Quality, And Can Effectively Process Large-scale Social Network Trajectory Data.

CREDIT CARD FRAUD DETECTION - Fraud Detection In Credit Card Data Using Unsupervised Machine Learning Based Scheme

Development Of Communication Technologies And E-commerce Has Made The Credit Card As The Most Common Technique Of Payment For Both Online And Regular Purchases. So, Security In This System Is Highly Expected To Prevent Fraud Transactions. Fraud Transactions In Credit Card Data Transaction Are Increasing Each Year. In This Direction, Researchers Are Also Trying The Novel Techniques To Detect And Prevent Such Frauds. However, There Is Always A Need Of Some Techniques That Should Precisely And Efficiently Detect These Frauds. This Paper Proposes A Scheme For Detecting Frauds In Credit Card Data Which Uses A Neural Network (NN) Based Unsupervised Learning Technique. Proposed Method Outperforms The Existing Approaches Of Auto Encoder (AE), Local Outlier Factor (LOF), Isolation Forest (IF) And K-Means Clustering. Proposed NN Based Fraud Detection Method Performs With 99.87% Accuracy Whereas Existing Methods AE, IF, LOF And K Means Gives 97%, 98%, 98% And 99.75% Accuracy Respectively.

SECURE MINING OF ASSOCIATION RULES - Scalable Privacy-Preserving Distributed Extremely Randomized Trees For Structured Data With Multiple Colluding Parties

Today, In Many Real-world Applications Of Machine Learning Algorithms, The Data Is Stored On Multiple Sources Instead Of At One Central Repository. In Many Such Scenarios, Due To Privacy Concerns And Legal Obligations, E.g., For Medical Data, And Communication/computation Overhead, For Instance For Large Scale Data, The Raw Data Cannot Be Transferred To A Center For Analysis. Therefore, New Machine Learning Approaches Are Proposed For Learning From The Distributed Data In Such Settings. In This Paper, We Extend The Distributed Extremely Randomized Trees (ERT) Approach W.r.t. Privacy And Scalability. First, We Extend Distributed ERT To Be Resilient W.r.t. The Number Of Colluding Parties In A Scalable Fashion. Then, We Extend The Distributed ERT To Improve Its Scalability Without Any Major Loss In Classification Performance. We Refer To Our Proposed Approach As K-PPD-ERT Or Privacy-Preserving Distributed Extremely Randomized Trees With K Colluding Parties.

TAXI DRIVERS ROUTE CHOICE BEHAVIOR USING THE TRACE RECORDS- A Mixed Path Size Logit-Based Taxi Customer-Search Model Considering Spatio-Temporal Factors In Route Choice

This Paper Introduces A Model To Analyze Route Choice Behavior Of Taxi Drivers For Finding Next Passenger In Urban Road Network. Considering The Situation Of Path Overlapping Between Selected Routes In The Process Of Customer-searching, A Mixed Path Size Logit Model Is Proposed To Analyze Route Choice Behaviors Through Considering Spatio-temporal Features Of Route Including Customer Generation Rate, Path Travel Time, Cumulative Intersection Delay, Path Distance, And Path Size. Specially, Customer Generation Rate Is Defined As Attraction Strength Based On Historical Pick-up Records In The Route, The Intersection Travel Delay And Path Travel Time Are Estimated Based On Large Scaled Taxi Global Positioning System Trajectories. In The Experiment, The GPS Data Were Collected From About 36000 Taxi Vehicles In Beijing At 30-s Interval During Six Months. In The Model Application, An Area Of Approximately 10 Square Kilometers In The Center Of Beijing Is Selected To Demonstrate The Effectiveness Of The Proposed Model. The Results Indicated That The MPSL Model Could Effectively Analyze The Route Choice Behavior In Customer-searching Process And Express Higher Accuracy Than Traditional Multinomial Logit Model And Basic PSL Model.

FILE TRANSFER USING CRYPTOGRAPHIC TECHNIQUE - Enhancing Secure Digital Communication Media Using Cryptographic Steganography Techniques

Data Hiding Technique Is The Process Of Anti-computer Forensic For Making The Data Difficult To Accessible. Steganography Is Merging Texts, Files, Or Other Multimedia Files Within Another Texts, Files, Or Other Multimedia Files To Reduce The Visible Attack And It Is An Approach Of Data Hiding Technique. Cryptography Is Changing The Readable Text To Illegible Information. This Paper Presents About Secure Communication Media Which Is Used In Transferring Text, Multimedia Or Relevant Digital File Between Sender And Receiver Securely. To Have Securing Communication Media, The Media Required To Reduce The Possible Threats And Vulnerabilities. Therefore, Transferred Media Is Main Thing To Consideration For Having Communication System Firmly. Data Hiding Techniques Are Used To Improve The Security Of Communication Media Using Salt Encryption. This Paper Is Proposed The Methodology To Develop The Secure Communication Media Using Combination Of Cryptography And Steganography Techniques By Describing Experimental Results From Difference Technical Analysis.

PREDICT LENGTH OF STAY OF STROKE PATIENTS USING DATA MINING TECHNIQUES - SNOMED CT-Based Standardized E-Clinical Pathways For Enabling Big Data Analytics In Healthcare

Automation Of Healthcare Facilities Represents A Challenging Task Of Streamlining A Highly Information-intensive Sector. Modern Healthcare Processes Produce Large Amounts Of Data That Have Great Potential For Health Policymakers And Data Science Researchers. However, A Considerable Portion Of Such Data Is Not Captured In Electronic Format And Hidden Inside The Paperwork. A Major Source Of Missing Data In Healthcare Is Paper-based Clinical Pathways (CPs). CPs Are Healthcare Plans That Detail The Interventions For The Treatment Of Patients, And Thus Are The Primary Source For Healthcare Data. However, Most CPs Are Used As Paper-based Documents And Not Fully Automated. A Key Contribution Towards The Full Automation Of CPs Is Their Proper Computer Modeling And Encoding Their Data With International Clinical Terminologies. We Present In This Research An Ontology-based CP Automation Model In Which CP Data Are Standardized With SNOMED CT, Thus Enabling Machine Learning Algorithms To Be Applied To CP-based Datasets. CPs Automated Under This Model Contribute Significantly To Reducing Data Missingness Problems, Enabling Detailed Statistical Analyses On CP Data, And Improving The Results Of Data Analytics Algorithms. Our Experimental Results On Predicting The Length Of Stay (LOS) Of Stroke Patients Using A Dataset Resulting From An E-clinical Pathway Demonstrate Improved Prediction Results Compared With LOS Prediction Using Traditional EHR-based Datasets. Fully Automated CPs Enrich Medical Datasets With More CP Data And Open New Opportunities For Machine Learning Algorithms To Show Their Full Potential In Improving Healthcare, Reducing Costs, And Increasing Patient Satisfaction

PREDICT CHANGING STUDENTS ATTITUDE USING DATA MINING - Supporting Teachers To Monitor Students Learning Progress In An Educational Environment With Robotics Activities

Educational Robotics Has Proven Its Positive Impact On The Performances And Attitudes Of Students. However, The Educational Environments That Employ Them Rarely Provide Teachers With Relevant Information That Can Be Used To Make An Effective Monitoring Of The Student Learning Progress. To Overcome These Limitations, In This Paper We Present IDEE (Integrated Didactic Educational Environment), An Educational Environment For Physics, That Uses EV3 LEGO Mindstorms R Educational Kit As Robotic Component. To Provide Support To Teachers, IDEE Includes A Dashboard That Provides Them With Information About The Students’ Learning Process. This Analysis Is Done By Means Of An Additive Factor Model (AFM). That Is A Well-known Technique In The Educational Data Mining Research Area. However, It Has Been Usually Employed To Carry Out Analysis About Students’ Performance Data Outside The System. This Can Be A Burden For The Teacher Who, In Most Cases, Is Not An Expert In Data Analysis. Our Goal In This Paper Is To Show How The Coefficients Of AFM Provide Valuable Information To The Teacher Without Requiring Any Deep Expertise In Data Analysis. In Addition, We Show An Improved Version Of The AFM That Provides A Deeper Understanding About The Students’ Learning Process.

MALWARE DETECTION IN GOOGLE PLAY - Towards De-Anonymization Of Google Play Search Rank Fraud

Search Rank Fraud, The Fraudulent Promotion Of Products Hosted On Peer-review Sites, Is Driven By Expert Workers Recruited Online, Often From Crowdsourcing Sites. In This Paper We Introduce The Fraud De-anonymization Problem, That Goes Beyond Fraud Detection, To Unmask The Human Masterminds Responsible For Posting Search Rank Fraud In Peer-review Sites. We Collect And Study Data From Crowdsourced Search Rank Fraud Jobs, And Survey The Capabilities And Behaviors Of 58 Search Rank Fraud Workers Recruited From 6 Crowdsourcing Sites. We Collect A Gold Standard Dataset Of Google Play User Accounts Attributed To 23 Crowdsourced Workers And Analyze Their Fraudulent Behaviors In The Wild. We Propose Dolos , A Fraud De-anonymization System That Leverages Traits And Behaviors We Extract From Our Studies, To Attribute Detected Fraud To Crowdsourcing Site Workers, Thus To Real Identities And Bank Accounts. We Introduce MCDense, A Min-cut Dense Component Detection Algorithm To Uncover Groups Of User Accounts Controlled By Different Workers, And Use Stylometry And Supervised Learning To Attribute Them To Crowdsourcing Site Profiles. Dolos Correctly Identified The Owners Of 95 Percent Of Fraud Worker-controlled Communities, And Uncovered Fraud Workers Who Promoted As Many As 97.5 Percent Of Fraud Apps We Collected From Google Play. When Evaluated On 13,087 Apps (820,760 Reviews), Which We Monitored Over More Than 6 Months, Dolos Identified 1,056 Apps With Suspicious Reviewer Groups. We Report Orthogonal Evidence Of Their Fraud, Including Fraud Duplicates And Fraud Re-posts. Dolos Significantly Outperformed Adapted Dense Subgraph Detection And Loopy Belief Propagation Competitors, On Two New Coverage Scores That Measure The Quality Of Detected Community Partitions.

LOCATION AWARE KEYWORD QUERY SUGGESTION - Integrating “Random Forest” With Indexing And Query Processing For Personalized Search

The Internet Has Become An Integral Part Of At Least 4.4 Billion Lives. An Average Person Looks At Their Device At Least 20 Times A Day. One Can Only Imagine The Amount Of Queries A Search Engine Gets On A Daily Basis. With The Help Of All The Data Acquired Over The Years, The Internet Updates Us With All The Biggest Trends And Live Events Happening All Over The World. A Search Engine Is Able To Provide Query Suggestions Based On The Number Of Times A Keyword Has Been Searched For Or The Current Query Relates To A Certain Trend. All These Trends Are Updated To Every Device Internationally Or Locally. This Concept Is Generalized Throughout All Devices That Use Any Kind Of Search Engine On Any Application. Through This Paper We Intend To Propose To Use Random Forest As A Predictive Model To Be Integrated With The Indexing Process Of The Search Engine To Produce Query Suggestions That A User Would Want To Search, Contrary To The Query Suggestions That Are Usually Displayed Based On Hyped Trends And Fashion.

USER TRUST AND ITEM RATINGS PREDICT - A Novel Implicit Trust Recommendation Approach For Rating Prediction

Rating Predictions, As An Application That Is Widely Used In Recommender Systems, Have Gradually Become A Valuable Way Which Can Help User Narrow Down Their Choices Quickly And Make Wise Decisions From The Vast Amount Of Information. However, Most Existing Collaborative Recommendation Models Suffer From Poor Accuracy Due To Data Sparsity And Cold Start Problems That Recommender Systems Contain Only A Few Explicit Data. To Solve This Problem, A New Implicit Trust Recommendation Approach (ITRA) Is Proposed To Generate Item Rating Prediction By Mining And Utilizing User Implicit Information In Recommender Systems. Specifically, User Trust Neighbor Set That Has Similar Preference And Taste With A Target User Is First Obtained By Trust Expansion Strategy Via User Trust Diffusion Features In A Trust Network. Then, The Trust Ratings Mined From User Trust Neighbors Are Used To Compute Trust Similarity Among Users Based On User Collaborative Filtering Model. Finally, Using The Above Filtered Trust Ratings And User Trust Similarity, The Prediction Results Are Generated By A Trust Weighting Method. In Addition, The Empirical Experiments Are Conducted On Three Real-world Datasets, And The Results Demonstrate That Our Rating Prediction Model Has Obvious Advantages Over The State-of-the-art Comparison Methods In Terms Of The Accuracy Of Recommendations.

PRIVACY POLICY INFERENCE OF USER-UPLOADED IMAGES - User Flagging For Posts At 3DTubeorg The First Social Platform For 3D-Exclusive Contents

Social Networks Have Been A Popular Way For A Community To Share Content, Information, And News. Despite Section 230 Of The Communications Decency Act Of 1996 Protecting Social Platforms From Legal Liability Regarding User Uploaded Contents Of Their Platforms In The USA, There Has Been A Recent Call For Some Jurisdiction Over Platform Management Practices. This Duty Of Potential Jurisdiction Would Be Especially Challenging For Social Networks That Are Rich In Multimedia Contents, Such As, Since 3D Capabilities Have A History Of Attracting Adult Materials And Other Controversial Content. This Paper Presents The Design Of To Address Two Major Issues: (1) The Need For A Social Media Platform Of 3D Contents And (2) The Policies And Designs For Mediation Of Said Contents. Content Mediation Can Be Seen As A Compromise Between Two Conflicting Goals: Platform Micromanaging Of Content, Which Is Resource-intensive, And User Notification Of Flagged Content And Material, Prior To Viewing. This Paper Details's Solution To Such A Compromise.

SEMANTICALLY SECURE ENCRYPTED RELATIONAL DATA USING K -NEAREST NEIGHBOR CLASSIFICATION - A Distributed Storage And Computation K-Nearest Neighbor Algorithm Based Cloud-Edge Computing For Cyber-Physical-Social Systems

The K-nearest Neighbor (kNN) Algorithm Is A Classic Supervised Machine Learning Algorithm. It Is Widely Used In Cyber-physical-social Systems (CPSS) To Analyze And Mine Data. However, In Practical CPSS Applications, The Standard Linear KNN Algorithm Struggles To Efficiently Process Massive Data Sets. This Paper Proposes A Distributed Storage And Computation K-nearest Neighbor (D-kNN) Algorithm. The D-kNN Algorithm Has The Following Advantages: First, The Concept Of K-nearest Neighbor Boundaries Is Proposed And The K-nearest Neighbor Search Within The K-nearest Neighbors Boundaries Can Effectively Reduce The Time Complexity Of KNN. Second, Based On The K-neighbor Boundary, Massive Data Sets Beyond The Main Storage Space Are Stored On Distributed Storage Nodes. Third, The Algorithm Performs K-nearest Neighbor Searching Efficiently By Performing Distributed Calculations At Each Storage Node. Finally, A Series Of Experiments Were Performed To Verify The Effectiveness Of The D-kNN Algorithm. The Experimental Results Show That The D-kNN Algorithm Based On Distributed Storage And Calculation Effectively Improves The Operation Efficiency Of K-nearest Neighbor Search. The Algorithm Can Be Easily And Flexibly Deployed In A Cloud-edge Computing Environment To Process Massive Data Sets In CPSS.

COMPLICATION RISK PROFILING IN DIABETES CARE- A Bayesian Multi-Task And Feature Relationship Learning Approach

Diabetes Mellitus, Commonly Known As Diabetes, Is A Chronic Disease That Often Results In Multiple Complications. Risk Prediction Of Diabetes Complications Is Critical For Healthcare Professionals To Design Personalized Treatment Plans For Patients In Diabetes Care For Improved Outcomes. In This Paper, Focusing On Type 2 Diabetes Mellitus (T2DM), We Study The Risk Of Developing Complications After The Initial T2DM Diagnosis From Longitudinal Patient Records. We Propose A Novel Multi-task Learning Approach To Simultaneously Model Multiple Complications Where Each Task Corresponds To The Risk Modeling Of One Complication. Specifically, The Proposed Method Strategically Captures The Relationships (1) Between The Risks Of Multiple T2DM Complications, (2) Between Different Risk Factors, And (3) Between The Risk Factor Selection Patterns, Which Assumes Similar Complications Have Similar Contributing Risk Factors. The Method Uses Coefficient Shrinkage To Identify An Informative Subset Of Risk Factors From High-dimensional Data, And Uses A Hierarchical Bayesian Framework To Allow Domain Knowledge To Be Incorporated As Priors. The Proposed Method Is Favorable For Healthcare Applications Because In Addition To Improved Prediction Performance, Relationships Among The Different Risks And Among Risk Factors Are Also Identified. Extensive Experimental Results On A Large Electronic Medical Claims Database Show That The Proposed Method Outperforms State-of-the-art Models By A Significant Margin. Furthermore, We Show That The Risk Associations Learned And The Risk Factors Identified Lead To Meaningful Clinical Insights.

SUPPLY AND DEMAND CHAIN INTEGRATION- Sustainable Supply And Demand Chain Integration Within Global Manufacturing Industries

Given The Emerging Industrial Management Strategies Considering Three Pillars Of Sustainability In Particular, There Is A Vital Need To Determine The Differences Of Sustainability Practices Within Both Supply And Demand Distribution Systems Through Global Manufacturing Environments Providing With The Successful Global Trade And Logistics. This Research Paper Aims To Explore The Interactions And Advantages Of Sustainability Applications Within Both Supply And Demand Chain Management. The Research Framework Adopted Consists Of Survey Questionnaire Method Which Is Conducted Within A Global Tyre Manufacturing Company. The Research Results And Analysis Justify The Need For The Application Of Ethical Codes, Supply Chain Transformation And The Effective Association Of Industry Executives, Professional Bodies And The Government. The Research Study Also Identifies That The Vital Incentive Factors For The Organisation Towards Sustainable Supply Demand Chain (SSDC) Are Mostly The Financial Benefits Of Doing So And Therefore, A Positive Mind-set Shift Towards Greening Practices Is Required.

MULTI-KEY WORD RANKED SEARCH - Privacy Preserving Multi-Key Word Ranked Search Over Encrypted Cloud Data

With The Advent Of Cloud Computing, Data Owners Are Motivated To Outsource Their Complex Data Management Systems From Local Sites To The Commercial Public Cloud For Great Flexibility And Economic Savings. But For Protecting Data Privacy, Sensitive Data Have To Be Encrypted Before Outsourcing, Which Obsoletes Traditional Data Utilization Based On Plaintext Keyword Search. Thus, Enabling An Encrypted Cloud Data Search Service Is Of Paramount Importance. Considering The Large Number Of Data Users And Documents In The Cloud, It Is Necessary To Allow Multiple Keywords In The Search Request And Return Documents In The Order Of Their Relevance To These Keywords. Related Works On Searchable Encryption Focus On Single Keyword Search Or Boolean Keyword Search, And Rarely Sort The Search Results. In This Paper, For The First Time, We Define And Solve The Challenging Problem Of Privacy-preserving Multi-keyword Ranked Search Over Encrypted Data In Cloud Computing (MRSE). We Establish A Set Of Strict Privacy Requirements For Such A Secure Cloud Data Utilization System. Among Various Multi-keyword Semantics, We Choose The Efficient Similarity Measure Of "coordinate Matching," I.e., As Many Matches As Possible, To Capture The Relevance Of Data Documents To The Search Query. We Further Use "inner Product Similarity" To Quantitatively Evaluate Such Similarity Measure. We First Propose A Basic Idea For The MRSE Based On Secure Inner Product Computation, And Then Give Two Significantly Improved MRSE Schemes To Achieve Various Stringent Privacy Requirements In Two Different Threat Models. To Improve Search Experience Of The Data Search Service, We Further Extend These Two Schemes To Support More Search Semantics. Thorough Analysis Investigating Privacy And Efficiency Guarantees Of Proposed Schemes Is Given. Experiments On The Real-world Data Set Further Show Proposed Schemes Indeed Introduce Low Overhead On Computation And Communication.

SKYLINE PRODUCT - Finding Optimal Skyline Product Combination Under Price Promotion

Nowadays, With The Development Of E-commerce, A Growing Number Of Customers Choose To Go Shopping Online. To Find Attractive Products From Online Shopping Marketplaces, The Skyline Query Is A Useful Tool Which Offers More Interesting And Preferable Choices For Customers. The Skyline Query And Its Variants Have Been Extensively Investigated. However, To The Best Of Our Knowledge, They Have Not Taken Into Account The Requirements Of Customers In Certain Practical Application Scenarios. Recently, Online Shopping Marketplaces Usually Hold Some Price Promotion Campaigns To Attract Customers And Increase Their Purchase Intention. Considering The Requirements Of Customers In This Practical Application Scenario, We Are Concerned About Product Selection Under Price Promotion. We Formulate A Constrained Optimal Product Combination (COPC) Problem. It Aims To Find Out The Skyline Product Combinations Which Both Meet A Customer's Willingness To Pay And Bring The Maximum Discount Rate. The COPC Problem Is Significant To Offer Powerful Decision Support For Customers Under Price Promotion, Which Is Certified By A Customer Study. To Process The COPC Problem Effectively, We First Propose A Two List Exact (TLE) Algorithm. The COPC Problem Is Proven To Be NP-hard, And The TLE Algorithm Is Not Scalable Because It Needs To Process An Exponential Number Of Product Combinations. Additionally, We Design A Lower Bound Approximate (LBA) Algorithm That Has A Guarantee About The Accuracy Of The Results And An Incremental Greedy (IG) Algorithm That Has Good Performance. The Experiment Results Demonstrate The Efficiency And Effectiveness Of Our Proposed Algorithms.

SENTIMENTAL ANALYSIS- Age Related Sentimental Analysis For Efficient Review Mining

Natural Language Processing Has Been Continuous Field Of Interest Since 1950s. It Is Concerned With The Interaction Between Computers And Human’s Natural Languages. The History Of Natural Language Processing Started With Alan Turing’s Article Titled “Computer Machinery And Intelligence”. How Natural Language Is Processed By Computers Is Main Concern Of NLP. Speech Recognition, Text Analysis, Text Translation Are Few Areas Where Natural Language Processing Along With Artificial Intelligence Is Employed. NLP Includes Various Evaluation Tasks Such As Stemming, Grammar Induction, Topic Segmentation Etc. This Project Aims At Developing A Program That Is Used For Age Related Sentiment Analysis. Sentiment Analysis Refers To The Use Of Natural Language Processing, Text Analysis, Computational Linguistics, And Biometrics To Systematically Identify, Extract, Quantify, And Study Affective States And Subjective Information. Methods To Approach Sentiment Analysis Are Classified Mainly Into Knowledge Based Approach, Statistical Approach And Hybrid Approach. Provided A Text, Mood Of The Text Will Be Analysed. The Main Constraint That Is Applied Here Is Age. The Text Will Be Analysed Related To The Age. The Opinion Or Mood Behind The Particular Text Varies For Every Age Group Since Their Understanding Levels And Conceptual Knowledge Varies. Word Ambiguity Is Analysed And Based On The Keyword Detection And Context Analysis Ambiguity Is Removed. Age Is Taken Into Consideration While Analysing The Text And Hence For The Same Text In The Same Context Analysis Varies.

GEOGRAPHICAL PROBABILISTIC FACTOR MODEL - A General Geographical Probabilistic Factor Model For Point Of Interest Recommendation

The Problem Of Point Of Interest (POI) Recommendation Is To Provide Personalized Recommendations Of Places, Such As Restaurants And Movie Theaters. The Increasing Prevalence Of Mobile Devices And Of Location Based Social Networks (LBSNs) Poses Significant New Opportunities As Well As Challenges, Which We Address. The Decision Process For A User To Choose A POI Is Complex And Can Be Influenced By Numerous Factors, Such As Personal Preferences, Geographical Considerations, And User Mobility Behaviors. This Is Further Complicated By The Connection LBSNs And Mobile Devices. While There Are Some Studies On POI Recommendations, They Lack An Integrated Analysis Of The Joint Effect Of Multiple Factors. Meanwhile, Although Latent Factor Models Have Been Proved Effective And Are Thus Widely Used For Recommendations, Adopting Them To POI Recommendations Requires Delicate Consideration Of The Unique Characteristics Of LBSNs. To This End, In This Paper, We Propose A General Geographical Probabilistic Factor Model (Geo-PFM) Framework Which Strategically Takes Various Factors Into Consideration. Specifically, This Framework Allows To Capture The Geographical Influences On A User's Check-in Behavior. Also, User Mobility Behaviors Can Be Effectively Leveraged In The Recommendation Model. Moreover, Based Our Geo-PFM Framework, We Further Develop A Poisson Geo-PFM Which Provides A More Rigorous Probabilistic Generative Process For The Entire Model And Is Effective In Modeling The Skewed User Check-in Count Data As Implicit Feedback For Better POI Recommendations. Finally, Extensive Experimental Results On Three Real-world LBSN Datasets (which Differ In Terms Of User Mobility, POI Geographical Distribution, Implicit Response Data Skewness, And User-POI Observation Sparsity), Show That The Proposed Recommendation Methods Outperform State-of-the-art Latent Factor Models By A Significant Margin

SCALABLE GRAPH-BASED RANKING MODEL- EMR A Scalable Graph-based Ranking Model For Content-based Image Retrieval

Graph-based Ranking Models Have Been Widely Applied In Information Retrieval Area. In This Paper, We Focus On A Well Known Graph-based Model - The Ranking On Data Manifold Model, Or Manifold Ranking. Particularly, It Has Been Successfully Applied To Content-based Image Retrieval, Because Of Its Outstanding Ability To Discover Underlying Geometrical Structure Of The Given Image Database. However, Manifold Ranking Is Computationally Very Expensive, Which Significantly Limits Its Applicability To Large Databases Especially For The Cases That The Queries Are Out Of The Database. We Propose A Novel Scalable Graph-based Ranking Model Called Efficient Manifold Ranking (EMR), Trying To Address The Shortcomings Of MR From Two Main Perspectives: Scalable Graph Construction And Efficient Ranking Computation. Specifically, We Build An Anchor Graph On The Database Instead Of A Traditional K-nearest Neighbor Graph, And Design A New Form Of Adjacency Matrix Utilized To Speed Up The Ranking. An Approximate Method Is Adopted For Efficient Out-of-sample Retrieval. Experimental Results On Some Large Scale Image Databases Demonstrate That EMR Is A Promising Method For Real World Retrieval Applications.

ROUTE-SAVER- Leveraging Route APIs For Accurate And Efficient Query Processing At Location Based Services

Location-based Services (LBS) Enable Mobile Users To Query Points-of-interest (e.g., Restaurants, Cafes) On Various Features (e.g., Price, Quality, Variety). In Addition, Users Require Accurate Query Results With Up-to-date Travel Times. Lacking The Monitoring Infrastructure For Road Traffic, The LBS May Obtain Live Travel Times Of Routes From Online Route APIs In Order To Offer Accurate Results. Our Goal Is To Reduce The Number Of Requests Issued By The LBS Significantly While Preserving Accurate Query Results. First, We Propose To Exploit Recent Routes Requested From Route APIs To Answer Queries Accurately. Then, We Design Effective Lower/upper Bounding Techniques And Ordering Techniques To Process Queries Efficiently. Also, We Study Parallel Route Requests To Further Reduce The Query Response Time. Our Experimental Evaluation Shows That Our Solution Is Three Times More Efficient Than A Competitor, And Yet Achieves High Result Accuracy (above 98 Percent).

TWEET SEGMENTATION - Tweet Segmentation And Its Application To Named Entity Recognition

Twitter Has Attracted Millions Of Users To Share And Disseminate Most Up-to-date Information, Resulting In Large Volumes Of Data Produced Everyday. However, Many Applications In Information Retrieval (IR) And Natural Language Processing (NLP) Suffer Severely From The Noisy And Short Nature Of Tweets. In This Paper, We Propose A Novel Framework For Tweet Segmentation In A Batch Mode, Called HybridSeg. By Splitting Tweets Into Meaningful Segments, The Semantic Or Context Information Is Well Preserved And Easily Extracted By The Downstream Applications. HybridSeg Finds The Optimal Segmentation Of A Tweet By Maximizing The Sum Of The Stickiness Scores Of Its Candidate Segments. The Stickiness Score Considers The Probability Of A Segment Being A Phrase In English (i.e., Global Context) And The Probability Of A Segment Being A Phrase Within The Batch Of Tweets (i.e., Local Context). For The Latter, We Propose And Evaluate Two Models To Derive Local Context By Considering The Linguistic Features And Term-dependency In A Batch Of Tweets, Respectively. HybridSeg Is Also Designed To Iteratively Learn From Confident Segments As Pseudo Feedback. Experiments On Two Tweet Data Sets Show That Tweet Segmentation Quality Is Significantly Improved By Learning Both Global And Local Contexts Compared With Using Global Context Alone. Through Analysis And Comparison, We Show That Local Linguistic Features Are More Reliable For Learning Local Context Compared With Term-dependency. As An Application, We Show That High Accuracy Is Achieved In Named Entity Recognition By Applying Segment-based Part-of-speech (POS) Tagging.

PRIVACY AND DATA CONFIDENTIALITY - Fast A Fast Clustering-Based Database With Privacy And Data Confidentiality

In Order To Prevent The Disclosure Of Sensitive Information And Protect Users' Privacy, The Generalization And Suppression Of Technology Is Often Used To Anonymize The Quasi-identifiers Of The Data Before Its Sharing. Data Streams Are Inherently Infinite And Highly Dynamic Which Are Very Different From Static Datasets, So That The Anonymization Of Data Streams Needs To Be Capable Of Solving More Complicated Problems. The Methods For Anonymizing Static Datasets Cannot Be Applied To Data Streams Directly. In This Paper, An Anonymization Approach For Data Streams Is Proposed With The Analysis Of The Published Anonymization Methods For Data Streams. This Approach Scans The Data Only Once To Recognize And Reuse The Clusters That Satisfy The Anonymization Requirements For Speeding Up The Anonymization Process. Experimental Results On The Real Dataset Show That The Proposed Method Can Reduce The Information Loss That Is Caused By Generalization And Suppression And Also Satisfies The Anonymization Requirements And Has Low Time And Space Complexity.

INSTANT MESSAGE USING DATA MINING AND ONTOLOGY- Framework For Survelliance Of Instant Messages In Instant Messengers And Social Networking Sites Using Data Mining And Ontology

Innumerable Terror And Suspicious Messages Are Sent Through Instant Messengers (IM) And Social Networking Sites (SNS) Which Are Untraced, Leading To Hindrance For Network Communications And Cyber Security. We Propose A Framework That Discover And Predict Such Messages That Are Sent Using IM Or SNS Like Facebook, Twitter, LinkedIn, And Others. Further, These Instant Messages Are Put Under Surveillance That Identifies The Type Of Suspected Cyber Threat Activity By Culprit Along With Their Personnel Details. Framework Is Developed Using Ontology Based Information Extraction Technique (OBIE), Association Rule Mining (ARM) A Data Mining Technique With Set Of Pre-defined Knowledge-based Rules (logical), For Decision Making Process That Are Learned From Domain Experts And Past Learning Experiences Of Suspicious Dataset Like GTD (Global Terrorist Database). The Experimental Results Obtained Will Aid To Take Prompt Decision For Eradicating Cyber Crimes.

DATA RETRIEVAL PROCESS- Generating Boolean Matrix For Data Retrieval Process

An Data Retrieval (DR) Or Information Retrieval (IR) Process Begins When A User Enters A Query Into The System. Queries Are Formal Statements Of Information Needs, For Example Search Strings In Web Search Engines. In IR A Query Does Not Uniquely Identify A Single Object In The Collection. Instead, Several Objects May Match The Query, Perhaps With Different Degrees Of Relevancy. An Object Is An Entity Which Keeps Or Stores Information In A Database. User Queries Are Matched To Objects Stored In The Database. Depending On The Application The Data Objects May Be, For Example, Text Documents, Images Or Videos. The Documents Themselves Are Not Kept Or Stored Directly In The IR System, But Are Instead Represented In The System By Document Surrogates. Most IR Systems Compute A Numeric Score On How Well Each Objects In The Database Match The Query, And Rank The Objects According To This Value. The Top Ranking Objects Are Then Shown To The User. The Process May Then Be Iterated If The User Wishes To Refine The Query. In This Paper We Try To Explain IR Methods And Asses Them From Two View Points And Finally Propose A Simple Method For Ranking Terms And Documents On IR And Implement The Method And Check The Result.

UNCERTAIN OBJECT- Query Aware Determinization Of Uncertain Objects

This Paper Considers The Problem Of Determinizing Probabilistic Data To Enable Such Data To Be Stored In Legacy Systems That Accept Only Deterministic Input. Probabilistic Data May Be Generated By Automated Data Analysis/enrichment Techniques Such As Entity Resolution, Information Extraction, And Speech Processing. The Legacy System May Correspond To Pre-existing Web Applications Such As Flickr, Picasa, Etc. The Goal Is To Generate A Deterministic Representation Of Probabilistic Data That Optimizes The Quality Of The End-application Built On Deterministic Data. We Explore Such A Determinization Problem In The Context Of Two Different Data Processing Tasks-triggers And Selection Queries. We Show That Approaches Such As Thresholding Or Top-1 Selection Traditionally Used For Determinization Lead To Suboptimal Performance For Such Applications. Instead, We Develop A Query-aware Strategy And Show Its Advantages Over Existing Solutions Through A Comprehensive Empirical Evaluation Over Real And Synthetic Datasets.

EFFECTIVE AND EFFICIENT CLUSTERING METHOD - Effective And Efficient Clustering Methods For Correlated Probabilistic Graphs

Recently, Probabilistic Graphs Have Attracted Significant Interests Of The Data Mining Community. It Is Observed That Correlations May Exist Among Adjacent Edges In Various Probabilistic Graphs. As One Of The Basic Mining Techniques, Graph Clustering Is Widely Used In Exploratory Data Analysis, Such As Data Compression, Information Retrieval, Image Segmentation, Etc. Graph Clustering Aims To Divide Data Into Clusters According To Their Similarities, And A Number Of Algorithms Have Been Proposed For Clustering Graphs, Such As The PKwikCluster Algorithm, Spectral Clustering, K-path Clustering, Etc. However, Little Research Has Been Performed To Develop Efficient Clustering Algorithms For Probabilistic Graphs. Particularly, It Becomes More Challenging To Efficiently Cluster Probabilistic Graphs When Correlations Are Considered. In This Paper, We Define The Problem Of Clustering Correlated Probabilistic Graphs. To Solve The Challenging Problem, We Propose Two Algorithms, Namely The PEEDR And The CPGS Clustering Algorithm. For Each Of The Proposed Algorithms, We Develop Several Pruning Techniques To Further Improve Their Efficiency. We Evaluate The Effectiveness And Efficiency Of Our Algorithms And Pruning Methods Through Comprehensive Experiments.

DOCUMENT ANNOTATION UISNG CONTENT AND QUERYING VALUE- Facilitating Document Annotation Using Content And Querying Value

A Large Number Of Organizations Today Generate And Share Textual Descriptions Of Their Products, Services, And Actions. Such Collections Of Textual Data Contain Significant Amount Of Structured Information, Which Remains Buried In The Unstructured Text. While Information Extraction Algorithms Facilitate The Extraction Of Structured Relations, They Are Often Expensive And Inaccurate, Especially When Operating On Top Of Text That Does Not Contain Any Instances Of The Targeted Structured Information. We Present A Novel Alternative Approach That Facilitates The Generation Of The Structured Metadata By Identifying Documents That Are Likely To Contain Information Of Interest And This Information Is Going To Be Subsequently Useful For Querying The Database. Our Approach Relies On The Idea That Humans Are More Likely To Add The Necessary Metadata During Creation Time, If Prompted By The Interface; Or That It Is Much Easier For Humans (and/or Algorithms) To Identify The Metadata When Such Information Actually Exists In The Document, Instead Of Naively Prompting Users To Fill In Forms With Information That Is Not Available In The Document. As A Major Contribution Of This Paper, We Present Algorithms That Identify Structured Attributes That Are Likely To Appear Within The Document, By Jointly Utilizing The Content Of The Text And The Query Workload. Our Experimental Evaluation Shows That Our Approach Generates Superior Results Compared To Approaches That Rely Only On The Textual Content Or Only On The Query Workload, To Identify Attributes Of Interest.

PRIVACY PROTECTION - Supporting Privacy Protection In Personalized Web Search

Personalized Web Search (PWS) Has Demonstrated Its Effectiveness In Improving The Quality Of Various Search Services On The Internet. However, Evidences Show That Users' Reluctance To Disclose Their Private Information During Search Has Become A Major Barrier For The Wide Proliferation Of PWS. We Study Privacy Protection In PWS Applications That Model User Preferences As Hierarchical User Profiles. We Propose A PWS Framework Called UPS That Can Adaptively Generalize Profiles By Queries While Respecting User-specified Privacy Requirements. Our Runtime Generalization Aims At Striking A Balance Between Two Predictive Metrics That Evaluate The Utility Of Personalization And The Privacy Risk Of Exposing The Generalized Profile. We Present Two Greedy Algorithms, Namely GreedyDP And GreedyIL, For Runtime Generalization. We Also Provide An Online Prediction Mechanism For Deciding Whether Personalizing A Query Is Beneficial. Extensive Experiments Demonstrate The Effectiveness Of Our Framework. The Experimental Results Also Reveal That GreedyIL Significantly Outperforms GreedyDP In Terms Of Efficiency.

TRUSTEDDB- A Trusted Hardware-Based Database With Privacy And Data Confidentiality

Traditionally, As Soon As Confidentiality Becomes A Concern, Data Are Encrypted Before Outsourcing To A Service Provider. Any Software-based Cryptographic Constructs Then Deployed, For Server-side Query Processing On The Encrypted Data, Inherently Limit Query Expressiveness. Here, We Introduce TrustedDB, An Outsourced Database Prototype That Allows Clients To Execute SQL Queries With Privacy And Under Regulatory Compliance Constraints By Leveraging Server-hosted, Tamper-proof Trusted Hardware In Critical Query Processing Stages, Thereby Removing Any Limitations On The Type Of Supported Queries. Despite The Cost Overhead And Performance Limitations Of Trusted Hardware, We Show That The Costs Per Query Are Orders Of Magnitude Lower Than Any (existing Or) Potential Future Software-only Mechanisms. TrustedDB Is Built And Runs On Actual Hardware, And Its Performance And Costs Are Evaluated Here.

FAST CLUSTERING- A Fast Clustering-Based Feature Subset Selection Algorithm For High-Dimensional Data

Feature Selection Involves Identifying A Subset Of The Most Useful Features That Produces Compatible Results As The Original Entire Set Of Features. A Feature Selection Algorithm May Be Evaluated From Both The Efficiency And Effectiveness Points Of View. While The Efficiency Concerns The Time Required To Find A Subset Of Features, The Effectiveness Is Related To The Quality Of The Subset Of Features. Based On These Criteria, A Fast Clustering-based Feature Selection Algorithm (FAST) Is Proposed And Experimentally Evaluated In This Paper. The FAST Algorithm Works In Two Steps. In The First Step, Features Are Divided Into Clusters By Using Graph-theoretic Clustering Methods. In The Second Step, The Most Representative Feature That Is Strongly Related To Target Classes Is Selected From Each Cluster To Form A Subset Of Features. Features In Different Clusters Are Relatively Independent, The Clustering-based Strategy Of FAST Has A High Probability Of Producing A Subset Of Useful And Independent Features. To Ensure The Efficiency Of FAST, We Adopt The Efficient Minimum-spanning Tree (MST) Clustering Method. The Efficiency And Effectiveness Of The FAST Algorithm Are Evaluated Through An Empirical Study. Extensive Experiments Are Carried Out To Compare FAST And Several Representative Feature Selection Algorithms, Namely, FCBF, ReliefF, CFS, Consist, And FOCUS-SF, With Respect To Four Types Of Well-known Classifiers, Namely, The Probability-based Naive Bayes, The Tree-based C4.5, The Instance-based IB1, And The Rule-based RIPPER Before And After Feature Selection. The Results, On 35 Publicly Available Real-world High-dimensional Image, Microarray, And Text Data, Demonstrate That The FAST Not Only Produces Smaller Subsets Of Features But Also Improves The Performances Of The Four Types Of Classifiers.

DISTRIBUTED PROCESSING - Distributed Processing Of Probabilistic Top-k Queries In Wireless Sensor Networks

In This Paper, We Introduce The Notion Of Sufficient Set And Necessary Set For Distributed Processing Of Probabilistic Top-k Queries In Cluster-based Wireless Sensor Networks. These Two Concepts Have Very Nice Properties That Can Facilitate Localized Data Pruning In Clusters. Accordingly, We Develop A Suite Of Algorithms, Namely, Sufficient Set-based (SSB), Necessary Set-based (NSB), And Boundary-based (BB), For Intercluster Query Processing With Bounded Rounds Of Communications. Moreover, In Responding To Dynamic Changes Of Data Distribution In The Network, We Develop An Adaptive Algorithm That Dynamically Switches Among The Three Proposed Algorithms To Minimize The Transmission Cost. We Show The Applicability Of Sufficient Set And Necessary Set To Wireless Sensor Networks With Both Two-tier Hierarchical And Tree-structured Network Topologies. Experimental Results Show That The Proposed Algorithms Reduce Data Transmissions Significantly And Incur Only Small Constant Rounds Of Data Communications. The Experimental Results Also Demonstrate The Superiority Of The Adaptive Algorithm, Which Achieves A Near-optimal Performance Under Various Conditions.

XML RETRIEVAL - Using Personalization To Improve XML Retrieval

As The Amount Of Information Increases Every Day And The Users Normally Formulate Short And Ambiguous Queries, Personalized Search Techniques Are Becoming Almost A Must. Using The Information About The User Stored In A User Profile, These Techniques Retrieve Results That Are Closer To The User Preferences. On The Other Hand, The Information Is Being Stored More And More In An Semi-structured Way, And XML Has Emerged As A Standard For Representing And Exchanging This Type Of Data. XML Search Allows A Higher Retrieval Effectiveness, Due To Its Ability To Retrieve And To Show The User Specific Parts Of The Documents Instead Of The Full Document. In This Paper We Propose Several Personalization Techniques In The Context Of XML Retrieval. We Try To Combine The Different Approaches Where Personalization May Be Applied: Query Reformulation, Re-ranking Of Results And Retrieval Model Modification. The Experimental Results Obtained From A User Study Using A Parliamentary Document Collection Support The Validity Of Our Approach.

ASSOCIATION RULE AND THE APRIORI ALGORTHIM- A Data Mining Project -Discovering Association Rules Using The Apriori Algorithm

Data Mining Has A Lot Of E-Commerce Applications. The Key Problem Is How To Find Useful Hidden Patterns For Better Business Applications In The Retail Sector. For The Solution Of These Problems, The Apriori Algorithm Is One Of The Most Popular Data Mining Approaches For Finding Frequent Item Sets From A Transaction Dataset And Derives Association Rules. Rules Are The Discovered Knowledge From The Data Base. Finding Frequent Item Set (item Sets With Frequency Larger Than Or Equal To A User Specified Minimum Support) Is Not Trivial Because Of Its Combinatorial Explosion. Once Frequent Item Sets Are Obtained, It Is Straightforward To Generate Association Rules With Confidence Larger Than Or Equal To A User Specified Minimum Confidence. The Paper Illustrating Apriori Algorithm On Simulated Database And Finds The Association Rules On Different Confidence Value.

TEXT CLASSIFICATION AND CLUSTERING- Similarity Measure For Text Classification And Clustering

Measuring The Similarity Between Documents Is An Important Operation In The Text Processing Field. In This Paper, A New Similarity Measure Is Proposed. To Compute The Similarity Between Two Documents With Respect To A Feature, The Proposed Measure Takes The Following Three Cases Into Account: A) The Feature Appears In Both Documents, B) The Feature Appears In Only One Document, And C) The Feature Appears In None Of The Documents. For The First Case, The Similarity Increases As The Difference Between The Two Involved Feature Values Decreases. Furthermore, The Contribution Of The Difference Is Normally Scaled. For The Second Case, A Fixed Value Is Contributed To The Similarity. For The Last Case, The Feature Has No Contribution To The Similarity. The Proposed Measure Is Extended To Gauge The Similarity Between Two Sets Of Documents. The Effectiveness Of Our Measure Is Evaluated On Several Real-world Data Sets For Text Classification And Clustering Problems. The Results Show That The Performance Obtained By The Proposed Measure Is Better Than That Achieved By Other Measures.

ASSOCIATION RULE MINING - An Efficient Multi-Party Communication Scheme With Association Rule Mining

A Protocol For Secure Mining Of Association Rules In Horizontally Distributed Databases. Our Protocol, Like Theirs, Is Based On The Fast Distributed Mining (FDM) Algorithm Which Is An Unsecured Distributed Version Of The Apriori Algorithm. The Main Ingredients In Our Protocol Are Two Novel Secure Multi-party Algorithms One That Computes The Union Of Private Subsets That Each Of The Interacting Players Hold, And Another That Tests The Inclusion Of An Element Held By One Player In A Subset Held By Another. Our Protocol Offers Enhanced Privacy With Respect To The Protocol. In Addition, It Is Simpler And Is Significantly More Efficient In Terms Of Communication Rounds, Communication Cost And Computational Cost.

RRW- A Robust And Reversible Watermarking Technique For Relational Data

Advancement In Information Technology Is Playing An Increasing Role In The Use Of Information Systems Comprising Relational Databases. These Databases Are Used Effectively In Collaborative Environments For Information Extraction; Consequently, They Are Vulnerable To Security Threats Concerning Ownership Rights And Data Tampering. Watermarking Is Advocated To Enforce Ownership Rights Over Shared Relational Data And For Providing A Means For Tackling Data Tampering. When Ownership Rights Are Enforced Using Watermarking, The Underlying Data Undergoes Certain Modifications; As A Result Of Which, The Data Quality Gets Compromised. Reversible Watermarking Is Employed To Ensure Data Quality Along-with Data Recovery. However, Such Techniques Are Usually Not Robust Against Malicious Attacks And Do Not Provide Any Mechanism To Selectively Watermark A Particular Attribute By Taking Into Account Its Role In Knowledge Discovery. Therefore, Reversible Watermarking Is Required That Ensures; (i) Watermark Encoding And Decoding By Accounting For The Role Of All The Features In Knowledge Discovery; And, (ii) Original Data Recovery In The Presence Of Active Malicious Attacks. In This Paper, A Robust And Semi-blind Reversible Watermarking (RRW) Technique For Numerical Relational Data Has Been Proposed That Addresses The Above Objectives. Experimental Studies Prove The Effectiveness Of RRW Against Malicious Attacks And Show That The Proposed Technique Outperforms Existing Ones.

PRIVACY PRESERVING SOCIAL MEDIA DATA PUBLISHING- Privacy Preserving Social Media Data Publishing For Personalized Rank Based Recommendation

Personalized Recommendation Is Crucial To Help Users Find Pertinent Information. It Often Relies On A Large Collection Of User Data, In Particular Users' Online Activity (e.g., Tagging/rating/checking-in) On Social Media, To Mine User Preference. However, Releasing Such User Activity Data Makes Users Vulnerable To Inference Attacks, As Private Data (e.g., Gender) Can Often Be Inferred From The Users' Activity Data. In This Paper, We Proposed PrivRank, A Customizable And Continuous Privacy-preserving Social Media Data Publishing Framework Protecting Users Against Inference Attacks While Enabling Personalized Ranking-based Recommendations. Its Key Idea Is To Continuously Obfuscate User Activity Data Such That The Privacy Leakage Of User-specified Private Data Is Minimized Under A Given Data Distortion Budget, Which Bounds The Ranking Loss Incurred From The Data Obfuscation Process In Order To Preserve The Utility Of The Data For Enabling Recommendations. An Empirical Evaluation On Both Synthetic And Real-world Datasets Shows That Our Framework Can Efficiently Provide Effective And Continuous Protection Of User-specified Private Data, While Still Preserving The Utility Of The Obfuscated Data For Personalized Ranking-based Recommendation. Compared To State-of-the-art Approaches, PrivRank Achieves Both A Better Privacy Protection And A Higher Utility In All The Ranking-based Recommendation Use Cases We Tested

ACTIVE LEARNING FOR RANKING - Active Learning For Ranking Through Expected Loss Optimization

Learning To Rank Arises In Many Data Mining Applications, Ranging From Web Search Engine, Online Advertising To Recommendation System. In Learning To Rank, The Performance Of A Ranking Model Is Strongly Affected By The Number Of Labeled Examples In The Training Set; On The Other Hand, Obtaining Labeled Examples For Training Data Is Very Expensive And Time-consuming. This Presents A Great Need For The Active Learning Approaches To Select Most Informative Examples For Ranking Learning; However, In The Literature There Is Still Very Limited Work To Address Active Learning For Ranking. In This Paper, We Propose A General Active Learning Framework, Expected Loss Optimization (ELO), For Ranking. The ELO Framework Is Applicable To A Wide Range Of Ranking Functions. Under This Framework, We Derive A Novel Algorithm, Expected Discounted Cumulative Gain (DCG) Loss Optimization (ELO-DCG), To Select Most Informative Examples. Then, We Investigate Both Query And Document Level Active Learning For Raking And Propose A Two-stage ELO-DCG Algorithm Which Incorporate Both Query And Document Selection Into Active Learning. Furthermore, We Show That It Is Flexible For The Algorithm To Deal With The Skewed Grade Distribution Problem With The Modification Of The Loss Function. Extensive Experiments On Real-world Web Search Data Sets Have Demonstrated Great Potential And Effectiveness Of The Proposed Framework And Algorithms.

REPRESENTATIVE PATTERN SETS- A Flexible Approach To Finding Representative Pattern Sets

Frequent Pattern Mining Often Produces An Enormous Number Of Frequent Patterns, Which Imposes A Great Challenge On Visualizing, Understanding And Further Analysis Of The Generated Patterns. This Calls For Finding A Small Number Of Representative Patterns To Best Approximate All Other Patterns. In This Paper, We Develop An Algorithm Called MinRPset To Find A Minimum Representative Pattern Set With Error Guarantee. MinRPset Produces The Smallest Solution That We Can Possibly Have In Practice Under The Given Problem Setting, And It Takes A Reasonable Amount Of Time To Finish When The Number Of Frequent Closed Patterns Is Below One Million. MinRPset Is Very Space-consuming And Time-consuming On Some Dense Datasets When The Number Of Frequent Closed Patterns Is Large. To Solve This Problem, We Propose Another Algorithm Called FlexRPset, Which Provides One Extra Parameter K To Allow Users To Make A Trade-off Between Result Size And Efficiency. We Adopt An Incremental Approach To Let The Users Make The Trade-off Conveniently. Our Experiment Results Show That MinRPset And FlexRPset Produce Fewer Representative Patterns Than RPlocal-an Efficient Algorithm That Is Developed For Solving The Same Problem.