Best Project Center | Best project center in chennai, best project center in t.nagar, best project center in tnagar, Best final year project center, project center in Chennai, project center near me, best project center in mambalam, best project center in vadapalani, best project center in ashok nagar, best project center in Annanagar, best project center

Search Projects Here

26 Results Found

PRIVACY PRESERVING RECORD LINKAGE- Secure Approximate String Matching For Privacy-preserving Record Linkage

Real-world Applications Of Record Linkage Often Require Matching To Be Robust In Spite Of Small Variations In String Fields. For Example, Two Health Care Providers Should Be Able To Detect A Patient In Common, Even If One Record Contains A Typo Or Transcription Error. In The Privacy-preserving Setting, However, The Problem Of Approximate String Matching Has Been Cast As A Trade-off Between Security And Practicality, And The Literature Has Mainly Focused On Bloom Filter Encodings, An Approach Which Can Leak Significant Information About The Underlying Records. We Present A Novel Public-key Construction For Secure Two-party Evaluation Of Threshold Functions In Restricted Domains Based On Embeddings Found In The Message Spaces Of Additively Homomorphic Encryption Schemes. We Use This To Construct An Efficient Two-party Protocol For Privately Computing The Threshold Dice Coefficient. Relative To The Approach Of Bloom Filter Encodings, Our Proposal Offers Formal Security Guarantees And Greater Matching Accuracy. We Implement The Protocol And Demonstrate The Feasibility Of This Approach In Linking Medium-sized Patient Databases With Tens Of Thousands Of Records.

CYBER CIRME TECHINQUES USING DATA MINING- A Data Mining Framework To Predict Cyber Attack For Cyber Security

Cyber-attacks Are Exponentially Increasing Daily With The Advancements Of Technology. Therefore, The Detection And Prediction Of Cyber-attacks Are Very Important For Every Organization That Is Dealing With Sensitive Data For Business Purposes. In This Paper, We Present A Framework On Cyber Security Using A Data Mining Technique To Predict Cyber-attacks That Can Be Helpful To Take Proper Interventions To Reduce The Cyber-attacks. The Two Main Components Of The Framework Are The Detection And Prediction Of Cyber-attacks. The Framework First Extracts The Patterns Related To Cyber-attacks From Historical Data Using A J48 Decision Tree Algorithm And Then Builds A Prediction Model To Predict The Future Cyber-attacks. We Then Apply The Framework On Publicly Available Cyber Security Datasets Provided By The Canadian Institute Of Cybersecurity. In The Datasets, Several Kinds Of Cyber-attacks Are Presented Including DDoS, Port Scan, Bot, Brute Force, SQL Injection, And Heartbleed. The Proposed Framework Correctly Detects The Cyber-attacks And Provides The Patterns Related To Cyber-attacks. The Overall Accuracy Of The Proposed Prediction Model To Detect Cyber-attacks Is Around 99%. The Extracted Patterns Of The Prediction Model On Historical Data Can Be Applied To Predict Any Future Cyber-attacks. The Experimental Results Of The Prediction Model Indicate The Superiority Of The Model To Detect Any Future Cyber-attacks.

FRAUD DETECTION FROM MASSIVE USER BEHAVIORS- Fraud Detection In Dynamic Interaction Network

Fraud Detection From Massive User Behaviors Is Often Regarded As Trying To Find A Needle In A Haystack. In This Paper, We Suggest Abnormal Behavioral Patterns Can Be Better Revealed If Both Sequential And Interaction Behaviors Of Users Can Be Modeled Simultaneously, Which However Has Rarely Been Addressed In Prior Work. Along This Line, We Propose A COllective Sequence And INteraction (COSIN) Model, In Which The Behavioral Sequences And Interactions Between Source And Target Users In A Dynamic Interaction Network Are Modeled Uniformly In A Probabilistic Graphical Model. More Specifically, The Sequential Schema Is Modeled With A Hierarchical Hidden Markov Model, And Meanwhile It Is Shifted To The Interaction Schema To Generate The Interaction Counts Through Poisson Factorization. A Hybrid Gibbs-Variational Algorithm Is Then Proposed For Efficient Parameter Estimation Of The COSIN Model. We Conduct Extensive Experiments On Both Synthetic And Real-world Telecom Datasets In Different Scales, And The Results Show That The Proposed Model Outperforms Some Competitive Baseline Methods And Is Scalable. A Case Is Further Presented To Show The Precious Explainability Of The Model.

NOVAL BIOMETRIC FOR MEDICAL IMAGES A Novel Biometric Inspired Robust Security Framework For Medical Images

The Protection Of Sensitive And Confidential Data Become A Challenging Task In The Present Scenario As More And More Digital Data Is Stored And Transmitted Between The End Users. The Privacy Is Vitally Necessary In Case Of Medical Data, Which Contains The Important Information Of The Patients. In This Article, A Novel Biometric Inspired Medical Encryption Technique Is Proposed Based On Newly Introduced Parameterized All Phase Orthogonal Transformation (PR-APBST), Singular Value, And QR Decomposition. The Proposed Technique Utilizes The Biometrics Of The Patient/owner To Generate A Key Management System To Obtain The Parameters Involved In The Proposed Technique. The Medical Image Is Then Encrypted Employing PR-APBST, QR And Singular Value Decomposition And Is Ready For Secure Transmission Or Storage. Finally, A Reliable Decryption Process Is Employed To Reconstruct The Original Medical Image From The Encrypted Image. The Validity And Feasibility Of The Proposed Framework Have Been Demonstrated Using An Extensive Experiments On Various Medical Images And Security Analysis.

SOCIAL CONTEXTUAL IMAGE RECOMMENDATION A Hierarchical Attention Model For Social Contextual Image Recommendation

Image Based Social Networks Are Among The Most Popular Social Networking Services In Recent Years. With A Tremendous Amount Of Images Uploaded Everyday, Understanding Users' Preferences On User-generated Images And Making Recommendations Have Become An Urgent Need. In Fact, Many Hybrid Models Have Been Proposed To Fuse Various Kinds Of Side Information (e.g., Image Visual Representation, Social Network) And User-item Historical Behavior For Enhancing Recommendation Performance. However, Due To The Unique Characteristics Of The User Generated Images In Social Image Platforms, The Previous Studies Failed To Capture The Complex Aspects That Influence Users' Preferences In A Unified Framework. Moreover, Most Of These Hybrid Models Relied On Predefined Weights In Combining Different Kinds Of Information, Which Usually Resulted In Sub-optimal Recommendation Performance. To This End, In This Paper, We Develop A Hierarchical Attention Model For Social Contextual Image Recommendation. In Addition To Basic Latent User Interest Modeling In The Popular Matrix Factorization Based Recommendation, We Identify Three Key Aspects (i.e., Upload History, Social Influence, And Owner Admiration) That Affect Each User's Latent Preferences, Where Each Aspect Summarizes A Contextual Factor From The Complex Relationships Between Users And Images. After That, We Design A Hierarchical Attention Network That Naturally Mirrors The Hierarchical Relationship (elements In Each Aspects Level, And The Aspect Level) Of Users' Latent Interests With The Identified Key Aspects. Specifically, By Taking Embeddings From State-of-the-art Deep Learning Models That Are Tailored For Each Kind Of Data, The Hierarchical Attention Network Could Learn To Attend Differently To More Or Less Content. Finally, Extensive Experimental Results On Real-world Datasets Clearly Show The Superiority Of Our Proposed Model

DATA ANALYSIS IN COVID-19 Data Analysis In Covid-19 In Various Categories

The COVID-19 Epidemic Has Caused A Large Number Of Human Losses And Havoc In The Economic, Social, Societal, And Health Systems Around The World. Controlling Such Epidemic Requires Understanding Its Characteristics And Behavior, Which Can Be Identified By Collecting And Analyzing The Related Big Data. Big Data Analytics Tools Play A Vital Role In Building Knowledge Required In Making Decisions And Precautionary Measures. However, Due To The Vast Amount Of Data Available On COVID-19 From Various Sources, There Is A Need To Review The Roles Of Big Data Analysis In Controlling The Spread Of COVID-19, Presenting The Main Challenges And Directions Of COVID-19 Data Analysis, As Well As Providing A Framework On The Related Existing Applications And Studies To Facilitate Future Research On COVID-19 Analysis. Therefore, In This Paper, We Conduct A Literature Review To Highlight The Contributions Of Several Studies In The Domain Of COVID-19-based Big Data Analysis. The Study Presents As A Taxonomy Several Applications Used To Manage And Control The Pandemic. Moreover, This Study Discusses Several Challenges Encountered When Analyzing COVID-19 Data. The Findings Of This Paper Suggest Valuable Future Directions To Be Considered For Further Research And Applications.

FAKE REVIEW REMOVAL- Fake Product Review Monitoring And Removal For Genuine Online Product Reviews Using Opinion Mining

As Most Of The People Require Review About A Product Before Spending Their Money On The Product. So People Come Across Various Reviews In The Website But These Reviews Are Genuine Or Fake Is Not Identified By The User. In Some Review Websites Some Good Reviews Are Added By The Product Company People Itself In Order To Make In Order To Produce False Positive Product Reviews. They Give Good Reviews For Many Different Products Manufactured By Their Own Firm. User Will Not Be Able To Find Out Whether The Review Is Genuine Or Fake. To Find Out Fake Review In The Website This “Fake Product Review Monitoring And Removal For Genuine Online Product Reviews Using Opinion Mining” System Is Introduced. This System Will Find Out Fake Reviews Made By Posting Fake Comments About A Product By Identifying The IP Address Along With Review Posting Patterns. User Will Login To The System Using His User Id And Password And Will View Various Products And Will Give Review About The Product. To Find Out The Review Is Fake Or Genuine, System Will Find Out The IP Address Of The User If The System Observe Fake Review Send By The Same IP Address Many A Times It Will Inform The Admin To Remove That Review From The System. This System Uses Data Mining Methodology. This System Helps The User To Find Out Correct Review Of The Product.

THYROID DISEASE USING DATA MINING- Prediction For Thyroid Disease Using Data Mining Technique

Classification Based Data Mining Plays Important Role In Various Healthcare Services. In Healthcare Field, The Important And Challenging Task Is To Diagnose Health Conditions And Proper Treatment Of Disease At The Early Stage. There Are Various Diseases That Can Be Diagnosed Early And Can Be Treated At The Early Stage. As For Example, Thyroid Diseases. The Traditional Ways Of Diagnosing Thyroid Diseases Depends On Clinical Examination And Many Blood Tests. The Main Task Is To Detect Disease Diagnosis At The Early Stages With Higher Accuracy. Data Mining Techniques Plays An Important Role In Healthcare Field For Making Decision, Disease Diagnosis And Providing Better Treatment For The Patients At Low Cost. Thyroid Disease Classification Is An Important Task. The Purpose Of This Study Is Predication Of Thyroid Disease Using Different Classification Techniques And Also To Find The TSH, T3,T4 Correlation Towards Hyperthyroidism And Hyporthyroidism And Also To Finding The TSH, T3,T4 Correlation With Gender Towards Hyperthyroidism And Hyporthyroidism.

HEALTH EXAMINATION RECORDS- Mining Health Examination Records - A Graph Based Approach

General Health Examination Is An Integral Part Of Healthcare In Many Countries. Identifying The Participants At Risk Is Important For Early Warning And Preventive Intervention. The Fundamental Challenge Of Learning A Classification Model For Risk Prediction Lies In The Unlabeled Data That Constitutes The Majority Of The Collected Dataset. Particularly, The Unlabeled Data Describes The Participants In Health Examinations Whose Health Conditions Can Vary Greatly From Healthy To Very-ill. There Is No Ground Truth For Differentiating Their States Of Health. In This Paper, We Propose A Graph-based, Semi-supervised Learning Algorithm Called SHG-Health (Semi-supervised Heterogeneous Graph On Health) For Risk Predictions To Classify A Progressively Developing Situation With The Majority Of The Data Unlabeled. An Efficient Iterative Algorithm Is Designed And The Proof Of Convergence Is Given. Extensive Experiments Based On Both Real Health Examination Datasets And Synthetic Datasets Are Performed To Show The Effectiveness And Efficiency Of Our Method.

RANWAR RANKING1-Ranwar Ranking Based Weighted Association Rule Mining From Gene Expression And Methylation Data

Ranking Of Association Rules Is Currently An Interesting Topic In Data Mining And Bioinformatics. The Huge Number Of Evolved Rules Of Items (or, Genes) By Association Rule Mining (ARM) Algorithms Makes Confusion To The Decision Maker. In This Article, We Propose A Weighted Rule-mining Technique (say, $RANWAR$ Or Rank-based Weighted Association Rule-mining) To Rank The Rules Using Two Novel Rule-interestingness Measures, Viz., Rank-based Weighted Condensed Support $(wcs)$ And Weighted Condensed Confidence $(wcc)$ Measures To Bypass The Problem. These Measures Are Basically Depended On The Rank Of Items (genes). Using The Rank, We Assign Weight To Each Item. $RANWAR$ Generates Much Less Number Of Frequent Itemsets Than The State-of-the-art Association Rule Mining Algorithms. Thus, It Saves Time Of Execution Of The Algorithm. We Run $RANWAR$ On Gene Expression And Methylation Datasets. The Genes Of The Top Rules Are Biologically Validated By Gene Ontologies (GOs) And KEGG Pathway Analyses. Many Top Ranked Rules Extracted From $RANWAR$ That Hold Poor Ranks In Traditional Apriori, Are Highly Biologically Significant To The Related Diseases. Finally, The Top Rules Evolved From $RANWAR$ , That Are Not In Apriori, Are Reported.

OUTLIER DETEECTION- An Efficient Approach For Outlier Detection With Imperfect Data Labels

The Task Of Outlier Detection Is To Identify Data Objects That Are Markedly Different From Or Inconsistent With The Normal Set Of Data. Most Existing Solutions Typically Build A Model Using The Normal Data And Identify Outliers That Do Not Fit The Represented Model Very Well. However, In Addition To Normal Data, There Also Exist Limited Negative Examples Or Outliers In Many Applications, And Data May Be Corrupted Such That The Outlier Detection Data Is Imperfectly Labeled. These Make Outlier Detection Far More Difficult Than The Traditional Ones. This Paper Presents A Novel Outlier Detection Approach To Address Data With Imperfect Labels And Incorporate Limited Abnormal Examples Into Learning. To Deal With Data With Imperfect Labels, We Introduce Likelihood Values For Each Input Data Which Denote The Degree Of Membership Of An Example Toward The Normal And Abnormal Classes Respectively. Our Proposed Approach Works In Two Steps. In The First Step, We Generate A Pseudo Training Dataset By Computing Likelihood Values Of Each Example Based On Its Local Behavior. We Present Kernel \(k\) -means Clustering Method And Kernel LOF-based Method To Compute The Likelihood Values. In The Second Step, We Incorporate The Generated Likelihood Values And Limited Abnormal Examples Into SVDD-based Learning Framework To Build A More Accurate Classifier For Global Outlier Detection. By Integrating Local And Global Outlier Detection, Our Proposed Method Explicitly Handles Data With Imperfect Labels And Enhances The Performance Of Outlier Detection. Extensive Experiments On Real Life Datasets Have Demonstrated That Our Proposed Approaches Can Achieve A Better Tradeoff Between Detection Rate And False Alarm Rate As Compared To State-of-the-art Outlier Detection Approaches.

PERDICTION DIFFICULT KEYWORD- Efficient Prediction Of Difficult Keyword Queries Over Database

Keyword Queries On Databases Provide Easy Access To Data, But Often Suffer From Low Ranking Quality, I.e., Low Precision And/or Recall, As Shown In Recent Benchmarks. It Would Be Useful To Identify Queries That Are Likely To Have Low Ranking Quality To Improve The User Satisfaction. For Instance, The System May Suggest To The User Alternative Queries For Such Hard Queries. In This Paper, We Analyze The Characteristics Of Hard Queries And Propose A Novel Framework To Measure The Degree Of Difficulty For A Keyword Query Over A Database, Considering Both The Structure And The Content Of The Database And The Query Results. We Evaluate Our Query Difficulty Prediction Model Against Two Effectiveness Benchmarks For Popular Keyword Search Ranking Methods. Our Empirical Results Show That Our Model Predicts The Hard Queries With High Accuracy. Further, We Present A Suite Of Optimizations To Minimize The Incurred Time Overhead.

MEDICAL DISEASE TREATMENT SYSTEM- Automatic Medical Disease Treatment System For Data Mining

In Our Proposed System Is Identifying Reliable Information In The Medical Domain Stand As Building Blocks For A Healthcare System That Is Up-to-date With The Latest Discoveries. By Using The Tools Such As NLP, ML Techniques. In This Research, Focus On Diseases And Treatment Information, And The Relation That Exists Between These Two Entities. The Main Goal Of This Research Is To Identify The Disease Name With The Symptoms Specified And Extract The Sentence From The Article And Get The Relation That Exists Between Disease-Treatment And Classify The Information Into Cure, Prevent, Side Effect To The User.This Electronic Document Is A “live” Template. The Various Components Of Your Paper [title, Text, Heads, Etc.] Are Already Defined On The Style Sheet, As Illustrated By The Portions Given In This Document.

FAST CLUSTERING - A Fast Clustering - Based Feature Subset Selection Algorithm For High Dimensional Data

Feature Selection Involves Identifying A Subset Of The Most Useful Features That Produces Compatible Results As The Original Entire Set Of Features. A Feature Selection Algorithm May Be Evaluated From Both The Efficiency And Effectiveness Points Of View. While The Efficiency Concerns The Time Required To Find A Subset Of Features, The Effectiveness Is Related To The Quality Of The Subset Of Features. Based On These Criteria, A Fast Clustering-based Feature Selection Algorithm (FAST) Is Proposed And Experimentally Evaluated In This Paper. The FAST Algorithm Works In Two Steps. In The First Step, Features Are Divided Into Clusters By Using Graph-theoretic Clustering Methods. In The Second Step, The Most Representative Feature That Is Strongly Related To Target Classes Is Selected From Each Cluster To Form A Subset Of Features. Features In Different Clusters Are Relatively Independent, The Clustering-based Strategy Of FAST Has A High Probability Of Producing A Subset Of Useful And Independent Features. To Ensure The Efficiency Of FAST, We Adopt The Efficient Minimum-spanning Tree (MST) Clustering Method. The Efficiency And Effectiveness Of The FAST Algorithm Are Evaluated Through An Empirical Study. Extensive Experiments Are Carried Out To Compare FAST And Several Representative Feature Selection Algorithms, Namely, FCBF, ReliefF, CFS, Consist, And FOCUS-SF, With Respect To Four Types Of Well-known Classifiers, Namely, The Probability-based Naive Bayes, The Tree-based C4.5, The Instance-based IB1, And The Rule-based RIPPER Before And After Feature Selection. The Results, On 35 Publicly Available Real-world High-dimensional Image, Microarray, And Text Data, Demonstrate That The FAST Not Only Produces Smaller Subsets Of Features But Also Improves The Performances Of The Four Types Of Classifiers.

PATTER CLASSIFIER UNDER ATTACK- Security Evaluation Of Pattern Classifiers Under Attack

Pattern Classification Systems Are Commonly Used In Adversarial Applications, Like Biometric Authentication, Network Intrusion Detection, And Spam Filtering, In Which Data Can Be Purposely Manipulated By Humans To Undermine Their Operation. As This Adversarial Scenario Is Not Taken Into Account By Classical Design Methods, Pattern Classification Systems May Exhibit Vulnerabilities, Whose Exploitation May Severely Affect Their Performance, And Consequently Limit Their Practical Utility. Extending Pattern Classification Theory And Design Methods To Adversarial Settings Is Thus A Novel And Very Relevant Research Direction, Which Has Not Yet Been Pursued In A Systematic Way. In This Paper, We Address One Of The Main Open Issues: Evaluating At Design Phase The Security Of Pattern Classifiers, Namely, The Performance Degradation Under Potential Attacks They May Incur During Operation. We Propose A Framework For Empirical Evaluation Of Classifier Security That Formalizes And Generalizes The Main Ideas Proposed In The Literature, And Give Examples Of Its Use In Three Real Applications. Reported Results Show That Security Evaluation Can Provide A More Complete Understanding Of The Classifier’s Behavior In Adversarial Environments, And Lead To Better Design Choices.

CREDIT CARD DETECTION- Credit Card Fraud Detection Using Hidden Markov Model

Since Past Few Years There Is Tremendous Advancement In Electronic Commerce Technology, And The Use Of Credit Cards Has Dramatically Increased. As Credit Card Becomes The Most Popular Mode Of Payment For Both Online As Well As Regular Purchase, Cases Of Fraud Associated With It Are Also Rising. In This Paper We Present The Necessary Theory To Detect Fraud In Credit Card Transaction Processing Using A Hidden Markov Model (HMM). An HMM Is Initially Trained With The Normal Behavior Of A Cardholder. If An Incoming Credit Card Transaction Is Not Accepted By The Trained HMM With Sufficiently High Probability, It Is Considered To Be Fraudulent. At The Same Time, We Try To Ensure That Genuine Transactions Are Not Rejected By Using An Enhancement To It(Hybrid Model).In Further Sections We Compare Different Methods For Fraud Detection And Prove That Why HMM Is More Preferred Method Than Other Methods.

PRIVACY PRESERVING DATA PUBLISHING -Slicing A New Approach To Privacy Preserving Data Publishing

Several Anonymization Techniques, Such As Generalization And Bucketization, Have Been Designed For Privacy Preserving Microdata Publishing. Recent Work Has Shown That Generalization Loses Considerable Amount Of Information, Especially For High-dimensional Data. Bucketization, On The Other Hand, Does Not Prevent Membership Disclosure And Does Not Apply For Data That Do Not Have A Clear Separation Between Quasi-identifying Attributes And Sensitive Attributes. In This Paper, We Present A Novel Technique Called Slicing, Which Partitions The Data Both Horizontally And Vertically. We Show That Slicing Preserves Better Data Utility Than Generalization And Can Be Used For Membership Disclosure Protection. Another Important Advantage Of Slicing Is That It Can Handle High-dimensional Data. We Show How Slicing Can Be Used For Attribute Disclosure Protection And Develop An Efficient Algorithm For Computing The Sliced Data That Obey The ℓ-diversity Requirement. Our Workload Experiments Confirm That Slicing Preserves Better Utility Than Generalization And Is More Effective Than Bucketization In Workloads Involving The Sensitive Attribute. Our Experiments Also Demonstrate That Slicing Can Be Used To Prevent Membership Disclosure.

REVIEW RATING PREDICTION - Rating Prediction Based On Social Sentiment From Textual Reviews

In Recent Years, We Have Witnessed A Flourish Of Review Websites. It Presents A Great Opportunity To Share Our Viewpoints For Various Products We Purchase. However, We Face An Information Overloading Problem. How To Mine Valuable Information From Reviews To Understand A User's Preferences And Make An Accurate Recommendation Is Crucial. Traditional Recommender Systems (RS) Consider Some Factors, Such As User's Purchase Records, Product Category, And Geographic Location. In This Work, We Propose A Sentiment-based Rating Prediction Method (RPS) To Improve Prediction Accuracy In Recommender Systems. Firstly, We Propose A Social User Sentimental Measurement Approach And Calculate Each User's Sentiment On Items/products. Secondly, We Not Only Consider A User's Own Sentimental Attributes But Also Take Interpersonal Sentimental Influence Into Consideration. Then, We Consider Product Reputation, Which Can Be Inferred By The Sentimental Distributions Of A User Set That Reflect Customers' Comprehensive Evaluation. At Last, We Fuse Three Factors-user Sentiment Similarity, Interpersonal Sentimental Influence, And Item's Reputation Similarity-into Our Recommender System To Make An Accurate Rating Prediction. We Conduct A Performance Evaluation Of The Three Sentimental Factors On A Real-world Dataset Collected From Yelp. Our Experimental Results Show The Sentiment Can Well Characterize User Preferences, Which Helps To Improve The Recommendation Performance

PRIVACY PRESERVING OUTSOURCED- Privacy-Preserving Outsourced Association Rule Mining On Vertically Partitioned Databases

Association Rule Mining And Frequent Itemset Mining Are Two Popular And Widely Studied Data Analysis Techniques For A Range Of Applications. In This Paper, We Focus On Privacy-preserving Mining On Vertically Partitioned Databases. In Such A Scenario, Data Owners Wish To Learn The Association Rules Or Frequent Itemsets From A Collective Data Set And Disclose As Little Information About Their (sensitive) Raw Data As Possible To Other Data Owners And Third Parties. To Ensure Data Privacy, We Design An Efficient Homomorphic Encryption Scheme And A Secure Comparison Scheme. We Then Propose A Cloud-aided Frequent Itemset Mining Solution, Which Is Used To Build An Association Rule Mining Solution. Our Solutions Are Designed For Outsourced Databases That Allow Multiple Data Owners To Efficiently Share Their Data Securely Without Compromising On Data Privacy. Our Solutions Leak Less Information About The Raw Data Than Most Existing Solutions. In Comparison To The Only Known Solution Achieving A Similar Privacy Level As Our Proposed Solutions, The Performance Of Our Proposed Solutions Is Three To Five Orders Of Magnitude Higher. Based On Our Experiment Findings Using Different Parameters And Data Sets, We Demonstrate That The Run Time In Each Of Our Solutions Is Only One Order Higher Than That In The Best Non-privacy-preserving Data Mining Algorithms. Since Both Data And Computing Work Are Outsourced To The Cloud Servers, The Resource Consumption At The Data Owner End Is Very Low.

HIERARCHAL TENSOR GEOSPATIAL DATA- A Hierarchal Tensor Based Approach To Compressing Updating And Querying Geospatial Data

HIERARCHAL TENSOR GEOSPATIAL DATA- A Hierarchal Tensor Based Approach To Compressing, Updating And Querying Geospatial Data.

MULTI LABEL AND COST SENSITIVE CLASSIFICATION- Generalized K-Label Sets Ensemble For Multi-Label And Cost-Sensitive Classification

Label Powerset (LP) Method Is One Category Of Multi-label Learning Algorithm. This Paper Presents A Basis Expansions Model For Multi-label Classification, Where A Basis Function Is An LP Classifier Trained On A Random K-labelset. The Expansion Coefficients Are Learned To Minimize The Global Error Between The Prediction And The Ground Truth. We Derive An Analytic Solution To Learn The Coefficients Efficiently. We Further Extend This Model To Handle The Cost-sensitive Multi-label Classification Problem, And Apply It In Social Tagging To Handle The Issue Of The Noisy Training Set By Treating The Tag Counts As The Misclassification Costs. We Have Conducted Experiments On Several Benchmark Datasets And Compared Our Method With Other State-of-the-art Multi-label Learning Methods. Experimental Results On Both Multi-label Classification And Cost-sensitive Social Tagging Demonstrate That Our Method Has Better Performance Than Other Methods

FOOD RECOGNITION SYSTEM A Food Recognition System For Diabetic Patients Based On An Optimized Bag Of Features Model

Computer Vision-based Food Recognition Could Be Used To Estimate A Meal's Carbohydrate Content For Diabetic Patients. This Study Proposes A Methodology For Automatic Food Recognition, Based On The Bag-of-features (BoF) Model. An Extensive Technical Investigation Was Conducted For The Identification And Optimization Of The Best Performing Components Involved In The BoF Architecture, As Well As The Estimation Of The Corresponding Parameters. For The Design And Evaluation Of The Prototype System, A Visual Dataset With Nearly 5000 Food Images Was Created And Organized Into 11 Classes. The Optimized System Computes Dense Local Features, Using The Scale-invariant Feature Transform On The HSV Color Space, Builds A Visual Dictionary Of 10000 Visual Words By Using The Hierarchical K-means Clustering And Finally Classifies The Food Images With A Linear Support Vector Machine Classifier. The System Achieved Classification Accuracy Of The Order Of 78%, Thus Proving The Feasibility Of The Proposed Approach In A Very Challenging Image Dataset.

ONLINE IMAGE RETRIEVAL- Mining User Queries With Markov Chains Application To Online Image Retrieval

We Propose A Novel Method For Automatic Annotation, Indexing And Annotation-based Retrieval Of Images. The New Method, That We Call Markovian Semantic Indexing (MSI), Is Presented In The Context Of An Online Image Retrieval System. Assuming Such A System, The Users' Queries Are Used To Construct An Aggregate Markov Chain (AMC) Through Which The Relevance Between The Keywords Seen By The System Is Defined. The Users' Queries Are Also Used To Automatically Annotate The Images. A Stochastic Distance Between Images, Based On Their Annotation And The Keyword Relevance Captured In The AMC, Is Then Introduced. Geometric Interpretations Of The Proposed Distance Are Provided And Its Relation To A Clustering In The Keyword Space Is Investigated. By Means Of A New Measure Of Markovian State Similarity, The Mean First Cross Passage Time (CPT), Optimality Properties Of The Proposed Distance Are Proved. Images Are Modeled As Points In A Vector Space And Their Similarity Is Measured With MSI. The New Method Is Shown To Possess Certain Theoretical Advantages And Also To Achieve Better Precision Versus Recall Results When Compared To Latent Semantic Indexing (LSI) And Probabilistic Latent Semantic Indexing (pLSI) Methods In Annotation-Based Image Retrieval (ABIR) Tasks.

DISTRIBUTED DATABASE- Secure Mining Of Association Rules In Horizontally Distributed Databases

We Propose A Protocol For Secure Mining Of Association Rules In Horizontally Distributed Databases. The Current Leading Protocol Is That Of Kantarcioglu And Clifton . Our Protocol, Like Theirs, Is Based On The Fast Distributed Mining (FDM)algorithm Of Cheung Et Al. , Which Is An Unsecured Distributed Version Of The Apriori Algorithm. The Main Ingredients In Our Protocol Are Two Novel Secure Multi-party Algorithms-one That Computes The Union Of Private Subsets That Each Of The Interacting Players Hold, And Another That Tests The Inclusion Of An Element Held By One Player In A Subset Held By Another. Our Protocol Offers Enhanced Privacy With Respect To The Protocol In . In Addition, It Is Simpler And Is Significantly More Efficient In Terms Of Communication Rounds, Communication Cost And Computational Cost.

DOMAIN KNOWLEDGE- Web-Page Recommendation Based On Web Usage And Domain Knowledge

Web-page Recommendation Plays An Important Role In Intelligent Web Systems. Useful Knowledge Discovery From Web Usage Data And Satisfactory Knowledge Representation For Effective Web-page Recommendations Are Crucial And Challenging. This Paper Proposes A Novel Method To Efficiently Provide Better Web-page Recommendation Through Semantic-enhancement By Integrating The Domain And Web Usage Knowledge Of A Website. Two New Models Are Proposed To Represent The Domain Knowledge. The First Model Uses An Ontology To Represent The Domain Knowledge. The Second Model Uses One Automatically Generated Semantic Network To Represent Domain Terms, Web-pages, And The Relations Between Them. Another New Model, The Conceptual Prediction Model, Is Proposed To Automatically Generate A Semantic Network Of The Semantic Web Usage Knowledge, Which Is The Integration Of Domain Knowledge And Web Usage Knowledge. A Number Of Effective Queries Have Been Developed To Query About These Knowledge Bases. Based On These Queries, A Set Of Recommendation Strategies Have Been Proposed To Generate Web-page Candidates. The Recommendation Results Have Been Compared With The Results Obtained From An Advanced Existing Web Usage Mining (WUM) Method. The Experimental Results Demonstrate That The Proposed Method Produces Significantly Higher Performance Than The WUM Method.

PATTERN IDENTIFICATION - Visualization And Pattern Identification In Large Scale Time Series Data

Visualization Of Massively Large Datasets Presents Two Significant Problems. First, The Dataset Must Be Prepared For Visualization, And Traditional Dataset Manipulation Methods Fail Due To Lack Of Temporary Storage Or Memory. The Second Problem Is The Presentation Of The Data In The Visual Media, Particularly Real-time Visualization Of Streaming Time Series Data. An Ongoing Research Project Addresses Both These Problems, Using Data From Two National Repositories. This Work Is Presented Here, With The Results Of The Current Effort Summarized And Future Plans, Including 3D Visualization, Outlined.