Commit d75d781a authored by wangys_biolab's avatar wangys_biolab

Initial commit

parents
SSA_embed.model filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
## prPred-DRLF
prPred-DRLF is a tool to identify the plant resistance proteins (R proteins) based on deep representation learning features
prPred-DRLF is an open-source Python-based toolkit, which operates depending on the Python environment (##Python Version 3.7).
### **If your computer has GPU,it will be faster. **
### **Download**
git clone git@github.com:Wangys-prog/prPred-DRLF.git
### **Install dependencies**
pip3 install -r requirements.txt
or
pip3 install joblib==1.0.1
pip3 install tape_proteins==0.5
pip3 install numpy==1.19.2
pip3 install pandas==1.2.0
pip3 install Bio==0.4.1
pip3 install lightgbm-2.3.0
#### For python3.7
#### If you have GPU # CUDA 9.2
pip install torch==1.2.0 torchvision==0.4.0
#### If you have GPU CUDA 10.0
pip install torch==1.2.0+cu92 torchvision==0.4.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
#### CPU Only python3.7
pip install torch==1.2.0+cpu torchvision==0.4.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
### Input parameters
python3 prPred-DRLF.py -h
$ usage: Script for predicting plant R protein using deep representation learning features
$ [-h] [-i I] [-o O]
$ optional arguments:
$ -h, --help show this help message and exit
$ -i I input sequences in Fasta format
$ -o O path to saved CSV file
python3 prPred-DRLF.py -i ./dataset/test_data.fasta -o ./dataset/predict_result
### Webserver
http://lab.malab.cn/soft/prPred-DRLF
### Other tools
http://lab.malab.cn/~wys/
### Please cite
(1) Yansu Wang, Lei Xu, Quan Zou, Chen Lin. prPred-DRLF: plant R protein predictor using deep representation learning features. Proteomics. 2021. DOI: 10.1002/pmic.202100161
(2) Yansu Wang, Murong Zhou, Quan Zou, Lei Xu. Machine learning for phytopathology: from the molecular scale towards the network scale. Briefings in Bioinformatics. 2021, Doi:10.1093/bib/bbab037
(3) Wang Y, Wang P, Guo Y, et al. prPred: A Predictor to Identify Plant Resistance Proteins by Incorporating k-Spaced Amino Acid (Group) Pairs. Frontiers in bioengineering and biotechnology, 2021, 8: 1593.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
,Sequence_ID,R protein possibility
0,negative212|0,0.002
1,positive29|1,1.0
,UniRep_F232,BiLSTM_F3253,BiLSTM_F231,UniRep_F1029,UniRep_F1338,UniRep_F747,BiLSTM_F750,BiLSTM_F2662,UniRep_F774,UniRep_F1757,UniRep_F650,UniRep_F1574,BiLSTM_F1420,BiLSTM_F77,UniRep_F662,BiLSTM_F1270,BiLSTM_F1404,BiLSTM_F1775,BiLSTM_F213,UniRep_F57,BiLSTM_F101,UniRep_F1800,UniRep_F1827,UniRep_F682,BiLSTM_F1758,UniRep_F204,UniRep_F861,BiLSTM_F2557,UniRep_F1793,BiLSTM_F655,BiLSTM_F1786,BiLSTM_F853,BiLSTM_F1448,UniRep_F4,BiLSTM_F671,UniRep_F510,BiLSTM_F1116,UniRep_F1754,BiLSTM_F482,UniRep_F212,BiLSTM_F1526,UniRep_F945,UniRep_F1573,BiLSTM_F1912,UniRep_F1241,BiLSTM_F1677,BiLSTM_F74,BiLSTM_F1696,BiLSTM_F524,UniRep_F814,BiLSTM_F1124,BiLSTM_F528,UniRep_F1299,UniRep_F408,UniRep_F1736,BiLSTM_F415,UniRep_F868,BiLSTM_F596,UniRep_F1818,UniRep_F1369,UniRep_F529,UniRep_F472,UniRep_F165,UniRep_F860,UniRep_F1376,BiLSTM_F3362,UniRep_F1214,BiLSTM_F22,UniRep_F1187,UniRep_F1182,UniRep_F1576,UniRep_F156,UniRep_F1095,UniRep_F1082,BiLSTM_F2843,UniRep_F1780,UniRep_F322,UniRep_F1791,BiLSTM_F154,BiLSTM_F252,UniRep_F51,BiLSTM_F3119,UniRep_F1843,UniRep_F660,UniRep_F1459,BiLSTM_F3485,BiLSTM_F2284,BiLSTM_F974,UniRep_F1336,BiLSTM_F1002,BiLSTM_F3578,BiLSTM_F865,BiLSTM_F969,BiLSTM_F3463,BiLSTM_F1066,UniRep_F657,BiLSTM_F1233,BiLSTM_F2256,UniRep_F538,UniRep_F1389,BiLSTM_F1481,BiLSTM_F1997,UniRep_F72,UniRep_F314,UniRep_F288,UniRep_F1625,UniRep_F97,BiLSTM_F1631,UniRep_F237,BiLSTM_F1705,BiLSTM_F1707,BiLSTM_F1740,BiLSTM_F1755,UniRep_F201,UniRep_F1423,UniRep_F183,BiLSTM_F2003,UniRep_F337,BiLSTM_F1424,UniRep_F49,BiLSTM_F1096,BiLSTM_F2179,BiLSTM_F1186,UniRep_F1711,UniRep_F1709,UniRep_F25,UniRep_F32,UniRep_F1581,UniRep_F1615,UniRep_F409,BiLSTM_F2134,BiLSTM_F2117,BiLSTM_F1339,UniRep_F362,BiLSTM_F2097,BiLSTM_F1414,UniRep_F1335,UniRep_F1229,UniRep_F1024,BiLSTM_F75,UniRep_F980,UniRep_F1877,BiLSTM_F2869,UniRep_F1840,BiLSTM_F87,BiLSTM_F525,BiLSTM_F2725,BiLSTM_F336,UniRep_F1065,BiLSTM_F2711,UniRep_F879,UniRep_F1817,UniRep_F889,BiLSTM_F3019,UniRep_F1090,BiLSTM_F3181,UniRep_F902,UniRep_F904,BiLSTM_F370,UniRep_F1832,UniRep_F1093,UniRep_F925,BiLSTM_F110,BiLSTM_F2747,BiLSTM_F2877,BiLSTM_F273,BiLSTM_F2879,BiLSTM_F166,BiLSTM_F794,UniRep_F703,UniRep_F704,BiLSTM_F2,BiLSTM_F3382,UniRep_F749,UniRep_F770,UniRep_F1215,BiLSTM_F2837,BiLSTM_F38,UniRep_F919,UniRep_F790,UniRep_F1539,BiLSTM_F3312,BiLSTM_F696,BiLSTM_F2839,BiLSTM_F677,UniRep_F1478,UniRep_F1149,BiLSTM_F53,BiLSTM_F3114,BiLSTM_F3051,BiLSTM_F3053,UniRep_F62,UniRep_F73,BiLSTM_F3423,UniRep_F92,UniRep_F129,UniRep_F155,UniRep_F152,BiLSTM_F2987,BiLSTM_F2991
negative212|0,-0.034811634570360184,0.004029435571283102,0.20001398026943207,-0.2960071265697479,-0.0024540922604501247,0.006126032676547766,0.0010587710421532393,4.685978638008237e-06,-0.05614127963781357,0.1427001953125,-0.041672732681035995,-0.05256567895412445,0.0012017363915219903,0.01676754467189312,0.008841787464916706,-7.350556552410126e-05,0.011933702044188976,-0.015835225582122803,0.05146706476807594,-0.04539719223976135,0.07343664765357971,-0.006802583113312721,0.17321082949638367,0.15729841589927673,-0.008490246720612049,-0.07197565585374832,0.11062388867139816,0.020558515563607216,0.0015170032856985927,0.0009310527821071446,-0.016358856111764908,-0.00043481437023729086,0.0002815321204252541,-0.008164050057530403,0.0014553946675732732,-0.32523977756500244,-0.001188446767628193,-0.029281049966812134,0.1990051567554474,0.0030880800914019346,0.0009616818861104548,-0.025534940883517265,0.0034448658116161823,-0.0011311048874631524,0.02875017188489437,0.03008027747273445,0.18625584244728088,-0.0007277324330061674,0.0065430221147835255,0.00959217268973589,-0.06674424558877945,0.0062814378179609776,-0.17130865156650543,0.044943418353796005,0.22742120921611786,0.010491505265235901,0.015428616665303707,-0.05097590386867523,0.16133497655391693,-0.08295519649982452,-0.081154964864254,-0.0017777030589058995,-0.007788260001689196,0.0965556874871254,-0.013366347178816795,0.08124931901693344,0.03668227419257164,0.013384564779698849,-0.0794229805469513,-0.0679733008146286,0.0654965341091156,0.0057541728019714355,-0.024419866502285004,-0.36622166633605957,-0.00881290528923273,0.043333500623703,-0.1656431257724762,-0.021231194958090782,0.7599426507949829,0.05204673483967781,0.014051737263798714,-0.0025678942911326885,-0.038811005651950836,0.028863953426480293,0.010074766352772713,0.018057193607091904,0.009611126966774464,-9.985068572859745e-06,-0.29714322090148926,-0.0022403011098504066,0.0007539559155702591,-0.0007802600739523768,0.0012872798833996058,-0.020394984632730484,0.00018372003978583962,-0.01142818108201027,-0.12469439953565598,0.0030504970345646143,0.005137742962688208,0.04964349791407585,-0.063686802983284,-0.008631475269794464,-0.05681881681084633,-0.03665829077363014,-0.024518385529518127,-0.013852118514478207,0.01943495310842991,0.045649390667676926,-0.03180487081408501,-0.00045372816384769976,0.06544964760541916,0.007043574005365372,0.0034767272882163525,0.12061934918165207,0.002497767796739936,0.00977579690515995,0.00952797383069992,-0.3852512240409851,0.0011000894010066986,0.03510650619864464,-0.0007191549520939589,-0.046695832163095474,-0.08960983902215958,-0.04537704586982727,-0.024351483210921288,-0.07213440537452698,-0.02844931185245514,0.010713263414800167,0.14630551636219025,0.0008363188244402409,0.0043046604841947556,6.591193960048258e-05,0.013596250675618649,0.03074769861996174,-0.0036781695671379566,0.052703484892845154,-0.1296735256910324,-0.05928082391619682,-0.11912056058645248,0.2954358160495758,0.02697659283876419,0.16921155154705048,-0.008907645009458065,-0.10611321777105331,0.03804710879921913,0.015096589922904968,-0.009671171195805073,0.008266295306384563,0.032908570021390915,0.0051622698083519936,-0.02861451357603073,0.2160898745059967,-0.06824861466884613,-0.05382305756211281,-0.21336129307746887,0.0024776076897978783,0.025232091546058655,0.017331881448626518,0.039423130452632904,-0.15889021754264832,-0.16561467945575714,-0.02627953328192234,0.01685287244617939,0.04109898954629898,-0.00025445071514695883,0.11988529562950134,0.3718585968017578,0.08955643326044083,0.03052324242889881,-0.10895266383886337,-0.6846140623092651,0.04898446798324585,0.010754983872175217,-0.004667660221457481,0.00487252464517951,-0.005179384723305702,-0.014620037749409676,0.03433913365006447,-0.0711006224155426,0.0013567559653893113,0.006823370233178139,-0.015461510978639126,-0.0019957523327320814,0.0009553214185871184,0.06192902475595474,-0.05836094543337822,0.04120407626032829,0.09122627973556519,0.0026462250389158726,-0.027500014752149582,0.003278030315414071,-0.3406601548194885,0.064644455909729,-0.043833933770656586,-0.2154310941696167,0.035033516585826874,-0.004995195660740137,0.003603357821702957,0.06984248757362366,-0.008638323284685612
positive29|1,0.047021638602018356,0.014518939889967442,0.30660468339920044,-0.0997648760676384,-0.021379224956035614,0.21550607681274414,0.0017186225159093738,0.011030955240130424,-0.03562738373875618,0.12322363257408142,-0.511938750743866,0.0559275783598423,0.0007715504034422338,0.1915474236011505,0.02599368616938591,0.0009804734727367759,7.705090683884919e-05,0.23463262617588043,0.05079019069671631,-0.013588722795248032,0.10005445033311844,-0.12152118980884552,0.21101221442222595,0.11690635234117508,0.0007002240163274109,-0.08805426210165024,0.1521615982055664,0.026477152481675148,0.008704038336873055,0.006376922130584717,-0.01420477032661438,-0.001107917632907629,0.004063126165419817,-0.009208302944898605,9.926391794579104e-05,-0.1387002170085907,-0.0005364833050407469,-0.05399543419480324,0.38372454047203064,0.08831854164600372,0.0015583375934511423,0.0480845645070076,-0.21832117438316345,-0.0013783295871689916,-0.019745778292417526,0.018389929085969925,0.372477263212204,-0.000593259755987674,0.017678584903478622,0.08803900331258774,-0.015332366339862347,0.015322189778089523,-0.029843388125300407,0.04684191197156906,0.06406532227993011,0.154341921210289,-0.033147308975458145,-0.0544540137052536,-0.005841437727212906,-0.026871688663959503,-0.11748846620321274,0.04943282529711723,-0.11073572933673859,0.013253855518996716,0.0037885296624153852,0.03634146228432655,0.0020501711405813694,0.0045037781819701195,-0.06989450752735138,-0.05759497731924057,0.06161860376596451,0.03754503279924393,-0.08671807497739792,-0.3750254511833191,-0.001583936857059598,-0.008361173793673515,-0.25496914982795715,-0.08594770729541779,0.9308525919914246,0.012229636311531067,0.07179277390241623,-0.002392892260104418,-0.17682287096977234,-0.042692698538303375,0.029441799968481064,0.024413809180259705,0.034294478595256805,-0.00017307627422269434,-0.11905552446842194,-0.012487738393247128,0.012764235958456993,-0.0004319600120652467,0.0007568246219307184,0.004885090980678797,-0.0023544190917164087,0.035879358649253845,-0.08125337213277817,0.009058305062353611,0.12687283754348755,0.25194916129112244,-0.1339229792356491,-0.0035269600339233875,-0.13834422826766968,0.058383695781230927,-0.014322595670819283,0.013162012211978436,0.001530149718746543,0.02281617373228073,-0.014277858659625053,0.0003172357683070004,0.0232255682349205,-0.00930514745414257,0.005449910182505846,0.21019788086414337,0.07679086923599243,0.0696309506893158,0.008368267677724361,-0.09474487602710724,0.0053260428830981255,0.18975666165351868,-0.000974863360170275,-0.07034582644701004,-0.028055427595973015,-0.2005874514579773,-0.00982706993818283,-0.2449072301387787,0.019937530159950256,-0.017223266884684563,0.09532631933689117,0.027423355728387833,0.0014937054365873337,-0.015802418813109398,0.01190674863755703,0.08190274238586426,-0.007283797487616539,0.03169621154665947,-0.13005363941192627,-0.10680436342954636,-0.21628670394420624,0.23169612884521484,0.1669270396232605,-0.12550489604473114,-0.011671730317175388,-0.08160116523504257,0.023034395650029182,0.02697242982685566,-0.0029739653691649437,0.014881371520459652,0.017498156055808067,-0.0025019815657287836,-0.040142972022295,-0.10432545095682144,-0.13928277790546417,-0.004976244177669287,-0.041907068341970444,-0.0027595101855695248,0.0168609656393528,0.023636601865291595,0.022314006462693214,-0.011900339275598526,-0.1638420820236206,-0.027477165684103966,0.16686472296714783,0.0824504867196083,-0.004815066233277321,0.1017688736319542,-0.0021219602786004543,0.08912322670221329,0.030671877786517143,-0.14924025535583496,-0.420635461807251,0.06387920677661896,0.013131505809724331,0.01728537678718567,0.1849440187215805,0.037930022925138474,-0.08270659297704697,0.022981420159339905,0.03790099918842316,-0.09256094694137573,-0.01590036414563656,-0.012757503427565098,-0.008787378668785095,0.00020024998229928315,0.0389237254858017,0.00031973194563761353,0.013933354057371616,0.05314409360289574,0.0014059138484299183,-0.009592892602086067,-0.0022570237051695585,-0.11398699134588242,0.047493431717157364,-0.123441681265831,-0.1520315706729889,0.13869796693325043,-0.016330208629369736,0.04511628672480583,-0.9886535406112671,0.02411423623561859
>negative1|0
MFTTRKVGFVIENDRFARSICVNSDMTFYELEAAVAGEFHVDANAWITNLSYWLPKQLSIFSTSKKPPVALNSTVAPKGFLLVKDTKAHLNFCHSIEPSVGGDIQKFDGRELGESFGTAKYIQRELESVRVESQVVQFVTAPTIETNGDVDDGAGKYFSSEVEGVHVELAVVKSVSAPTTETNGEAVDADAGTPSLSTDIDGRCEEDFFANYSEDEVDQRELEALERAEAEKARVKGISLRMKKALKLVIEYAVEWCDGDTRLPGRVMRSSSGLAMG
>negative10|0
MRCLRPGGARRRCQLLDGDTAAFCASLVDGLAQLESTLLREEDDGDGGGGGGGAVSMRWCADAMRLVKRMQRELLVMFKKADVPVGSAVSYGGGGGGGDGGGCWFEHYMQETAALLDFCNAFKSAVSRLHRYCMVVDFAAQVGCAGAGAAAENGGGAGGWWLEEEPGGDDAGAIRHRLSDVRAAVSEAERLGRKIMSSSSGGGGAGDDDAGGMVVVMLVAKITMAVVSMFVLQALTSPIVPLAADVDDGHCTLGRAAAVPVPELQPWRESLSVITDRFPRRPGVAEHERVAMVVKSMMINTKMEGEEETKNGKQEQEDDHVELLRTRSGELREGVEMFDCVLDEVFDEVIKGRNEMLGIFRDKALTLG
>negative11|0
MAMAYVAAAVLAAVTLTASFAGGEACNFVASMTWTAACQQTGRWENLCQQTLQTAPATAEVTVYALVATRLAKLRYENTLGEVDTMLWPRNAPADARAALDNCKYKYSVALGRMDAVSDEIFACDFSHARQEYVDAEVAVRSCQDGLLRLPNQYHSWPLFAKVSADYDLTMVAYLLGAIFLGR
>negative21|0
MLISNGFCSKREKGGKLATWTIFIFLFFFLXTIISXPFSTNMKSKTVCFGIRGRNNKSKENKVEDLEKLSHVKPNHKGKTKRCASTTPVGDDDGGINVHGANIVASNDAGVAAAVITAAHMSLMCVNDGQDGSGHGHGGESGVDGDGG
>negative32|0
MASLVNPLKTDVALRRPSTLDDAIMLAQAYEQRLHLGATDPIPERAVRALRMVLPAPSTASSTPSTDAASTTASGKQPLVAALPRKHLSPVEMAQRRTEGLCYNCHEKYVSGQHCEKSFIIEVISFPDDTEGEDDPPPAANTILADAGKLLISLHALAGIRALMFNTIKVCACVGTVDYLALLDSGSMHNFLSETVAHVRISSSSRAPASVSPSRMATASPRWDAARHSTSSSARPLSTLISTPFCWEGMTSSWAHIGSARSVRRCGTLLTNRFRSAQVTTK
>negative34|0
MDSSREANRKHDAEEKREDEKTSLMEVNLMVVLLSQLSTKKEDENGLEKERRDHVAHEARESRARANKRFMKDTKLMEKRTHVLQPMAKEFRHYRKM
>negative37|0
MVGGKSPIDAFKCQLVCKLFRDSATLDTIYQTMDKKRLRFRPFSADMYEVLRRCRKLNNPYILFNYGMLGEDDAGKQLLQYATDKGQLDAIFVLGMLLMAEGSERKQEALIMLNNAYINTRRCWNLRQTCYKVRSHLDACLVEFAKMFDISLE
>negative42|0
MEAKVVATVLIVLFLTLGGEAAAKICHDHSQTFKGMCFHTSNCIACCTNEGYTGGYCKPFTYRCMCTKDCGGDSPPDDPPPAMPTSPAATTTVA
>negative51|0
MSERRRRDLGGASDGAPFARQRWFPDEGAVTMTPEARAPAVAQGAASGQRLVRRCRPKVCAMVEVPRVRTAATTPSRGRARRDCLQRRRPHEARVAPAVSVVAQGACCKGLWSARRFRRPSITNSSDLVVSVASSDDDLPQSFVSYMRLLGDHILGFFSLIMMYDVSIYATFYNDLYLARY
>negative52|0
MEGLNVGLEEVVRHRIIRGAVIGDGDLNLNLNKSYFYRIRLPFHCVEEMVGITGYDPACIPFSYLGLPFGANMNRGVNWNPVIQSETKREIPWVKRNLVLNFKEKGGLRIGSLKTFNLALRIKWRLCFYKEVDVLWAKVIQNIDGMHDGLGLSSPKLRGGVWSNLVKAFDNWDLL
>negative53|0
MEKGPARHDWFKFGLARHESRAGPCSKFEPVVLPGPTRCCGPCSCWPGTETSRSSVGRHLIRFSRFPRRPPVSDSLLDSPGVGRRQPIHSSSPQARRPAPQARRPDSSGPAATRHAAISPAWRPDSSGPAATRHAAISLGVTPSGQRPRAHAQRPSSGPAATMPCMPAARQAPSGQSISPSAISGSPEAPAQRPSPDLPTATEIQT
>negative54|0
MSLLTNRMAFSLLLMIIFGHPLALLAHSTPPISSRITVVGAVYCDTCLSDGFSKHSYFIPGADVHIQCRFNANVPKTSEMISFSVNRTTDIYGVYKLEIPSVEGVDCVDGPPIQSFCQASLLESSSSACNVPSLKATTTEVSVKSKQENLCVYSFTPLSFRPSKRNDTICRKRGHKSSQASSRSISQAANLQPLSFPFSSPPPFLPFPFPFSSPPPSLPFPFPFSSPPPSLPFPFPFSSPPPSLPFPFPFSSQPPSLPFPFPHLPPFPSIPSFLHSPPPPPPSFDLGDPRTWIPHIPLISPPPPPPPPPPPVFDLRDPKTWTPLIPPLSPPNLQKQNP
>negative55|0
MSFGTRLLLFLILTLPLVTSSSPNTLHVSGIVKTGTTSRFLMMTIEDYDDPSANTRHDPSVPTNAKADTTP
>negative67|0
MAGFHIPGDPYFPNQGNARWIDDEPEQQIEEDPEEDPEENPEEEPKEEDKQEEEEEDEHEGEEEDEEEEEEEDEEEVEEDDDIVMVGNEMEEQPEVFNPPYITRVPAHRFGYNRPEPPWVTTIERWSRQQRQRSPYGNQRGYYDLIHGGPTDRALPVTIHRIASMDDRGRTTTDQVKELSAIVQSTTDRTRDLERDSHHRDQLIQDLLAARAETREYRERYMALEERIVAAERRLAELQGESTSSQAPSKKGRHD
>negative72|0
MASSSTSSSLAAAAMVAVVLLLLGATATSTQAARLLDEELPTAAIPAIPGVPGVPAVGPGIPVIPGVPGVPVVGPGIPFVPVIPGVPVIVPIIPGVPMIAGMTTLPVPPFVPPIDPGAGAGFPGVPPASSTTVQEDPQPPMPSVVPPVP
>negative73|0
MSDHRLQPIPARHVIAAAAAPNPEPPPPPRPAWAQDRPARAHRSPSSSPPRASLPASPSAAASLPASQLPLAAFVPATAGERQVAADRASAPRLLGEPRSPDPRARAPFSLLAPLDASLSPPDMSDHRLQPTPARHVIAATAAPNPEPPQPPRPAWARDGPARAHRSPSRRPSSSPPHAPLPASPSAAPSPSASQPPLAAFVPGHHRRAAGRRRPRLHHAAPGRATLARARSPPPPPIVASRRRPSPCCPTPPSPTPPTSPAPPRAVVAGRPRRRVELAAAPREAPRASAAPLLPRRRRPPPLARTSPALLRRTPTIPI
>negative80|0
MIYFCFLGFGIKSRVMENMNEEIDLGLSLGATNQQRVESCSDSGAGVNADLGSRIDTTNTNTKPFVRPHPLTELVWSTQNGLTIKYTGCSPCFAHTKDDERVIDASFFRSTLPLAHTGRQNRRTADLLPLLSNEPEKLEIKTENSDRFTMDVGLPLKMTQEHTNVKRGEELKEEGSSSAPFLEKMEETAENDVIVKDLENSEKDQKGGENREEGCKEDDDESHKSMESCNSANLSSKKNKGWRFEEQLIIGSKRIKKQSQEYSPVVKQDSSFMNWISTMVKSVKPYQEETPRPFDDDQNKRLGFQTVFQSLYSQDPKRLETKTDIDDKSVDASKEIILFDKTASDHNPYDNLNQKAFGNLWITRLFPKIPSNVNTLVAKDSSTIDASAPCGMNQFEKPVEKSKCFFCGKTGHELRDCLQINENGNVFFKKKVSNINEIASTSGNNKLFNEEKIHRIPKGMFDTIRSLRLSRTDILKWMNSHLPLTNLDGYFLRLRLAKWEEGVGEAGYYVACITDENPSKGSKKPIRVNIGGVECLVESRYISNCDFLEDELIAWWQRTLTSGQVPMEERLTFKLEEREKLGF
>negative88|0
MEGFTNFLQVMETKVDLLARKVDLLTSKMDLCATNVDLLTSKMNLCATNSDLKAMLTSMKGKNVPAKKTLAMAASDSEDSSDSSSSDDIELCFRFLIKIVIQLRPSYFIIIFAYIDLKEGANKALELNGSDMAGKELVVKTALLMRDSYLGYSCNGGIGGRFGGRFGSVGGIFGGGRCGGLGQYYAQV
>negative96|0
MSMFTIHIERGNALRRLLLPIANLSNTAQINCSIPCLSFVVSPRNNFIVVLRIFPAFFTSYQFNPNYAGNYTSFHAKLYLRLLHTNISNMFTDELSVTIYLNRFESTIPVRFYEPRSGYLQFSALLLARARRIPMYPVDMGVFFAIRSDEFLKILTDLRVLSGEYFVIYVSTTKALFKSLRREIAYNTRDDENMMVMGGVPTGGARQFFLICPSPLHFFYYTASISKMVWFFIAAENHTRHFLSCCVNIHASYLVCF
>negative108|0
MAMEICAEAFPLSRFPTLEKSYCADSRDTERSHNTNWITEVHFSNRGSNKLWISEVAKIKKRIRSLNLRKSLGQARFSSSTGCIHEQLLKFYSLNYIHAGLSAESLPFPCLLKKLSGLCFAILLYTRFAGYIRKGTCQLSLCKGFRWSSA
>negative110|0
MQFVIGSVSPFACTSLGGGVIRLKTTTVPVLAFFRSGEKIQDIVWPQTIEIKGRVKWGVFKKFLKDLRHSQNRSVMVISLLLNVESSKNGLEGMKQVANMLKVNKRIGLATTEDGSYIYVCPGHKDIITVLAKYGFWKGASTIDRNQDSLIGFVVRDRKPPVNTSEGEVSIEKIQEIGGTHVEAQEVIYDKQSYPVHPLCAPVDHGLYFRRPPLVLAEQGSYYPYPPPGLLGHDFHARHPPIGLSGQGLHPHAPECQGVFHQLPPCARGQGFITHCAPNFALEYFKPLTDHGNNNGSQHMRPPSGSAPGHLWHRMSCARSGNNLPGFERSEGFSGSSSTYYSRFENGSGSMPPN
>negative111|0
MESYRVFTTICLLLCLFLSANFFTHYVVDARKSVGFEREPKKVMMIKALKHTSLLQKMMTQLNLAQPLDYSSSSNTQPYGVSTTLTLPPYVSLPPLSVPGNAPPFCINPPNTPPSSSYPGLSPPPGPITLPNPPDSSSNPNSNPNPPESSSNPNPPDSSSNPNSNPNPPVTVPNPPESSSNPNPPDSSSNPNSNPNPPESSSNPNPPVTVPNPPESSSNPNPPESSSNPNPPITIPYPPESSSPNPPEIVPSPPESGYTPGPVLGPPYSEPGPSTPTGSIPSPSSGFLPPIVYPPPMAPPSPSVTPTSAYWCVAKPSVPDPIIQEAMNFACGSGADCHSIQPNGPCFKPNTLWAHASFAYNSYWQRTKSTGGSCTFGGTGMLVTVDPSFNGCHFDFF
>negative112|0
MENILNYLQMSLEFQIAHFIEVVKAKAIQVGIQHDGLLTVLTLCFVFYLIVVSTGHLWKNKKKKKRGKIQRVLTRSMSIGVLHGGELALERLVDYHQAKANVQLLDVTETELDSLLKEGLPDFKKLQRCIAKLEMSGKESKAVKILESAIKEARLEGKPHEAYEFEMLLVESLIYQGEFIKALSYECLNDEFITDARRPLYKAIIYLSLGYTEEEAKKFWWEFKRIRGHMKRSRNTQDVQLFEITTDFDKFMCIVKSLEEDIKQVKAKANKNK
>negative118|0
MVVFPVVKLATLALRTACKPIANRLKKEAGYHPKFRNFIISIAQANHRLTTRVQRRIYGHATDVAIHPLNEEKAVQAAADLLGELFVFSTNVADAILIAVADILKTLMLQPKTRTHATKFSSVPSIKISAIIDFVLFRLQELPSSLRCKEVQDQKQEKKNYVDRRYRLSSCLCLSWIICFEGKLWVEAKLVFL
>negative126|0
MAERAWIFSLPGVRQLGRSFLSAARRFPFPKLHGCCTSPAERLFPTALPPCFSFPLASSPRRRAPAPAIGRAPASMDAAPAACSPCFSSPSSFPSPSHGQLRAHPALSLFLSRPSLFFHGCSRFPVPRPRPAVPPLSSSSGSVQFALALAMVAVVPMPTGARSAAVSSRLPATARHVPVFEFVDLHSSPHGRTTSSMLLFGGRLLA
>negative127|0
MKNLKASKFQFLCKPTQHGSKLTQIRKRKRNNYQYQQMKSGVLERCNQRRFSWKAAIKDVGGGNLNGYYKSDILLEGFEIQNRETHHNLQHDLTEYIWERQFQGQNDGEDDDDDDGDDEEEDDGGDEADDTGDDNDDD
>negative137|0
MGRFCDLAILCPDMVPTDSKKIERYIWGLSPQIQLSVLASRPDTFDSAKELAQSLIDHGNYWNSVVGVQDQQKENNFGKQGGNKRKDESSQESSRKQHQGHTIRYCKNPARPINQLPNAGVSQACYGCGKVGHYKRDCPGTANTGTDGRMLTITAAGEATPNPR
>negative138|0
MRDVHLLFGQSTPFLHMQMHEGLLDQNGLRKTTQSKPILHMQMHTPTNSAFCDDKNVCESPVFVPELKKNSFGKFLTEEIPEFIPSRTGVTDSEKKAAEDHSKEDSSLTASEKQKETSIPDEEQKSETSSEDQTTEGECFEDSSDETIEDQDSESSSEDQSCSDSSYEASEELNSSEYQTSESSTDSDNDEECSEDESRLDYLKKSVKNLVNSTAYF
>negative139|0
MPRYKRFKFGLRESEQSLRLSERSLEKTYKRLVEKEQQLCDRLNQRSKKVKQRLKETELSLDEISAKLEERHRRLFDRLEKSVQKLELSETNLEECLKSYDYAITKLRKCG
>negative141|0
MAARPALARLDVSTVAAGAGKQQLVHARRRRRPAHAHRLPVPGAGSAIRLACSSPFLGGSNGSRSLKHNAAAGEKSAADHAAGALEDELIQKENSGGDAAAGASPPSSCDNHGAPQQIEVTADTNDGDKEKTNGPARDVHIKAKLLGYNLEPGSGPHYNHLGPV
>negative148|0
MCPANTHEFMTVFPKDISHEPVATCPITVAGAVSTIHVQQSKHGKGVGSRGPRCFLGGGYVAGTAWCRVQVGTPEGDKSRFPPDKDGLFVGGFMQTSSTIFVQANNWGLSVFGIKELSAVFGPKTLLSCCCFVLKTMSQYQAWCLGGVFGSEILVSLGSEKDELMMRAHRHILPHHTTVSRKRDRQPFFSRHDPRKRRRPLPETTEEERPTKRPAPPRTVVVMGLPQDCSVLDLKSRFEIYGAISRIRIDRDAVGYITYRTKDSADASIAADLDPSFGITVNSKKVRVLWATDPLAMWREGVGNSRDKSSMSKLVRAEVPLSKHGRGNRLSSAIGNTKRSEDSSGSSVLEVPFKGREIVAYDDIL
>negative150|0
MAIEELVVRTADIIDQQAMETLLIENHSSEAKDMEKMRSSMEILSRVDLDLAYSDEKLTNLDNLLMRVLACENEVEAISFENDEISPDSIEKVLTLDHLSAILNSEIRQLDNFIGTLQDLITEARQKMSTCREFSELHMIMENKFHDTEELLRQSKERVLEMKMQLAKLQMTSLAFDKNEWRHSMSLDLSDINNASTREFKPHMQTVEQRRILRMLEKSLERELDLEKKLTLLKQNEDDLKLKLRLTEQVALCMEEGAEVIWGRFLESENTAEVLMGISREMVSRLQVVNFNLNGSLQREKDFTSKIDNCIGQINAKDITIQNLNSCNKQLAAENAEVSSLRDRLKLLEDKLKESESKLLKANELNEASEERLKELECIVESQKEDIDIAEHKAESAEEKVAHLTETNLELHEELDFLKSSNESNAKKVSILEKKVRELELQLQNAKTSSEAGQEQQNMLYSAIWDMETLIDELKQKVLKAESKTEHAEEQCLILSEANLELTKEVEFLRSRVEGLETSLNQATVEKLASAKDINIKTSFIMDLVMQLAIERERVQKQMFCLAKENKSLVRTLESKKDQLSINVLVNGVDDIGLSSSRVASTIAAAKDSSCELLTESSLKSSQVINELTPDKTTDQSSKSSISTNDEPSMVVKSEETQVQLQTTQHQHKYIVMAIFVLLLSTLGLYIFDKRNNILDWSMVRL
>negative160|0
MGRSDSRSPARGRGSPRRRSPSRRERSPAHKRSSHAASSAVAEKPSRRARSRSPVPLSPARERPSSRNRSPKRRKSFSPASHSPIREKPTSCVRSPKRANSRSPDLKLLQGEKFSGRARSPRRAKLQSPEPRSPSSRTKRLRRAERDAEEKSREREPEKNHGRSSDRATHREKDSDRILPEPRSPSPLTKRLKRAERDAEEKLREREPEKNHGRSSDRAKHREKDSDRMLPESRSPSPRTKRLRRAEREAEEKSREREPGKNHGRSSDRATLRENDSDRVMQSERRERKSGKDSIDNGSYKSRNGRSASPSERQHRSRHRSRSPAAADSRAHPEATNSRRGEHRNGEDDSLSKMKEAEEALEAKNKDKPSFELSGKLAAETNRVRGITLLFNEPPDARKPDIRWRLYVFKGGEVLNEPLYVHRQSCYLFGRERRVADIPTDHPSCSKQHAVLQYRQVEKDNPDGTSSKQVRPYVMDLGSTNGTFINDNRIEPERYYELFEKDTLKFGNSSREYVLLRSMEVSFALIGLMLDAFTRRRSGFCYAKFTSASSRLEIWALLS
>negative163|0
MSIVPISIAVASDLDTGHSEADQMFDKNPQGELPLSQTDLSSDAKELLNECIHHGEDGEFKVSATSILEESFFNTRNDPKADDAYSNAHFIAPPIVVNLVADKSAGLPFPVTQMLGGNTKSIEIDFLDTLQPKESADKDGEWLRNIADIVFDKSLELKATKLQCQLANVAPLETAKYVSTEILSAHFDEPTWNEGMTSLFLHTSVAHEDDKIVEFLPYIFKQPSLTLVLDTFRDLAMNLKYVLHTCNCFDTGQHASNEVVSTLLTTKRRKDCHLYSAHRKGTRDIGLPICGYVDSCFATGQVLWACFNMIELYVSSLLRQVQLIGYPHVFQVLDIAVVSSRCKPSEWRADPVNSLQIQHFRVKCNGCKGFRPSRLLLLLLASDEMVDSTPAHSERHDFIDPGANFFIVTSRANADLYVWDLGINSASLALTGCIENVAALIFMGCTGKFCFITFSNSKTGVWDPGQPWFVNYYNSQSNFALSVSSLPKLTHFNIIDWTCMRSQDPSDIALNGSYSSDDTNNSFLPCSLKIPLFLRVANFSGELLLAKGDTTLLRNIIYQSTQVAILGDILELGPTEFTFNELMLQFCCDARFNVTAPIERTENINCAEEIKLLCFNDAHCPESKLPRRGVCSS
>negative180|0
MAAEMALVKPITKFNTINTTTARLSSRRLPFTVRMSAATTTPPTSKPSKKPQKQGIKESLLTPRFYTTDFDEMETLFNTEINKNLNEAEFEALLQEFKTDYNQTHFVRNKEFKEAADKMQGPLRQIFVEFLERSCTAEFSGFLLYKELGRRLKKTNPVVAEIFSLMSRDEARHAGFLNKGLSDFNLALDLGFLTKARKYTFFKPKFIFYATYLSEKIGYWRYITIYRHLKTNPEFICYPIFKYFENWCQDENRHGDFFSALMKAQPQFLNDWKAKLWSRFFCLSVINHLHLKYPFIMISQLEKALYIAFYIVLIIILLVLTKMQVYVTMYLNDCQRTDFYEGIGLNTKEFDMHVIIETNRTTARIFPAVLDVENPEFKRKLDRMVEINTKLIAVSESDEIPLVKNFKKIPLIAALASELLAAYLMKPIESGSVDFAEFEPQLTY
>negative191|0
MPFRRFCGRTILRAIRTLLRPDRIGNLQPHNSEVQRLYINKLPTFWSKPFTKKQPFLIHILQTHQSALEKLRQQLQMENQMATNLVRSTSLPTITHPLIASAEEQLQRLKSSEGTSSSSHSTICQKLDGLRKLYECVDDVLHLPLSQQVLSYEKHVKLMEQVSNSSLKVLDTCSITKDAFSQMKERVRLLESSLRRRRGGEFSLSNEVGVYIASNKKLNKVICKGFRNLHKKENDQTIVVENSELTSLVSLIKGVEDVTLMVLESTSSFISHTKMISKQNSWSLVSKLLKPKRVSCEGIDIEIEKINMELLLLLRNEKIDHSQILNAAKKLQVFESSIQELEDGLGAIFRLLLKNRVSLLNILSH
>negative193|0
MSVVPLMPSMSVVPAMLVVPVVPSTSVVPVFEWWPEVVPVIVWWLEVVPVLGVKEEPSITTKIMKLFGVIISEEEDAEEPPSKRPSLSSSSSSSNRRHRYECPYCIRKFKNSQGLGGHQNAHKRERLHLKRPELQPNTNPAVASFAPLLHKEATESENQAVFQSTATERLLSVASSVASGVRQPAGSLSCTTDGVGMSSVVGPQSNNHVNDGVTSFDHIDLHLSL
>negative195|0
MFLFYYLHHHSSSLVSMLISITISYNSTKWADEERYQTSMFQWLHGLLITSSSLSWIKLKERRKTGVKHLCMTHYNML
>negative203|0
MRRLLSGKKLQLVLDLDHTLIQTRQIKKTPHLKREKDIYRFYPIYLLSMISLQQLILPNLTKSNIMEQNFMWYDNESCNSITTSSLQCRHSSFMDNKCARCTQSIADCGNPVTIPFGYIHPHLSHTLEEIDRLRDENLQCLLERKKLHLVLDLDHTLVHATKIMQPMLGRESTESMNDFHEILDSRYMIKIRPGVHEFLEQVSSMFDLSIYTRSVREYAHKVVEVLNSGDGLSKFSWVITREDCLKNKRKGLDVVLSHERVVLIVDDKERVWEESGRENLIKIRPYFYIQEEEDMDEEDMDDELDRVLNVLKEVHNGFYDDNIEFGKLDYGKRDVREVLKRVLLSSKG
>negative209|0
MADSGKKQVGESSISQKGSKRKGQGQRYSVEATQKLEAFFKKCRFPDEDQRNQLAIEVGLDPDQIKGWFQNKRTQTKTKNERSDNQNFQNENEKFRREIIEMKEAMENNMRSKCDGPLIGEEERAGNIEKLKINMQRLREERKRIISNIISSYHEKSFVMDSNLAPPNSTLGSLTDSSDECLLRQTICGSPIGYNSSFHPENNNDNNNVRAHSINIKNIPIISQLEQENYGFHHDNNGEKSVIFEIVVPAINEMLGLVYVNEPLWVKSSVDEGWLIHRESYDRTFSNSNRPYKSTARIESSRSFGVVPMTAIDLIKNFRDPIKWMNMFPTIVTKARIVDVLDSGNTEMYEKLHILSPLVEARDFVFIRCCKQLDQTTWIMVDVSYDLFKEIKTCAPSYAWKFPSGCVIQDTGDGKSMVAWIEHVQIDEKCQVNHIFRDLLCGRQTYGAERWIVTLQRMSERYNFAMPVTCATTDDSQGGSVLYLTLYVFMIFQN
>negative213|0
MVFVVLLVAVGGGVGIGGLGGGISGFDISIGGIDGGVGCLGGLGASIGGISGGISGFCGSLGALGGGIGSLRGIVGGFGGSGGRLGGGGGGGLGGPSISTGLVDTIAEPTVKLIKKKLDGAIAIRRAIRQGLPNFEALHDQYTVTNSNASSGGITGGVVDIGGSHLDADTDASRDDEHGRQLANPNAYAVADKIVDLNFYNNFKDRYNDLKRQDETPATVGVVPQLIEIFLSLFITADMQFVAAYSLSVANDR
>negative214|0
MALTPRELAYASVNPLLSTTWLKFSVSPSTSLGHENLLTKYAKGSQTLPVTPLINLASKQLADEISFMVLTKITKFVESCCGSVVKNTKMLVPTYFHETQCQATTDTNINGLNITSIINKPTAAAIAYGLTNSPITVCKPLHDFLEKLAGGERVFLVGLNFGGAYISLAMELYPSKVSKAVFVAAENIQGHILFFSHRELIRFLLVFVANFSSSRSREEARDRESLEEAEGIFVLRYCGKRSVIAVNLGASEVALRDRVEAFAIAIKVASG
>negative216|0
MPSSSIQNKVPLFILFPQSHLYSIPPRVFGSTCFVHNLAPGKDKLALRALKYVFLGYSRVQKGYRCYSPDLRRYFMSIDVIFFESRPYYISSDHPDVSEILPVSPVLPAPSLVLPVSPDLPKPTFAKSTFTSTSLTAVPPLLTYHRRPRPLFVPDDSCHVSDAAPTADLPSPSQSIALQKGSVSGNGSEHSAENVEKSVEKLEPCTSHQVKAKAGRHKQNHSRSLGLLAAKLFDDEVPLRKKLKLFNRLATVQDDGTVQFEVPGDIKPEKLDFGTGVVYNGALGEAANDVDNVADIPELPPLQIVMLIVGTRGDVQPFVAIGKKFQENGHRVRLATHANFKEFVLGAGLEFYPLGGDPKVLAAYMVKNKGFLPSGPSEILIQRNQIKDIVFSLLPACIDPDPESNVPFKVNAIIANPPAYGHMHVAEALKVPLHIFFTMPWTFLLSYVGRPTSEFPHPLSRVKQSVGYRLSYQVVDGLIWLGIRDVINDFRKKKLKLRPVTYLSNSNSYHPDVPYGYIWSPHLVPKPKDWGPKIDVVGFCFLDLASNYEPRESLVKWLEDGEKPIYIGFGSLPVQEPEKMTKIIVQALEMTGQRGIINKGWGGLGNLKEPKDFVYLLDNCPHDWLFLQCAAVVHHGGAGTTAAGLKTACPTTVVPFFGDQTFWGERVHARGVGPAPIPVDEFSLEKLVAAIRFMLDPKVKERAVELAKAMEHEDGVTGAVKAFYKHFPRESLEPKPEVSPRPHHFFSLRRCFGRS
>negative217|0
MGSCLSSEAPATGDGPAAWRKRHHGSQEGAAGGGGKKLPGGAGEMTEDELAWVLERLCGKGASAVASLHTQQDRKGTNQDAMVVWEGGGTAEGADAGRGGDGLGQRSCGGDRRREDGADAACR
>negative222|0
MRSGEAAGAVRSSSDQGNSRKKPRFDAGEEEEEELARMPLADAFVGAGSSGDGDGAAAGAGGCAAAPSVELLDIVQHPLPGYGAPVALSFSPDDRRVAFLYSPDGTLHRNVYAFDPAQRRQELLFGPPDGGGLEEGNLSAEERLRRERARERGLGVTRYEWRARLPGTPASRAGIVVPLPSGVYFQDLSGAEPVLKLQSSATSPIIDPHLSPDGSMIAYVRDDELHTVGFSDGQTTQLTYGASESGKIHGLAEYIAQEEMERKMGFWWSPDSKHLAFTEVDSSEIPLYRIMHQGKSSVGPDAQEDHAYPFAGAANVKVRLGVVSSHGGEITWMDLLCGEPNSIHGDEEYLARVNWMHNSAIAVQVLNRTHSKLKLLKFDIASGKREVILEEEHDTWITLHDCFTPLDKGVNSKHPGGFIWASEKTGFRHLYLHDKNGVCLGPLTQGDWMIDQIAGVNESSGVIYFTGTLDGPLETNLYSTNLFPDWSLPLQVPKRLTHGTGRHSVILDHQLLRFIDVYDSIKSPPVILLCSLLDGSVIMPLYEQPLTVQPLKKFQQLSPEIVQIEGKDGTALYGTLYLPDEKKYGPPPYKTLVNVYGGPSVQLVSDSWISTVDMRAQFLRSKGILVWKMDNRGTARRGLQFEGQLKYNIGRVDAEDQLAGAEWLIKKGLAKPGHIGLYGWSYGGFLSAMCLARFPDTFSCAVSGAPVTAWDGYDTFYTEKYMGLPSEQRDAYRYGSIMHHVKNLRGRLLLIHGMIDENVHFRHTARLINSLMAEGKPYDILLFPDERHMPRRLGDRIYMEERIWDFVERNL
>negative231|0
MASEIGRGKRQASEIGRGMREASKIKSGGARKLMHTDVQLHVYGVPFSLNRGLLAARSSKLAALLKENDEDDISHLLGDIPTDSETFELVARFCHGFDIILSPDNIIKVLCLAHYLGMSEIHSTNNLTKKAGLYFQNNVLSSWNKTIKALKSAEIILQQAADLSLVDACAEFIIAKVLHNPSLLGEPMRNITTADDDSENDENVYKSNVKRRLFVHDWKSEDLTLLSIALYEPIIRAMVHREVPLEYVASSLFQYLSKWVFLDTKREDDDPSTYTRNSQREIIEAVERLLPQKRGLIPSSLLSKMLQSAIILDAHTECKNGLETRIGKQLDQATVKDLLIPAQGYAKEEQYDTVSVKRILKNFYSNYESTEKSGLVVVAELVDDFLAEVSSDIDLKLNTFLSLAELSQAATAGTNRNSDGIYRAIDIYLDRHRYLTDWEREEVCRVLDCSKMSPEACEHAATNEKLPVRVTVQILFSVQLKLKDNVTKRIKRGPDNRLLKLEEDEEDAKGTSNSEEEMMKAEMEMMGNKVLELEKECHMMRREIQRGSSQHKIQKEKTSMWKEMKRKIGCMTSSHETNCHVKKKKVHPR
>negative235|0
MTPYSLNPPGPSIQAGQNQLFNISPNNQDCRTFFNIFDPRQTSIEIGGLRENYRQDDKMILHDGSSSNCNSSFNISPETVVMVDPLSSACDRRNLPSEEESKNNDHGSGNKWMSSKMRLMKKMMRPSISPTTDKAINSSPRFQNHQGLESRRYSQRSPRNNNGSSTPRVCSDCNTSTTPLWRTGPKGPKSLCNACGIRQRKARRAMAEAANGLVTPIACEKTRLHNKEKKSRMNHFAQFKNKYKSTTTTTTTTVGSSEGVRKLEYFNNFAISLRSNNSDFEQMFPRDEVAEAALLLMDLSCGFVHL
>negative245|0
MVETEVAALIAPWLTPTSLFVLLNILLATIFLANNNNNNNHNNNSKNNLETHHHHQYHQSPDDGASLVPPPPQLVRAPSLFSRVASFNFSNYYSESHASDPATVTFTRAPSFLQRVTSFKFGRGNGTENEEEARDSGVTHHVRRENKSDSDSEDRKSEEKVVKVKKKVNKKVKKEVTAVKKEKEVVVPEVKETTSLFRKKDEVAEVKETTSLKREDEEGVDSKADDFIHKFKQQLKLQRLESLLRYRNN
>negative249|0
MCSPKNLCSIVRQKKQNMSGKSDFAQKLLYDLKLRKERVAAAQNTGYSYSTPRDGRANPGQTYRGSRQTKTLEYVNLKVGSTSNRSFRVEESSGQIVTYGTGRVRNSERVGDLSMALAFAIENGGKFTKMDSGSSRNPVLKFLQQFGQRSIDISKAERRTYHLPNGQFPSISHVHINEVSKGIQKLNQILRACSNGLNFNRNSIEVGQELLKGAMDLEESLRMLVNLQEASDHMIKPQNKNRIKLLDEDEDEDEDDDVKIVDQKQLDLPRFSFDKPSRNSYITKGTTKNNIKQQLMALTYPDQTPKLHEKQPLSTNKPVSHKRSASCAPDFKNLDQKNQSSGPKSGSEKGRISNVIAKLMGLDEIPQKEDKKASRKGSDPKKKQEPVLKRRSDPTGTRDAENMPSLSVDKNMISNMLSVQDAKHVRRADNARASPRRNSDMVSSGRPQQQEERSIAVNKDTGPVSSGLQTINNVMDKQHSKDIQLNQVPGHQVNFQQKQTEKNQTSVKGKITKTIEIKETISQLKKPQTSRVPLINIVLQEDTIQKEKKEIDKFPLSNEQKALHKDEVRQVQKLKKPEGQDEKHQAGKKEQLPSNKTMQVRTHKANQVETITSPKARNGAASSKTKQSSLKQSILGTKNSTKSKNGAPSKDSLDKIKQVALVKHRHSSTSAAIMQSSENKNANQNISPGAENLNSIDQSSKREKPINLPSTDRKGHHTKIHITETFPKIEKLPRKREDILQEESAPPKHLSPTLQDMKLQKDDKSCSQLTKNQAREAKADNIGSNDSEMSTERLNLQTELHCKIENSTSCNTTMEKEREVLIGSETMISNENHQEKTLQEVDISMDQKLGEDRPKISQAMDQFNGIHQEASLNSKLFNDEQNQCFPAKFTGKGGTKLANVVRHDQETTLPLVAQEPLTEPEKHFKESVIKNHLFLNTAEALFKLNIPISILHASDQNNQGEDVKLMIDCGYEIMKRKAIRQELALHPYATISIGYTKTRSLDGLIKQLCKDFDTLKSYGGNGNVSAECDEADYLHKMIGKDMQNGNPDVNSMWDFGWSTEQLMFGFLEKDDIVKDVEKHLLNGLIDEITMDLLRIAISV
>negative256|0
MLRTRLDGANPNDNTTSCWTVNYNQHCVSLLLSRSFCIFPLTQTSHVCPFLLPQEIDCYIWRGFEPHKRHRLLPLDIDAQKTDSCRLHLSSSTFVNGICFTPPNSREQELLSLKLHRNPVAAVILDYRVFCTPRVISGFWIGPDVEDGSGFVEAIVYQMFDA
>negative263|0
MCLILSPLLIYLGSFCFGKSNRFSAPEKIRGDFTTLKVETCEQMHHSPSLKIFFDLNPNKQIYEKVYVDHRLNIIREGKGLERRMRAFWCFSFADFSFYSADYITINQEKKICLEKKDLSL
>negative270|0
MTQILIFSPSPILSPFNEREQRWRKKVICCSRQHVWDESLLQSNGGFRKFRASNSTAGSYGGSAEISKGQPLPWISTWPNKQMLMPFSKVNSKFCISHVLVN
>negative272|0
MYLSSRTGSRLLFLVKASFNTFEKVATISSTLIYFVSHKLVSSRSCSHNFRNICGYIRHNCRKCIRFCFCTFSIKISLPLVIARRIEY
>negative274|0
MEDTGFSGVNGSCDYLVQSPDHQPLQLVPYHVGCSSHSVGCSSSGINLDRRRVGRKSSSKPKASKKKNGYHQPQGEKKQRLIWTPELHQCFVDIFNRTPSTKLFPRTILEQMSGKYPFVTRENIASHLQKHKKNLNKKDEEENSAKASSKSVPLKIPKESNSQSINPKTSNSNSTFPNSQLILQSYPNSTNPISSQVLLHSHPNSAIPSFDDKTWVQNSQGCGFSREQFEQFALEQIALEQLYGCLKRSINHQVDYQQQQHSFNNNVCESQENPNWALQIGQSSTAPISDLAVSTNLCPPPISANQISGNDGNGIGFHPIMDFNSFGGWNNTNNDPFLDTDLPNFAFGNCYDNPEPLDFRYPLNLNTINGFQTDSEQMYYPSINPMENDFTFQQCFDGYNHPNVVPQTFGLGYNEVHAKNNVELRNLDALADNDLQIFNGLNNDQISQGNECFPGPKGSVT
>negative285|0
MTSPASTSSWFSGIVRTASGNPPSSAMPPAGGVTSAPVSLPDTPAAVSGKGGVVAPVATTGVGARRKQLQGTLFKYGPKSAQGCWIQFRSSAACFAIPLAWEFFGSDVLDASRMVAFRTGDFTRQVIFIGGLADGLLATDYLEPLSLALEVEKWSLVQPLLSSSYTGYGISSLEQDALELDQLIGYLINKDNSEGVILLGHSTGCQDIIHYMQTNFACSKAVSGVILQAPVSDREYRATLPETGEMIDLAAKMISAGRGMDLMPREANSDAPITAYRFHSLCAYMGDDDMFSSDLSEDQLKQRLGHMSTTQCLVIFSMADEYVPEYVDKKALVDRLCRALGDSEKVEIKWGNHALSNRVQEAVEVIVDFVKREGPKGWDDPWS
>negative286|0
MVAVMAMVMVEMIHHHFLKWFIKNDFESSAPLPPEAKITSGSRASYLN
>negative289|0
MGIPLVAIRFWEKSPDHVSLEFLLALVHHPQGSLGSLLFLVYGIDEVSLFSHASFLDLAPKSASDSTIALDRVVDPVAMTISFRFNHRLRSCSRSSCYDQTWYCSRFAIIITGVDAAELGSAMGQAGEELVGALLCFFVDA
>negative290|0
MEFDVQLKKLQIGLSAKNTVSISPALSPSDPFLRHFTLLNAATFLLRSTLHLLLLLLLYTFQSTRSDFFYLESDETVDFSGCRNQRFEEKMLGFSNGSGGGGVGAEDLAQLQSTMRAIELACSYIQINSNPVAAEATILSLHQSPQPYKACRYILENSQVANARFQAAAAIRESAIREWSFLATDDKGGLISFCLGYVMQHANSSEGYVLSKVSSVAAQLMKRGWLEFTPAQKEVFFYQINQAILGSHGLDVQFIGVNFLESLVSEFSPSTSSAMGLPREFHENCRKSLEQNFLKSFYQWAQDAALSVTSKIIESHSSVPEVKVCNATLRLMHQILNWEFPYSKGGTRASINVFSDGIRPDNALSRKTECVIVQPGASWCDVLLSSSHVGWLINFYSSVRQKFDLEGYWLDCPVAVSARKLIVQLCSLAGEIFPSNNVQMRDQHLLLLLTGVLPWIDPPDVISKEIEEGRSGSEMIDGCRALLSIGTVTTPVVFDQLLRSLRPFGTLTLLSMLMGEVVKVLMANSTDEETWSYEARDILLDTWTTLLTSMDGSGGNAWLPPEGIHAAASLFSLIVESELKVASASATTEDDADCLASVSAMDERLGSYALIARAAVDATIPFLAKLFSDHVARLHQGRGTVDPTETLEEVYSLLLIIGHVLADEGEGETALVPDALQSHFVDVVEANNHPVVVLSSSIIKFAEQCLDAEMRSSIFSPRLMEAVIWFLARWSFTYLLLVEECNLGSNKLQSLPSRACLFTYFNEHNQGKFVLDIIVRISLTSLTSYPGEKDLQELTCFQLLHALVRRRNICFHLLSLDSWRNLANAFANDKTLFLLNSVSQRSLAQTLVLSAYGMRSSDASNQYVKDLMAHMTSSLVDLSNSSDLKNLAQQPDIIMLVSCVLERLRGAASATEPRTQRAIYEMGLSVMNPVLRLLEVYKHESAVIYLLLKFVVDWVDGQLSYLEAHETAVVINFCMSLLQIYSSHNIGKISLSLSSTLLNEAKTEKYKDLRALLQLLSHLCSKDMVDFSSDSIETQSTNISQVVYFGLHIITPLITLELLKYPKLCFDYFSLISHMLEVYPETLAQLNNDAFSHVLTTVDFGLHQQDVDIVTMCLRALKALASYHYKEKNAGNSGLGSHAAGHTDPNGVFHEGILSRFLRTLLHFLLFEDYSTDLVSTAADALFPLILCEPNLYQGLGNELIEKQANPNFKTRLANALQVLTTSNQLSSSLDRLNYQRFRKNLNNFLVEVRGFLKTR
>negative294|0
MAHANTGPSRGPNLRVQGGGEKQQSPTSSASSRPHPASPPQTLSPPASPLLHGGGGHHRTMRSSPAASGTTPSNMDSGSESDSAPEELTAVQGVEKHDEISKVEKDSAIRVSQQEKERRRRWAQRRTSSKPDKKEPLEVEDKDIKQKAENEEDEESEETHTMPGMLPTNVIEMLAAREKQTFSSDSEEEITNQKVQKRKKRLKSSGPETILLKDVRSTQHVKNALAFLEQRKNQVPRSNAVLKNANKALRLLSSKGNFLS
>negative299|0
MMILDDRNNKRSPFPFTASQWKELEQQLLTFKYLISGVPVPPQLFSNVITNPFASSSSTLFPYQQSLPFGWGCFQVGYGRKIDPEPGRCRRTDGKKWRCSKEAFPDSKYCEKHMHRGKNKTKKHIQVFSTTPPKTTTTNTIIPTTPSTTNKPYLSSPVISSSVSPSYINQLHSYDHTTSFYPFLSPQSSSFSTHFTESTQNVTAPHWLLDTQLYDDQPKKDMRYLKEMKGAVVNEKRSSQFQDTRYNKINKEERGERFGGVDRVDETQMLFHHFLGD
>negative300|0
MQRKPPMATRPSLRSSDGARPRSKSGAGRPPSSSRSPGPSRPSSAAAADKPVPSFLRPTVSSSMHSSSSSSSLPLPCASSSSAAGSKGAAAATARRSADKAPARPVGAPRPITPKDRASAQPVGAPRPITPKDRASAQPVGAPRPITLKDKAPAQPVGVPRPAKAKAPASASLWGAVSPRQLMQRASNAFKASSRSRSKKGKEEAAASASSGGKGGAGASARAKGQTSRAQHQQQQPETPAEPSPAVTPLELEAEEPVLLESDAAGQNGQDVATSQEAASTDITTVAVRCQEEQVGAGQPKEKAEAEMSDLQEERPQSSEVVETETGAHERTDDGSPAVVGEAAATPEGEDELATSAAEEKIVEEIQRAEATENSKANANPEEPDQETGVISEEPKEGTSVVSEEPKPKEAADPTTAQKREEASDEPKTAAGSNASAPTTPLTEAANYDGNDDVEATPKQVSASEPVTPVAEGISKGKEVMETQQQSASAPTTPASKQAAIPEDAASALAFRGRRVRTAMERRSEEEQPKRKEVARSNDVIEEAKSKLMEKRKSKVKALVGAFETVMDSPRAS
>positive1|1
MKNLDHIAASVDWEKESLPEYQDLIFLLFFALFFPVLRFILDRFVFEALAKRMIFGKKTVVNINGREERKKINKFKESAWKFVYFLSAELLALSVTCNEPWFTDSRYFWAGPGDVVWPNLKMKLKLKLLYMYAGGFYFYSIFATLYWETRRYDFAAQIIHHVTTVSLIVLSYVYGFARIGSVVLALHDGSDVFMEIAKMSKYSGFDLIADIFFSLFALVFTSLRIICYPFWIIRSTCYELLYVLDIQKERTTGIILYFVFNALLICLLVLHLFWFKIILRMVKNQILSRGHITDDVREDSESDDDHKD
>positive2|1
MAHASVASLMRTIESLLTFNSPMQSLSCDHREELCALREKVSSLEVFVKNFEKNNVFGEMTDFEVEVREVASAAEYTIQLRLTGTVLGENKSQKKKARRRFRQSLQQVAEDMDHIWKESTKIQDKGKQVSKESLVHDFSSSTNDILKVKNNMVGRDDQRKQLLEDLTRSYSGEPKVIPIVGMGGIGKTTLAKEVYNDESILCRFDVHAWATISQQHNKKEILLGLLHSTIKMDDRVKMIGEAELADMLQKSLKRKRYLIVLDDIWSCEVWDGVRRCFPTEDNAGSRILLTTRNDEVACYAGVENFSLRMSFMDQDESWSLFKSAAFSSEALPYEFETVGKQIADECHGLPLTIVVVAGLLKSKRTIEDWKTVAKDVKSFVTNDPDERCSRVLGLSYDHLTSDLKTCLLHFGIFPEDSDIPVKNLMRSWMAEGFLKLENDLEGEVEKCLQELVDRCLVLVSKRSRDGTKIRSCKVHDLIYDLCVREVQRENIFIMNDIVLDVSYPECSYLCMYKMQPFKRVTGDEINYCPYGLYRALLTPVNRQLRDHDNNNLLKRTHSVFSFHLEPLYYVLKSEVVHFKLLKVLELRHRQIDGFPREILSLIWLRYLSLFSYGNFDVPPEICRLWNLQTFIVQRFRSDIIIFAEEIWELMQLRHLKLPRFYLPDCPSGSVDKGRHLDFSNLQTISYLSPRCCTKEVIMGIQNVKKLGISGNKDDYKSFRDSGLPNNLVYLQQLEILSLISVDYSLLPVIISSAKAFPATLKKLKLERTYLSWSYLDIIAELPNLEVLKLMDDACCGEEWHPIVMGFNRLKLLLIKYSFLKFWKATNDNFPVLERLMIRSCKNLKEIPIEFADIHTLQLIELRECPPKLGESAARIQKEQEDLGNNPVDVRISNPLKESDSDSEEH
>positive18|1
MNFNNELSDLKNRFLFRTLRAQKCSDVARDRIDFFIWELKFLNCFLHLQSFAFASECGMLDISQKMIEICKRFNTPPPHNSFAYWKEVICKRLCAISIQPDASSDDGFACWKKVIWKTKQEFRAKYSFPKTLLADNKVYDDDDTNPKFVMEFIDAVVGNLNVLVKINDPSSLLFVPGPKEQIEQVLKELKLLRFFVCFVSNKCIEPQYQHTTFYTHALIEASHIAMVVWLNLPIYGNRNQDLASSEVSCLLSDFMEMKIKSIQPDISRNNIYIDVLRALKSTIPQAQDKHAAESGIVETPTHNLMVGLSDQMANLQEMLCLLRDNLIHLPILDLEFHLQDMDSVIVDAGLLIYSLYDIKGQKEDTTLEDINQALGFDLPRNIEPIKAMINLVMQKAFQCNLPRIHGLGYVDFLLKNLKDFQGRYSDSLDFLKNQLQVIQTEFESLQPFLKVVVEEPHNKLKTLNEDCATQIIRKAYEVEYVVDACINKEVPQWCIERWLLDIIEEITCIKAKIQEKNTVEDTMKTVIARTSSKLARTPRMNEEIVGFEDVIENLRKKLLNGTKGQDVISIHGMPGLGKTTLANSLYSDRSVFSQFDICAQCCVSQVYSYKDLILALLRDAIGEGSVRRELHANELADMLRKTLLPRRYLILVDDVWENSVWDDLRGCFPDVNNRSRIILTTRHHEVAKYASVHSDPLHLRMFDEVESWKLLEKKVFGEESCSPLLKNVGLRIAKMCGQLPLSIVLVAGILSEMEKEVECWEQVANNLGSYIHNDSRAIVDKSYHVLPCHLKSCFLYFGAFLEDRVIDISRLIRLWISEAFIKSSEGRRLEDIAEGYLENLIGRNLVMVTQRSISDGKAKECRLHDVLLDFCKERAAEENFLLWINRDQITKPSSCVYSHKQHAHLAFTEMHNLVEWSASCSFVGSVVLSNKYDSYFSTRDISSLHDFSISRILPNFKFLKVLDLEHRVFIDFIPTELVYLKYFSAHIEQNSIPSSISNLWNLETLILKSPIYALRCTLLLPSTVWDMVKLRHLYIPDFSTRIEAALLENSAKLYNLETLSTLYFSRVEDAELMLRKTPNLRKLICEVECLEYPPQYHVLNFPIRLEILKLYRSKFKTIPFCISAPNLKYLKLCGFSLDSQYLSETADHLKHLEVLILYKVEFGDHREWKVSNGKFPQLKILKLEYLSLVKWIVADDAFPNLEQLVLRGCQDLMEIPSCFMDILSLKYIGVEYCNESVVKSALNIQETQVEDYQNTNFKLVLIEFSLQKKAWKLNLTDAEDMHNAVKNILAEIR
>positive38|1
MAPAVSASQGVIMRSLTSKLDSLLLQPPEPPPPAQPSSLRKGERKKILLLRGDLRHLLDDYYLLVEPPSDTAPPPDSTAACWAKEVRELSYDVDDFLDELTTQLLHHRGGGDGSSTAGAKKMISSMIARLRGELNRRRWIADEVTLFSARVKEAIRRQESYHLGRRTSSSRPREEVDDDDREDSAGNERRRFLSLTFGMDDAAVHGQLVGRDISMQKLVRWLADGEPKLKVASIVGSGGVGKTTLATEFYRLHGRRLDAPFDCRAFVRTPRKPDMTKILTDMLSQLRPQHQHQSSDVWEVDRLLETIRTHLQDKRYFIIIEDLWASSMWDIVSRGLPDNNSCSRILITTEIEPVALACCGYNSEHIIKIDPLGDDVSSQLFFSGVVGQGNEFPGHLTEVSHDMIKKCGGLPLAITITARHFKSQLLDGMQQWNHIQKSLTTSNLKKNPTLQGMRQVLNLIYNNLPHCLKACLLYLSIYKEDYIIRKANLVRQWMAEGFINSIENKVMEEVAGNYFDELVGRGLVQPVDVNCKNEVLSCVVHHMVLNFIRCKSIEENFSITLDHSQTTVRHADKVRRLSLHFSNAHDTTPLAGLRLSQVRSMAFFGQVKCMPSIADYRLLRVLILCFWADQEKTSYDLTSIFELLQLRYLKITGNITVKLPEKIQGLQHLQTLEADARATAVLLDIVHTQCLLHLRLVLLDLLPHCHRYIFTSIPKWTGKLNNLRILNIAVMQISQDDLDTLKGLGSLTALSLLVRTAPAQRIVAANEGFGSLKYFMFVCTAPCMTFVEGAMPSVQRLNLRFNANEFKQYDSKETGLEHLVALAEISARIGGTDDDESNKTEVESALRTAIRKHPTPSTLMVDIQWVDWIFGAEGRDLDEDLAQQDDHGYGFFILFPGYNLQGLLSFFLSLPWLLSLPSMHLQPDLMIV
>positive40|1
MKRRRLFFSVLLSILTLFINGPLITTAQSPPSSSTSCNRICGGIEIPFPFGIGRRDCFLNDWYEVVCNSTTSGKSLAPFLYKINRELVSITLRSSIDSSYGVVHIKSPVTSSGCSQRPVKPLPLNLTGKGSPFFITDSNRLVSVGCDNRALITDIESQITGCESSCDGDKSRLDKICGGYTCCQAKIPADRPQVIGVDLESSGGNTTQGGNCKVAFLTNETYSPANVTEPEQFYTNGFTVIELGWYFDTSDSRLTNPVGCVNLTETGIYTSAPSCVCEYGNFSGFGYSNCYCNQIGYRGNPYLPGGCIDIDECEEGKGLSSCGELTCVNVPGSWRCELNGVGKIKPLFPGLVLGFPLLFLVLGIWGLIKFVKKRRKIIRKRMFFKRNGGLLLKQQLTTRGGNVQSSKIFSSKELEKATDNFNMNRVLGQGGQGTVYKGMLVDGRIVAVKRSKVLDEDKVEEFINEVGVLSQINHRNIVKLMGCCLETEVPILVYEHIPNGDLFKRLHHDSDDYTMTWDVRLRISVEIAGALAYLHSAASTPVYHRDVKTTNILLDEKYRAKVSDFGTSRSINVDQTHLTTLVAGTFGYLDPEYFQTSQFTDKSDVYSFGVVLVELITGEKPFSVMRPEENRGLVSHFNEAMKQNRVLDIVDSRIKEGCTLEQVLAVAKLARRCLSLKGKKRPNMREVSVELERIRSSPEDLELHIEEEDEEECAMEINMDDSWSVDMTAPASLFDLSPKLDVEPLVPQRTW
>positive47|1
MGGCFSVSLPCDQVVSQFSQLLCVRGSYIHNLSKNLASLQKAMRMLKARQYDVIRRLETEEFTGRQQRLSQVQVWLTSVLIIQNQFNDLLRSNEVELQRLCLCGFCSKDLKLSYRYGKRVIMMLKEVESLSSQGFFDVVSEATPFADVDEIPFQPTIVGQEIMLEKAWNRLMEDGSGILGLYGMGGVGKTTLLTKINNKFSKIDDRFDVVIWVVVSRSSTVRKIQRDIAEKVGLGGMEWSEKNDNQIAVDIHNVLRRRKFVLLLDDIWEKVNLKAVGVPYPSKDNGCKVAFTTRSRDVCGRMGVDDPMEVSCLQPEESWDLFQMKVGKNTLGSHPDIPGLARKVARKCRGLPLALNVIGEAMACKRTVHEWCHAIDVLTSSAIDFSGMEDEILHVLKYSYDNLNGELMKSCFLYCSLFPEDYLIDKEGLVDYWISEGFINEKEGRERNINQGYEIIGTLVRACLLLEEERNKSNVKMHDVVREMALWISSDLGKQKEKCIVRAGVGLREVPKVKDWNTVRKISLMNNEIEEIFDSHECAALTTLFLQKNDVVKISAEFFRCMPHLVVLDLSENQSLNELPEEISELASLRYFNLSYTCIHQLPVGLWTLKKLIHLNLEHMSSLGSILGISNLWNLRTLGLRDSRLLLDMSLVKELQLLEHLEVITLDISSSLVAEPLLCSQRLVECIKEVDFKYLKEESVRVLTLPTMGNLRKLGIKRCGMREIKIERTTSSSSRNKSPTTPCFSNLSRVFIAKCHGLKDLTWLLFAPNLTFLEVGFSKEVEDIISEEKAEEHSATIVPFRKLETLHLFELRGLKRIYAKALHFPCLKVIHVEKCEKLRKLPLDSKSGIAGEELVIYYGEREWIERVEWEDQATQLRFLPSSRWRWRET
>positive48|1
MISLPLLLFVLLFSALLLCPSSSDDDGDAAGDELALLSFKSSLLYQGGQSLASWNTSGHGQHCTWVGVVCGRRRRRHPHRVVKLLLRSSNLSGIISPSLGNLSFLRELDLGDNYLSGEIPPELSRLSRLQLLELSDNSIQGSIPAAIGACTKLTSLDLSHNQLRGMIPREIGASLKHLSNLYLYKNGLSGEIPSALGNLTSLQEFDLSFNRLSGAIPSSLGQLSSLLTMNLGQNNLSGMIPNSIWNLSSLRAFSVRENKLGGMIPTNAFKTLHLLEVIDMGTNRFHGKIPASVANASHLTVIQIYGNLFSGIITSGFGRLRNLTELYLWRNLFQTREQDDWGFISDLTNCSKLQTLNLGENNLGGVLPNSFSNLSTSLSFLALELNKITGSIPKDIGNLIGLQHLYLCNNNFRGSLPSSLGRLKNLGILLAYENNLSGSIPLAIGNLTELNILLLGTNKFSGWIPYTLSNLTNLLSLGLSTNNLSGPIPSELFNIQTLSIMINVSKNNLEGSIPQEIGHLKNLVEFHAESNRLSGKIPNTLGDCQLLRYLYLQNNLLSGSIPSALGQLKGLETLDLSSNNLSGQIPTSLADITMLHSLNLSFNSFVGEVPTIGAFAAASGISIQGNAKLCGGIPDLHLPRCCPLLENRKHFPVLPISVSLAAALAILSSLYLLITWHKRTKKGAPSRTSMKGHPLVSYSQLVKATDGFAPTNLLGSGSFGSVYKGKLNIQDHVAVKVLKLENPKALKSFTAECEALRNMRHRNLVKIVTICSSIDNRGNDFKAIVYDFMPNGSLEDWIHPETNDQADQRHLNLHRRVTILLDVACALDYLHRHGPEPVVHCDIKSSNVLLDSDMVAHVGDFGLARILVDGTSLIQQSTSSMGFIGTIGYAAPEYGVGLIASTHGDIYSYGILVLEIVTGKRPTDSTFRPDLGLRQYVELGLHGRVTDVVDTKLILDSENWLNSTNNSPCRRITECIVWLLRLGLSCSQELPSSRTPTGDIIDELNAIKQNLSGLFPVCEGGSLEF
>positive52|1
MDKWKYARLAQFLFTLSLLFLETSFGLGGNKTLCLDKERDALLEFKRGLTDSFDHLSTWGDEEDKQECCKWKGIECDRRTGHVTVIDLHNKFTCSAGASACFAPRLTGKLSPSLLELEYLNYLDLSVNEFERSEIPRFIGSLKRLEYLNLSASFFSGVIPIQFQNLTSLRTLDLGENNLIVKDLRWLSHLSSLEFLSLSSSNFQVNNWFQEITKVPSLKELDLSGCGLSKLAPSQADLANSSFISLSVLHLCCNEFSSSSEYSWVFNLTTSLTSIDLLYNQLSGQIDDRFGTLMYLEHLDLANNLKIEGGVPSSFGNLTRLRHLDMSNTQTVQWLPELFLRLSGSRKSLEVLGLNENSLFGSIVNATRFSSLKKLYLQKNMLNGSFMESAGQVSTLEYLDLSENQMRGALPDLALFPSLRELHLGSNQFRGRIPQGIGKLSQLRILDVSSNRLEGLPESMGQLSNLESFDASYNVLKGTITESHLSNLSSLVDLDLSFNSLALKTSFNWLPPFQLQVISLPSCNLGPSFPKWLQNQNNYTVLDISLASISDTLPSWFSSFPPDLKILNLSNNQISGRVSDLIENTYGYRVIDLSYNNFSGALPLVPTNVQIFYLHKNQFFGSISSICRSRTSPTSLDLSHNQFSGELPDCWMNMTSLAVLNLAYNNFSGEIPHSLGSLTNLKALYIRQNSLSGMLPSFSQCQGLQILDLGGNKLTGSIPGWIGTDLLNLRILSLRFNRLHGSIPSIICQLQFLQILDLSANGLSGKIPHCFNNFTLLYQDNNSGEPMEFIVQGFYGKFPRRYLYIGDLLVQWKNQESEYKNPLLYLKTIDLSSNELIGGVPKEIADMRGLKSLNLSRNELNGTVIEGIGQMRMLESLDMSRNQLSGVIPQDLANLTFLSVLDLSNNQLSGRIPSSTQLQSFDRSSYSDNAQLCGPPLQECPGYAPPSPLIDHGSNNNPQEHDEEEEFPSLEFYISMVLSFFVAFWGILGCLIVNSSWRNAYFKFLTDTTSWLDMISRVWFARLKKKLRRAR
>positive55|1
MMNQNCFNSCSPLTVDALEPKKSSCAAKCIQVNGPLIVGAGPSGLATAAVLKQYSVPYVIIERADCIASLWQHKTYDRLRLNVPRQYCELPGLPFPPDFPEYPTKNQFISYLVSYAKHFEIKPQLNESVNLAGYDETCGLWKVKTVSEINGSTSEYMCKWLIVATGENAEMIVPEFEGLQDFGGQVIHACEYKTGEYYTGENVLAVGCGNSGIDISLDLSQHNANPFMVVRSSVQGRNFPEEINIVPAIKKFTQGKVEFVNGQILEIDSVILATGYTSNVTSWLMESEFFSREGCPKSPFPNGWKGEDGLYAVGFTGIGLFGASIDATNVAQDIAKIWKEQM
>positive56|1
MKLLSKTFLILTLTFFFFGIALAKQSFEPEIEALKSFKNGISNDPLGVLSDWTIIGSLRHCNWTGITCDSTGHVVSVSLLEKQLEGVLSPAIANLTYLQVLDLTSNSFTGKIPAEIGKLTELNQLILYLNYFSGSIPSGIWELKNIFYLDLRNNLLSGDVPEEICKTSSLVLIGFDYNNLTGKIPECLGDLVHLQMFVAAGNHLTGSIPVSIGTLANLTDLDLSGNQLTGKIPRDFGNLLNLQSLVLTENLLEGDIPAEIGNCSSLVQLELYDNQLTGKIPAELGNLVQLQALRIYKNKLTSSIPSSLFRLTQLTHLGLSENHLVGPISEEIGFLESLEVLTLHSNNFTGEFPQSITNLRNLTVLTVGFNNISGELPADLGLLTNLRNLSAHDNLLTGPIPSSISNCTGLKLLDLSHNQMTGEIPRGFGRMNLTFISIGRNHFTGEIPDDIFNCSNLETLSVADNNLTGTLKPLIGKLQKLRILQVSYNSLTGPIPREIGNLKDLNILYLHSNGFTGRIPREMSNLTLLQGLRMYSNDLEGPIPEEMFDMKLLSVLDLSNNKFSGQIPALFSKLESLTYLSLQGNKFNGSIPASLKSLSLLNTFDISDNLLTGTIPGELLASLKNMQLYLNFSNNLLTGTIPKELGKLEMVQEIDLSNNLFSGSIPRSLQACKNVFTLDFSQNNLSGHIPDEVFQGMDMIISLNLSRNSFSGEIPQSFGNMTHLVSLDLSSNNLTGEIPESLANLSTLKHLKLASNNLKGHVPESGVFKNINASDLMGNTDLCGSKKPLKPCTIKQKSSHFSKRTRVILIILGSAAALLLVLLLVLILTCCKKKEKKIENSSESSLPDLDSALKLKRFEPKELEQATDSFNSANIIGSSSLSTVYKGQLEDGTVIAVKVLNLKEFSAESDKWFYTEAKTLSQLKHRNLVKILGFAWESGKTKALVLPFMENGNLEDTIHGSAAPIGSLLEKIDLCVHIASGIDYLHSGYGFPIVHCDLKPANILLDSDRVAHVSDFGTARILGFREDGSTTASTSAFEGTIGYLAPEFAYMRKVTTKADVFSFGIIMMELMTKQRPTSLNDEDSQDMTLRQLVEKSIGNGRKGMVRVLDMELGDSIVSLKQEEAIEDFLKLCLFCTSSRPEDRPDMNEILTHLMKLRGKANSFREDRNEDREV
>positive68|1
MASSSSSPSSRRYDVFPSFSGVDVRKTFLSHLIEALDRRSINTFMDHGIVRSCIIADALITAIREARISIVIFSENYASSTWCLNELVEIHKCYKKGEQMVIPVFYGVDPSHVRKQIGGFGDVFKKTCEDKPEDQKQRWVKALTDISNLAGEDLRNGPTEAFMVKKIANDVSNKLFPLPKGFGDFVGIEDHIKAIKSILCLESKEARIMVGIWGQSGIGKSTIGRALFSQLSSQFHHRAFITYKSTSGSDVSGMKLSWEKELLSEILGQKDIKIDHFGVVEQRLKHKKVLILLDDVDNLEFLKTLVGKAEWFGSGSRIIVITQDKQLLKAHEIDLVYEVELPSQGLALKMISQYAFGKDSPPDDFKELAFEVAELVGSLPLGLSVLGSSLKGRDKDEWVKMMPRLRNDSDDKIEETLRVGYDRLNKKNRELFKCIACFFNGFKVSNVKELLEDDVGLTMLADKSLIRITPDGDIEMHNLLEKLGREIDRAKSKGNPAKRQFLTNFEDIQEVVTEKTGTETVLGIRVPPTVLFSTRPLLVINEESFKGMRNLQYLEIGHWSEIGLWSEIGLWSKIDLPQGLVYLPLKLKLLKWNYCPLKSLPSTFKAEYLVNLIMKYSKLEKLWEGTLPLGSLKKMDLGCSNNLKEIPDLSLAINLEELNLSKCESLVTLPSSIQNAIKLRTLYCSGVLLIDLKSLEGMCNLEYLSVDWSSMEGTQGLIYLPRKLKRLWWDYCPVKRLPSNFKAEYLVELRMENSDLEKLWDGTQPLGSLKEMYLHGSKYLKEIPDLSLAINLERLYLFGCESLVTLPSSIQNATKLINLDMRDCKKLESFPTDLNLESLEYLNLTGCPNLRNFPAIKMGCSYFEILQDRNEIEVEDCFWNKNLPAGLDYLDCLMRCMPCEFRPEYLTFLDVSGCKHEKLWEGIQSLGSLKRMDLSESENLTEIPDLSKATNLKRLYLNGCKSLVTLPSTIGNLHRLVRLEMKECTGLELLPTDVNLSSLIILDLSGCSSLRTFPLISTRIECLYLENTAIEEVPCCIEDLTRLSVLLMYCCQRLKNISPNIFRLTSLMVADFTDCRGVIKALSDATVVATMEDHVSCVPLSENIEYTCERFWDELYERNSRSIFSYKDEDGDVYWVNWDLMMMLMLI
>positive70|1
MIAEVAAGGALGLALSVLHEAVKRAKDRSVTTRFILHRLEATIDSITPLVVQIDKFSEEMEDSTSRKVNKRLKLLLENAVSLVEENAELRRRNVRKKFRYMRDIKEFEAKLRWVVDVDVQVNQLADIKELKAKMSEISTKLDKIMPQPKFEIHIGWCSGKTNRAIRFTFCSDDS
>positive75|1
MDIVTGAISNLIPKLGELLTEEFKLHKGVKKNIEDLGKELESMNAALIKIGEVPREQLDSQDKLWADEVRELSYVIEDVVDKFLVQVDGIQSDDNNNKFKGLMKRTTELLKKVKHKHGIAHAIKDIQEQLQKVADRRDRNKVFVPHPTRPIAIDPCLRALYAEATELVGIYGKRDQDLMRLLSMEGDDASNKRLKKVSIVGFGGLGKTTLARAVYEKIKGDFDCRAFVPVGQNPDMKKVLRDILIDLGNPHSDLAMLDANQLIKKLHEFLENKRYLVIIDDIWDEKLWEGINFAFSNRNNLGSRLITTTRIVSVSNSCCSSDGDSVYQMEPLSVDDSRMLFSKRIFPDENGCINEFEQVSRDILKKCGGVPLAIITIASALAGDQKMKPKCEWDILLRSLGSGLTEDNSLEEMRRILSFSYSNLPSHLKTCLLYLCVYPEDSMISRDKLIWKWVAEGFVHHENQGNSLYLLGLNYFNQLINRSMIQPIYNYSGEAYACRVHDMVLDLICNLSYEAKFVNLLDGTGNSMSSQSNCRRLSLQKRNEDHQVRPFTDIKSMSRVRSITIFPSAIEVMPSLSRFDVLRVLDLSRCNLGENSSLQLNLKDVGHLTHLRYLGLEGTNISKLPAEIGKLQFLEVLDLGNNRNIKELPSTVCNFRRLIYLNLVGCQVVPPVGLLQNLTAIEVLRGILVSLNIIAQELGKLKSMRELEIRFNDGSLDLYEGFVKSLCNLHHIESLIIGCNSRETSSFEVMDLLGERWVPPVHLREFESSMPSQLSALRGWIKRDPSHLSNLSDLVLPVKEVQQDDVEIIGGLLALRRLWIKSNHQTQRLLVIPVDGFHCIVDFQLDCGSATQILFEPGALPRAESVVISLGVRVAKEDGNRGFDLGLQGNLLSLRRHVFVLIYCGGARVGEAKEAKAALRRAQEAHPDHLRIYIDMRPCIAEGAHDDDLCEGEEEN
>positive81|1
MAPCLVSASTGAMGSLLTKLETMLDDEYILLNVRRDIKFVIHELAMWQSFLLDVADTEEPGQHDKSCADLVRELSYDIEDKIDNSMSLMLHHACPKSGIKKHMSKFKNLLPVKIPYQIAKDIKDIKSQILEVSNRCERYRFEDVCLARTEFVDPRLCTVDTCAADLVGIDGPKHELVKWLRNGEDESVHQQKVVSIVGCAGLGKTTLAKQVYDELRINFEYRAFVSISRSPNMATILKCVLSQFHAQDYSSDESEIPKLVDQIRDLLQDKRYFVIIDDIWDMKTWDVLKCALCKNSCGSVIMTTTRIYDVAKSCCSSNGDLVYNIQPLSVADSEELFLNRVFGHEKGFPPELKEVSKDVLRKCGGLPLAINAISSLLAAEKIEEWDRVGLSNVFAQGEKSDIDAMKYKLSLCYFDLPLHLRSCLLYLIMFPEDCLIEKERLVHRWISEGFIRNEDGEDLVEVGERYLYELVNRSLIESVGVPYDGKARFYRVHNVILDFLMIKSMEENFCTLTSNQSRLDYKVRRLSLFANKDPSCIAQLDLSHARSLGASGHLGQLISSVKSNALRVLDVQDCSELGNHHVKDIGRNPLLRYLNISGTDVTELPIQIGDMGFLETLDASFTELVEMPGSITRLRQLQRLFVSDETKLPDEIGNMKRLQELGDINAFKQSVNFLNELGKLTGLRKLGIIWDTNDILKSGKGSSKEKRLVSSLSKLDAGRLSNLYVTFYLREKDGFIGHPFLPALNSIREVYLRRGRMCWMNKWLLSLANLEKLYISGGDEIEQDDLRTVGSIPTLVEFKLYSGCLGPIIISSGFEQLERLELKFSFSQLTFEVGAMPNLKKLDLHVYLSKFKSAGAGFDFGIQHLSSLACVSIVIFCEGVSAAYVEAAEGAFKSMVNAHPNPNRPMLEMTRESADFMSQDE
>positive88|1
MKTNFVILLLLLCVFAISPSQQEEINQHNPGIYHQKLLYKVQQWRTSLKESNSVELKLSLAAIVAGVLYFLAALISSACGIGSGGLFIPITTLVSRLDLKTGKRFLGQYLIWVILLLGQLHECKSCIEKERVALLDFKKYWMSITQESDLDYVFPTWNNDTKSDCCQWESIMCNPTSGRLIRLHVGASNLKENSLLNISLLHPFEEVRSLELSAGLNGFVDNVEGYKSLRKLKNLEILDLSYNNRFNNNILPFINAATSLTSLSLQNNSMEGPFPFEEIKDLTNLKLLDLSRNILKGPMQGLTHLKKLKALDLSNNVFSSIMELQVVCEMKNLWELDLRENKFVGQLPLCLGRLNKLRVLDLSSNQLNGNLPSTFNRLESLEYLSLLDNNFTGFFSFDPLANLTKLKVFKLSSTSDMLQIKTESEPKYQFQLSVVVIRVCSLEKIPSFLEYQKNLRLVDLSNNRLSGNLPTWLLANNPELKVLQLQDNLFTIFQMPATIVHELQFLDFSVNDISGLLPDNIGYALPNLLRMNGSRNGFQGHLPSSMGEMVNITSLDLSYNNFSGKLPRRFVTGCFSLKHLKLSHNNFSGHFLPRETSFTSLEELRVDSNSFTGKIGVGLLSSNTTLSVLDMSNNFLTGDIPSWMSNLSGLTILSISNNFLEGTIPPSLLAIGFLSLIDLSGNLLSGSLPSRVGGEFGIKLFLHDNMLTGPIPDTLLEKVQILDLRYNQLSGSIPQFVNTESIYILLMKGNNLTGSMSRQLCDLRNIRLLDLSDNKLNGFIPSCLYNLSFGPEDTNSYVGTAITKITPFKFYESTFVVEDFVVISSSFQEIEIKFSMKRRYDSYFGATEFNNDVLDYMYGMDLSSNELSGVIPAELGSLSKLRVMNLSCNFLSSSIPSSFSNLKDIESLDLSHNMLQGSIPQQLTNLSSLVVFDVSYNNLSGIIPQGRQFNTFDEKSYLGNPLLCGPPTNRSCDAKKTSDESENGGEEEDDEAPVDMLAFYFSSASTYVTTLIGIFILMCFDCPLRRAWLRIVDASIASVKSMLP
>positive91|1
MADWAMHHYLLLANQQRHRALADVAVRRRQLLLDSGRVFMLLGAVILMHMLTTTGGGASSGCTRGAEPCVALLLWLLGAALAMLSLVAGRFPVLAAAIAEELGDHLLGGLWSL
>positive98|1
MAEGVVGSLIVKLGDALASEAVEVAKSLLGLEGSALKRLFSEIREVKGELESIHAFLQAAERFKDADETTSAFVKQVRSLALSIEDVVDEFTYELGEGDGRMGMAVALKRMCKMGTWSRLAGNLQDIKVNLKNAAERRIRYDLKGVERGAKSMAGRRSSNWRSDSVLFKREDELVGIEKKRDLLMKWVKDEEQRRMVVSVWGMSGIGKTALVANVYNAIKADFDTCAWITVSQSYEADDLLRRTAQEFRKNDRKKDFPIDVDITNYRGLVETTRSYLENKRYVLVLDDVWNANVWFDSKDAFEDGNIGRIILTSRNYDVALLAHETHIINLQPLEKHHAWDLFCKEAFWKNEIRNCPPELQPWANNFVDKCNGLPIAIVCIGRLLSFQGSTYSDWEKVYKNLEMQLTNNSIMDMMNIILKISLEDLPHNIKNCFLYCSMFPENYVMKRKSLVRLWVAEGFIEETEHRTLEEVAEHYLTELVNRCLLLLVKRNEAGHVHEVQMHDILRVLALSKAHEQNFCIVVNHSRSTHLIGEARRLSIQRGDFAQLADHAPHLRSLLLFQSSPNVSSLQSLPKSMKLLSVLDLTDSSVDRLPKEVFGLFNLRFLGLRRTKISKLPSSIGRLKILLVLDAWKCKIVKLPLAITKLQKLTHLIVTSKAVVVSKQFVPSFDVPAPLRICSMTTLQTLLLMEASSQMVHHLGSLVELRTFRISKVRSCHCEQLFMAITNMIHLTRLGI
>positive101|1
MGTVLDALAWKFLEKLGQLIEDEVIMTLSVKRGIESLKKNLEFFNAVHEDAEALAMEDPGIDSWWKNMRDVMFDVDDIVDLFMVHSQKLLLPPRPVCCNQPLFSSFAKFSFDHMIAKRIDNINEKFEEIKMNKEMFGLERTNRQQIQITIVDRSQTSPVDELEVVGEDIRRAIDDMVKMIVSSNYNESRSTVFGIQGMGGIGKTTLAQKIYNEQRIREKFQVHIWLCISQNYTETSLLKQAIRMAGGICDQLETKTELLPLLVDTIRGKSVFLVLDDVWKSDVWIDLLRLPFLRGLNSHILVTSRNLDVLVEMHATYTHKVNKMNDCDGLELLMKMSLGPYEQSREFSGVGYQIVKKCDGLPLAIKVVAGVLSTKRTRAEWESIRDSKWSIHGLPRELGGPLYLSYSNLPPELKQCFLWCALLPSNFVIRRDAVAYWWVAEGFVTEVHGYSIHEVAEEYYHELIRRNLLQPRPEFVDKGESTMHDLLRSLGQFLTKDHSIFMNMEYSKALPNLRHLCISNDVEEIPAIEKQKCLRSLLVFDNKNFMKINKDIFRELKHIRVLVLSGTSIQIIPESVGNFLLLRLLDLSYTKIQKLPESIGKLTSLEYLSLHGCIHLDSLPDSLMRLSNISFLELEQTAIDHVPKGVAKLQQLYNLRGVFDSGTGFRLDELQCLSNIQRLRIVKLEKAAPGGSFVLKNCLHLRELWLGCTIGGHDKTYYQANEIERIQQVYELLIPSPSLLYIFLVGFPGVRFPDWLCSEPERKMPNLGHMHLNDCTSCSMLPPAGQMPELLVFKIKGADAIVNMGAELLGKGVNSAKHITIFPKLELLLITNMSNLESWSLNTWNLCGKSEQLVLMPCLKRLFLNDCPKLRALPEDLHRIANLRRIHIEGAHTLQEVDNLPSVLWLKVKNNRCLRRISNLCNLKDLLAQDCPALYQAENLISLKRLYMVDCHNAKQFRMSLLEDQQLAVHVVTVGADGRDIFPDESLYN
>positive111|1
MEGLARETNPSSHHQDFASCASDERPDEPELELASRRRQNGAGNNEHVSENMLLDSSKFGALKRREFFNNLLKNLEDDHPRFLRRQKERIDRVDVKLPAIEVRYNNLFVEAECRVTKGNHLPSLWNSTKGAFSGLVKLLGFETERAKTNVLEDVSGIIKPCRLTLLLGPPGCGKSTLLRALAGKLDKSLKVTGDISYNCYELHEFVPEKTAVYINQHDLHIAEMTVRETLDFSAQCQGVGRRPKILKEVNTRESVAGIIPDADIDLYMKVVAVEASERSLQTDYILKIMGLETCADTMVGDAMRRGISGGQKKRLTTAEMIVGPAKAYFMDEISNGLDSSTTFQIINCFQQLTNISEYTMVISLLQPTPEVFDLFDDLILMAEGKIIYHGPRNEALNFFEECGFKCPERKAAADFLQEILSRKDQEQYWLGPHESYRYISPHELSSMFKENHRGRKLHEQSVPPKSQFGKEALAFNKYSLRKLEMFKACGAREALLMKRNMFVYVFKTGQLAIIALVTMSVFLRTRMTISFTHANYYMGALFFSIFMIMLNGIPEMSMQIGRLPSFYKQKSYYFYSSWAYAIPASVLKVPVSILDSLVWISITYYGIGYTPTVSRFFCQFLILCLLHHSVTSQYRFIASYFQTPIVSFFYLFLALTVFLTFGGFILPKTSMPEWLNWGFWISPMAYAEISIVINEFLAPRWQKESIQNITIGNQILVNHGLYYSWHFYWISFGALLGSILLFYIAFGLALDYRTPTEEYHGSRPTKSLCQQQEKDSTIQNESDDQSNISKAKMTIPTMHLPITFHNLNYYIDTPPEMLKQGYPTRRLRLLNNITGALRPGVLSALMGVSGAGKTTLLDVLAGRKTGGYIEGDIRIGGYPKVQETFVRILGYCEQVDIHSPQLTVEESVTYSAWLRLPSHVDKQTRSKFVAEVLETVELDQIKDVLVGSPQKNGLSMEQRKRLTIAVELVSNPSIILMDEPTTGLDTRSAAIVIRAVKNICETGRTVVCTIHQPSTEIFEAFDELILMKTGGKTIYNGPIGERSCKVIEYFEKISGVPKIKSNCNPATWMMDVTSTSMEVQHNMDFAILYEESSLHREAEDLVEQLSIPLPNSENLRFSHSFAQNGWIQLKACLWKQNITYWRSPQYNLRRIMMTVISALIYGVLFWKHAKVLNNEQDMLSVFGAMYLGFTTIGAYNDQTIIPFSTTERIVMYREKFAGMYSSWSYSFAQAFIEIPYVFIQVVLYTLIVYPSTGYYWTAHKFLWFFYTTFCSILSYVYVGLLLVSITPNVQVATILASFFNTMQTLFSGFILPAPQIPKWWTWLYYLTPTSWALNALLTSQYGNIEKEVKAFGETKSVSIFLNDYFGFHQDKLSIVATVLVAFPFVLIILFSLSIEKLNFQKR
>positive112|1
MDILISVTAKIAEYTVEPVGRQLGYVFFIRSNFQKLKTQVEKLKITRESVQHKIHSARRNAEDIKPAVEEWLKKVDDFVRESDEILANEGGHGGLCSTYLVQRHKLSRKASKMVDEVLEMKNEGESFDMVSYKSVIPSVDCSLPKVPDFLDFESRKSIMEQIMDALSDGNVHRIGVYGMGGVGKTMLVKDILRKIVESKKPFDEVVTSTISQTPDFRSIQGQLADKLGLKFEQETIEGRATILRKRLKMERSILVVLDDVWEYIDLETIGIPSVEDHTGCKILFTTRIKHLISNQMCANKIFEIKVLGKDESWNLFKAMAGDIVDASDLKPIAIRIVRECAGLPIAITTVAKALRNKPSDIWNDALDQLKTVDVGMANIGEMEKKVYLSLKLSYDCLGYEEVKLLFLLCSMFPEDFSIDVEGLHVYAMGMGFLHGVDTVVKGRRRIKKLVDDLISSSLLQQYSEYGCNYVKMHDMVRDVALLIASKNEHVRTLSYVKRSNEEWEEEKLLGNHTAVFIDGLHYPLPKLTLPKVQLLRLVAKYCWEHNKRVSVVETFFEEMKELKGLVVENVNISLMQRPSDVYSLANIRVLRLERCQLLGSIDWIGELKKLEILDFSESNITQIPTTMSQLTQLKVLNLSSCEQLEVIPPNILSKLTKLEELDLETFDGWEGEEWYEGRKNASLSELKCLRHLYALNLTIQDEEIMPENLFLVGKLKLQKFNICIGCESKLKYTFAYKNRIKNFIGIKMESGRCLDDWIKNLLKRSDNVLLEGSVCSKVLHSELVGANNFVSLPNLEKLEIVNAKSLKMIWSNNVPILNSFSKLEEIKIYSCNNLQKVLFPPNMMDILTCLKVLEIKNCDLLEGIFEAQEPISVVESNNLPILNSFSKLEEIRIWSCNNLQKVLFPSNMMGILPCLKVLDIRGCELLEGIFEVQEPISVVESNSVPILNSFSKLEKIRIWSCNNLQKILFPSNMMGILTCLKVLEIRDCELLEGIFEVQEPISVVESNNLPILNSFSKLEEIRIGSCNNLQKVLFPPNMMGILTCLKVLEIRHCNLLEGIFEVQEPISIVEASPILLQNLSSLMLCNLPNLEYVWSKNPYELLSLENIKSLTIDKCPRLRREYSVKILKQLEDVSIDIKQLMKVIEKEKSAHHNMLESKQWETSSSSKDGVLRLGDGSKLFPNLKSLKLYGFVDYNSTHLPMEMLQILFQLVVFELEGAFLEEIFPSNILIPSYMVLRRLALSKLPKLKHLWSEECSQNNITSVLQHLISLRISECGRLSSLLSSIVCFTNLKHLRVYKCDGLTHLLNPSVATTLVQLESLTIEECKRMSSVIEGGSTEEDGNDEMVVFNNLQHLYIFNCSNLTSFYCGRCIIKFPCLRQVDIWNCSEMKVFSLGIVSTPRLKYENFSLKNDYDDERCHPKYPKDMLVEDMNVITREYWEDNVDTGIPNLFAEQSLEENRSENSSSSKNNVEKE
>positive116|1
MEVVTGAMSTLLPLLGDLLKEEYNLQKSTKGEIKFLKAELESMEAALIKISEAPLDQPPNIQVKLWARDVKDLSYEIEDGIDKFRVHLECRQQKKPHSFMGFIHKSMDMLTKGKIRHKIGIDIKDIKSRIKEVSDRRERYKVDSVAPKPTGTSTDTLRQLALFKKAEELIGTKEKSLDIVKMLTEGDEVFKKHLKMVSIVGFGGLGKTTLANVVYEKLRGDFDCAAFVSVSLNPDMKKLFKCLLHQLDKGEYKNIMDESAWSETQLISEIRDFLRDKRYFILIDDIWDKSVWNNIRCALIENECGSRVIATTRILDVAKEVGGVYQLKPLSTSDSGQLFYQRIFGIGDKRPPIQLAEVSEKILGKCGGVPLAIITLASMLAGKKEHENTYTYWYKVYQSMGSGLENNPGLMDMRRILHVSYYDLPPNLKTCLLYLSLYPEDYNIETKELIWKWIGEGFIHEEQGKSLYEVGEDYIAELINKSLVQPMYINIANKASSVRVHDMVLDLITSLSNEENFLATLGGQQTRSLPRKIRRLSLQSSNEEDVQPMPTMSSLSHVRSLTVFSKDLSLLSALSGFLVLRALDLSGCEEVGNHHMKDICNLFHLRYLSLEGTSITEIPKEISNLRLLQLLVIRSTKMKKFPSTFVQLGQLVFIDMGNREVSRLLLKSMSTLPSLSSLAIGIGELREEDLQILGSMPSLHDLSIDVGYWERGRDKRLVIDSGSPFRSLTRFSIKGCGFIDFMFAQGTLQKLQILELSIFGKAIKDRFGDFQFGLENLSSLEHVYVDARGRGIIPSQEAELSGALEKELDINPNKPTLTVKVTPR
>positive118|1
MAEAFIQVLLDNLTSFLKGELVLLFGFQDEFQRLSSMFSTIQAVLEDAQEKQLNNKPLENWLQKLNAATYEVDDILDEYKTKATRFSQSEYGRYHPKVIPFRHKVGKRMDQVMKKLKAIAEERKNFHLHEKIVERQAVRRETGSVLTEPQVYGRDKEKDEIVKILINNVSDAQHLSVLPILGMGGLGKTTLAQMVFNDQRVTEHFHSKIWICVSEDFDEKRLIKAIVESIEGRPLLGEMDLAPLQKKLQELLNGKRYLLVLDDVWNEDQQKWANLRAVLKVGASGASVLTTTRLEKVGSIMGTLQPYELSNLSQEDCWLLFMQRAFGHQEEINPNLVAIGKEIVKKSGGVPLAAKTLGGILCFKREERAWEHVRDSPIWNLPQDESSILPALRLSYHQLPLDLKQCFAYCAVFPKDAKMEKEKLISLWMAHGFLLSKGNMELEDVGDEVWKELYLRSFFQEIEVKDGKTYFKMHDLIHDLATSLFSANTSSSNIREINKHSYTHMMSIGFAEVVFFYTLPPLEKFISLRVLNLGDSTFNKLPSSIGDLVHLRYLNLYGSGMRSLPKQLCKLQNLQTLDLQYCTKLCCLPKETSKLGSLRNLLLDGSQSLTCMPPRIGSLTCLKTLGQFVVGRKKGYQLGELGNLNLYGSIKISHLERVKNDKDAKEANLSAKGNLHSLSMSWNNFGPHIYESEEVKVLEALKPHSNLTSLKIYGFRGIHLPEWMNHSVLKNIVSILISNFRNCSCLPPFGDLPCLESLELHWGSADVEYVEEVDIDVHSGFPTRIRFPSLRKLDIWDFGSLKGLLKKEGEEQFPVLEEMIIHECPFLTLSSNLRALTSLRICYNKVATSFPEEMFKNLANLKYLTISRCNNLKELPTSLASLNALKSLKIQLCCALESLPEEGLEGLSSLTELFVEHCNMLKCLPEGLQHLTTLTSLKIRGCPQLIKRCEKGIGEDWHKISHIPNVNIYI
>positive120|1
MAEVVLAGLRLAATPICVKLLCNASTCLGVDMTRELHELETIIIPQFELVIEAAEKGNHRAKLDRWLRELKQAFYNAEDLLDEHEYNILKCKAKHKDSLVKDSTQVHDSSISNILKQPMRAVSSRMSNLRPENRKILCQLNELKTMLEKAKEFRELIHLPAGNSLEGPSVPTIVVPVVTSLLPPRVFGRNMDRDRIIHLLTKPMATVSSSVGYSGLAIVAHGGAGKSTLAQCVYNDKRAQEHFDVRMWVCISRKLDVHRHTREIIESATNGECPRVDNLDTLQCRLKDIMQKSEKFLLVLDDVWFDESVNEREWDQLLDPLVSQQEGSRVLVTSRRDVLPAALHCKDVVHLENMEDAEFLALFKYHAFSGTEIRNPQLHARLEEVAEKIAKRLGQSPLAARTVGSQLSRNKDIAIWKSALNIENLSEPMKALLWSYNKLDSRLQRCFLYCSLFPKGHKYKIDEMVDLWVAEGLVDSRNQGDKRIEDIGRDYFNEMVSGSFFQPVSERYMGTWYIMHDLLHDLAESLTKEDCFRLEDDGVKEIPATVRHLSICVDSMKFHKQKICKLRYLRTVICIDPLMDDGDDIFNQLLKNLKKLRVLHLSFYNSSSLPECIGELKHLRYLSIISTLISELPRSLCTLFHLELLHLNDKVKNLPDRLCNLRKLRRLEAYDDRNRMYKLYRAALPQIPYIGKLSLLQDIDGFCVQKQKGYELRQLRDMNKLGGNLRVVNLENVTGKDEASESKLHQKTHLRGLHLSWNDVDDMDVSHLEILEGLRPPSQLEDLTIEGYKSTMYPSWLLDGSYFENLESFTLANCCVIGSLPPNTEIFRHCMTLTLENVPNMKTLPFLPEGLTSLSIEGCPLLVFTTNNDELEHHDYRESITRANNLETQLVLIWEANSDSDIRSTLSSEHSSMKKLTELMDTDMSGNLQTIESALEIERDEALVKEDIIKVWLCCHEERMRFIYSRKAGLPLVLPSGLCVLSLSSCSITDGALAICLGGLTSLRNLFLTEIMTLTTLPPEEVFQHLGNLRYLVIRSCWCLRSFGGLRSATSLSEIRLFSCPSLQLARGAEFMQMSLEKLCVYNCVLSADFFCGDWPHLDDILLSGCRSSSSLHVGDLTSLESFSLYHFPDLCTLEGLSSLQLHHVHLIDVPKLTTESISQFRVQRSLYISSSVMLNHMLSAEGFVVPEFLSLESCKEPSVSFEESANFTSVKCLRLCNCEMRSPPGNMKCLSSLTKLDIYDCPNISSIPDLPSSLQHICIWGCELLKESCRAPEGESWPKIAHIRWKEFR
>positive125|1
MSFDSFISFRGEDTRNTFTGHLYKELVGLGITTFMDDKKLLIGDSLSEKLIKAIENSDSFIVVLSENYASSKWCLRELAKIIDCTDEQKHRVLLPVFYHVNPRDVRRQSGCFENSFRLHEELLRELDHMERDKYMEEVQQWRRAFTKVGDLTGVVVTKDCVEVDSIGKITNQLLDMLLHHQKLVPWDELTKLVDIERQLFKMEKLNDLEPNVVRFIGIIGMGGIGKTTIAEVFYDRVARFFGKNRCFLRIYEHTTLLSLQQQLLSQLLQTKDLIINNENEGARMIGSRLKDKRVLIVLDGVKEKTQLEQLVGNPNWFGSGSKIIITTRNRDVLRQPNYKDKMVEYSVEFLDTKSAMTLFCKHAFGCGFPSKNFEDFSKEIVERVEGHPQALIQIGSSLYDKGIEIWKEELKSLEEDYNNRIFKTLKISFDDLGKTSQEVFLDLACFFNEKTKEKVIEILKSFDYRPHSEIQLLQDRCLIEVRSDNTILMPKCIQTMGQQIEREADKRSRIWLPKDAQDVFDEPHQRVKDIKGVVLKLEEKQDEIELEGKVFEDMRSLKILEIGNVEVSGDFTHLSKQLRLLNWHSYPSQCLPLSFESRYLFQLLLPLSQTRQLWNGQKVGFEKLKVINVSGSKNLRETPNFTKVPNLESLDLSNCTRLWKIDSSIIRLNRLTLLDITCCITLKNLPFSRSCKSLITINYVGSGLEEKGTCNFHYGK
>positive127|1
MRRCGYSLGLGEPNLDGKPNLDYDAVCRPSELHALKKGALDYIQNSENQILFTIHQIFESWIFSSKKLLDRISERISKEEFTKAADDCWILEKIWKLLEEIENLHLLMDPDDFLHLKTQLRMKTVADSETFCFRSKGLIEVTKLSKDLRHKVPKILGVEVDPMGGPVIQESAMELYREKRRYEKIHLLQAFQGVESAVKGFFFNYKQLLVIMMGSLEAKANFAVIGGSTESSDLLAQLFLEPTYYPSLDGAKTFIGDCWEHDQAVGSGLDCRHHRKNRTAKQ
>positive129|1
MERRLMIPCFFWLILVLDLVLRVSGNAEGDALSALKNSLADPNKVLQSWDATLVTPCTWFHVTCNSDNSVTRVDLGNANLSGQLVMQLGQLPNLQYLELYSNNITGTIPEQLGNLTELVSLDLYLNNLSGPIPSTLGRLKKLRFLRLNNNSLSGEIPRSLTAVLTLQVLDLSNNPLTGDIPVNGSFSLFTPISFANTKLTPLPASPPPPISPTPPSPAGSNRITGAIAGGVAAGAALLFAVPAIALAWWRRKKPQDHFFDVPAEEDPEVHLGQLKRFSLRELQVASDNFSNKNILGRGGFGKVYKGRLADGTLVAVKRLKEERTQGGELQFQTEVEMISMAVHRNLLRLRGFCMTPTERLLVYPYMANGSVASCLRERPESQPPLDWPKRQRIALGSARGLAYLHDHCDPKIIHRDVKAANILLDEEFEAVVGDFGLAKLMDYKDTHVTTAVRGTIGHIAPEYLSTGKSSEKTDVFGYGVMLLELITGQRAFDLARLANDDDVMLLDWVKGLLKEKKLEALVDVDLQGNYKDEEVEQLIQVALLCTQSSPMERPKMSEVVRMLEGDGLAERWEEWQKEEMFRQDFNYPTHHPAVSGWIIGDSTSQIENEYPSGPR
>positive131|1
MAASFCGSRRYDVFPSFSKVDVRRSFLAHLLKELDRRLINTFTDHGMERNLPIDAELLSAIAESRISIVIFSKNYASSTWCLDELVEIHTCYKELAQIVVPVFFNVHPSQVKKQTGEFGKVFGKTCKGKPENRKLRWMQALAAVANIAGYDLQNWPDEAVMIEMVADDVSKKLFKSSNDFSDIVGIEAHLEAMSSILRLKSEKARMVGISGPSGIGKTTIAKALFSKLSPQFHLRAFVTYKRTNQDDYDMKLCWIEKFLSEILGQKDLKVLDLGAVEQSLMHKKVLIILDDVDDLELLKTLVGQTGWFGFGSRIVVITQDRQLLKAHDINLIYEVAFPSAHLALEIFCQSAFGKIYPPSDFRELSVEFAYLAGNLPLDLRVLGLAMKGKHREEWIEMLPRLRNDLDGKFKKTLRNYLPVIRKRVSNEEGGREKLKKGNKKLDLDEEFPGGEIYSDEIPSPTSNWKDTDDFDSGDIIPIIADKSTTIIPNRRHSNDDWCSFCEFLRNRIPPLNPFKCSANDVIDFLRTRQVLGSTEALVDRLIFSSEAFGIKPEENPFRSQAVTSYLKAARDMTREKECILVFSCHDNLDVDETSFIEAISKELHKQGFIPLTYNLLGRENLDEEMLYGSRVGIMILSSSYVSSRQSLDHLVAVMEHWKTTDLVIIPIYFKVRLSDICGLKGRFEAAFLQLHMSLQEDRVQKWKAAMSEIVSIGGHEWTKGSQFILAEEVVRNASLRLYLKSSKNLLGILALLNHSQSTDVEIMGIWGIAGIGKTSIAREIFELHAPHYDFCYFLQDFHLMCQMKRPRQLREDFISKLFGEEKGLGASDVKPSFMRDWFHKKTILLVLDDVSNARDAEAVIGGFGWFSHGHRIILTSRSKQVLVQCKVKKPYEIQKLSDFESFRLCKQYLDGENPVISELISCSSGIPLALKLLVSSVSKQYITNMKDHLQSLRKDPPTQIQEAFRRSFDGLDENEKNIFLDLACFFRGQSKDYAVLLLDACGFFTYMGICELIDESLISLVDNKIEMPIPFQDMGRIIVHEEDEDPCERSRLWDSKDIVDVLTNNSGTEAIEGIFLDASDLTCELSPTVFGKMYNLRLLKFYCSTSGNQCKLTLPHGLDTLPDELSLLHWENYPLVYLPQKFNPVNLVELNMPYSNMEKLWEGKKNLEKLKNIKLSHSRELTDILMLSEALNLEHIDLEGCTSLIDVSMSIPCCGKLVSLNMKDCSRLRSLPSMVDLTTLKLLNLSGCSEFEDIQDFAPNLEEIYLAGTSIRELPLSIRNLTELVTLDLENCERLQEMPSLPVEIIRRT
>positive135|1
MAEIAVLLVLKKIAIALAGETLSFAKPLLAKKSESVAALPDDMKLISNELELIRAFLKEIGRKGWKSEVIETWIGQVRRLAYDMEDTVDHFIYVVGTHDQMGSCWDYMKKIAKKPRRLVSLDEIASEIKKIKQELKQLSESRDRWTKPLDGGSGIPAGSYETEKEMYLPGHDYTISDEELAGIDENKQTLISSLKFEDPSLRIIAVWGMGGVGKSTLVNNVYKNEGSNFDCRAWVSISQSYRLEDIWKKMLTDLIGKDKIEFDLGTMDSAELREQLTKTLDKRQYLIILDDVWMANVFFKIKEVLVDNGLGSRVIITTRIEEVASLAKGSCKIKVEPLGVDDSWHVFCRKAFLKDENHICPPELRQCGINIVEKCDGLPLALVAIGSILSLRPKNVDEWKLFYDQLIWELHNNENLNRVEKIMNLSYKYLPDYLKNCFLYCAMFPEDYLIHRKRLIRLWIAEGFIEQKGACSLEDTAESYLKELIRRSMLHVAERNCFGRIKCIRMHDLVRELAIFQSKREGFSTTYGGNNEAVLVGSYSRRVAVLQCSKGIPSTIDPSRLRTLITFDTSRALSVWYSSISSKPKYLAVLDLSSLPIETIPNSIGELFNLRLLCLNKTKVKELPKSITKLQNLQTMSLENGELVKFPQGFSKLKKLRHLMVSRLQDVTFSGFKSWEAVEPFKGLWTLIELQTLYAITASEVLVAKLGNLSQLRRLIICDVRSNLCAQLCGSLSKLCQLSRLTIRACNEDEVLQLDHLTFPNPLQTLSLDGRLSEGTFKSPFFLNHGNGLLRLMLFYSQLSENPVPHLSELSNLTRLSLIKAYTGQELYFQAGWFLNLKELYLKNLSRLNQIDIQEGALASLERITMKHLPELREVPVGFRFLKSLKTIFFSDMHPEFESSFQKEM
>positive140|1
MGSKYSKATNSINDASNLSYGVPFENYRVPFVDLEEATNNFDDNFFIGEGGFGKVYRGVLRDGTKVALKKHKPESSQGIEEFETEIEILSFCSHPHLVSLIGFCDERNEMILIYDYMENGNLKSHLYGSDLPSMSWEQRLEICIGAARGLHYLHKNAVIHRDVKCTNILLDENFVPKITDFGISKTMPELDQTHLSTVVRGNIGYIAPEYALWGQLTEKSDVYSFGVVLFEVLCARPALDRSEIMSLDDETQKMGQLEQIVDPTIAAKIRPESLRMFGETAIKCLAPSSKNRPSMGDVLWKLEYALCLQEPTIQDDPE
>positive149|1
MAAAEMERTMSFDAAEKLKAADGGGGEVDDELEEGEIVEESNDTASYLGKEITVKHPLEHSWTFWFDNPTTKSRQTAWGSSLRNVYTFSTVEDFWGAYNNIHHPSKLIMGADFHCFKHKIEPKWEDPVCANGGTWKMSFSKGKSDTSWLYTLLAMIGHQFDHGDEICGAVVSVRAKGEKIALWTKNAANETAQVSIGKQWKQFLDYSDSVGFIFHDDAKRLDRNAKNRYTV
>positive152|1
MELPRNKVAGSNQGNENLKAKAKWGSDIKKFSEHQIKRITKNYSTHVGKGAFGEVFRGFLDDGSPVAVKKYIHQNMKEWFDKEITIHCQVNHKNIVKLLGYCSEENALMMLTEYIPRGNLKDLLHGSDDPISFEARLCIAIDCAEALAFMHSMSPPIIHGDIKPDNILLDDNLGAKLADFGISRLLSMDNTHFTMNVIGSRGYMDPEHIETGRVDPKIDVYSFGVVLVELVTRDMASQNGICNGLARNFIGASLTKNNFFSEAFGKQKKAREMFDIQIANMSNMEVLDKFGELAVECLRRDIKKRPEMNHVLERLRMLGKDHEKGQDRVKEHGVLPPFSSSQPKGRLETGRKSSSSDHERLFSQEVVEQEEKNQKYFTWRTSIANGPPESFYDWIRGNDLEIPNQRSPDEVFSRGRWRLLTCQNGLRIFEVLEPAVYLARAIGKAMKAVGVIDASSEAIFQLVMSMDDTRHKWDCSYKYGSLVEEVDGHTAILYHRLRLDWFLTFVWPRDLCYVRHWRRYYDGSYVVLFQSREHPNCGPQPGFVRAHVEIGGFRISPLKSHEGRPRTQVQYLMQIDLKGWGVGYLSSFQQHCVLRMLNTIAELREWFSRSDDRPISAKASLTMDQSKCTTILEEEFDEDEWLSQSDESQ
>negative212|0
MGSQAGREASIFDSAVKGDTTILDRANIAKYGESSYLSHTPSLGNNIIHIAARRGKTTFVDAAIRLFPDLLWQKNNNDNTVVHEAAAEMGTKDCVRLLISYLKSSMSSSTTSSVSDHNRSNATAIAVPFLLERNCDGETPFHVALRCRNLAAAEELFADVKTHQQILLIKNNSGETPLHLYARYCAGGTFPKDLEHDDNHNVVDKPKFIDALIISNSAAIFEQDSDGFIPVMRAAQYGRVFAVLRMLLSYKQSTECRDLKGMNVLHHLRLRIADLNARDKDYTFWICEKILELHGVDTLIFLQDKDGNTPLHLAIMDRDYDIAELILKRYVEAYLKERKELIVSIKNKEGKTVLDLITFVPDIPITLRKLMEQSSMGCMMPEELYEAAKMGYVGVFGPQTSEVSNLSQETGSTTKNTQDDEYFLSQDVDGRNILHIALEHEREVFITTAIKSFPNLMYHKDSNGDTPLHIAARLRSNHSAYELLRESFHYWYANNRQSYDVTLRVPPWKVTNSRSNTPLHEAARTSNYSSLYTFIANCITPEIRSEAMSDVNEDGETVLHLIARYGQKLDAVLDHEHQHSLSVTEVLRESITSVYMRDRDGLTPILRAAHCGQIWVVARIGKMYPKSIYISDYKGKTVLHHLCSSALDVVNELEIVLQLWEQVSETFPNGDDLMFSQDHDGNTPLHLAIMAQNFRVVHYFIRYYIKVKSSLRNELMGLMNNDGKSIMNLITCGSDIPPNLKKLIKQASMETPMEDDIYEAATKGDINVINEHNRLQCQAYDGSNILHIAVRHRKLSLLMAVVEKNYLYHLIYWEDSKGDTPLHVAAGIQAPEALQFVEQCVKKWKPFYGLLPWTVRNMKGNTVLHEAARFCNYEVILSALKYATDINDVKLNQKIIKDVNDDGETVLHIIARYATYKAKDFLEKVSRELKSLVYMRDYEGFTPVLRAVQCGRLGVARLLVQRYPQSVEIADNKGRTILHHMRSLVVDLVDDLKDIVPVWKDILNYPEADNLRNAQDEDGNTPLHLSIADANVAKAKFLIERCLESTNKQELDINNNDGHTVYDLLSSRSHIPATKELQQLVSRSYVKERALVYDMMEDDLYNAAVKGDVQIFVKVPMDLESQPPTSVEAYFCRQTPGGSNIIHIALRHGSPKVKQFVKTALTHYPILSMRPDNNGDTPLHLAAKWKTGLSSVEVLIEASKCFINELGESDKAFYVAPWKVKNYKGNFPIHEALQSNNLKAAENLLGCDVEATSRVNDLGETPLHAFAKNGFAINNKKEAEKFVEKLIIAKTKEDESSNTASYIRDDEGLNQNAYNSAYIQDDEGLSQNAYNSAYIRDDEGLTPLLRAARSGRLEVVRAILTHCPQSAYLRDPWGRTFLHLLRFTGEDIDESLDGNFQKTGKELFVLREADSQRLVQDYEGNTPLHYAIKTQNSIAAIVLTQRCLEDEEHKELGLVNRDGQTVLDLLALHDVPSEIIQQIRKKLPKEVYLARSSYGIRNTETNQSANALSVVAALLATITFAAGLQVPGGFDSDDGSPVLLKNAVFAAFMVANTIAMCCSMLCLFLLLWVGIRKSHGSLMILDISIILLEFSFYLTMLAFTLGVFVVTLQKSLWLAVLVCVLSFITFLLTWKCSIKLVIKFVESVVAAGKKLASCCASKSADTEGACVGDNRTEK
>positive29|1
MAEILLTSVINKSVEIAGNLLIQEGKRLYWLKEDIDWLQREMRHIRSYVDNAKAKEAGGDSRVKNLLKDIQELAGDVEDLLDDFLPKIQQSNKFNYCLKRSSFADEFAMEIEKIKRRVVDIDRIRKTYNIIDTDNNNDDCVLLDRRRLFLHADETEIIGLDDDFNMLQAKLLNQDLHYGVVSIVGMPGLGKTTLAKKLYRLIRDQFECSGLVYVSQQPRASEILLDIAKQIGLTEQKMKENLEDNLRSLLKIKRYVFLLDDIWDVEIWDDLKLVLPECDSKVGSRIIITSRNSNVGRYIGGESSLHALQPLESEKSFELFTKKIFNFDDNNSWANASPDLVNIGRNIVGRCGGIPLAIVVTAGMLRARERTEHAWNRVLESMGHKVQDGCAKVLALSYNDLPIASRPCFLYFGLYPEDHEIRAFDLINMWIAEKFIVVNSGNRREAEDLAEDVLNDLVSRNLIQLAKRTYNGRISSCRIHDLLHSLCVDLAKESNFFHTAHDAFGDPGNVARLRRITFYSDNVMIEFFRSNPKLEKLRVLFCFAKDPSIFSHMAYFDFKLLHTLVVVMSQSFQAYVTIPSKFGNMTCLRYLRLEGNICGKLPNSIVKLTRLETIDIDRRSLIQPPSGVWESKHLRHLCYRDYGQACNSCFSISSFYPNIYSLHPNNLQTLMWIPDKFFEPRLLHRLINLRKLGILGVSNSTVKMLSIFSPVLKALEVLKLSFSSDPSEQIKLSSYPHIAKLHLNVNRTMALNSQSFPPNLIKLTLANFTVDRYILAVLKTFPKLRKLKMFICKYNEEKMDLSGEANGYSFPQLEVLHIHSPNGLSEVTCTDDVSMPKLKKLLLTGFHCRISLSERLKKLSK
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
import joblib
import pandas as pd
from prediction.get_feature import fasta,GnerateFeatures
import argparse
def predict(inputfasta,outfile):
seq_dict,id_list,seq_list = fasta(inputfasta)
feature_sel= GnerateFeatures(seq_dict)
model = joblib.load(open('./prediction/BiLSTM_unirep_model_lgb.pkl', 'rb'))
x=pd.read_csv("./prediction/BiLSTM_unirep_lgbm_feature.csv")
x2=x.iloc[:,1:]
y=pd.read_csv("./prediction/train_label.csv")
y2=y.iloc[:,1:]
model.fit(x2,y2)
pred_proba = model.predict_proba(feature_sel)
id3=[]
for i in range(len(id_list)):
id2 = []
id2.append(str(id_list[i]))
id2.append(str(round(float(pred_proba[:,1][i]),3)))
id3.append(id2)
col = ["Sequence_ID", "R protein possibility"]
result2=pd.DataFrame(data=id3,columns=col)
result2.to_csv(outfile)
if __name__=="__main__":
parser = argparse.ArgumentParser(
'Script for predicting plant R protein using deep representation learning features')
parser.add_argument('-i', type=str, help='input sequences in Fasta format')
parser.add_argument('-o', type=str, help='path to saved CSV file')
args = parser.parse_args()
inputfasta= args.i
outfile = args.o
predict(inputfasta,outfile)
This source diff could not be displayed because it is too large. You can view the blob instead.
embbed_models @ 042c4d78
Subproject commit 042c4d787140a2504b74b1cdb19c5586ba1c170f
UniRep_F232,BiLSTM_F3253,BiLSTM_F231,UniRep_F1029,UniRep_F1338,UniRep_F747,BiLSTM_F750,BiLSTM_F2662,UniRep_F774,UniRep_F1757,UniRep_F650,UniRep_F1574,BiLSTM_F1420,BiLSTM_F77,UniRep_F662,BiLSTM_F1270,BiLSTM_F1404,BiLSTM_F1775,BiLSTM_F213,UniRep_F57,BiLSTM_F101,UniRep_F1800,UniRep_F1827,UniRep_F682,BiLSTM_F1758,UniRep_F204,UniRep_F861,BiLSTM_F2557,UniRep_F1793,BiLSTM_F655,BiLSTM_F1786,BiLSTM_F853,BiLSTM_F1448,UniRep_F4,BiLSTM_F671,UniRep_F510,BiLSTM_F1116,UniRep_F1754,BiLSTM_F482,UniRep_F212,BiLSTM_F1526,UniRep_F945,UniRep_F1573,BiLSTM_F1912,UniRep_F1241,BiLSTM_F1677,BiLSTM_F74,BiLSTM_F1696,BiLSTM_F524,UniRep_F814,BiLSTM_F1124,BiLSTM_F528,UniRep_F1299,UniRep_F408,UniRep_F1736,BiLSTM_F415,UniRep_F868,BiLSTM_F596,UniRep_F1818,UniRep_F1369,UniRep_F529,UniRep_F472,UniRep_F165,UniRep_F860,UniRep_F1376,BiLSTM_F3362,UniRep_F1214,BiLSTM_F22,UniRep_F1187,UniRep_F1182,UniRep_F1576,UniRep_F156,UniRep_F1095,UniRep_F1082,BiLSTM_F2843,UniRep_F1780,UniRep_F322,UniRep_F1791,BiLSTM_F154,BiLSTM_F252,UniRep_F51,BiLSTM_F3119,UniRep_F1843,UniRep_F660,UniRep_F1459,BiLSTM_F3485,BiLSTM_F2284,BiLSTM_F974,UniRep_F1336,BiLSTM_F1002,BiLSTM_F3578,BiLSTM_F865,BiLSTM_F969,BiLSTM_F3463,BiLSTM_F1066,UniRep_F657,BiLSTM_F1233,BiLSTM_F2256,UniRep_F538,UniRep_F1389,BiLSTM_F1481,BiLSTM_F1997,UniRep_F72,UniRep_F314,UniRep_F288,UniRep_F1625,UniRep_F97,BiLSTM_F1631,UniRep_F237,BiLSTM_F1705,BiLSTM_F1707,BiLSTM_F1740,BiLSTM_F1755,UniRep_F201,UniRep_F1423,UniRep_F183,BiLSTM_F2003,UniRep_F337,BiLSTM_F1424,UniRep_F49,BiLSTM_F1096,BiLSTM_F2179,BiLSTM_F1186,UniRep_F1711,UniRep_F1709,UniRep_F25,UniRep_F32,UniRep_F1581,UniRep_F1615,UniRep_F409,BiLSTM_F2134,BiLSTM_F2117,BiLSTM_F1339,UniRep_F362,BiLSTM_F2097,BiLSTM_F1414,UniRep_F1335,UniRep_F1229,UniRep_F1024,BiLSTM_F75,UniRep_F980,UniRep_F1877,BiLSTM_F2869,UniRep_F1840,BiLSTM_F87,BiLSTM_F525,BiLSTM_F2725,BiLSTM_F336,UniRep_F1065,BiLSTM_F2711,UniRep_F879,UniRep_F1817,UniRep_F889,BiLSTM_F3019,UniRep_F1090,BiLSTM_F3181,UniRep_F902,UniRep_F904,BiLSTM_F370,UniRep_F1832,UniRep_F1093,UniRep_F925,BiLSTM_F110,BiLSTM_F2747,BiLSTM_F2877,BiLSTM_F273,BiLSTM_F2879,BiLSTM_F166,BiLSTM_F794,UniRep_F703,UniRep_F704,BiLSTM_F2,BiLSTM_F3382,UniRep_F749,UniRep_F770,UniRep_F1215,BiLSTM_F2837,BiLSTM_F38,UniRep_F919,UniRep_F790,UniRep_F1539,BiLSTM_F3312,BiLSTM_F696,BiLSTM_F2839,BiLSTM_F677,UniRep_F1478,UniRep_F1149,BiLSTM_F53,BiLSTM_F3114,BiLSTM_F3051,BiLSTM_F3053,UniRep_F62,UniRep_F73,BiLSTM_F3423,UniRep_F92,UniRep_F129,UniRep_F155,UniRep_F152,BiLSTM_F2987,BiLSTM_F2991
from __future__ import print_function, division
import numpy as np
class Alphabet:
def __init__(self, chars, encoding=None, mask=False, missing=255):
self.chars = np.frombuffer(chars, dtype=np.uint8)
self.encoding = np.zeros(256, dtype=np.uint8) + missing
if encoding is None:
self.encoding[self.chars] = np.arange(len(self.chars))
self.size = len(self.chars)
else:
self.encoding[self.chars] = encoding
self.size = encoding.max() + 1
self.mask = mask
if mask:
self.size -= 1
def __len__(self):
return self.size
def __getitem__(self, i):
return chr(self.chars[i])
def encode(self, x):
""" encode a byte string into alphabet indices """
x = np.frombuffer(x, dtype=np.uint8)
return self.encoding[x]
def decode(self, x):
""" decode index array, x, to byte string of this alphabet """
string = self.chars[x]
return string.tobytes()
def unpack(self, h, k):
""" unpack integer h into array of this alphabet with length k """
n = self.size
kmer = np.zeros(k, dtype=np.uint8)
for i in reversed(range(k)):
c = h % n
kmer[i] = c
h = h // n
return kmer
def get_kmer(self, h, k):
""" retrieve byte string of length k decoded from integer h """
kmer = self.unpack(h, k)
return self.decode(kmer)
DNA = Alphabet(b'ACGT')
class Uniprot21(Alphabet):
def __init__(self, mask=False):
chars = alphabet = b'ARNDCQEGHILKMFPSTWYVXOUBZ'
encoding = np.arange(len(chars))
encoding[21:] = [11,4,20,20] # encode 'OUBZ' as synonyms
super(Uniprot21, self).__init__(chars, encoding=encoding, mask=mask, missing=20)
class SDM12(Alphabet):
"""
A D KER N TSQ YF LIVM C W H G P
See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732308/#B33
"Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment"
Peterson et al. 2009. Bioinformatics.
"""
def __init__(self, mask=False):
chars = alphabet = b'ADKNTYLCWHGPXERSQFIVMOUBZ'
groups = [b'A',b'D',b'KERO',b'N',b'TSQ',b'YF',b'LIVM',b'CU',b'W',b'H',b'G',b'P',b'XBZ']
groups = {c:i for i in range(len(groups)) for c in groups[i]}
encoding = np.array([groups[c] for c in chars])
super(SDM12, self).__init__(chars, encoding=encoding, mask=mask)
SecStr8 = Alphabet(b'HBEGITS ')
# Default ignored files
/shelf/
/workspace.xml
<component name="InspectionProjectProfileManager">
<settings>
<option name="USE_PROJECT_PROFILE" value="false" />
<version value="1.0" />
</settings>
</component>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.8" project-jdk-type="Python SDK" />
</project>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectModuleManager">
<modules>
<module fileurl="file://$PROJECT_DIR$/.idea/src.iml" filepath="$PROJECT_DIR$/.idea/src.iml" />
</modules>
</component>
</project>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$" />
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
</module>
\ No newline at end of file
from __future__ import print_function,division
import sys
import os
#sys.path.append('./embedding_model/')
sys.path.append(os.path.join(os.pardir, os.pardir))
import pandas as pd
import torch
import torch.nn as nn
import time
import logging
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
logging.info(DEVICE)
from preprocessing.alphabets import Uniprot21
import warnings
warnings.filterwarnings('ignore')
def unstack_lstm(lstm):
device = next(iter(lstm.parameters())).device
in_size = lstm.input_size
hidden_dim = lstm.hidden_size
layers = []
for i in range(lstm.num_layers):
layer = nn.LSTM(in_size, hidden_dim, batch_first=True, bidirectional=True)
layer.to(device)
attributes = ['weight_ih_l', 'weight_hh_l', 'bias_ih_l', 'bias_hh_l']
for attr in attributes:
dest = attr + '0'
src = attr + str(i)
getattr(layer, dest).data[:] = getattr(lstm, src)
dest = attr + '0_reverse'
src = attr + str(i) + '_reverse'
getattr(layer, dest).data[:] = getattr(lstm, src)
layer.flatten_parameters()
layers.append(layer)
in_size = 2*hidden_dim
return layers
def embed_stack(x, lm_embed, lstm_stack, proj, include_lm=True, final_only=False):
zs = []
x_onehot = x.new(x.size(0),x.size(1), 21).float().zero_()
x_onehot.scatter_(2,x.unsqueeze(2),1)
zs.append(x_onehot)
h = lm_embed(x)
if include_lm and not final_only:
zs.append(h)
if lstm_stack is not None:
for lstm in lstm_stack:
h,_ = lstm(h)
if not final_only:
zs.append(h)
if proj is not None:
h = proj(h.squeeze(0)).unsqueeze(0)
zs.append(h)
z = torch.cat(zs, 2)
return z
def embed_sequence(x, lm_embed, lstm_stack, proj, include_lm=True, final_only=False
, pool='none', use_cuda=False):
if len(x) == 0:
return None
alphabet = Uniprot21()
x = x.upper()
x = alphabet.encode(x)
x = torch.from_numpy(x)
if use_cuda:
x = x.to(DEVICE)
with torch.no_grad():
x = x.long().unsqueeze(0)
z = embed_stack(x, lm_embed, lstm_stack, proj
, include_lm=include_lm, final_only=final_only)
z = z.squeeze(0)
if pool == 'sum':
z = z.sum(0)
elif pool == 'max':
z,_ = z.max(0)
elif pool == 'avg':
z = z.mean(0)
z = z.cpu().numpy()
return z
def load_model(path, use_cuda=False):
encoder = torch.load(path)
encoder.eval()
if use_cuda:
encoder=encoder.to(DEVICE)
encoder = encoder.embedding
lm_embed = encoder.embed
lstm_stack = unstack_lstm(encoder.rnn)
proj = encoder.proj
return lm_embed, lstm_stack, proj
def SSA_Embed(input_seq):
T0 = time.time()
SSAEMB_ = []
PID_ =[]
# inData = fasta.fasta2csv(fastaFile)
# SEQ_ = inData["Seq"]
# PID_ = inData["PID"]
# CLASS_ = inData["Class"]
logging.info("SSA Embedding...")
print("Loading SSA Model...", file=sys.stderr, end='\r')
lm_embed, lstm_stack, proj = load_model("./prediction/embbed_models/SSA_embed.model", use_cuda=True)
include_lm = True
final_only = True
for key, value in input_seq.items():
PID_.append(key)
sequence = str(value).encode("utf-8")
z = embed_sequence(sequence, lm_embed, lstm_stack, proj
, include_lm=include_lm, final_only=final_only
, pool='avg', use_cuda=True)
SSAEMB_.append(z)
# count += 1
# print('#{}$'.format(count), file=sys.stderr, end='\r')
ssa_feature = pd.DataFrame(SSAEMB_)
col = ["SSA_F" + str(i + 1) for i in range(0, 121)]
ssa_feature.columns = col
ssa_feature = pd.concat([ssa_feature], axis=1)
ssa_feature.index = PID_
logging.info(ssa_feature.shape)
# ssa_feature.to_csv("./feature_vectors/SSA_feature.csv")
logging.info("SSA embedding finished@¥¥¥¥¥")
logging.info("it took %0.3f mins.\n" % ((time.time() - T0) / 60))
return ssa_feature
def BiLSTM_Embed(input_seq):
T0=time.time()
BiLSTMEMB_=[]
PID=[]
logging.info("\nBiLSTM Embedding...")
print("Loading BiLSTM Model...", file=sys.stderr, end='\r')
lm_embed, lstm_stack, proj = load_model("./prediction/embbed_models/SSA_embed.model", use_cuda=True)
proj = None
for key,value in input_seq.items():
PID.append(key)
sequence = str(value).encode("utf-8")
z = embed_sequence(sequence, lm_embed, lstm_stack, proj
, final_only=False,include_lm = True
, pool='avg', use_cuda=True)
BiLSTMEMB_.append(z)
bilstm_feature=pd.DataFrame(BiLSTMEMB_)
col=["BiLSTM_F"+str(i+1) for i in range(0,3605)]
bilstm_feature.columns=col
bilstm_feature=pd.concat([bilstm_feature],axis=1)
bilstm_feature.index=PID
# bilstm_feature.to_csv("./feature_vectors/bilstm_feature.csv")
logging.info("BiLSTM embedding finished@@¥¥¥¥¥")
logging.info("it took %0.3f mins.\n"%((time.time()-T0)/60))
return bilstm_feature
from __future__ import print_function,division
import sys
import os
sys.path.append(os.pardir)
sys.path.append(os.path.join(os.pardir, os.pardir))
import time
import pandas as pd
import torch
import warnings
warnings.filterwarnings('ignore')
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
from tape import UniRepModel,TAPETokenizer
def UniRep_Embed(input_seq):
T0=time.time()
UNIREPEB_=[]
PID = []
print("UniRep Embedding...")
model = UniRepModel.from_pretrained('babbler-1900')
model=model.to(DEVICE)
tokenizer = TAPETokenizer(vocab='unirep')
for key,value in input_seq.items():
PID.append(key)
sequence = value
if len(sequence) == 0:
print('# WARNING: sequence', PID, 'has length=0. Skipping.', file=sys.stderr)
continue
with torch.no_grad():
token_ids = torch.tensor([tokenizer.encode(sequence)])
token_ids = token_ids.to(DEVICE)
output = model(token_ids)
unirep_output = output[0]
unirep_output=torch.squeeze(unirep_output)
unirep_output= unirep_output.mean(0)
unirep_output = unirep_output.cpu().numpy()
UNIREPEB_.append(unirep_output.tolist())
unirep_feature=pd.DataFrame(UNIREPEB_)
col=["UniRep_F"+str(i+1) for i in range(0,1900)]
unirep_feature.columns=col
unirep_feature=pd.concat([unirep_feature],axis=1)
unirep_feature.index=PID
# print(unirep_feature.shape)
unirep_feature.to_csv("./dataset/unirep_feature.csv")
print("Getting Deep Representation Learning Features with UniRep is done.")
print("it took %0.3f mins.\n"%((time.time()-T0)/60))
return unirep_feature
\ No newline at end of file
from __future__ import print_function,division
import torch
import torch.nn as nn
import torch.nn.functional as F
class L1(nn.Module):
def forward(self, x, y):
return -torch.sum(torch.abs(x.unsqueeze(1)-y), -1)
class L2(nn.Module):
def forward(self, x, y):
return -torch.sum((x.unsqueeze(1)-y)**2, -1)
class DotProduct(nn.Module):
def forward(self, x, y):
return torch.mm(x, y.t())
def pad_gap_scores(s, gap):
col = gap.expand(s.size(0), 1)
s = torch.cat([s, col], 1)
row = gap.expand(1, s.size(1))
s = torch.cat([s, row], 0)
return s
class OrdinalRegression(nn.Module):
def __init__(self, embedding, n_classes, compare=L1()
, align_method='ssa', beta_init=10
, allow_insertions=False, gap_init=-10
):
super(OrdinalRegression, self).__init__()
self.embedding = embedding
self.n_out = n_classes
self.compare = compare
self.align_method = align_method
self.allow_insertions = allow_insertions
self.gap = nn.Parameter(torch.FloatTensor([gap_init]))
self.theta = nn.Parameter(torch.ones(1,n_classes-1))
self.beta = nn.Parameter(torch.zeros(n_classes-1)+beta_init)
self.clip()
def forward(self, x):
return self.embedding(x)
def clip(self):
# clip the weights of ordinal regression to be non-negative
self.theta.data.clamp_(min=0)
def score(self, z_x, z_y):
if self.align_method == 'ssa':
s = self.compare(z_x, z_y)
if self.allow_insertions:
s = pad_gap_scores(s, self.gap)
a = F.softmax(s, 1)
b = F.softmax(s, 0)
if self.allow_insertions:
index = s.size(0)-1
index = s.data.new(1).long().fill_(index)
a = a.index_fill(0, index, 0)
index = s.size(1)-1
index = s.data.new(1).long().fill_(index)
b = b.index_fill(1, index, 0)
a = a + b - a*b
c = torch.sum(a*s)/torch.sum(a)
elif self.align_method == 'ua':
s = self.compare(z_x, z_y)
c = torch.mean(s)
elif self.align_method == 'me':
z_x = z_x.mean(0)
z_y = z_y.mean(0)
c = self.compare(z_x.unsqueeze(0), z_y.unsqueeze(0)).squeeze(0)
else:
raise Exception('Unknown alignment method: ' + self.align_method)
logits = c*self.theta + self.beta
return logits.view(-1)
from __future__ import print_function,division
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils.rnn import PackedSequence
class LMEmbed(nn.Module):
def __init__(self, nin, nout, lm, padding_idx=-1, transform=nn.ReLU()
, sparse=False):
super(LMEmbed, self).__init__()
if padding_idx == -1:
padding_idx = nin-1
self.lm = lm
self.embed = nn.Embedding(nin, nout, padding_idx=padding_idx, sparse=sparse)
self.proj = nn.Linear(lm.hidden_size(), nout)
self.transform = transform
self.nout = nout
def forward(self, x):
packed = type(x) is PackedSequence
h_lm = self.lm.encode(x)
# embed and unpack if packed
if packed:
h = self.embed(x.data)
h_lm = h_lm.data
else:
h = self.embed(x)
# project
h_lm = self.proj(h_lm)
h = self.transform(h + h_lm)
# repack if needed
if packed:
h = PackedSequence(h, x.batch_sizes)
return h
class Linear(nn.Module):
def __init__(self, nin, nhidden, nout, padding_idx=-1,
sparse=False, lm=None):
super(Linear, self).__init__()
if padding_idx == -1:
padding_idx = nin-1
if lm is not None:
self.embed = LMEmbed(nin, nhidden, lm, padding_idx=padding_idx, sparse=sparse)
self.proj = nn.Linear(self.embed.nout, nout)
self.lm = True
else:
self.proj = nn.Embedding(nin, nout, padding_idx=padding_idx, sparse=sparse)
self.lm = False
self.nout = nout
def forward(self, x):
if self.lm:
h = self.embed(x)
if type(h) is PackedSequence:
h = h.data
z = self.proj(h)
z = PackedSequence(z, x.batch_sizes)
else:
h = h.view(-1, h.size(2))
z = self.proj(h)
z = z.view(x.size(0), x.size(1), -1)
else:
if type(x) is PackedSequence:
z = self.embed(x.data)
z = PackedSequence(z, x.batch_sizes)
else:
z = self.embed(x)
return z
class StackedRNN(nn.Module):
def __init__(self, nin, nembed, nunits, nout, nlayers=2, padding_idx=-1, dropout=0,
rnn_type='lstm', sparse=False, lm=None):
super(StackedRNN, self).__init__()
if padding_idx == -1:
padding_idx = nin-1
if lm is not None:
self.embed = LMEmbed(nin, nembed, lm, padding_idx=padding_idx, sparse=sparse)
nembed = self.embed.nout
self.lm = True
else:
self.embed = nn.Embedding(nin, nembed, padding_idx=padding_idx, sparse=sparse)
self.lm = False
if rnn_type == 'lstm':
RNN = nn.LSTM
elif rnn_type == 'gru':
RNN = nn.GRU
self.dropout = nn.Dropout(p=dropout)
if nlayers == 1:
dropout = 0
self.rnn = RNN(nembed, nunits, nlayers, batch_first=True
, bidirectional=True, dropout=dropout)
self.proj = nn.Linear(2*nunits, nout)
self.nout = nout
def forward(self, x):
if self.lm:
h = self.embed(x)
else:
if type(x) is PackedSequence:
h = self.embed(x.data)
h = PackedSequence(h, x.batch_sizes)
else:
h = self.embed(x)
h,_ = self.rnn(h)
if type(h) is PackedSequence:
h = h.data
h = self.dropout(h)
z = self.proj(h)
z = PackedSequence(z, x.batch_sizes)
else:
h = h.view(-1, h.size(2))
h = self.dropout(h)
z = self.proj(h)
z = z.view(x.size(0), x.size(1), -1)
return z
from __future__ import print_function,division
import torch
import torch.nn as nn
import torch.nn.functional as F
from .comparison import L1, pad_gap_scores
class SCOPCM(nn.Module):
def __init__(self, embedding, similarity_kwargs={},
cmap_kwargs={}):
super(SCOPCM, self).__init__()
self.embedding = embedding
embed_dim = embedding.nout
self.scop_predict = OrdinalRegression(5, **similarity_kwargs)
self.cmap_predict = ConvContactMap(embed_dim, **cmap_kwargs)
def clip(self):
self.scop_predict.clip()
self.cmap_predict.clip()
def forward(self, x):
return self.embedding(x)
def score(self, z_x, z_y):
return self.scop_predict(z_x, z_y)
def predict(self, z):
return self.cmap_predict(z)
class ConvContactMap(nn.Module):
def __init__(self, embed_dim, hidden_dim=50, width=7, act=nn.ReLU()):
super(ConvContactMap, self).__init__()
self.hidden = nn.Conv2d(2*embed_dim, hidden_dim, 1)
self.act = act
self.conv = nn.Conv2d(hidden_dim, 1, width, padding=width//2)
self.clip()
def clip(self):
# force the conv layer to be transpose invariant
w = self.conv.weight
self.conv.weight.data[:] = 0.5*(w + w.transpose(2,3))
def forward(self, z):
return self.predict(z)
def predict(self, z):
# z is (b,L,d)
z = z.transpose(1, 2) # (b,d,L)
z_dif = torch.abs(z.unsqueeze(2) - z.unsqueeze(3))
z_mul = z.unsqueeze(2)*z.unsqueeze(3)
z = torch.cat([z_dif, z_mul], 1)
# (b,2d,L,L)
h = self.act(self.hidden(z))
logits = self.conv(h).squeeze(1)
return logits
class OrdinalRegression(nn.Module):
def __init__(self, n_classes, compare=L1()
, align_method='ssa', beta_init=10
, allow_insertions=False, gap_init=-10
):
super(OrdinalRegression, self).__init__()
self.n_out = n_classes
self.compare = compare
self.align_method = align_method
self.allow_insertions = allow_insertions
self.gap = nn.Parameter(torch.FloatTensor([gap_init]))
self.theta = nn.Parameter(torch.ones(1,n_classes-1))
self.beta = nn.Parameter(torch.zeros(n_classes-1)+beta_init)
self.clip()
def clip(self):
# clip the weights of ordinal regression to be non-negative
self.theta.data.clamp_(min=0)
def forward(self, z_x, z_y):
if self.align_method == 'ssa':
s = self.compare(z_x, z_y)
if self.allow_insertions:
s = pad_gap_scores(s, self.gap)
a = F.softmax(s, 1)
b = F.softmax(s, 0)
if self.allow_insertions:
index = s.size(0)-1
index = s.data.new(1).long().fill_(index)
a = a.index_fill(0, index, 0)
index = s.size(1)-1
index = s.data.new(1).long().fill_(index)
b = b.index_fill(1, index, 0)
a = a + b - a*b
c = torch.sum(a*s)/torch.sum(a)
elif self.align_method == 'ua':
s = self.compare(z_x, z_y)
c = torch.mean(s)
elif self.align_method == 'me':
z_x = z_x.mean(0)
z_y = z_y.mean(0)
c = self.compare(z_x.unsqueeze(0), z_y.unsqueeze(0)).squeeze(0)
else:
raise Exception('Unknown alignment method: ' + self.align_method)
logits = c*self.theta + self.beta
return logits.view(-1)
from __future__ import print_function,division
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils.rnn import PackedSequence, pack_padded_sequence, pad_packed_sequence
class BiLM(nn.Module):
def __init__(self, nin, nout, embedding_dim, hidden_dim, num_layers
, tied=True, mask_idx=None, dropout=0):
super(BiLM, self).__init__()
if mask_idx is None:
mask_idx = nin-1
self.mask_idx = mask_idx
self.embed = nn.Embedding(nin, embedding_dim, padding_idx=mask_idx)
self.dropout = nn.Dropout(p=dropout)
self.tied = tied
if tied:
layers = []
nin = embedding_dim
for _ in range(num_layers):
layers.append(nn.LSTM(nin, hidden_dim, 1, batch_first=True))
nin = hidden_dim
self.rnn = nn.ModuleList(layers)
else:
layers = []
nin = embedding_dim
for _ in range(num_layers):
layers.append(nn.LSTM(nin, hidden_dim, 1, batch_first=True))
nin = hidden_dim
self.lrnn = nn.ModuleList(layers)
layers = []
nin = embedding_dim
for _ in range(num_layers):
layers.append(nn.LSTM(nin, hidden_dim, 1, batch_first=True))
nin = hidden_dim
self.rrnn = nn.ModuleList(layers)
self.linear = nn.Linear(hidden_dim, nout)
def hidden_size(self):
h = 0
if self.tied:
for layer in self.rnn:
h += 2*layer.hidden_size
else:
for layer in self.lrnn:
h += layer.hidden_size
for layer in self.rrnn:
h += layer.hidden_size
return h
def reverse(self, h):
packed = type(h) is PackedSequence
if packed:
h,batch_sizes = pad_packed_sequence(h, batch_first=True)
h_rvs = h.clone().zero_()
for i in range(h.size(0)):
n = batch_sizes[i]
idx = [j for j in range(n-1, -1, -1)]
idx = torch.LongTensor(idx).to(h.device)
h_rvs[i,:n] = h[i].index_select(0, idx)
# repack h_rvs
h_rvs = pack_padded_sequence(h_rvs, batch_sizes, batch_first=True)
else:
idx = [i for i in range(h.size(1)-1, -1, -1)]
idx = torch.LongTensor(idx).to(h.device)
h_rvs = h.index_select(1, idx)
return h_rvs
def transform(self, z_fwd, z_rvs, last_only=False):
# sequences are flanked by the start/stop token as:
# [stop, x, stop]
# z_fwd should be [stop,x]
# z_rvs should be [x,stop] reversed
# first, do the forward direction
if self.tied:
layers = self.rnn
else:
layers = self.lrnn
h_fwd = []
h = z_fwd
for rnn in layers:
h,_ = rnn(h)
if type(h) is PackedSequence:
h = PackedSequence(self.dropout(h.data), h.batch_sizes)
else:
h = self.dropout(h)
if not last_only:
h_fwd.append(h)
if last_only:
h_fwd = h
# now, do the reverse direction
if self.tied:
layers = self.rnn
else:
layers = self.rrnn
# we'll need to reverse the direction of these
# hidden states back to match forward direction
h_rvs = []
h = z_rvs
for rnn in layers:
h,_ = rnn(h)
if type(h) is PackedSequence:
h = PackedSequence(self.dropout(h.data), h.batch_sizes)
else:
h = self.dropout(h)
if not last_only:
h_rvs.append(self.reverse(h))
if last_only:
h_rvs = self.reverse(h)
return h_fwd,h_rvs
def embed_and_split(self, x, pad=False):
packed = type(x) is PackedSequence
if packed:
x,batch_sizes = pad_packed_sequence(x, batch_first=True)
if pad:
# pad x with the start/stop token
x = x + 1
## append start/stop tokens to x
x_ = x.data.new(x.size(0), x.size(1)+2).zero_()
if packed:
for i in range(len(batch_sizes)):
n = batch_sizes[i]
x_[i,1:n+1] = x[i,:n]
batch_sizes = [s+2 for s in batch_sizes]
else:
x_[:,1:-1] = x
x = x_
# sequences x are flanked by the start/stop token as:
# [stop, x, stop]
# now, encode x as distributed vectors
z = self.embed(x)
# to pass to transform, we discard the last element for z_fwd and the first element for z_rvs
z_fwd = z[:,:-1]
z_rvs = z[:,1:]
if packed:
lengths = [s-1 for s in batch_sizes]
z_fwd = pack_padded_sequence(z_fwd, lengths, batch_first=True)
z_rvs = pack_padded_sequence(z_rvs, lengths, batch_first=True)
# reverse z_rvs
z_rvs = self.reverse(z_rvs)
return z_fwd, z_rvs
def encode(self, x):
z_fwd,z_rvs = self.embed_and_split(x, pad=True)
h_fwd_layers,h_rvs_layers = self.transform(z_fwd, z_rvs)
# concatenate hidden layers together
packed = type(z_fwd) is PackedSequence
concat = []
for h_fwd,h_rvs in zip(h_fwd_layers,h_rvs_layers):
if packed:
h_fwd,batch_sizes = pad_packed_sequence(h_fwd, batch_first=True)
h_rvs,batch_sizes = pad_packed_sequence(h_rvs, batch_first=True)
# discard last element of h_fwd and first element of h_rvs
h_fwd = h_fwd[:,:-1]
h_rvs = h_rvs[:,1:]
# accumulate for concatenation
concat.append(h_fwd)
concat.append(h_rvs)
h = torch.cat(concat, 2)
if packed:
batch_sizes = [s-1 for s in batch_sizes]
h = pack_padded_sequence(h, batch_sizes, batch_first=True)
return h
def forward(self, x):
# x's are already flanked by the star/stop token as:
# [stop, x, stop]
z_fwd,z_rvs = self.embed_and_split(x, pad=False)
h_fwd,h_rvs = self.transform(z_fwd, z_rvs, last_only=True)
packed = type(z_fwd) is PackedSequence
if packed:
h_flat = h_fwd.data
logp_fwd = self.linear(h_flat)
logp_fwd = PackedSequence(logp_fwd, h_fwd.batch_sizes)
h_flat = h_rvs.data
logp_rvs = self.linear(h_flat)
logp_rvs = PackedSequence(logp_rvs, h_rvs.batch_sizes)
logp_fwd,batch_sizes = pad_packed_sequence(logp_fwd, batch_first=True)
logp_rvs,batch_sizes = pad_packed_sequence(logp_rvs, batch_first=True)
else:
b = h_fwd.size(0)
n = h_fwd.size(1)
h_flat = h_fwd.contiguous().view(-1, h_fwd.size(2))
logp_fwd = self.linear(h_flat)
logp_fwd = logp_fwd.view(b, n, -1)
h_flat = h_rvs.contiguous().view(-1, h_rvs.size(2))
logp_rvs = self.linear(h_flat)
logp_rvs = logp_rvs.view(b, n, -1)
# prepend forward logp with zero
# postpend reverse logp with zero
b = h_fwd.size(0)
zero = h_fwd.data.new(b,1,logp_fwd.size(2)).zero_()
logp_fwd = torch.cat([zero, logp_fwd], 1)
logp_rvs = torch.cat([logp_rvs, zero], 1)
logp = F.log_softmax(logp_fwd + logp_rvs, dim=2)
if packed:
batch_sizes = [s+1 for s in batch_sizes]
logp = pack_padded_sequence(logp, batch_sizes, batch_first=True)
return logp
PID,Category
negative38|0,0
negative135|0,0
negative243|0,0
negative264|0,0
negative36|0,0
negative162|0,0
negative19|0,0
negative221|0,0
negative113|0,0
negative61|0,0
negative277|0,0
negative66|0,0
negative75|0,0
negative184|0,0
negative186|0,0
negative115|0,0
negative276|0,0
negative6|0,0
negative142|0,0
negative12|0,0
negative82|0,0
negative171|0,0
negative303|0,0
negative181|0,0
negative20|0,0
negative241|0,0
negative76|0,0
negative260|0,0
negative57|0,0
negative9|0,0
negative25|0,0
negative50|0,0
negative198|0,0
negative267|0,0
negative218|0,0
negative33|0,0
negative262|0,0
negative169|0,0
negative132|0,0
negative252|0,0
negative46|0,0
negative284|0,0
negative291|0,0
negative253|0,0
negative14|0,0
negative84|0,0
negative35|0,0
negative62|0,0
negative119|0,0
negative232|0,0
negative182|0,0
negative295|0,0
negative190|0,0
negative164|0,0
negative296|0,0
negative167|0,0
negative45|0,0
negative211|0,0
negative189|0,0
negative143|0,0
negative287|0,0
negative161|0,0
negative230|0,0
negative178|0,0
negative3|0,0
negative271|0,0
negative70|0,0
negative107|0,0
negative259|0,0
negative204|0,0
negative85|0,0
negative172|0,0
negative7|0,0
negative157|0,0
negative129|0,0
negative31|0,0
negative149|0,0
negative128|0,0
negative16|0,0
negative175|0,0
negative248|0,0
negative265|0,0
negative43|0,0
negative109|0,0
negative44|0,0
negative205|0,0
negative200|0,0
negative56|0,0
negative240|0,0
negative140|0,0
negative133|0,0
negative275|0,0
negative247|0,0
negative242|0,0
negative91|0,0
negative92|0,0
negative281|0,0
negative166|0,0
negative250|0,0
negative199|0,0
negative59|0,0
negative18|0,0
negative225|0,0
negative69|0,0
negative90|0,0
negative64|0,0
negative301|0,0
negative206|0,0
negative95|0,0
negative147|0,0
negative210|0,0
negative2|0,0
negative106|0,0
negative183|0,0
negative197|0,0
negative236|0,0
negative78|0,0
negative269|0,0
negative215|0,0
negative134|0,0
negative302|0,0
negative188|0,0
negative233|0,0
negative153|0,0
negative258|0,0
negative48|0,0
negative68|0,0
negative27|0,0
negative194|0,0
negative224|0,0
negative104|0,0
negative65|0,0
negative17|0,0
negative136|0,0
negative251|0,0
negative63|0,0
negative165|0,0
negative229|0,0
negative257|0,0
negative280|0,0
negative152|0,0
negative77|0,0
negative49|0,0
negative168|0,0
negative89|0,0
negative117|0,0
negative79|0,0
negative130|0,0
negative58|0,0
negative288|0,0
negative102|0,0
negative144|0,0
negative219|0,0
negative41|0,0
negative173|0,0
negative237|0,0
negative86|0,0
negative244|0,0
negative116|0,0
negative254|0,0
negative30|0,0
negative15|0,0
negative123|0,0
negative101|0,0
negative40|0,0
negative97|0,0
negative174|0,0
negative238|0,0
negative74|0,0
negative293|0,0
negative23|0,0
negative81|0,0
negative5|0,0
negative120|0,0
negative87|0,0
negative151|0,0
negative297|0,0
negative227|0,0
negative146|0,0
negative47|0,0
negative125|0,0
negative114|0,0
negative282|0,0
negative179|0,0
negative4|0,0
negative246|0,0
negative298|0,0
negative239|0,0
negative268|0,0
negative100|0,0
negative304|0,0
negative155|0,0
negative28|0,0
negative131|0,0
negative94|0,0
negative283|0,0
negative158|0,0
negative8|0,0
negative261|0,0
negative176|0,0
negative208|0,0
negative83|0,0
negative124|0,0
negative279|0,0
negative170|0,0
negative273|0,0
negative93|0,0
negative255|0,0
negative207|0,0
negative223|0,0
negative24|0,0
negative145|0,0
negative122|0,0
negative99|0,0
negative220|0,0
negative196|0,0
negative154|0,0
negative29|0,0
negative228|0,0
negative292|0,0
negative71|0,0
negative105|0,0
negative278|0,0
negative202|0,0
negative187|0,0
negative60|0,0
negative103|0,0
negative26|0,0
negative177|0,0
negative201|0,0
negative159|0,0
negative192|0,0
negative156|0,0
negative266|0,0
negative13|0,0
negative98|0,0
negative39|0,0
negative22|0,0
negative234|0,0
negative226|0,0
negative185|0,0
negative121|0,0
negative212|0,0
positive29|1,1
positive69|1,1
positive4|1,1
positive139|1,1
positive123|1,1
positive21|1,1
positive117|1,1
positive86|1,1
positive109|1,1
positive24|1,1
positive76|1,1
positive30|1,1
positive72|1,1
positive77|1,1
positive7|1,1
positive133|1,1
positive145|1,1
positive147|1,1
positive34|1,1
positive148|1,1
positive82|1,1
positive115|1,1
positive136|1,1
positive33|1,1
positive132|1,1
positive78|1,1
positive46|1,1
positive121|1,1
positive8|1,1
positive108|1,1
positive90|1,1
positive114|1,1
positive146|1,1
positive57|1,1
positive95|1,1
positive106|1,1
positive141|1,1
positive41|1,1
positive50|1,1
positive14|1,1
positive74|1,1
positive100|1,1
positive10|1,1
positive66|1,1
positive94|1,1
positive45|1,1
positive137|1,1
positive36|1,1
positive35|1,1
positive64|1,1
positive87|1,1
positive142|1,1
positive84|1,1
positive5|1,1
positive28|1,1
positive128|1,1
positive67|1,1
positive62|1,1
positive44|1,1
positive138|1,1
positive22|1,1
positive103|1,1
positive96|1,1
positive104|1,1
positive15|1,1
positive99|1,1
positive80|1,1
positive17|1,1
positive119|1,1
positive43|1,1
positive65|1,1
positive113|1,1
positive151|1,1
positive49|1,1
positive144|1,1
positive11|1,1
positive12|1,1
positive122|1,1
positive73|1,1
positive54|1,1
positive23|1,1
positive13|1,1
positive126|1,1
positive97|1,1
positive39|1,1
positive25|1,1
positive79|1,1
positive83|1,1
positive85|1,1
positive26|1,1
positive32|1,1
positive92|1,1
positive9|1,1
positive3|1,1
positive20|1,1
positive134|1,1
positive110|1,1
positive102|1,1
positive42|1,1
positive60|1,1
positive16|1,1
positive58|1,1
positive53|1,1
positive31|1,1
positive130|1,1
positive93|1,1
positive124|1,1
positive71|1,1
positive89|1,1
positive51|1,1
positive107|1,1
positive37|1,1
positive59|1,1
positive150|1,1
positive61|1,1
positive27|1,1
positive143|1,1
positive19|1,1
positive6|1,1
positive105|1,1
positive63|1,1
joblib==1.0.1
tape_proteins==0.4
torch==1.2.0+cpu
numpy==1.19.2
pandas==1.2.0
Bio==0.4.1
tape==1.0
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment